Re: POC: GROUP BY optimization
On Thu, 31 Mar 2022 at 12:19, Tomas Vondra wrote: > Pushed, after going through the patch once more, running check-world > under valgrind, and updating the commit message. I'm still working in this area and I noticed that db0d67db2 updated some regression tests in partition_aggregate.out without any care as to what the test was testing. The comment above the test reads: -- Without ORDER BY clause, to test Gather at top-most path and you've changed the expected plan from being a parallel plan with a Gather to being a serial plan. So it looks like the test might have become useless. I see that the original plan appears to come back with some adjustments to parallel_setup_cost and parallel_tuple_cost. It seems a bit strange to me that the changes with this patch would cause a change of plan for this. There is only 1 GROUP BY column in the query in question. There's no rearrangement to do with a single column GROUP BY. David
RE: Support load balancing in libpq
Dear Jelte, I like your idea. But do we have to sort randomly even if target_session_attr is set to 'primary' or 'read-write'? I think this parameter can be used when all listed servers have same data, and we can assume that one of members is a primary and others are secondary. In this case user maybe add a primary host to top of the list, so sorting may increase time durations for establishing connection. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: EINTR in ftruncate()
On Fri, Jul 15, 2022 at 9:34 AM Tom Lane wrote: > (Someday we oughta go ahead and make our Windows signal API look more > like POSIX, as I suggested back in 2015. I'm still not taking > point on that, though.) For the sigprocmask() part, here's a patch that passes CI. Only the SIG_SETMASK case is actually exercised by our current code, though. One weird thing about our PG_SETMASK() macro is that you couldn't have used its return value portably: on Windows we were returning the old mask (like sigsetmask(), which has no way to report errors), and on Unix we were returning 0/-1 (from setprocmask(), ie the error we never checked). From b21140e9b3f593b46c3bb4782c6bf84ca248dc15 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 15 Jul 2022 14:35:01 +1200 Subject: [PATCH] Emulate sigprocmask(), not sigsetmask(), on Windows. Since commit a65e0864, we've required Unix systems to have sigprocmask(). For Windows we still emulated the old deprecated 4.2BSD function sigsetmask(). Emulate sigprocmask() instead, for consistency. Discussion: https://postgr.es/m/3153247.1657834482%40sss.pgh.pa.us --- src/backend/port/win32/signal.c | 29 +++-- src/include/libpq/pqsignal.h| 11 +++ 2 files changed, 30 insertions(+), 10 deletions(-) diff --git a/src/backend/port/win32/signal.c b/src/backend/port/win32/signal.c index b71164d8db..53b93a50b2 100644 --- a/src/backend/port/win32/signal.c +++ b/src/backend/port/win32/signal.c @@ -112,7 +112,7 @@ pgwin32_signal_initialize(void) /* * Dispatch all signals currently queued and not blocked * Blocked signals are ignored, and will be fired at the time of - * the pqsigsetmask() call. + * the pqsigprocmask() call. */ void pgwin32_dispatch_queued_signals(void) @@ -154,12 +154,29 @@ pgwin32_dispatch_queued_signals(void) /* signal masking. Only called on main thread, no sync required */ int -pqsigsetmask(int mask) +pqsigprocmask(int how, const sigset_t *set, sigset_t *oset) { - int prevmask; + if (oset) + *oset = pg_signal_mask; - prevmask = pg_signal_mask; - pg_signal_mask = mask; + if (!set) + return 0; + + switch (how) + { + case SIG_BLOCK: + pg_signal_mask |= *set; + break; + case SIG_UNBLOCK: + pg_signal_mask &= ~*set; + break; + case SIG_SETMASK: + pg_signal_mask = *set; + break; + default: + errno = EINVAL; + return -1; + } /* * Dispatch any signals queued up right away, in case we have unblocked @@ -167,7 +184,7 @@ pqsigsetmask(int mask) */ pgwin32_dispatch_queued_signals(); - return prevmask; + return 0; } diff --git a/src/include/libpq/pqsignal.h b/src/include/libpq/pqsignal.h index 41227a30e2..d17ddb787e 100644 --- a/src/include/libpq/pqsignal.h +++ b/src/include/libpq/pqsignal.h @@ -15,15 +15,18 @@ #include -#ifndef WIN32 #define PG_SETMASK(mask) sigprocmask(SIG_SETMASK, mask, NULL) -#else + +#ifdef WIN32 /* Emulate POSIX sigset_t APIs on Windows */ typedef int sigset_t; -extern int pqsigsetmask(int mask); +extern int pqsigprocmask(int how, const sigset_t *set, sigset_t *oset); -#define PG_SETMASK(mask) pqsigsetmask(*(mask)) +#define SIG_BLOCK1 +#define SIG_UNBLOCK2 +#define SIG_SETMASK3 +#define sigprocmask(how, set, oset) pqsigprocmask((how), (set), (oset)) #define sigemptyset(set) (*(set) = 0) #define sigfillset(set) (*(set) = ~0) #define sigaddset(set, signum) (*(set) |= (sigmask(signum))) -- 2.36.1
Re: Allowing REINDEX to have an optional name
On Mon, Jul 04, 2022 at 03:59:55PM +0900, Michael Paquier wrote: > Please note that patch authors should not switch a patch as RfC by > themselves. This is something that a reviewer should do. This patch has been marked as waiting for a review, however the CF bot is completely red: http://commitfest.cputube.org/simon-riggs.html Could you take care of those issues first? -- Michael signature.asc Description: PGP signature
Re: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger
On Fri, Jul 15, 2022 at 03:21:30AM +, kuroda.hay...@fujitsu.com wrote: > Sounds good. I grepped ATExecXXX() functions called in ATExecCmd(), > and I confirmed that all returned values have been collected except them. > > While checking test code test about EVENT TRIGGER, > I found there were no tests related with partitions in that. > How about adding them? Agreed. It would be good to track down what changes once those ObjectAddresses are collected. -- Michael signature.asc Description: PGP signature
RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger
Hi, > > I noticed that we didn't collect the ObjectAddress returned by > > ATExec[Attach|Detach]Partition. I think collecting this information can > > make it > > easier for users to get the partition OID of the attached or detached table > > in > > the event trigger. So how about collecting it like the attached patch ? > > Added to next CF. Sounds good. I grepped ATExecXXX() functions called in ATExecCmd(), and I confirmed that all returned values have been collected except them. While checking test code test about EVENT TRIGGER, I found there were no tests related with partitions in that. How about adding them? Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: Collect ObjectAddress for ATTACH DETACH PARTITION to use in event trigger
On Wednesday, July 13, 2022 5:58 PM Hou, Zhijie wrote: > Hi hackers, > > I noticed that we didn't collect the ObjectAddress returned by > ATExec[Attach|Detach]Partition. I think collecting this information can make > it > easier for users to get the partition OID of the attached or detached table in > the event trigger. So how about collecting it like the attached patch ? Added to next CF. Best regards, Hou zhijie
Re: [PATCH v1] remove redundant check of item pointer
On Fri, 15 Jul 2022 at 10:31, Bruce Momjian wrote: > for non-Assert builds, ItemPointerGetOffsetNumberNoCheck() and > ItemPointerGetOffsetNumber() are the same, so I don't see the point to > making this change. Frankly, I don't know why we even have two > functions for this. I am guessing ItemPointerGetOffsetNumberNoCheck is > for cases where you have an Assert build and do not want the check. We'll want to use ItemPointerGetOffsetNumberNoCheck() where the TID comes from sources we can't verify. e.g user input... '(2,0)'::tid. We want to use ItemPointerGetOffsetNumber() for item pointers that come from locations that we want to ensure are correct. e.g TIDs we're storing in an index. David
Re: doc: Clarify Routines and Extension Membership
On Thu, Jul 14, 2022 at 2:41 PM Bruce Momjian wrote: > On Fri, Jul 8, 2022 at 10:55:55PM -0400, Bruce Momjian wrote: > > > The/that inconsistency ... choose one. Or actually, the "an ... the" > > > combination you used elsewhere doesn't grate on the ear either. > > > > > > + For each extension, refuse to drop anything if any objects > (other than the > > > + extensions listed) depend on it. However, its own member > objects, and routines > > > + that are explicitly dependent on this extension, are skipped. > > > + This is the default. > > > > > > "skipped" seems like a horrible choice of word; it could easily be > read as > > > "they don't get dropped". I am not convinced that mentioning the > member > > > objects here is an improvement either. In the first sentence you are > > > treating each extension as a monolithic object; why not in the second? > > > > I created a modified patch based on this feedback; patch attached. I > > rewrote the last change. > > Patch applied to PG 13 and later, where extension dependency was added. > Thank you for the patch. > Thank you and apologies for being quiet here and on a few of the other threads. I've been on vacation and flagged as ToDo some of the non-simple feedback items that have come this way. The change to restrict and description in drop extension needs to be fixed up (the other pages look good). "This option prevents the specified extensions from being dropped if there exists non-extension-member objects that depends on any the extensions. This is the default." At minimum: "...that depend on any of the extensions." I did just now confirm that if any of the named extensions failed to be dropped the entire command fails. There is no partial success mode. I'd like to avoid non-extension-member, and one of the main points is that the routine dependency is member-like, not actual membership. Hence the separate wording. I thus propose to replace the drop extension / restrict paragraph and replace it with the following: "This option prevents the specified extensions from being dropped if other objects - besides these extensions, their members, and their explicitly dependent routines - depend on them. This is the default." Also, I'm thinking to change, on the same page (description): "Dropping an extension causes its component objects," to be: "Dropping an extension causes its member objects," I'm not sure why I originally chose component over member... David J.
Re: Doc about how to set max_wal_senders when setting minimal wal_level
On Tue, Jul 5, 2022 at 08:02:33PM -0400, Tom Lane wrote: > "Precondition" is an overly fancy word that makes things less clear > not more so. Does it mean that setting wal_level = minimal will fail > if you don't do these other things, or does it just mean that you > won't be getting the absolute minimum WAL volume? If the former, > I think it'd be better to say something like "To set wal_level to minimal, > you must also set [these variables], which has the effect of disabling > both WAL archiving and streaming replication." I have created the attached patch to try to improve this text. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..964f9fb379 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -2764,9 +2764,10 @@ include_dir 'conf.d' levels. This parameter can only be set at server start. -In minimal level, no information is logged for -permanent relations for the remainder of a transaction that creates or -rewrites them. This can make operations much faster (see +The minimal level generates the least WAL +volume. It logs no row information for permanent relations +in transactions that create or +rewrite them. This can make operations much faster (see ). Operations that initiate this optimization include: @@ -2778,19 +2779,18 @@ include_dir 'conf.d' REINDEX TRUNCATE -But minimal WAL does not contain enough information to reconstruct the -data from a base backup and the WAL logs, so replica or -higher must be used to enable WAL archiving -() and streaming replication. +However, minimal WAL does not contain sufficient information for +point-in-time recovery, so replica or +higher must be used to enable continuous archiving +() and streaming binary replication. Note that changing wal_level to -minimal makes any base backups taken before -unavailable for archive recovery and standby server, which may -lead to data loss. +minimal makes any base backups taken before this +unusable for point-in-time recovery and standby servers. In logical level, the same information is logged as -with replica, plus information needed to allow -extracting logical change sets from the WAL. Using a level of +with replica, plus information needed to +extract logical change sets from the WAL. Using a level of logical will increase the WAL volume, particularly if many tables are configured for REPLICA IDENTITY FULL and many UPDATE and DELETE statements are
Re: Fix typo in progress reporting doc
On Fri, Mar 11, 2022 at 04:07:32PM +0530, Nitin Jadhav wrote: > Hi, > > I have observed that the table naming conventions used in > 'progress-reporting.html' are not consistent across different > sections. For some cases "Phases" (Table 28.37. CREATE INDEX Phases) > is used and for some cases "phases" (Table 28.35. ANALYZE phases) is > used. I have attached a patch to correct this. Patch applied to PG 13, where it first appeared, and all later releases. Thanks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml
Hi, I had missed David's original email on this topic... On 2022-07-14 18:58:09 -0400, Bruce Momjian wrote: > On Wed, Apr 20, 2022 at 04:40:44PM -0700, David G. Johnston wrote: > > The new cumulative stats subsystem no longer has a "lost under heavy load" > > problem so that parenthetical should go (or at least be modified). > > > > These stats can be reset so some discussion about how the system uses them > > given that possibility seems like it would be good to add here. I'm not > > sure > > what that should look like though. > > > > diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml > > index 04a04e0e5f..360807c8f9 100644 > > --- a/doc/src/sgml/maintenance.sgml > > +++ b/doc/src/sgml/maintenance.sgml > > @@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert threshold + > > vacuum insert scale fac > > tuples to be frozen by earlier vacuums. The number of obsolete tuples > > and > > the number of inserted tuples are obtained from the cumulative > > statistics > > system; > > it is a semi-accurate count updated by each UPDATE, > > - DELETE and INSERT operation. > > (It is > > - only semi-accurate because some information might be lost under heavy > > - load.) If the relfrozenxid value of the > > table > > + DELETE and INSERT operation. > > + If the relfrozenxid value of the table > > is more than vacuum_freeze_table_age transactions > > old, > > an aggressive vacuum is performed to freeze old tuples and advance > > relfrozenxid; otherwise, only pages that > > have > > been modified > > Yes, I agree and plan to apply this patch soon. It might make sense to still say semi-accurate, but adjust the explanation to say that stats reporting is not instantaneous? Greetings, Andres Freund
Re: [PATCH v1] remove redundant check of item pointer
On Thu, Jul 14, 2022 at 3:59 PM Tom Lane wrote: > Even in an assert-enabled build, wouldn't you expect the compiler to > optimize away the second assertion as unreachable code? I think that it probably would, even at -O0 (GCC doesn't really allow you to opt out of all optimizations). I did think of that myself, but it seemed rather beside the point. There have been individual cases where individual assertions were deemed a bit too heavyweight. But those have been few and far between. I myself tend to use *lots* of technically-redundant assertions like this for preconditions and postconditions. At worst they're code comments that are all but guaranteed to stay current. -- Peter Geoghegan
Re: [PATCH v1] remove redundant check of item pointer
Peter Geoghegan writes: > The proposal doesn't seem like an improvement. Technically the > assertion cannot possibly fail here because the earlier assertion > would always fail instead, so strictly speaking it is redundant -- at > least right now. That is true. But it seems much more important to be > consistent about which variant to use. Especially because there is > obviously no overhead in builds without assertions enabled. Even in an assert-enabled build, wouldn't you expect the compiler to optimize away the second assertion as unreachable code? regards, tom lane
Re: doc: New cumulative stats subsystem obsoletes comment in maintenance.sgml
On Wed, Apr 20, 2022 at 04:40:44PM -0700, David G. Johnston wrote: > Hackers, > > The new cumulative stats subsystem no longer has a "lost under heavy load" > problem so that parenthetical should go (or at least be modified). > > These stats can be reset so some discussion about how the system uses them > given that possibility seems like it would be good to add here. I'm not sure > what that should look like though. > > diff --git a/doc/src/sgml/maintenance.sgml b/doc/src/sgml/maintenance.sgml > index 04a04e0e5f..360807c8f9 100644 > --- a/doc/src/sgml/maintenance.sgml > +++ b/doc/src/sgml/maintenance.sgml > @@ -652,9 +652,8 @@ vacuum insert threshold = vacuum base insert threshold + > vacuum insert scale fac > tuples to be frozen by earlier vacuums. The number of obsolete tuples > and > the number of inserted tuples are obtained from the cumulative statistics > system; > it is a semi-accurate count updated by each UPDATE, > - DELETE and INSERT operation. (It > is > - only semi-accurate because some information might be lost under heavy > - load.) If the relfrozenxid value of the table > + DELETE and INSERT operation. > + If the relfrozenxid value of the table > is more than vacuum_freeze_table_age transactions old, > an aggressive vacuum is performed to freeze old tuples and advance > relfrozenxid; otherwise, only pages that have > been modified Yes, I agree and plan to apply this patch soon. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: pg_parameter_aclcheck() and trusted extensions
On Thu, Jul 14, 2022 at 06:03:45PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Thu, Jul 14, 2022 at 04:02:30PM -0400, Tom Lane wrote: >>> I noted something that ought to be looked at separately: >>> validate_option_array_item() seems like it needs to be taught about >>> grantable permissions on GUCs. I think that right now it may report >>> permissions failures in some cases where it should succeed. > >> Which cases do you think might be inappropriately reporting permissions >> failures? It looked to me like this stuff was mostly used for >> pg_db_role_setting, which wouldn't be impacted by the current set of >> grantable GUC permissions. Is the idea that you should be able to do ALTER >> ROLE SET for GUCs that you have SET permissions for? > > Well, that's what I'm wondering. Obviously that wouldn't *alone* be > enough permissions, but it seems like it could be a component of it. > Specifically, this bit: > > /* manual permissions check so we can avoid an error being thrown */ > if (gconf->context == PGC_USERSET) >/* ok */ ; > else if (gconf->context == PGC_SUSET && superuser()) >/* ok */ ; > else if (skipIfNoPermissions) > return false; > > seems like it's trying to duplicate what set_config_option would do, > and it's now missing a component of that. If it shouldn't check > per-GUC permissions along with superuser(), we should add a comment > explaining why not. I looked into this a bit closer. I found that having SET permissions on a GUC seems to allow you to ALTER ROLE SET it to others. postgres=# CREATE ROLE test CREATEROLE; CREATE ROLE postgres=# CREATE ROLE other; CREATE ROLE postgres=# GRANT SET ON PARAMETER zero_damaged_pages TO test; GRANT postgres=# SET ROLE test; SET postgres=> ALTER ROLE other SET zero_damaged_pages = 'on'; ALTER ROLE postgres=> SELECT * FROM pg_db_role_setting; setdatabase | setrole |setconfig -+-+- 0 | 16385 | {zero_damaged_pages=on} (1 row) However, ALTER ROLE RESET ALL will be blocked, while resetting only the individual GUC will go through. postgres=> ALTER ROLE other RESET ALL; ALTER ROLE postgres=> SELECT * FROM pg_db_role_setting; setdatabase | setrole |setconfig -+-+- 0 | 16385 | {zero_damaged_pages=on} (1 row) postgres=> ALTER ROLE other RESET zero_damaged_pages; ALTER ROLE postgres=> SELECT * FROM pg_db_role_setting; setdatabase | setrole | setconfig -+-+--- (0 rows) I think this is because GUCArrayReset() is the only caller of validate_option_array_item() that sets skipIfNoPermissions to true. The others fall through to set_config_option(), which does a pg_parameter_aclcheck(). So, you are right. Regarding whether SET privileges should be enough to allow ALTER ROLE SET, I'm not sure I have an opinion yet. You would need WITH GRANT OPTION to be able to grant SET to that role, but that's a bit different than altering the setting for the role. You'll already have privileges to alter the role (e.g., CREATEROLE), so requiring extra permissions to set GUCs on roles feels like it might be excessive. But there might be a good argument for it. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Move Section 9.27.7 (Data Object Management Functions) to System Information Chapter
Bruce Momjian writes: > On Mon, Apr 25, 2022 at 08:33:47AM -0700, David G. Johnston wrote: >> Both the location and name of the linked to section make no sense to me: >> https://www.postgresql.org/docs/current/functions-admin.html# >> FUNCTIONS-ADMIN-DBOBJECT >> Neither of the tables listed there manage (cause to change) anything. They >> are >> pure informational functions - size and path of objects respectively. It >> belongs in the previous chapter "System Information Functions and Operators" >> with a different name. > So, the section title is: > 9.27.7. Database Object Management Functions > I think the idea is that they _help_ to manage database objects by > reporting their size or location. I do think it is in the right > chapter, but maybe needs a better title? I can't think of one. I'm hesitant to move functions to a different documentation page without a really solid reason. Just a couple days ago I fielded a complaint from somebody who couldn't find string_to_array anymore because we'd moved it from "array functions" to "string functions". I'd be the first to say that the division between 9.26 and 9.27 is pretty arbitrary ... but without a clearer demarcation rule, moving functions between the two pages seems more likely to add confusion than subtract it. regards, tom lane
Re: [PATCH v1] remove redundant check of item pointer
On Thu, Jul 14, 2022 at 3:31 PM Bruce Momjian wrote: > On Wed, Apr 27, 2022 at 08:04:00PM +0800, Junwang Zhao wrote: > for non-Assert builds, ItemPointerGetOffsetNumberNoCheck() and > ItemPointerGetOffsetNumber() are the same, so I don't see the point to > making this change. Frankly, I don't know why we even have two > functions for this. I am guessing ItemPointerGetOffsetNumberNoCheck is > for cases where you have an Assert build and do not want the check. Sometimes we use ItemPointerData for things that aren't actually TIDs. For example, both GIN and B-Tree type-pun the ItemPointerData field from the Indextuple struct. Plus we do something like that with UPDATEs that affect a partitioning key in a partitioned table. The proposal doesn't seem like an improvement. Technically the assertion cannot possibly fail here because the earlier assertion would always fail instead, so strictly speaking it is redundant -- at least right now. That is true. But it seems much more important to be consistent about which variant to use. Especially because there is obviously no overhead in builds without assertions enabled. -- Peter Geoghegan
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
In addition to adding several new tests, the attached version 26 fixes a major bug in constructing the view. The only valid combination of IOPATH/IOOP that is not tested now is IOPATH_STRATEGY + IOOP_WRITE. In most cases when I ran this in regress, the checkpointer wrote out the dirty strategy buffer before VACUUM got around to reusing and writing it out in my tests. I've also changed the BACKEND_NUM_TYPES definition. Now arrays will have that dead spot for B_INVALID, but I feel like it is much easier to understand without trying to skip that spot and use those special helper functions. I also started skipping adding rows to the view for WAL_RECEIVER and WAL_WRITER and for BackendTypes except B_BACKEND and WAL_SENDER for IOPATH_LOCAL. On Tue, Jul 12, 2022 at 1:18 PM Andres Freund wrote: > On 2022-07-11 22:22:28 -0400, Melanie Plageman wrote: > > Yes, per an off list suggestion by you, I have changed the tests to use a > > sum of writes. I've also added a test for IOPATH_LOCAL and fixed some of > > the missing calls to count IO Operations for IOPATH_LOCAL and > > IOPATH_STRATEGY. > > > > I struggled to come up with a way to test writes for a particular > > type of backend are counted correctly since a dirty buffer could be > > written out by another type of backend before the target BackendType has > > a chance to write it out. > > I guess temp file writes would be reliably done by one backend... Don't > have a > good idea otherwise. > > This was mainly an issue for IOPATH_STRATEGY writes as I mentioned. I still have not solved this. > > > I'm not sure how to cause a strategy "extend" for testing. > > COPY into a table should work. But might be unattractive due to the size > of of > the COPY ringbuffer. > Did it with a CTAS as Horiguchi-san suggested. > > > > Would be nice to have something testing that the ringbuffer stats stuff > > > does something sensible - that feels not entirely trivial. > > > > > > > > I've added a test to test that reused strategy buffers are counted as > > allocs. I would like to add a test which checks that if a buffer in the > > ring is pinned and thus not reused, that it is not counted as a strategy > > alloc, but I found it challenging without a way to pause vacuuming, pin > > a buffer, then resume vacuuming. > > Yea, that's probably too hard to make reliable to be worth it. > > Yes, I have skipped this. - Melanie From f7772e4d19821e0aeb19e906ba6f5e4bb046cfdb Mon Sep 17 00:00:00 2001 From: Melanie Plageman Date: Wed, 29 Jun 2022 18:37:42 -0400 Subject: [PATCH v26 3/4] Track IO operation statistics Introduce "IOOp", an IO operation done by a backend, and "IOPath", the location or type of IO done by a backend. For example, the checkpointer may write a shared buffer out. This would be counted as an IOOp "write" on an IOPath IOPATH_SHARED by BackendType "checkpointer". Each IOOp (alloc, extend, fsync, read, write) is counted per IOPath (local, shared, or strategy) through a call to pgstat_count_io_op(). The primary concern of these statistics is IO operations on data blocks during the course of normal database operations. IO done by, for example, the archiver or syslogger is not counted in these statistics. IOPATH_LOCAL and IOPATH_SHARED IOPaths concern operations on local and shared buffers. The IOPATH_STRATEGY IOPath concerns buffers alloc'd/extended/fsync'd/read/written as part of a BufferAccessStrategy. IOOP_ALLOC is counted for IOPATH_SHARED and IOPATH_LOCAL whenever a buffer is acquired through [Local]BufferAlloc(). IOOP_ALLOC for IOPATH_STRATEGY is counted whenever a buffer already in the strategy ring is reused. And IOOP_WRITE for IOPATH_STRATEGY is counted whenever the reused dirty buffer is written out. Stats on IOOps for all IOPaths for a backend are initially accumulated locally. Later they are flushed to shared memory and accumulated with those from all other backends, exited and live. The accumulated stats in shared memory could be extended in the future with per-backend stats -- useful for per connection IO statistics and monitoring. Some BackendTypes will not flush their pending statistics at regular intervals and explicitly call pgstat_flush_io_ops() during the course of normal operations to flush their backend-local IO Operation statistics to shared memory in a timely manner. Author: Melanie Plageman Reviewed-by: Justin Pryzby , Kyotaro Horiguchi Discussion: https://www.postgresql.org/message-id/flat/20200124195226.lth52iydq2n2uilq%40alap3.anarazel.de --- src/backend/postmaster/checkpointer.c | 1 + src/backend/storage/buffer/bufmgr.c | 53 - src/backend/storage/buffer/freelist.c | 25 ++- src/backend/storage/buffer/localbuf.c | 5 + src/backend/storage/sync/sync.c | 2 + src/backend/utils/activity/Makefile | 1 + src/backend/utils/activity/pgstat.c | 36 src/backend/utils/activity/pgstat_bgwriter.c | 7 +- .../utils/activity/pgstat_checkpointer.c
Re: Move Section 9.27.7 (Data Object Management Functions) to System Information Chapter
On Mon, Apr 25, 2022 at 08:33:47AM -0700, David G. Johnston wrote: > Hi, > > Both the location and name of the linked to section make no sense to me: > > https://www.postgresql.org/docs/current/functions-admin.html# > FUNCTIONS-ADMIN-DBOBJECT > > Neither of the tables listed there manage (cause to change) anything. They > are > pure informational functions - size and path of objects respectively. It > belongs in the previous chapter "System Information Functions and Operators" > with a different name. So, the section title is: 9.27.7. Database Object Management Functions I think the idea is that they _help_ to manage database objects by reporting their size or location. I do think it is in the right chapter, but maybe needs a better title? I can't think of one. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: [PATCH v1] remove redundant check of item pointer
On Wed, Apr 27, 2022 at 08:04:00PM +0800, Junwang Zhao wrote: > In function ItemPointerEquals, the ItemPointerGetBlockNumber > already checked the ItemPointer if valid, there is no need > to check it again in ItemPointerGetOffset, so use > ItemPointerGetOffsetNumberNoCheck instead. > > Signed-off-by: Junwang Zhao > --- > src/backend/storage/page/itemptr.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/src/backend/storage/page/itemptr.c > b/src/backend/storage/page/itemptr.c > index 9011337aa8..61ad727b1d 100644 > --- a/src/backend/storage/page/itemptr.c > +++ b/src/backend/storage/page/itemptr.c > @@ -37,8 +37,8 @@ ItemPointerEquals(ItemPointer pointer1, ItemPointer > pointer2) > > if (ItemPointerGetBlockNumber(pointer1) == > ItemPointerGetBlockNumber(pointer2) && > - ItemPointerGetOffsetNumber(pointer1) == > - ItemPointerGetOffsetNumber(pointer2)) > + ItemPointerGetOffsetNumberNoCheck(pointer1) == > + ItemPointerGetOffsetNumberNoCheck(pointer2)) > return true; > else > return false; Looking at the code: /* * ItemPointerGetOffsetNumberNoCheck * Returns the offset number of a disk item pointer. */ static inline OffsetNumber ItemPointerGetOffsetNumberNoCheck(const ItemPointerData *pointer) { return pointer->ip_posid; } /* * ItemPointerGetOffsetNumber * As above, but verifies that the item pointer looks valid. */ static inline OffsetNumber ItemPointerGetOffsetNumber(const ItemPointerData *pointer) { Assert(ItemPointerIsValid(pointer)); return ItemPointerGetOffsetNumberNoCheck(pointer); } for non-Assert builds, ItemPointerGetOffsetNumberNoCheck() and ItemPointerGetOffsetNumber() are the same, so I don't see the point to making this change. Frankly, I don't know why we even have two functions for this. I am guessing ItemPointerGetOffsetNumberNoCheck is for cases where you have an Assert build and do not want the check. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: pg_parameter_aclcheck() and trusted extensions
Nathan Bossart writes: > On Thu, Jul 14, 2022 at 04:02:30PM -0400, Tom Lane wrote: >> I noted something that ought to be looked at separately: >> validate_option_array_item() seems like it needs to be taught about >> grantable permissions on GUCs. I think that right now it may report >> permissions failures in some cases where it should succeed. > Which cases do you think might be inappropriately reporting permissions > failures? It looked to me like this stuff was mostly used for > pg_db_role_setting, which wouldn't be impacted by the current set of > grantable GUC permissions. Is the idea that you should be able to do ALTER > ROLE SET for GUCs that you have SET permissions for? Well, that's what I'm wondering. Obviously that wouldn't *alone* be enough permissions, but it seems like it could be a component of it. Specifically, this bit: /* manual permissions check so we can avoid an error being thrown */ if (gconf->context == PGC_USERSET) /* ok */ ; else if (gconf->context == PGC_SUSET && superuser()) /* ok */ ; else if (skipIfNoPermissions) return false; seems like it's trying to duplicate what set_config_option would do, and it's now missing a component of that. If it shouldn't check per-GUC permissions along with superuser(), we should add a comment explaining why not. regards, tom lane
Re: pg_parameter_aclcheck() and trusted extensions
On Thu, Jul 14, 2022 at 04:02:30PM -0400, Tom Lane wrote: > Here's a draft patch for that. I initially ran around and changed all > the set_config_option callers as I threatened before, but as I did it > I could not help observing that they were all changing in exactly the > same way: basically, they were passing GetUserId() if the GucContext > is PGC_S_SESSION and BOOTSTRAP_SUPERUSERID otherwise. Not counting > guc.c internal call sites, there is a grand total of one caller that > fails to fit the pattern. So that brought me around to liking the idea > of keeping set_config_option's API stable by making it a thin wrapper > around another function with an explicit role argument. The result, > attached, poses far less of an API/ABI hazard than I was anticipating. > If you're not poking into the GUC tables you have little to fear. > > Most of the bulk of this is mechanical changes to pass the source > role around properly in guc.c's data structures. That's all basically > copy-and-paste from the code to track the source context (scontext). At first glance, this looks pretty reasonable to me. > I noted something that ought to be looked at separately: > validate_option_array_item() seems like it needs to be taught about > grantable permissions on GUCs. I think that right now it may report > permissions failures in some cases where it should succeed. Which cases do you think might be inappropriately reporting permissions failures? It looked to me like this stuff was mostly used for pg_db_role_setting, which wouldn't be impacted by the current set of grantable GUC permissions. Is the idea that you should be able to do ALTER ROLE SET for GUCs that you have SET permissions for? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: standby recovery fails (tablespace related) (tentative patch and discussion)
Here's a couple of fixups. 0001 is the same as before. In 0002 I think CheckTablespaceDirectory ends up easier to read if we split out the test for validity of the link. Looking at that again, I think we don't need to piggyback on ignore_invalid_pages, which is already a stretch, so let's not -- instead we can use allow_in_place_tablespaces if users need a workaround. So that's 0003 (this bit needs more than zero docs, however.) 0004 is straightforward: let's check for bad directories before logging about consistent state. After all this, I'm not sure what to think of dbase_redo. At line 3102, is the directory supposed to exist or not? I'm confused as to what is the expected state at that point. I rewrote this, but now I think my rewrite continues to be confusing, so I'll have to think more about it. Another aspect are the tests. Robert described a scenario where the previously committed version of this patch created trouble. Do we have a test case to cover that problematic case? I think we should strive to cover it, if possible. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "The eagle never lost so much time, as when he submitted to learn of the crow." (William Blake) >From bdb691c2a86301e5369b3ae7f735fa5f7c29304d Mon Sep 17 00:00:00 2001 From: Alvaro Herrera Date: Wed, 13 Jul 2022 18:14:18 +0200 Subject: [PATCH v25 1/4] Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch allows missing tablespaces to be created during recovery before reaching consistency. The tablespaces are created as real directories that should not exists but will be removed until reaching consistency. CheckRecoveryConsistency is responsible to make sure they have disappeared. Similar to log_invalid_page mechanism, the GUC ignore_invalid_pages turns into PANIC errors detected by this patch into WARNING, which allows continueing recovery. Diagnosed-by: Paul Guo Author: Paul Guo Author: Kyotaro Horiguchi Author: Asim R Praveen Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27dn5xwj2p9-roh16e4...@mail.gmail.com --- doc/src/sgml/config.sgml| 5 +- src/backend/access/transam/xlogrecovery.c | 59 src/backend/commands/dbcommands.c | 71 + src/backend/commands/tablespace.c | 28 +--- src/backend/utils/misc/guc.c| 8 +- src/include/access/xlogutils.h | 2 + src/test/recovery/t/029_replay_tsp_drops.pl | 155 7 files changed, 296 insertions(+), 32 deletions(-) create mode 100644 src/test/recovery/t/029_replay_tsp_drops.pl diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 37fd80388c..1e1c8c1cb7 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -11363,11 +11363,12 @@ LOG: CleanUpLock: deleting: lock(0xb7acd844) id(24688,24696,0,0,0,1) If set to off (the default), detection of -WAL records having references to invalid pages during +WAL records having references to invalid pages or +WAL records resulting in invalid directory operations during recovery causes PostgreSQL to raise a PANIC-level error, aborting the recovery. Setting ignore_invalid_pages to on -causes the system to ignore invalid page references in WAL records +causes the system to ignore invalid actions caused by such WAL records (but still report a warning), and continue the recovery. This behavior may cause crashes, data loss, propagate or hide corruption, or other serious problems. diff --git a/src/backend/access/transam/xlogrecovery.c b/src/backend/access/transam/xlogrecovery.c index 5d6f1b5e46..ae81244e06 100644 --- a/src/backend/access/transam/xlogrecovery.c +++ b/src/backend/access/transam/xlogrecovery.c @@ -2008,6 +2008,57 @@ xlogrecovery_redo(XLogReaderState *record, TimeLineID replayTLI) } } +/* + * Makes sure that ./pg_tblspc directory doesn't contain a real directory. + * + * This is intended to be called after reaching consistency. + * ignore_invalid_pages=on turns into the PANIC error into WARNING so that + * recovery can continue. + * + * This can't be checked in allow_in_place_tablespaces mode, so skip it in + * that case. + */ +static void +CheckTablespaceDirectory(void) +{ + char *tblspc_path = "./pg_tblspc"; + DIR *dir; + struct dirent *de; + + /* Do not check for this
Re: doc: Clarify Routines and Extension Membership
On Fri, Jul 8, 2022 at 10:55:55PM -0400, Bruce Momjian wrote: > > The/that inconsistency ... choose one. Or actually, the "an ... the" > > combination you used elsewhere doesn't grate on the ear either. > > > > + For each extension, refuse to drop anything if any objects (other > > than the > > + extensions listed) depend on it. However, its own member objects, > > and routines > > + that are explicitly dependent on this extension, are skipped. > > + This is the default. > > > > "skipped" seems like a horrible choice of word; it could easily be read as > > "they don't get dropped". I am not convinced that mentioning the member > > objects here is an improvement either. In the first sentence you are > > treating each extension as a monolithic object; why not in the second? > > I created a modified patch based on this feedback; patch attached. I > rewrote the last change. Patch applied to PG 13 and later, where extension dependency was added. Thank you for the patch. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Introduce "log_connection_stages" setting.
On Thu, Jul 14, 2022, at 8:20 AM, Sergey Dudoladov wrote: > I've taken connection stages and terminology from the existing log messages. > The reason I have separated "authorized" and "authenticated" are [1] > and [2] usages of "log_connections"; > "received" is mentioned at [3]. After checking the commit 9afffcb833d, I agree that you need 4 stages: connected, authorized, authenticated, and disconnected. > I have thought about enums too, but we need to cover arbitrary > combinations of message types, for example log only "received" and > "disconnected". > Hence the proposed type "string" with individual values within the > string really drawn from an enum. Ooops. I said enum but I meant string list. > Are there any specific deprecation guidelines ? I have not found any > after a quick search for GUC deprecation in Google and commit history. > A deprecation scheme could look like that: > 1. Mention in the docs "log_(dis)connections" are deprecated in favor > of "log_connection_stages" > 2. Map "log_(dis)connections" to relevant values of > "log_connection_stages" in code if the latter is unset. > 3. Complain in the logs if a conflict arises between the old params > and "log_connection_stages", with "log_connection_stages" > taking the precedence. No. AFAICS in this case, you just remove log_connections and log_disconnections and create the new one (see for example the commit 88e98230268 that replace checkpoint_segments with min_wal_size and max_wal_size). We don't generally keep ConfigureNames* entries for deprecated GUCs. Unless you are renaming a GUC -- see map_old_guc_names; that's not the case. When we remove a GUC, we are introducing an incompatibility so the only place it will be mentioned is the release notes (there is a section called "Migration to Version X" that lists all incompatibilities). From the developer's point of view, you only need to mention in the commit message that this commit is introducing an incompatibility. Hence, when it is time to write the release notes, the information about the removal and the new replacement will be added. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: EINTR in ftruncate()
Thomas Munro writes: > So why would I add another wrapper like PG_SETMASK and leave it > unimplemented for now on Windows, when I could just use sigprocmask() > directly and leave it unimplemented for now on Windows? Fair enough, I guess. No objection to this patch. (Someday we oughta go ahead and make our Windows signal API look more like POSIX, as I suggested back in 2015. I'm still not taking point on that, though.) regards, tom lane
Re: EINTR in ftruncate()
On Fri, Jul 15, 2022 at 3:27 AM Tom Lane wrote: > Alvaro Herrera writes: > > ISTM it would be cleaner to patch PG_SETMASK to have a second argument > > and to return the original mask if that's not NULL. This is more > > invasive, but there aren't that many callsites of that macro. > > [ shoulda read your message before replying ] > > Given that this needs back-patched, I think changing PG_SETMASK > is a bad idea: there might be outside callers. However, we could > add another macro with the additional argument. PG_GET_AND_SET_MASK? It's funny though, the reason we had PG_SETMASK in the first place is not for Windows. Ancient commit 47937403676 added that for long gone pre-POSIX systems like NeXTSTEP which only had single-argument sigsetmask(), not sigprocmask(). In general on Windows we're emulating POSIX signal interfaces with normal names like sigemptyset() etc, it just so happens that we chose to emulate that pre-standard sigsetmask() interface (as you complained about in the commit message for a65e0864). So why would I add another wrapper like PG_SETMASK and leave it unimplemented for now on Windows, when I could just use sigprocmask() directly and leave it unimplemented for now on Windows? The only reason I can think of for a wrapper is to provide a place to check the return code and ereport (panic?). That seems to be of limited value (how can it fail ... bad "how" value, or a sandbox denying some system calls, ...?). I did make sure to preserve the errno though; even though we're assuming these calls can't fail by long standing tradition, I didn't feel like additionally assuming that successful calls don't clobber errno. I guess, coded like this, it should also be safe to do it in the postmaster, but I think maybe we should leave it conditional, rather than relying on BlockSig being initialised and sane during early postmaster initialisation. From 17eb588af303873dbd76d0f24b536f74747cb3bf Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Fri, 15 Jul 2022 09:00:12 +1200 Subject: [PATCH] Make dsm_impl_resize more future-proof. Commit 4518c798 blocks signals for a short region of code, but it assumed that whatever calls it had the mask set to UnBlockSig on entry. That may be true today, but it would be more resilient to save and restore the caller's signal mask. Previously we couldn't do this because our macro for abstracting signal mask changes was based on the old pre-POSIX sigsetmask(), due to 1990s portability concerns. Since this is a POSIX-only code path, and since a65e0864 established that supported Unix systems must have sigprocmask(), we can just use sigprocmask() directly. It would also be implementable for Windows, but that's not needed for this. Back-patch to all releases, like 4518c798 and 80845b7c. Reviewed-by: Alvaro Herrera Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CA%2BhUKGKx6Biq7_UuV0kn9DW%2B8QWcpJC1qwhizdtD9tN-fn0H0g%40mail.gmail.com --- src/backend/storage/ipc/dsm_impl.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/src/backend/storage/ipc/dsm_impl.c b/src/backend/storage/ipc/dsm_impl.c index 0247d13a91..69c6df75b4 100644 --- a/src/backend/storage/ipc/dsm_impl.c +++ b/src/backend/storage/ipc/dsm_impl.c @@ -49,6 +49,7 @@ #include "postgres.h" #include +#include #include #ifndef WIN32 #include @@ -62,7 +63,7 @@ #endif #include "common/file_perm.h" -#include "libpq/pqsignal.h" /* for PG_SETMASK macro */ +#include "libpq/pqsignal.h" #include "miscadmin.h" #include "pgstat.h" #include "portability/mem.h" @@ -355,6 +356,7 @@ dsm_impl_posix_resize(int fd, off_t size) { int rc; int save_errno; + sigset_t save_sigmask; /* * Block all blockable signals, except SIGQUIT. posix_fallocate() can run @@ -363,7 +365,7 @@ dsm_impl_posix_resize(int fd, off_t size) * conflicts), the retry loop might never succeed. */ if (IsUnderPostmaster) - PG_SETMASK(); + sigprocmask(SIG_SETMASK, , _sigmask); pgstat_report_wait_start(WAIT_EVENT_DSM_ALLOCATE); #if defined(HAVE_POSIX_FALLOCATE) && defined(__linux__) @@ -402,7 +404,7 @@ dsm_impl_posix_resize(int fd, off_t size) if (IsUnderPostmaster) { save_errno = errno; - PG_SETMASK(); + sigprocmask(SIG_SETMASK, _sigmask, NULL); errno = save_errno; } -- 2.30.2
Re: Issue with recovery_target = 'immediate'
On 7/14/22 04:26, Kyotaro Horiguchi wrote: At Wed, 13 Jul 2022 14:41:40 -0400, David Steele wrote in While it is certainly true that timeline 2 cannot be replayed to from timeline 1, it should not matter for an immediate recovery that stops at consistency. No timeline switch will occur until promotion. Of course the cluster could be shut down before promotion and the target changed, but in that case couldn't timeline be adjusted at that point? This works by default for PostgreSQL < 12 because the default timeline is current. Since PostgreSQL 12 the default has been latest, which does not work by default. When a user does a number of recoveries it is pretty common for the timelines to get in a state that will make most subsequent recoveries fail. We think it makes sense for recovery_target = 'immediate' to be a fail safe that always works no matter the state of the latest timeline. Our solution has been to force recovery_target_timeline = 'current' when recovery_target = 'immediate', but it seems like this is something that should be done in PostgreSQL instead. Thoughts? I think it is natural that recovery defaultly targets the most recent update. In that sense, at the time the admin decided to recover the server from the first backup, the second backup is kind of dead, at least which should be forgotten in the future operation. Well, I dislike the idea of a dead backup. Certainly no backup can be followed along all timelines but it should still be recoverable. Even if we want "any" backup usable, just re-targeting to the current timeline after the timeline error looks kind of inconsistent to the behavior mentioned above. To make "dead" backups behave like the "live" ones, we would need to check if the backup is in the history of each "future" timelines, then choose the latest timeline from them. I think this makes sense for for non-immediate targets. The idea is that recovering to the "latest" timeline would actually recover to the latest timeline that is valid for the backup. Is that what you had in mind? However, for immediate targets, only the current timeline makes sense so I feel like it would be better to simply force the current timeline. # Mmm. I remember about a recent patch for pg_rewind to do the same... Do you have a link for this? Regards, -David
Re: Making the subquery alias optional in the FROM clause
Dean Rasheed writes: > Attached is an update with the following changes: > * Docs updated as suggested. > * transformLockingClause() updated to skip subquery and values rtes > without aliases. > * eref->aliasname changed to "unnamed_subquery" for subqueries without > aliases. This looks good to me. Marked RFC. regards, tom lane
Re: [PATCH] Log details for client certificate failures
On Thu, Jul 14, 2022 at 1:12 PM Peter Eisentraut wrote: > Concretely, I was thinking like the attached top-up patch. > > The other way can surely be made to work somehow, but this seems much > simpler and with fewer questions about the details. Ah, seeing it side-by-side helps. That's much easier, I agree. Thanks, --Jacob
Re: [PATCH] Log details for client certificate failures
On 13.07.22 01:06, Jacob Champion wrote: I had to read up on this "ex_data" API. Interesting. But I'm wondering a bit about how the life cycle of these objects is managed. What happens if the allocated error string is deallocated before its containing object? Or vice versa? Yeah, I'm currently leaning heavily on the lack of any memory context switches here. And I end up leaking out a pointer to the stale stack of be_tls_open_server(), which is gross -- it works since there are no other clients, but that could probably come back to bite us. The ex_data API exposes optional callbacks for new/dup/free (I'm currently setting those to NULL), so we can run custom code whenever the SSL* is destroyed. If you'd rather the data have the same lifetime of the SSL* object, we can switch to malloc/strdup/free (or even OPENSSL_strdup() in later versions). But since we don't have any use for the ex_data outside of this function, maybe we should just clear it before we return, rather than carrying it around. Concretely, I was thinking like the attached top-up patch. The other way can surely be made to work somehow, but this seems much simpler and with fewer questions about the details.From 662bc1d19101865f91c6ed4a98c0f13f3757e1c7 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Thu, 14 Jul 2022 22:08:53 +0200 Subject: [PATCH v5 2/2] squash! Log details for client certificate failures --- src/backend/libpq/be-secure-openssl.c | 29 +-- 1 file changed, 5 insertions(+), 24 deletions(-) diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c index 8d5eee0d16..2147524ffc 100644 --- a/src/backend/libpq/be-secure-openssl.c +++ b/src/backend/libpq/be-secure-openssl.c @@ -81,13 +81,7 @@ static bool ssl_is_server_start; static int ssl_protocol_version_to_openssl(int v); static const char *ssl_protocol_version_to_string(int v); -/* Helpers for storing application data in the SSL handle */ -struct pg_ex_data -{ - /* to bubble errors out of callbacks */ - char *errdetail; -}; -static int ssl_ex_index; +static const char *cert_errdetail; /* */ /* Public interface */ @@ -110,7 +104,6 @@ be_tls_init(bool isServerStart) SSL_library_init(); SSL_load_error_strings(); #endif - ssl_ex_index = SSL_get_ex_new_index(0, NULL, NULL, NULL, NULL); SSL_initialized = true; } @@ -422,7 +415,6 @@ be_tls_open_server(Port *port) int err; int waitfor; unsigned long ecode; - struct pg_ex_data exdata = {0}; boolgive_proto_hint; Assert(!port->ssl); @@ -455,14 +447,6 @@ be_tls_open_server(Port *port) SSLerrmessage(ERR_get_error(); return -1; } - if (!SSL_set_ex_data(port->ssl, ssl_ex_index, )) - { - ereport(COMMERROR, - (errcode(ERRCODE_PROTOCOL_VIOLATION), -errmsg("could not attach application data to SSL handle: %s", - SSLerrmessage(ERR_get_error(); - return -1; - } port->ssl_in_use = true; @@ -561,7 +545,7 @@ be_tls_open_server(Port *port) (errcode(ERRCODE_PROTOCOL_VIOLATION), errmsg("could not accept SSL connection: %s", SSLerrmessage(ecode)), -exdata.errdetail ? errdetail_internal("%s", exdata.errdetail) : 0, +cert_errdetail ? errdetail_internal("%s", cert_errdetail) : 0, give_proto_hint ? errhint("This may indicate that the client does not support any SSL protocol version between %s and %s.", ssl_min_protocol_version ? @@ -570,6 +554,7 @@ be_tls_open_server(Port *port) ssl_max_protocol_version ? ssl_protocol_version_to_string(ssl_max_protocol_version) : MAX_OPENSSL_TLS_VERSION) : 0)); + cert_errdetail = NULL; break; case SSL_ERROR_ZERO_RETURN: ereport(COMMERROR, @@ -1145,12 +1130,10 @@ truncate_cert_name(char *name) static int verify_cb(int ok,
Re: System catalog documentation chapter
On Tue, Jul 12, 2022 at 05:16:41PM -0400, Bruce Momjian wrote: > On Tue, Jul 12, 2022 at 08:56:01PM +0200, Peter Eisentraut wrote: > > On 08.07.22 18:07, Bruce Momjian wrote: > > > so I guess we can backpatch this with no issues. > > > > It inserts a new chapter, which would renumber all other chapters. That's a > > pretty big change to backpatch. I'm against that. > > Okay, I can see the renumbering as being confusing so I will do PG 15 > and head only. Patch applied to PG 15 and master. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: [RFC] building postgres with meson -v9
Hi, On 2022-07-07 12:09:32 +0200, Peter Eisentraut wrote: > On 06.07.22 15:21, Andres Freund wrote: > > > Here is my rough assessment of where we are with this patch set: > > > > > > 08b4330ded prereq: deal with \ paths in basebackup_to_shell tests. > > > > > > This still needs clarification, per my previous review. > > Hm. I thought I had explained that bit, but apparently not. Well, it's > > pretty > > simple - without this, the test fail on windows for me, as soon as one of > > the > > binaries is in a directory with spaces (which is common on windows). > > Iimagine > > what happens with e.g. > >qq{$gzip --fast > "$escaped_backup_path%f.gz"} > > if $gzip contains spaces. > > > > > > This doesn't happen currently on CI because nothing runs these tests on > > windows yet. > > Hmm, maybe this patch looked different the last time I saw it. Don't think it had changed since I wrote it first. > The quoting of "$gzip" is clearly necessary. > > What about the backslash replacements s{\\}{/}g ? Is that also required for > passing the path through the shell? I don't recall the details, but it's definitely needed for embedding it into postgresql.conf. We could double the escapes instead, but that doesn't seem an improvement (and wouldn't work when passing it to the shell anymore). > If so, the treatment of $tar in that > way doesn't seem necessary, since that doesn't get called through an > intermediate shell. (That would then also explain why $gzip in the > pg_basebackup tests doesn't require that treatment, which had previously > confused me.) Yea, it doesn't immediately look like it's needed, and the test passes without it. I guess I might just have tried to be complete... Greetings, Andres Freund
Re: pg_parameter_aclcheck() and trusted extensions
I wrote: > Michael Paquier writes: >> Looks like a bug to me, so I have added an open item assigned to Tom. > Yeah. So the fix here seems pretty obvious: rather than applying the > permissions check using bare GetUserId(), we need to remember the role > OID that originally applied the setting, and use that. Here's a draft patch for that. I initially ran around and changed all the set_config_option callers as I threatened before, but as I did it I could not help observing that they were all changing in exactly the same way: basically, they were passing GetUserId() if the GucContext is PGC_S_SESSION and BOOTSTRAP_SUPERUSERID otherwise. Not counting guc.c internal call sites, there is a grand total of one caller that fails to fit the pattern. So that brought me around to liking the idea of keeping set_config_option's API stable by making it a thin wrapper around another function with an explicit role argument. The result, attached, poses far less of an API/ABI hazard than I was anticipating. If you're not poking into the GUC tables you have little to fear. Most of the bulk of this is mechanical changes to pass the source role around properly in guc.c's data structures. That's all basically copy-and-paste from the code to track the source context (scontext). I noted something that ought to be looked at separately: validate_option_array_item() seems like it needs to be taught about grantable permissions on GUCs. I think that right now it may report permissions failures in some cases where it should succeed. regards, tom lane diff --git a/src/backend/commands/extension.c b/src/backend/commands/extension.c index 3db859c3ea..6b6720c690 100644 --- a/src/backend/commands/extension.c +++ b/src/backend/commands/extension.c @@ -912,6 +912,9 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control, * We use the equivalent of a function SET option to allow the setting to * persist for exactly the duration of the script execution. guc.c also * takes care of undoing the setting on error. + * + * log_min_messages can't be set by ordinary users, so for that one we + * pretend to be superuser. */ save_nestlevel = NewGUCNestLevel(); @@ -920,9 +923,10 @@ execute_extension_script(Oid extensionOid, ExtensionControlFile *control, PGC_USERSET, PGC_S_SESSION, GUC_ACTION_SAVE, true, 0, false); if (log_min_messages < WARNING) - (void) set_config_option("log_min_messages", "warning", - PGC_SUSET, PGC_S_SESSION, - GUC_ACTION_SAVE, true, 0, false); + (void) set_config_option_ext("log_min_messages", "warning", + PGC_SUSET, PGC_S_SESSION, + BOOTSTRAP_SUPERUSERID, + GUC_ACTION_SAVE, true, 0, false); /* * Similarly disable check_function_bodies, to ensure that SQL functions diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 0328029d43..9e6cb590c7 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -5172,7 +5172,8 @@ static void reapply_stacked_values(struct config_generic *variable, struct config_string *pHolder, GucStack *stack, const char *curvalue, - GucContext curscontext, GucSource cursource); + GucContext curscontext, GucSource cursource, + Oid cursrole); static void ShowGUCConfigOption(const char *name, DestReceiver *dest); static void ShowAllGUCConfig(DestReceiver *dest); static char *_ShowOption(struct config_generic *record, bool use_units); @@ -5897,10 +5898,10 @@ InitializeWalConsistencyChecking(void) check_wal_consistency_checking_deferred = false; - set_config_option("wal_consistency_checking", - wal_consistency_checking_string, - PGC_POSTMASTER, guc->source, - GUC_ACTION_SET, true, ERROR, false); + set_config_option_ext("wal_consistency_checking", + wal_consistency_checking_string, + guc->scontext, guc->source, guc->srole, + GUC_ACTION_SET, true, ERROR, false); /* checking should not be deferred again */ Assert(!check_wal_consistency_checking_deferred); @@ -5979,6 +5980,8 @@ InitializeOneGUCOption(struct config_generic *gconf) gconf->reset_source = PGC_S_DEFAULT; gconf->scontext = PGC_INTERNAL; gconf->reset_scontext = PGC_INTERNAL; + gconf->srole = BOOTSTRAP_SUPERUSERID; + gconf->reset_srole = BOOTSTRAP_SUPERUSERID; gconf->stack = NULL; gconf->extra = NULL; gconf->last_reported = NULL; @@ -6356,6 +6359,7 @@ ResetAllOptions(void) gconf->source = gconf->reset_source; gconf->scontext = gconf->reset_scontext; + gconf->srole = gconf->reset_srole; if (gconf->flags & GUC_REPORT) { @@ -6401,6 +6405,7 @@ push_old_value(struct config_generic *gconf, GucAction action) { /* SET followed by SET LOCAL, remember SET's value */ stack->masked_scontext = gconf->scontext; + stack->masked_srole = gconf->srole; set_stack_value(gconf, >masked);
Re: doc: Clarify Savepoint Behavior
On Sat, Jul 9, 2022 at 12:59:23PM -0400, Bruce Momjian wrote: > On Sun, Jun 26, 2022 at 09:14:56AM -0700, David G. Johnston wrote: > > So leave the "release" behavior implied from the rollback behavior? > > > > On the whole I'm slightly in favor of your proposed wording (mostly due to > > the > > better fitting of the ROLLBACK command, though at the omission of > > RELEASE...) > > but are you seeing anything beyond personal style as to why you feel one is > > better than the other? Is there some existing wording in the docs that I > > should be conforming to here? > > I have developed the attached patch based on the discussion here. I > tried to simplify the language and example to clarify the intent. > > I was confused why the first part of the patch added a mention of > releasing savepoints to the ROLLBACK TO manual page --- I have removed > that and improved the text in RELEASE SAVEPOINT. Patch applied to all supported versions. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: doc: Clarify what "excluded" represents for INSERT ON CONFLICT
On Fri, Jul 8, 2022 at 11:18:43PM -0400, Bruce Momjian wrote: > On Fri, Jul 1, 2022 at 08:11:36AM -0700, David G. Johnston wrote: > > That said, I still think that the current wording should be tweak with > > respect > > to row vs. rows (especially if we continue to call it a table): > > > > Current: > > "The SET and WHERE clauses in ON CONFLICT DO UPDATE have access to the > > existing row using the table's name (or an alias), and to [rows] proposed > > for insertion using the special excluded table." > > > > Change [rows] to: > > > > "the row" > > > > > > I'm undecided whether "FROM excluded" should be something that works - but I > > also don't think it would actually be used in any case. > > I found two places where a singular "row" would be better, doc patch > attached. Patch applied to all supported versions. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: doc: Bring mention of unique index forced transaction wait behavior outside of the internal section
On Mon, Jul 11, 2022 at 05:22:41PM +0300, Aleksander Alekseev wrote: > Hi Bruce, > > > I was not happy with putting this in the Transaction Isolation section. > > I rewrote it and put it in the INSERT secion, right before ON CONFLICT; > > patch attached. > > Looks good. Applied to all supported PG versions. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Pre-allocating WAL files
On Fri, Apr 08, 2022 at 01:30:03PM -0700, Nathan Bossart wrote: > On Thu, Mar 17, 2022 at 04:12:12PM -0700, Nathan Bossart wrote: >> It seems unlikely that this will be committed for v15, so I've adjusted the >> commitfest entry to v16 and moved it to the next commitfest. > > rebased It's now been over a year since I first posted a patch in this thread, and I still sense very little interest for this feature. I intend to mark it as Withdrawn at the end of this commitfest. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: optimize lookups in snapshot [sub]xip arrays
Hi Bharath, Thanks for taking a look. On Thu, Jul 14, 2022 at 03:10:56PM +0530, Bharath Rupireddy wrote: > Aren't these snapshot arrays always sorted? I see the following code: > > /* sort so we can bsearch() */ > qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator); > > /* sort so we can bsearch() later */ > qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator); AFAICT these arrays are sorted in limited cases, such as pg_current_snapshot() and logical replication. GetSnapshotData() does not appear to sort them, so I don't think we can always assume they are sorted. In the previous thread, Tomas analyzed simply sorting the arrays [0] and found that it provided much less improvement compared to the hash table approach, so I have not seriously considered it here. > If the ordering isn't an invariant of these snapshot arrays, can we > also use the hash table mechanism for all of the snapshot arrays > infrastructure rather than qsort+bsearch in a few places and hash > table for others? Unless there is demonstrable benefit in doing so for the few places that sort the arrays, I'm ѕkeptical it's worth the complexity. This patch is targeted to XidInMVCCSnapshot(), which we can demonstrate has clear impact on TPS for some workloads. > + * The current value worked well in testing, but it's still mostly a > guessed-at > + * number that might need updating in the future. > + */ > +#define XIP_HASH_MIN_ELEMENTS (128) > + > > Do you see a regression with a hash table for all the cases? Why can't > we just build a hash table irrespective of these limits and use it for > all the purposes instead of making it complex with different > approaches if we don't have measurable differences in the performance > or throughput? I performed the same tests as before with a variety of values. Here are the results: writers HEAD 1 16 32 64 128 16 659 698 678 659 665 664 32 645 661 688 657 649 663 64 659 656 653 649 663 692 128 641 636 639 679 643 716 256 619 641 619 643 653 610 512 530 609 582 602 605 702 768 469 610 608 551 571 582 1000 367 610 538 557 556 577 I was surpised to see that there really wasn't a regression at the low end, but keep in mind that this is a rather large machine and a specialized workload for generating snapshots with long [sub]xip arrays. That being said, there really wasn't any improvement at the low end, either. If we always built a hash table, we'd be introducing more overhead and memory usage in return for approximately zero benefit. My intent was to only take on the overhead in cases where we believe it might have a positive impact, which is why I picked the somewhat conservative value of 128. If the overhead isn't a concern, it might be feasible to always make [sub]xip a hash table. > +static inline bool > +XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt, > + xip_hash_hash **xiph) > > + /* Make sure the hash table is built. */ > + if (*xiph == NULL) > + { > + *xiph = xip_hash_create(TopTransactionContext, xcnt, NULL); > + > + for (int i = 0; i < xcnt; i++) > > Why create a hash table on the first search? Why can't it be built > while inserting or creating these snapshots? Basically, instead of the > array, can these snapshot structures be hash tables by themselves? I > know this requires a good amount of code refactoring, but worth > considering IMO as it removes bsearch thus might improve the > performance further. The idea is to avoid the overhead unless something actually needs to inspect these arrays. [0] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
On Thu, Jul 14, 2022 at 12:06 PM Masahiko Sawada wrote: > > On Thu, Jul 14, 2022 at 11:16 AM shiy.f...@fujitsu.com > wrote: > > > > On Tue, Jul 12, 2022 5:23 PM Masahiko Sawada wrote: > > > > > > On Tue, Jul 12, 2022 at 5:58 PM shiy.f...@fujitsu.com > > > wrote: > > > > > > > > It happened when executing the following code because it tried to free a > > > NULL > > > > pointer (catchange_xip). > > > > > > > > /* be tidy */ > > > > if (ondisk) > > > > pfree(ondisk); > > > > + if (catchange_xip) > > > > + pfree(catchange_xip); > > > > } > > > > > > > > It seems to be related to configure option. I could reproduce it when > > > > using > > > > `./configure --enable-debug`. > > > > But I couldn't reproduce with `./configure --enable-debug CFLAGS="-Og - > > > ggdb"`. > > > > > > Hmm, I could not reproduce this problem even if I use ./configure > > > --enable-debug. And it's weird that we checked if catchange_xip is not > > > null but we did pfree for it: > > > > > > #1 pfree (pointer=0x0) at mcxt.c:1177 > > > #2 0x0078186b in SnapBuildSerialize (builder=0x1fd5e78, > > > lsn=25719712) at snapbuild.c:1792 > > > > > > Is it reproducible in your environment? > > > > Thanks for your reply! Yes, it is reproducible. And I also reproduced it on > > the > > v4 patch you posted [1]. > > Thank you for testing! I've found out the exact cause of this problem and how to fix it. I'll submit an updated patch next week with my analysis. Thank you for testing and providing additional information off-list, Shi yu. Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations.
On Thu, Jul 14, 2022 at 4:34 AM Peter Smith wrote: > > On Wed, Jul 13, 2022 at 7:55 PM vignesh C wrote: > > > > On Wed, Jul 13, 2022 at 1:13 PM Michael Paquier wrote: > > > > > > On Wed, Jul 13, 2022 at 12:22:06PM +0530, vignesh C wrote: > > > > Most of the code is common between GetSubscriptionRelations and > > > > GetSubscriptionNotReadyRelations. Added a parameter to > > > > GetSubscriptionRelations which could provide the same functionality as > > > > the existing GetSubscriptionRelations and > > > > GetSubscriptionNotReadyRelations. Attached patch has the changes for > > > > the same. Thoughts? > > > > > > Right. Using all_rels to mean that we'd filter relations that are not > > > ready is a bit confusing, though. Perhaps this could use a bitmask as > > > argument. > > > > The attached v2 patch has the modified version which includes the > > changes to make the argument as bitmask. > > > > By using a bitmask I think there is an implication that the flags can > be combined... > > Perhaps it is not a problem today, but later you may want more flags. e.g. > #define SUBSCRIPTION_REL_STATE_READY 0x02 /* READY relations */ > > then the bitmask idea falls apart because IIUC you have no intentions > to permit things like: > (SUBSCRIPTION_REL_STATE_NOT_READY | SUBSCRIPTION_REL_STATE_READY) > > IMO using an enum might be a better choice for that parameter. Changed it to enum so that it can be extended to support other subscription relations like ready state subscription relations later easily. The attached v3 patch has the changes for the same. Regards, Vignesh From e0186ae685acb6334b711ea821b287be61d6cbd3 Mon Sep 17 00:00:00 2001 From: Vigneshwaran C Date: Wed, 13 Jul 2022 11:54:59 +0530 Subject: [PATCH v3] Refactor to make use of a common function for GetSubscriptionRelations and GetSubscriptionNotReadyRelations. Most of the code is common between GetSubscriptionRelations and GetSubscriptionNotReadyRelations. Added a parameter to GetSubscriptionRelations which could provide the same functionality for GetSubscriptionRelations and GetSubscriptionNotReadyRelations. --- src/backend/catalog/pg_subscription.c | 71 - src/backend/commands/subscriptioncmds.c | 5 +- src/backend/replication/logical/tablesync.c | 3 +- src/include/catalog/pg_subscription_rel.h | 19 +- src/tools/pgindent/typedefs.list| 1 + 5 files changed, 34 insertions(+), 65 deletions(-) diff --git a/src/backend/catalog/pg_subscription.c b/src/backend/catalog/pg_subscription.c index 8856ce3b50..46eedbc062 100644 --- a/src/backend/catalog/pg_subscription.c +++ b/src/backend/catalog/pg_subscription.c @@ -525,65 +525,15 @@ HasSubscriptionRelations(Oid subid) } /* - * Get all relations for subscription. + * Get the relations for subscription. * - * Returned list is palloc'ed in current memory context. + * If subrel_options is SUBSCRIPTION_REL_ALL, return all the relations for + * subscription. If SUBSCRIPTION_REL_NOT_READY, return all the relations for + * subscription that are not in a ready state. Returned list is palloc'ed in + * current memory context. */ List * -GetSubscriptionRelations(Oid subid) -{ - List *res = NIL; - Relation rel; - HeapTuple tup; - ScanKeyData skey[1]; - SysScanDesc scan; - - rel = table_open(SubscriptionRelRelationId, AccessShareLock); - - ScanKeyInit([0], -Anum_pg_subscription_rel_srsubid, -BTEqualStrategyNumber, F_OIDEQ, -ObjectIdGetDatum(subid)); - - scan = systable_beginscan(rel, InvalidOid, false, - NULL, 1, skey); - - while (HeapTupleIsValid(tup = systable_getnext(scan))) - { - Form_pg_subscription_rel subrel; - SubscriptionRelState *relstate; - Datum d; - bool isnull; - - subrel = (Form_pg_subscription_rel) GETSTRUCT(tup); - - relstate = (SubscriptionRelState *) palloc(sizeof(SubscriptionRelState)); - relstate->relid = subrel->srrelid; - relstate->state = subrel->srsubstate; - d = SysCacheGetAttr(SUBSCRIPTIONRELMAP, tup, - Anum_pg_subscription_rel_srsublsn, ); - if (isnull) - relstate->lsn = InvalidXLogRecPtr; - else - relstate->lsn = DatumGetLSN(d); - - res = lappend(res, relstate); - } - - /* Cleanup */ - systable_endscan(scan); - table_close(rel, AccessShareLock); - - return res; -} - -/* - * Get all relations for subscription that are not in a ready state. - * - * Returned list is palloc'ed in current memory context. - */ -List * -GetSubscriptionNotReadyRelations(Oid subid) +GetSubscriptionRelations(Oid subid, SubscriptionRelationState subrel_state) { List *res = NIL; Relation rel; @@ -599,10 +549,11 @@ GetSubscriptionNotReadyRelations(Oid subid) BTEqualStrategyNumber, F_OIDEQ, ObjectIdGetDatum(subid)); - ScanKeyInit([nkeys++], -Anum_pg_subscription_rel_srsubstate, -BTEqualStrategyNumber, F_CHARNE, -CharGetDatum(SUBREL_STATE_READY)); + if (subrel_state == SUBSCRIPTION_REL_NOT_READY) + ScanKeyInit([nkeys++], + Anum_pg_subscription_rel_srsubstate, +
MERGE and parsing with prepared statements
We've used INSERT ON CONFLICT for a few years (with partitions as the target). That's also combined with prepared statements, for bulk loading. I was looking to see if we should use MERGE (probably not, but looking anyway). And came across this behavior. I'm not sure if it's any issue. CREATE TABLE CustomerAccount (CustomerId int, Balance float); PREPARE p AS MERGE INTO CustomerAccount CA USING (SELECT $1 AS CustomerId, $2 AS TransactionValue) AS T ON CA.CustomerId = T.CustomerId WHEN NOT MATCHED THEN INSERT (CustomerId, Balance) VALUES (T.CustomerId, T.TransactionValue) WHEN MATCHED THEN UPDATE SET Balance = Balance + TransactionValue; ERROR: operator does not exist: integer = text LINE 3: ON CA.CustomerId = T.CustomerId postgres: pryzbyj postgres [local] PREPARE(+0x2337be) [0x56108322e7be] postgres: pryzbyj postgres [local] PREPARE(oper+0x198) [0x56108322f1fb] postgres: pryzbyj postgres [local] PREPARE(make_op+0x7e) [0x56108322f55a] postgres: pryzbyj postgres [local] PREPARE(+0x228f2b) [0x561083223f2b] postgres: pryzbyj postgres [local] PREPARE(+0x227aa9) [0x561083222aa9] postgres: pryzbyj postgres [local] PREPARE(transformExpr+0x1c) [0x5610832227f9] postgres: pryzbyj postgres [local] PREPARE(transformMergeStmt+0x339) [0x56108322d988] postgres: pryzbyj postgres [local] PREPARE(transformStmt+0x70) [0x5610831f4071] postgres: pryzbyj postgres [local] PREPARE(+0x1fa350) [0x5610831f5350] postgres: pryzbyj postgres [local] PREPARE(transformTopLevelStmt+0x11) [0x5610831f5385] postgres: pryzbyj postgres [local] PREPARE(parse_analyze_varparams+0x5b) [0x5610831f54f4] postgres: pryzbyj postgres [local] PREPARE(pg_analyze_and_rewrite_varparams+0x38) [0x5610834bcdfe] postgres: pryzbyj postgres [local] PREPARE(PrepareQuery+0xcc) [0x561083292155] postgres: pryzbyj postgres [local] PREPARE(standard_ProcessUtility+0x4ea) [0x5610834c31a0] Why is $1 construed to be of type text ?
Re: SQL/JSON documentation JSON_TABLE
On 2022-07-08 Fr 16:20, Andrew Dunstan wrote: > On 2022-07-08 Fr 16:03, Erik Rijkers wrote: >> Hi, >> >> Attached are a few small changes to the JSON_TABLE section in func.sgml. >> >> The first two changes are simple typos. >> >> Then there was this line: >> >> >> context_item, path_expression [ AS json_path_name ] [ PASSING { value >> AS varname } [, ...]] >> >> >> those are the parameters to JSON_TABLE() so I changed that line to: >> >> >> JSON_TABLE(context_item, path_expression [ AS json_path_name ] [ >> PASSING { value AS varname } [, ...]]) >> >> >> Some parts of the JSON_TABLE text strike me as opaque. For instance, >> there are paragraphs that more than once use the term: >> json_api_common_syntax >> >> 'json_api_common_syntax' is not explained. It turns out it's a relic >> from Nikita's original docs. I dug up a 2018 patch where the term is >> used as: >> >> 2018: >> JSON_TABLE ( >> json_api_common_syntax [ AS path_name ] >> COLUMNS ( json_table_column [, ...] ) >> (etc...) >> >> >> with explanation: >> >> 2018: >> json_api_common_syntax: >> The input data to query, the JSON path expression defining the >> query, and an optional PASSING clause. >> >> >> So that made sense then (input+jsonpath+params=api), but it doesn't >> now fit as such in the current docs. >> >> I think it would be best to remove all uses of that compound term, and >> rewrite the explanations using only the current parameter names >> (context_item, path_expression, etc). >> >> But I wasn't sure and I haven't done any such changes in the attached. >> >> Perhaps I'll give it a try during the weekend. >> >> >> > > Thanks for this. If you want to follow up that last sentence I will try > to commit a single fix early next week. > > Here's a patch that deals with most of this. There's one change you wanted that I don't think is correct, which I omitted. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index b6783b7ad0..478d6eccd8 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -18030,9 +18030,9 @@ FROM or array, but if it is CONDITIONAL it will not be applied to a single array or object. UNCONDITIONAL is the default. -If the result is a a scalar string, by default the value returned will have -surrounding quotes making it a valid JSON value. However, this behavior -is reversed if OMIT QUOTES is specified. +If the result is a scalar string, by default the value returned will +have surrounding quotes making it a valid JSON value. However, this +behavior is reversed if OMIT QUOTES is specified. The ON ERROR and ON EMPTY clauses have similar semantics to those clauses for json_value. @@ -18101,7 +18101,7 @@ FROM columns. Columns produced by NESTED PATHs at the same level are considered to be siblings, while a column produced by a NESTED PATH is - considered to be a child of the column produced by and + considered to be a child of the column produced by a NESTED PATH or row expression at a higher level. Sibling columns are always joined first. Once they are processed, the resulting rows are joined to the parent row. @@ -18151,9 +18151,9 @@ FROM the specified column. - The provided PATH expression parses the - row pattern defined by json_api_common_syntax - and fills the column with produced SQL/JSON items, one for each row. + The provided PATH expression is evaluated and + and the column is filled with the produced SQL/JSON items, one for each + row. If the PATH expression is omitted, JSON_TABLE uses the $.name path expression, @@ -18185,9 +18185,8 @@ FROM item into each row of this column. - The provided PATH expression parses the - row pattern defined by json_api_common_syntax - and fills the column with produced SQL/JSON items, one for each row. + The provided PATH expression is evaluated and + the column is filled with the produced SQL/JSON items, one for each row. If the PATH expression is omitted, JSON_TABLE uses the $.name path expression, @@ -18216,11 +18215,10 @@ FROM Generates a column and inserts a boolean item into each row of this column. - The provided PATH expression parses the - row pattern defined by json_api_common_syntax, - checks whether any SQL/JSON items were returned, and fills the column with - resulting boolean value, one for each row. - The specified type should have cast from + The provided PATH expression is evaluated, + a check whether any SQL/JSON items were returned is done, and + the column is filled with the resulting boolean value, one for each row. + The specified type should have a cast from the boolean. If the PATH expression is omitted,
Re: EINTR in ftruncate()
Thomas Munro writes: > ... but now I'm wondering if we should be more defensive and possibly > even save/restore the mask. +1, sounds like a more future-proof solution. > Originally I discounted that because I > thought I had to go through PG_SETMASK for portability reasons, but on > closer inspection, I don't see any reason not to use sigprocmask > directly in Unix-only code. Seems like the thing to do is to add a suitable operation to the pqsignal.h API. We could leave it unimplemented for now on Windows, and then worry what to do if we ever need it in that context. regards, tom lane
Re: EINTR in ftruncate()
Alvaro Herrera writes: > ISTM it would be cleaner to patch PG_SETMASK to have a second argument > and to return the original mask if that's not NULL. This is more > invasive, but there aren't that many callsites of that macro. [ shoulda read your message before replying ] Given that this needs back-patched, I think changing PG_SETMASK is a bad idea: there might be outside callers. However, we could add another macro with the additional argument. PG_GET_AND_SET_MASK? regards, tom lane
Re: First draft of the PG 15 release notes
On Thu, Jul 14, 2022 at 01:24:41PM +0700, John Naylor wrote: > Regarding this item: > > "Allow hash lookup for NOT IN clauses with many constants (David Rowley, James > Coleman) > Previously the code always sequentially scanned the list of values." > > The todo list has an entry titled "Planning large IN lists", which links to > > https://www.postgresql.org/message-id/1178821226.6034.63.camel@goldbach > > Did we already have a hash lookup for IN clauses with constants and the above > commit adds NOT IN? If so, maybe we have enough to remove this todo item. Agreed, I have removed it now. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: replacing role-level NOINHERIT with a grant-level option
On 7/11/22 11:01 PM, Robert Haas wrote: Oops. Here is a rebased version of v3 which aims to fix this bug. I found one issue where pg_upgrade is failing PG v14.4 , create these below objects create user u1 with superuser; create user u3; create group g2 with user u1; now try to perform pg_upgrade from v16(w/patch), it is failing with these messages [edb@centos7tushar bin]$ tail -10 dc2/pg_upgrade_output.d/20220714T195919.494/log/pg_upgrade_utility.log -- -- CREATE ROLE "u3"; CREATE ROLE ALTER ROLE "u3" WITH NOSUPERUSER INHERIT NOCREATEROLE NOCREATEDB LOGIN NOREPLICATION NOBYPASSRLS; ALTER ROLE GRANT "g2" TO "u1" WITH GRANTED BY "edb"; psql:dc2/pg_upgrade_output.d/20220714T195919.494/dump/pg_upgrade_dump_globals.sql:47: ERROR: syntax error at or near "BY" LINE 1: GRANT "g2" TO "u1" WITH GRANTED BY "edb"; ^ This issue is not reproducible on PG v16 (without patch). -- regards,tushar EnterpriseDB https://www.enterprisedb.com/ The Enterprise PostgreSQL Company
Re: EINTR in ftruncate()
On 2022-Jul-15, Thomas Munro wrote: > I checked that this throw-away assertion doesn't fail currently: > > if (IsUnderPostmaster) > + { > + sigset_t old; > + sigprocmask(SIG_SETMASK, NULL, ); > + Assert(memcmp(, , sizeof(UnBlockSig)) == 0); > PG_SETMASK(); > + } > > ... but now I'm wondering if we should be more defensive and possibly > even save/restore the mask. Yeah, that sounds better to me. > Originally I discounted that because I thought I had to go through > PG_SETMASK for portability reasons, but on closer inspection, I don't > see any reason not to use sigprocmask directly in Unix-only code. ISTM it would be cleaner to patch PG_SETMASK to have a second argument and to return the original mask if that's not NULL. This is more invasive, but there aren't that many callsites of that macro. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: EINTR in ftruncate()
On Fri, Jul 15, 2022 at 1:02 AM Thomas Munro wrote: > On Fri, Jul 15, 2022 at 12:15 AM Thomas Munro wrote: > > Yeah. Done, and pushed. 0002 not back-patched. > > Hmm, there were a couple of hard to understand build farm failures. > My first thought is that the signal mask stuff should only be done if > IsUnderPostmaster, otherwise it clobbers the postmaster's signal mask > when reached from dsm_postmaster_startup(). Looking into that. I pushed that change, and I hope that clears up the problems seen on eg curculio. It does raise the more general question of why it's safe to assume the signal mask is UnBlockSig on entry in regular backends. I expect it to be in released branches, but things get more complicated as we use DSM in more ways and it's not ideal to bet on that staying true. I checked that this throw-away assertion doesn't fail currently: if (IsUnderPostmaster) + { + sigset_t old; + sigprocmask(SIG_SETMASK, NULL, ); + Assert(memcmp(, , sizeof(UnBlockSig)) == 0); PG_SETMASK(); + } ... but now I'm wondering if we should be more defensive and possibly even save/restore the mask. Originally I discounted that because I thought I had to go through PG_SETMASK for portability reasons, but on closer inspection, I don't see any reason not to use sigprocmask directly in Unix-only code.
Re: Problem about postponing gathering partial paths for topmost scan/join rel
Richard Guo wrote: > On Wed, Jul 28, 2021 at 3:42 PM Richard Guo wrote: > > To fix this problem, I'm thinking we can leverage 'root->all_baserels' > to tell if we are at the topmost scan/join rel, something like: > > --- a/src/backend/optimizer/path/allpaths.c > +++ b/src/backend/optimizer/path/allpaths.c > @@ -3041,7 +3041,7 @@ standard_join_search(PlannerInfo *root, int > levels_needed, List *initial_rels) > * partial paths. We'll do the same for the topmost > scan/join rel > * once we know the final targetlist (see > grouping_planner). > */ > - if (lev < levels_needed) > + if (!bms_equal(rel->relids, root->all_baserels)) > generate_useful_gather_paths(root, rel, > false); > > Any thoughts? > > Attach a patch to include the fix described upthread. Would appreciate > any comments on this topic. I think I understand the idea but I'm not sure about the regression test. I suspect that in the plan EXPLAIN (COSTS OFF) SELECT count(*) FROM tenk1 a JOIN tenk1 b ON a.two =3D b.two FULL JOIN tenk1 c ON b.two =3D c.two; QUERY PLAN Aggregate -> Hash Full Join Hash Cond: (b.two =3D c.two) -> Gather Workers Planned: 4 -> Parallel Hash Join Hash Cond: (a.two =3D b.two) -> Parallel Seq Scan on tenk1 a -> Parallel Hash -> Parallel Seq Scan on tenk1 b -> Hash -> Gather Workers Planned: 4 -> Parallel Seq Scan on tenk1 c the Gather node is located below the "Hash Full Join" node only because that kind of join currently cannot be executed by parallel workers. If the parallel "Hash Full Join" gets implemented (I've noticed but not checked in detail [1]), it might break this test. I'd prefer a test that demonstrates that the Gather node at the top of the "subproblem plan" is useful purely from the *cost* perspective, rather than due to executor limitation. [1] https://commitfest.postgresql.org/38/2903/ -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: Handle infinite recursion in logical replication setup
On Thu, 14 Jul 2022 at 6:34 PM, vignesh C wrote: > On Thu, Jul 14, 2022 at 11:26 AM Dilip Kumar > wrote: > > > > On Wed, Jul 13, 2022 at 4:49 PM Dilip Kumar > wrote: > > > > > > On Tue, Jul 12, 2022 at 2:58 PM vignesh C wrote: > > > > > > > > On Tue, Jul 12, 2022 at 9:51 AM Amit Kapila > wrote: > > > > > > I find one thing confusing about this patch. Basically, this has two > > > option 'local' and 'any', so I would assume that all the local server > > > changes should be covered under the 'local' but now if we set some > > > origin using 'select pg_replication_origin_session_setup('aa');' then > > > changes from that session will be ignored because it has an origin id. > > > I think actually the name is creating confusion, because by local it > > > seems like a change which originated locally and the document is also > > > specifying the same. > > > > > > + If local, the subscription will request the > publisher > > > + to only send changes that originated locally. If > any, > > > > > > I think if we want to keep the option local then we should look up all > > > the origin in the replication origin catalog and identify whether it > > > is a local origin id or remote origin id and based on that filter out > > > the changes. > > > > On the other hand if we are interested in receiving the changes which > > are generated without any origin then I think we should change 'local' > > to 'none' and then in future we can provide a new option which can > > send the changes generated by all the local origin? I think other > > than this the patch LGTM. > > Thanks for the comment. The attached v33 patch has the changes to > specify origin as 'none' instead of 'local' which will not publish the > data having any origin. I think the ‘none’ might have problem from expand ability pov? what if in future we support the actual origin name and than what none mean? no origin or origin name none? Should we just give origin name empty name ‘’? Or is there some other issue? > — Dilip > -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: EINTR in ftruncate()
On Fri, Jul 15, 2022 at 12:15 AM Thomas Munro wrote: > Yeah. Done, and pushed. 0002 not back-patched. Hmm, there were a couple of hard to understand build farm failures. My first thought is that the signal mask stuff should only be done if IsUnderPostmaster, otherwise it clobbers the postmaster's signal mask when reached from dsm_postmaster_startup(). Looking into that.
Re: Lazy JIT IR code generation to increase JIT speed with partitions
On Mon, Jul 4, 2022 at 10:32 PM Andres Freund wrote: > Hi, > > On 2022-06-27 16:55:55 +0200, David Geier wrote: > > Indeed, the total JIT time increases the more modules are used. The > reason > > for this to happen is that the inlining pass loads and deserializes all > to > > be inlined modules (.bc files) from disk prior to inlining them via > > llvm::IRMover. There's already a cache for such modules in the code, but > it > > is currently unused. This is because llvm::IRMover takes the module to be > > inlined as std::unique_ptr. The by-value argument requires > > the source module to be moved, which means it cannot be reused > afterwards. > > The code is accounting for that by erasing the module from the cache > after > > inlining it, which in turns requires reloading the module next time a > > reference to it is encountered. > > > > Instead of each time loading and deserializing all to be inlined modules > > from disk, they can reside in the cache and instead be cloned via > > llvm::CloneModule() before they get inlined. Key to calling > > llvm::CloneModule() is fully deserializing the module upfront, instead of > > loading the module lazily. That is why I changed the call from > > LLVMGetBitcodeModuleInContext2() (which lazily loads the module via > > llvm::getOwningLazyBitcodeModule()) to LLVMParseBitCodeInContext2() > (which > > fully loads the module via llvm::parseBitcodeFile()). Beyond that it > seems > > like that prior to LLVM 13, cloning modules could fail with an assertion > > (not sure though if that would cause problems in a release build without > > assertions). Andres reported this problem back in the days here [1]. In > the > > meanwhile the issue got discussed in [2] and finally fixed for LLVM 13, > see > > [3]. > > Unfortunately that doesn't work right now - that's where I had started. The > problem is that IRMover renames types. Which, in the case of cloned modules > unfortunately means that types used cloned modules are also renamed in the > "origin" module. Which then causes problems down the line, because parts of > the LLVM code match types by type names. > > That can then have the effect of drastically decreasing code generation > quality over time, because e.g. inlining never manages to find signatures > compatible. > > Looking at the LLVM patches these issues seem to be gone. The same suggests dumping the bitcode and running all tests with JIT force-enabled. See below. > > > However, curiously the time spent on optimizing is also reduced (95ms > > instead of 164ms). Could this be because some of the applied > optimizations > > are ending up in the cached module? > > I suspect it's more that optimization stops being able to do a lot, due to > the > type renamign issue. > > > > @Andres: could you provide me with the queries that caused the assertion > > failure in LLVM? > > I don't think I have the concrete query. What I tend to do is to run the > whole > regression tests with forced JITing. I'm fairly certain this triggered the > bug > at the time. > > > > Have you ever observed a segfault with a non-assert-enabled build? > > I think I observed bogus code generation that then could lead to segfaults > or > such. > OK, then using a non-assert-enabled LLVM build seems good enough. Especially, given the fixes they merged in [3]. > > > > I just want to make sure this is truly fixed in LLVM 13. Running 'make > > check-world' all tests passed. > > With jit-ing forced for everything? > > One more thing to try is to jit-compile twice and ensure the code is the > same. It certainly wasn't in the past due to the above issue. > > Yes, with jitting force-enabled everything passes (I removed all checks for the various 'jit_above' costs from planner.c). I also dumped the bitcode via 'SET jit_dump_bitcode = ON' and compared the bitcode output after inlining and after optimization for two successive executions of the very same query. The .bc files after inlining only differ in two bytes. The .bc files after optimization differ in a lot more bytes. However, the same is true for the master branch. How do you exactly conclude the identity of compilation output? -- David Geier (ServiceNow)
Re: EINTR in ftruncate()
On Tue, Jul 12, 2022 at 5:45 AM Andres Freund wrote: > Hm - given that we've observed ftruncate failing with strace, and that > stracing to find problems isn't insane, shouldn't we retry the ftruncate too? > It's kind of obsoleted by your next patch, but not really, because it's not > unconceivable that other OSs behave similarly? And IIUC you're planning to not > backpatch 0002? Yeah. Done, and pushed. 0002 not back-patched. > > + pgstat_report_wait_start(WAIT_EVENT_DSM_FILL_ZERO_WRITE); > > (not new in this patch, just moved around) - FILL_ZERO_WRITE is imo an odd > description of what's happening... In my understanding this isn't doing any > writing, just allocating. But whatever... True. Fixed in master.
Re: Fix gcc warning in sync.c (usr/src/backend/storage/sync/sync.c)
On Mon, Jul 11, 2022 at 9:45 PM Kyotaro Horiguchi wrote: > At Mon, 11 Jul 2022 01:45:16 -0400, Tom Lane wrote in > > Kyotaro Horiguchi writes: > > > At Sat, 9 Jul 2022 21:53:31 -0300, Ranier Vilela > > > wrote in > > >> 528 |entry = (PendingUnlinkEntry *) lfirst(cell); > > > > > Actually, I already see the following line (maybe) at the place instead. > > >> PendingUnlinkEntry *entry = (PendingUnlinkEntry *) lfirst(cell); > > > > Yeah, I see no line matching that in HEAD either. Confusing report :-) > > However, I do not much like the code at line 528, because its > > "PendingUnlinkEntry *entry" is masking an outer variable > > "PendingFsyncEntry *entry" from line 513. We should rename > > one or both variables to avoid that masking. Fair point. > I thought the same at the moment looking this. In this case, changing > entry->syncent, unl(del)lent works. But at the same time I don't think > that can be strictly applied. Yeah, let's rename both of them. Done. > So, for starters, I compiled the whole tree with -Wshadow=local. and I > saw many warnings with it. At a glance all of them are reasonably > "fixed" but I don't think it is what we want... Wow, yeah.
Re: enable/disable broken for statement triggers on partitioned tables
On Thu, Jul 14, 2022 at 8:20 PM Dmitry Koval wrote: > > I agree that the patch shouldn't have changed that behavior, so I've > > fixed the patch so that EnableDisableTrigger() recurses even if the > > parent trigger is unchanged. > > Thanks, I think this patch is ready for committer. Great, thanks. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com
Re: Introduce "log_connection_stages" setting.
Hello, Thank you for the constructive feedback. > Your proposal will add more confusion to the already-confused logging-related > GUCs. > I wouldn't introduce a new GUC that depends on the stage of other GUC as you > proposed. Agreed, coupling a new GUC with "log_connections" is likely to lead to extra confusion. > There are 3 stages: connect (received), authorized (authenticated), > disconnect. I've taken connection stages and terminology from the existing log messages. The reason I have separated "authorized" and "authenticated" are [1] and [2] usages of "log_connections"; "received" is mentioned at [3]. >> Example: log_connection_stages=’authorized,disconnected’. > would turn the boolean value into an enum value I have thought about enums too, but we need to cover arbitrary combinations of message types, for example log only "received" and "disconnected". Hence the proposed type "string" with individual values within the string really drawn from an enum. > merge log_connections and log_disconnections into a new GUC (?) and deprecate > them. Are there any specific deprecation guidelines ? I have not found any after a quick search for GUC deprecation in Google and commit history. A deprecation scheme could look like that: 1. Mention in the docs "log_(dis)connections" are deprecated in favor of "log_connection_stages" 2. Map "log_(dis)connections" to relevant values of "log_connection_stages" in code if the latter is unset. 3. Complain in the logs if a conflict arises between the old params and "log_connection_stages", with "log_connection_stages" taking the precedence. Regards, Sergey [1] https://github.com/postgres/postgres/blob/3f8148c256e067dc2e8929ed174671ba7dc3339c/src/backend/utils/init/postinit.c#L257-L262 [2] https://github.com/postgres/postgres/blob/02c408e21a6e78ff246ea7a1beb4669634fa9c4c/src/backend/libpq/auth.c#L372 [3] https://github.com/postgres/postgres/blob/master/src/backend/postmaster/postmaster.c#L4393
Re: enable/disable broken for statement triggers on partitioned tables
I agree that the patch shouldn't have changed that behavior, so I've fixed the patch so that EnableDisableTrigger() recurses even if the parent trigger is unchanged. Thanks, I think this patch is ready for committer. -- With best regards, Dmitry Koval Postgres Professional: http://postgrespro.com
Re: Bug: Reading from single byte character column type may cause out of bounds memory reads.
В письме от среда, 13 июля 2022 г. 16:14:39 MSK пользователь Aleksander Alekseev написал: Hi! Let me join the review process. Postgres data types is field of expertise I am interested in. 0. Though it looks like a steady bug, I can't reproduce it. Not using valgrind, not using ASan (address sanitizer should catch reading out of bounds too). I am running Debian Bullseye, and tried to build both postgresl 14.4 and current master. Never the less I would dig into this issue. And start with the parts that is not covered by the patch, but seems to be important for me. 1. typename "char" (with quotes) is very-very-very confusing. it is described in documentation, but you need to be postgres expert or careful documentation reader, to notice important difference between "char" and char. What is the history if "char" type? Is it specified by some standard? May be it is good point to create more understandable alias, like byte_char, ascii_char or something for usage in practice, and keep "char" for backward compatibility only. 2. I would totally agree with Tom Lane and Isaac Morland, that problem should be also fixed on the side of type conversion. There is whole big thread about it. Guess we should come to some conclusion there 3.Fixing out of bound reading for broken unicode is also important. Though for now I am not quite sure it is possible. > - p += pg_mblen(p); > + { > + int t = pg_mblen(p); > + p += t; > + max_copy_bytes -= t; > + } Minor issue: Here I would change variable name from "t" to "char_len" or something, to make code more easy to understand. Major issue: is pg_mblen function safe to call with broken encoding at the end of buffer? What if last byte of the buffer is 0xF0 and you call pg_mblen for it? >+ copy_bytes = p - s; >+ if(copy_bytes > max_copy_bytes) >+ copy_bytes = max_copy_bytes; Here I would suggest to add comment about broken utf encoding case. That would explain why we might come to situation when we can try to copy more than we have. I would also suggest to issue a warning here. I guess person that uses postgres would prefer to know that he managed to stuff into postgres a string with broken utf encoding, before it comes to some terrible consequences. > Hi Spyridon, > > > The column "single_byte_col" is supposed to store only 1 byte. > > Nevertheless, the INSERT command implicitly casts the '' text into > > "char". This means that only the first byte of '' ends up stored in the > > column. gdb reports that "pg_mblen(p) = 4" (line 1046), which is expected > > since the pg_mblen('') is indeed 4. Later at line 1050, the memcpy will > > copy 4 bytes instead of 1, hence an out of bounds memory read happens for > > pointer 's', which effectively copies random bytes. > Many thanks for reporting this! > > > - OS: Ubuntu 20.04 > > - PSQL version 14.4 > > I can confirm the bug exists in the `master` branch as well and > doesn't depend on the platform. > > Although the bug is easy to fix for this particular case (see the > patch) I'm not sure if this solution is general enough. E.g. is there > something that generally prevents pg_mblen() from doing out of bound > reading in cases similar to this one? Should we prevent such an INSERT > from happening instead? -- Nikolay Shaplov aka Nataraj Fuzzing Engineer at Postgres Professional Matrix IM: @dhyan:nataraj.su signature.asc Description: This is a digitally signed message part.
Re: [PATCH] Implement INSERT SET syntax
Hello, Here is a new version of the patch that applies to HEAD. It also adds some regression tests for overriding {system,user} values based on Wenjing Zeng's work. Gareth On Thu, 14 Jul 2022 at 22:40, Jacob Champion wrote: > > On Thu, Apr 7, 2022 at 11:29 AM Marko Tiikkaja wrote: > > I can help with review and/or other work here. Please give me a > > couple of weeks. > > Hi Marko, did you get a chance to pick up this patchset? If not, no > worries; I can mark this RwF and we can try again in a future > commitfest. > > Thanks, > --Jacob > > > > From f1ab7edadac846883b9c12f02ecc6c64dd293060 Mon Sep 17 00:00:00 2001 From: Gareth Palmer Date: Thu, 14 Jul 2022 01:47:17 + Subject: [PATCH] Implement INSERT SET syntax Allow the target column and values of an INSERT statement to be specified using a SET clause in the same manner as that of an UPDATE statement. The advantage of using the INSERT SET style is that the columns and values are kept together, which can make changing or removing a column or value from a large list easier. A simple example that uses SET instead of a VALUES() clause: INSERT INTO t SET c1 = 'foo', c2 = 'bar', c3 = 'baz'; Values can also be sourced from other tables similar to the INSERT INTO SELECT FROM syntax: INSERT INTO t SET c1 = x.c1, c2 = x.c2 FROM x WHERE x.c2 > 10 LIMIT 10; INSERT SET is not part of any SQL standard, however this syntax is also implemented by MySQL. Their implementation does not support specifying a FROM clause. --- doc/src/sgml/ref/insert.sgml | 56 +- src/backend/parser/gram.y | 77 ++- src/test/regress/expected/identity.out| 42 ++ src/test/regress/expected/insert.out | 13 +++- src/test/regress/expected/insert_conflict.out | 2 + src/test/regress/expected/with.out| 20 + src/test/regress/sql/identity.sql | 9 +++ src/test/regress/sql/insert.sql | 1 + src/test/regress/sql/insert_conflict.sql | 3 + src/test/regress/sql/with.sql | 9 +++ 10 files changed, 210 insertions(+), 22 deletions(-) diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml index a9af9959c0..b774b6ce74 100644 --- a/doc/src/sgml/ref/insert.sgml +++ b/doc/src/sgml/ref/insert.sgml @@ -28,6 +28,16 @@ INSERT INTO table_name [ AS conflict_target ] conflict_action ] [ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] + +[ WITH [ RECURSIVE ] with_query [, ...] ] +INSERT INTO table_name [ AS alias ] +[ OVERRIDING { SYSTEM | USER} VALUE ] +SET { column_name = { expression | DEFAULT } } [, ...] +[ FROM from_clause ] +[ ON CONFLICT [ conflict_target ] conflict_action ] +[ RETURNING * | output_expression [ [ AS ] output_name ] [, ...] ] + + where conflict_target can be one of: ( { index_column_name | ( index_expression ) } [ COLLATE collation ] [ opclass ] [, ...] ) [ WHERE index_predicate ] @@ -263,6 +273,18 @@ INSERT INTO table_name [ AS + + from_clause + + +A list of table expressions, allowing columns from other tables +to be used as values in the expression. +Refer to the statement for a +description of the syntax. + + + + DEFAULT @@ -650,6 +672,15 @@ INSERT INTO films (code, title, did, date_prod, kind) VALUES + +Insert a row using SET syntax: + + +INSERT INTO films SET code = 'MH832', title = 'Blade Runner', +did = 201, date_prod = DEFAULT, kind = 'SciFi'; + + + This example inserts some rows into table films from a table tmp_films @@ -696,6 +727,16 @@ WITH upd AS ( INSERT INTO employees_log SELECT *, current_timestamp FROM upd; + + Insert multiple rows into employees_log containing +the hours worked by each employee from time_sheets. + +INSERT INTO employees_log SET id = time_sheets.employee, +total_hours = sum(time_sheets.hours) FROM time_sheets +WHERE time_sheets.date '2019-11-15' GROUP BY time_sheets.employee; + + + Insert or update new distributors as appropriate. Assumes a unique index has been defined that constrains values appearing in the @@ -752,6 +793,18 @@ INSERT INTO distributors (did, dname) VALUES (9, 'Antwerp Design') INSERT INTO distributors (did, dname) VALUES (10, 'Conrad International') ON CONFLICT (did) WHERE is_active DO NOTHING; + + Insert a new film into watched_films or increment the + number of times seen. Returns the new seen count, example assumes a + unique index has been defined that constrains the values appearing in + the title and year columns and + that seen_count defaults to 1. + +INSERT INTO watched_films SET title = 'Akira', year = 1988 + ON CONFLICT (title, year) DO UPDATE SET seen_count = watched_films.seen_count + 1 + RETURNING watched_films.seen_count; + + @@ -762,7 +815,8 @@ INSERT INTO distributors (did, dname)
Improving scalability of Parallel Bitmap Heap/Index Scan
Hi hackers, While debugging some slow queries containing Bitmap Heap/Index Scans (in short BHS / BIS), we observed a few issues regarding scalability: 1. The BIS always only runs in a single process, also when the parent BHS is parallel. The first process arriving in the BHS serves as leader and executes the BIS. 2. As long as execution is "exact" (TIDs are stored instead of page bits), the parallel BHS sorts all TIDs to ensure pages are accessed sequentially. The sort is also performed just by a single worker. Already with a few tens of thousands of pages to scan, the sort time can make up a significant portion of the total runtime. Large page counts and the need for parallelism are not uncommon for BHS, as one use case is closing the gap between index and sequential scans. The BHS costing seems to not account for that. 3. The BHS does not scale well with an increasing number of parallel workers, even when accounting for the sequential parts of execution. A perf profile shows that the TID list / bitmap iteration code heavily contents on a mutex taken for every single TID / page bit (see LWLockAcquire(>lock, LW_EXCLUSIVE) in tidbitmap.c:1067). 4. The EXPLAIN ANALYZE statistics of the parallel BHS do not include the statistics of the parallel workers. For example the number of heap pages processed is what just the leader did. Similarly to other parallel plan nodes we should aggregate statistics across workers. The EXPLAIN ANALYZE output below shows (1) to (3) happening in action for different numbers of workers. I had to obfuscate the query slightly. The difference between the startup time of the BHS and the BIS is the time it takes to sort the TID list. The self time of the BHS is just the time spent on processing the shared TID list and processing the pages. That part runs in parallel but does not scale. Workers | Total runtime | Startup time BIS | Startup time BHS | Self time BHS (excl. sorting) ---|--|-- 2 | 15322 ms | 3107 ms | 5912 ms | 9269 ms 4 | 13277 ms | 3094 ms | 5869 ms | 7260 ms 8 | 14628 ms | 3106 ms | 5882 ms | 8598 ms None of this is really new and some of it is even documented. So, what I am more wondering about is why things are the way they are and how hard it would be to change them. I am especially curious about: - What stops us from extending the BIS to run in parallel? Parallel Bitmap Index Scans are also supported. - What about reducing the sort time by, e.g. - dividing TIDs across workers, ending up with N parallely sorted streams, - cooperatively sorting the TIDs with multiple workers using barriers for synchronization, - optimizing the PagetableEntry data structure for size and using a faster sorting algorithm like e.g. radix sort - a combination of the first three options - With separate TID lists per worker process the iteration problem would be solved. Otherwise, we could - optimize the iteration code and thereby minimize the duration of the critical section, - have worker processes acquire chunks of TIDs / page bits to reduce locking. Is there interest in patches improving on the above mentioned shortcomings? If so, which options do you deem best? -- David Geier (ServiceNow) -- 2 workers Finalize Aggregate (actual time=15228.937..15321.356 rows=1 loops=1) Output: count(*) -> Gather (actual time=15187.942..15321.345 rows=2 loops=1) Output: (PARTIAL count(*)) Workers Planned: 2 Workers Launched: 2 -> Partial Aggregate (actual time=15181.486..15181.488 rows=1 loops=2) Output: PARTIAL count(*) Worker 0: actual time=15181.364..15181.366 rows=1 loops=1 Worker 1: actual time=15181.608..15181.610 rows=1 loops=1 -> Parallel Bitmap Heap Scan on foo (actual time=5912.731..15166.992 rows=269713 loops=2) Filter: ... Rows Removed by Filter: 4020149 Worker 0: actual time=5912.498..15166.936 rows=269305 loops=1 Worker 1: actual time=5912.963..15167.048 rows=270121 loops=1 -> Bitmap Index Scan on foo_idx (actual time=3107.947..3107.948 rows=8579724 loops=1) Index Cond: - Worker 1: actual time=3107.947..3107.948 rows=8579724 loops=1 Planning Time: 0.167 ms Execution Time: 15322.081 ms -- 4 workers Finalize Aggregate (actual time=13175.765..13276.415 rows=1 loops=1) Output: count(*) -> Gather (actual time=13137.981..13276.403 rows=4 loops=1) Output: (PARTIAL count(*)) Workers Planned: 4 Workers Launched: 4 -> Partial Aggregate (actual time=13130.344..13130.346
Re: remove_useless_groupby_columns is too enthusiastic
On Wed, Jul 13, 2022 at 1:31 AM Tom Lane wrote: > I tried the attached quick-hack patch that just prevents > remove_useless_groupby_columns from removing anything that > appears in ORDER BY. That successfully fixes the complained-of > case, and it doesn't change any existing regression test results. > I'm not sure whether there are any cases that it makes significantly > worse. If there happens to be an index on t2.othercol, t2.x, the patch would definitely win since we can perform a GroupAggregate with no explicit sort. If there is no such index, considering the redundant sorting work due to the excess columns, do we still win? I'm testing with the query below: create table t (a int primary key, b int, c int); insert into t select i, i%1000, i from generate_series(1,100)i; analyze t; create index t_b_a_idx on t (b, a); select sum(c) from t group by a, b order by b limit 10; If we have index 't_b_a_idx', there would be big performance improvement. Without the index, I can see some performance drop (I'm not using parallel query mechanism). > (I also kind of wonder if the fundamental problem here is that > remove_useless_groupby_columns is being done at the wrong time, > and we ought to do it later when we have more semantic info. Concur with that. Thanks Richard
Re: Patch proposal: New hooks in the connection path
On Fri, Jul 8, 2022 at 5:54 PM Bharath Rupireddy wrote: > > Looking at v2-0003 patch and emit_log_hook, how about we filter out > for those connectivity errors either based on error codes and if they > aren't unique, perhaps passing special flags to ereport API indicating > that it's a connectivity error and in the emit_log_hook we can look > for those connectivity error codes or flags to collect the stats about > the failure connections (with MyProcPort being present in > emit_log_hook)? This way, we don't need a new hook. Thoughts? Bertrand and Other Hackers, above comment may have been lost in the wild - any thoughts on it? Regards, Bharath Rupireddy.
Re: optimize lookups in snapshot [sub]xip arrays
On Wed, Jul 13, 2022 at 10:40 PM Nathan Bossart wrote: > > Hi hackers, > > A few years ago, there was a proposal to create hash tables for long > [sub]xip arrays in snapshots [0], but the thread seems to have fizzled out. > I was curious whether this idea still showed measurable benefits, so I > revamped the patch and ran the same test as before [1]. Here are the > results for 60₋second runs on an r5d.24xlarge with the data directory on > the local NVMe storage: > > writers HEAD patch diff > > 16 659 664+1% > 32 645 663+3% > 64 659 692+5% > 128 641 716+12% > 256 619 610-1% > 512 530 702+32% > 768 469 582+24% > 1000 367 577+57% Impressive. > As before, the hash table approach seems to provide a decent benefit at > higher client counts, so I felt it was worth reviving the idea. > > The attached patch has some key differences from the previous proposal. > For example, the new patch uses simplehash instead of open-coding a new > hash table. Also, I've bumped up the threshold for creating hash tables to > 128 based on the results of my testing. The attached patch waits until a > lookup of [sub]xip before generating the hash table, so we only need to > allocate enough space for the current elements in the [sub]xip array, and > we avoid allocating extra memory for workloads that do not need the hash > tables. I'm slightly worried about increasing the number of memory > allocations in this code path, but the results above seemed encouraging on > that front. > > Thoughts? > > [0] https://postgr.es/m/35960b8af917e9268881cd8df3f88320%40postgrespro.ru > [1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e%402ndquadrant.com Aren't these snapshot arrays always sorted? I see the following code: /* sort so we can bsearch() */ qsort(snapshot->xip, snapshot->xcnt, sizeof(TransactionId), xidComparator); /* sort so we can bsearch() later */ qsort(snap->subxip, snap->subxcnt, sizeof(TransactionId), xidComparator); If the ordering isn't an invariant of these snapshot arrays, can we also use the hash table mechanism for all of the snapshot arrays infrastructure rather than qsort+bsearch in a few places and hash table for others? + * The current value worked well in testing, but it's still mostly a guessed-at + * number that might need updating in the future. + */ +#define XIP_HASH_MIN_ELEMENTS (128) + Do you see a regression with a hash table for all the cases? Why can't we just build a hash table irrespective of these limits and use it for all the purposes instead of making it complex with different approaches if we don't have measurable differences in the performance or throughput? +static inline bool +XidInXip(TransactionId xid, TransactionId *xip, uint32 xcnt, + xip_hash_hash **xiph) + /* Make sure the hash table is built. */ + if (*xiph == NULL) + { + *xiph = xip_hash_create(TopTransactionContext, xcnt, NULL); + + for (int i = 0; i < xcnt; i++) Why create a hash table on the first search? Why can't it be built while inserting or creating these snapshots? Basically, instead of the array, can these snapshot structures be hash tables by themselves? I know this requires a good amount of code refactoring, but worth considering IMO as it removes bsearch thus might improve the performance further. Regards, Bharath Rupireddy.
Re: [PATCH] Add native windows on arm64 support
Hello, Please find a new patch (no further changes) rebased on top of the master. On Tue, 10 May 2022 at 02:02, Michael Paquier wrote: > On Mon, May 09, 2022 at 12:44:22PM +0100, Niyas Sait wrote: > > Microsoft updated documentation [1] and clarified that ASLR cannot be > > disabled for Arm64 targets. > > > > Because ASLR can't be disabled on ARM, ARM64, or ARM64EC architectures, > > > /DYNAMICBASE:NO isn't supported for these targets. > > Better than nothing, I guess, though this does not stand as a reason > explaining why this choice has been made. And it looks like we will > never know. We are going to need more advanced testing to check that > DYNAMICBASE is stable enough if we begin to enable it. Perhaps we > should really look at that for all the build targets we currently > support rather than just ARM, for the most modern Win32 environments > if we are going to cut support for most of the oldest releases.. > -- > Michael > -- Niyas Sait Linaro Limited v2-0001-Enable-postgres-native-build-for-windows-arm64-pl.patch Description: Binary data
Re: Issue with recovery_target = 'immediate'
At Wed, 13 Jul 2022 14:41:40 -0400, David Steele wrote in > While it is certainly true that timeline 2 cannot be replayed to from > timeline 1, it should not matter for an immediate recovery that stops > at consistency. No timeline switch will occur until promotion. Of > course the cluster could be shut down before promotion and the target > changed, but in that case couldn't timeline be adjusted at that point? > > This works by default for PostgreSQL < 12 because the default timeline > is current. Since PostgreSQL 12 the default has been latest, which > does not work by default. > > When a user does a number of recoveries it is pretty common for the > timelines to get in a state that will make most subsequent recoveries > fail. We think it makes sense for recovery_target = 'immediate' to be > a fail safe that always works no matter the state of the latest > timeline. > > Our solution has been to force recovery_target_timeline = 'current' > when recovery_target = 'immediate', but it seems like this is > something that should be done in PostgreSQL instead. > > Thoughts? I think it is natural that recovery defaultly targets the most recent update. In that sense, at the time the admin decided to recover the server from the first backup, the second backup is kind of dead, at least which should be forgotten in the future operation. Even if we want "any" backup usable, just re-targeting to the current timeline after the timeline error looks kind of inconsistent to the behavior mentioned above. To make "dead" backups behave like the "live" ones, we would need to check if the backup is in the history of each "future" timelines, then choose the latest timeline from them. # Mmm. I remember about a recent patch for pg_rewind to do the same... regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Add SPLIT PARTITION/MERGE PARTITIONS commands
This is not a review, but I think the isolation tests should be expanded. At least, include the case of serializable transactions being involved. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
Re: Backup command and functions can cause assertion failure and segmentation fault
On Fri, Jul 08, 2022 at 08:56:14AM -0400, Robert Haas wrote: > On Thu, Jul 7, 2022 at 10:58 AM Fujii Masao > wrote: >> But if many think that it's worth adding the test, I will give a >> try. But even in that case, I think it's better to commit the >> proposed patch at first to fix the bug, and then to write the patch >> adding the test. I have looked at that in details, and it is possible to rely on pg_stat_activity.wait_event to be BaseBackupThrottle, which would make sure that the checkpoint triggered at the beginning of the backup finishes and that we are in the middle of the base backup. The command for the test should be a psql command with two -c switches without ON_ERROR_STOP, so as the second pg_backup_stop() starts after BASE_BACKUP is cancelled using the same connection, for something like that: psql -c "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);" \ -c "select pg_backup_stop()" "replication=database" The last part of the test should do a pump_until() and capture "backup is not in progress" from the stderr output of the command run. This is leading me to the attached, that crashes quickly without the fix and passes with the fix. > It's true that we don't really have good test coverage of write-ahead > logging and recovery, but this doesn't seem like the most important > thing to be testing in that area, either, and developing stable tests > for stuff like this can be a lot of work. Well, stability does not seem like a problem to me here. > I do kind of feel like the patch is fixing two separate bugs. The > change to SendBaseBackup() is fixing the problem that, because there's > SQL access on replication connections, we could try to start a backup > in the middle of another backup by mixing and matching the two > different methods of doing backups. The change to do_pg_abort_backup() > is fixing the fact that, after aborting a base backup, we don't reset > the session state properly so that another backup can be tried > afterwards. > > I don't know if it's worth committing them separately - they are very > small fixes. But it would probably at least be good to highlight in > the commit message that there are two different issues. Grouping both fixes in the same commit sounds fine by me. No objections from here. -- Michael From 062bf8a06f68b7d543288260dd0cdf50bb5e9a72 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 14 Jul 2022 16:56:17 +0900 Subject: [PATCH] Add TAP test for BASE_BACKUP cancellation with pg_stop_backup() --- src/test/recovery/t/033_basebackup_cancel.pl | 60 1 file changed, 60 insertions(+) create mode 100644 src/test/recovery/t/033_basebackup_cancel.pl diff --git a/src/test/recovery/t/033_basebackup_cancel.pl b/src/test/recovery/t/033_basebackup_cancel.pl new file mode 100644 index 00..56cbaf83d0 --- /dev/null +++ b/src/test/recovery/t/033_basebackup_cancel.pl @@ -0,0 +1,60 @@ +# Copyright (c) 2021-2022, PostgreSQL Global Development Group + +# BASE_BACKUP cancellation with replication database connection. +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# Initialize primary node +my $node = PostgreSQL::Test::Cluster->new('node'); +$node->init(allows_streaming => 1); +$node->append_conf('postgresql.conf', 'log_replication_commands = on'); +$node->start; + +note "testing BASE_BACKUP cancellation"; + +my $sigchld_bb_timeout = + IPC::Run::timer($PostgreSQL::Test::Utils::timeout_default); + +# This test requires a replication connection with a database, as it mixes +# a replication command and a SQL command. The first BASE_BACKUP is throttled +# to give enough room for the cancellation running below. The second command +# for pg_backup_stop() should fail. +my $connstr = + $node->connstr('postgres') . " replication=database dbname=postgres"; +my ($sigchld_bb_stdin, $sigchld_bb_stdout, $sigchld_bb_stderr) = ('', '', ''); +my $sigchld_bb = IPC::Run::start( + [ + 'psql', '-X', '-c', "BASE_BACKUP (CHECKPOINT 'fast', MAX_RATE 32);", + '-c', 'SELECT pg_backup_stop()', + '-d', $connstr + ], + '<', + \$sigchld_bb_stdin, + '>', + \$sigchld_bb_stdout, + '2>', + \$sigchld_bb_stderr, + $sigchld_bb_timeout); + +# Waiting on the wait event BaseBackupThrottle ensures that the checkpoint +# issued at backup start completes, making the cancellation happen in the +# middle of the base backup sent. +is( $node->poll_query_until( + 'postgres', + "SELECT pg_cancel_backend(pid) FROM pg_stat_activity WHERE " + . "wait_event = 'BaseBackupThrottle' " + . "AND backend_type = 'walsender' AND query ~ 'BASE_BACKUP'"), + "1", + "WAL sender sending base backup killed"); + +# The psql command should fail on pg_stop_backup(). +ok( pump_until( + $sigchld_bb, $sigchld_bb_timeout, + \$sigchld_bb_stderr, qr/backup is not in progress/), + 'base backup cleanly cancelled'); +$sigchld_bb->finish(); + +done_testing(); -- 2.36.1 signature.asc Description: PGP signature
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Hi, On Thu, Jul 14, 2022 at 08:51:24AM +0200, Antonin Houska wrote: > Shouldn't the patch status be set to "Waiting on Author"? > > (I was curious if this is a patch that I can review.) Ah indeed, I just updated the CF entry!
Re: [PROPOSAL] Detecting plan changes with plan_id in pg_stat_activity
Shouldn't the patch status be set to "Waiting on Author"? (I was curious if this is a patch that I can review.) Julien Rouhaud wrote: > On Wed, Jun 22, 2022 at 11:05:54PM +, Imseih (AWS), Sami wrote: > > >Can you describe how it's kept in sync, and how it makes sure that the > > > property > > >is maintained over restart / gc? I don't see any change in the code > > > for the > > >2nd part so I don't see how it could work (and after a quick test it > > > indeed > > >doesn't). > > > > There is a routine called qtext_hash_sync which removed all entries from > > the qtext_hash and reloads it will all the query ids from pgss_hash. > > [...] > > All the points when it's called, an exclusive lock is held.this allows or > > syncing all > > The present queryid's in pgss_hash with qtext_hash. > > So your approach is to let the current gc / file loading behavior happen as > before and construct your mapping hash using the resulting query text / offset > info. That can't work. > > > >2nd part so I don't see how it could work (and after a quick test it > > > indeed > > >doesn't). > > > > Can you tell me what test you used to determine it is not in sync? > > What test did you use to determine it is in sync? Have you checked how the > gc/ > file loading actually work? > > In my case I just checked the size of the query text file after running some > script that makes sure that there are the same few queries ran by multiple > different roles, then: > > Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 559B > pg_ctl restart > Size of $PGDATA/pg_stat_tmp/pgss_query_texts.stat: 2383B > > > >Can you share more details on the benchmarks you did? Did you also run > > >benchmark on workloads that induce entry eviction, with and without > > > need for > > >gc? Eviction in pgss is already very expensive, and making things > > > worse just > > >to save a bit of disk space doesn't seem like a good compromise. > > > > Sorry this was poorly explained by me. I went back and did some benchmarks. > > Attached is > > The script and results. But here is a summary: > > On a EC2 r5.2xlarge. The benchmark I performed is: > > 1. create 10k tables > > 2. create 5 users > > 3. run a pgbench script that performs per transaction a select on > > A randomly chosen table for each of the 5 users. > > 4. 2 variants of the test executed . 1 variant is with the default > > pg_stat_statements.max = 5000 > > and one test with a larger pg_stat_statements.max = 1. > > But you wrote: > > > ## > > ## pg_stat_statements.max = 15000 > > ## > > So which one is it? > > > > > So 10-15% is not accurate. I originally tested on a less powered machine. > > For this > > Benchmark I see a 6% increase in TPS (732k vs 683k) when we have a larger > > sized > > pg_stat_statements.max is used and less gc/deallocations. > > Both tests show a drop in gc/deallocations and a net increase > > In tps. Less GC makes sense since the external file has less duplicate SQLs. > > On the other hand you're rebuilding the new query_offset hashtable every time > there's an entry eviction, which seems quite expensive. > > Also, as I mentioned you didn't change any of the heuristic for > need_gc_qtexts(). So if the query texts are indeed deduplicated, doesn't it > mean that gc will artificially > be called less often? The wanted target of "50% bloat" will become "50% > bloat assuming no deduplication is done" and the average query text file size > will stay the same whether the query texts are deduplicated or not. > > I'm wondering the improvements you see due to the patch or simply due to > artificially calling gc less often? What are the results if instead of using > vanilla pg_stat_statements you patch it to perform roughly the same number of > gc as your version does? > > Also your benchmark workload is very friendly with your feature, what are the > results with other workloads? Having the results with query texts that aren't > artificially long would be interesting for instance, after fixing the problems > mentioned previously. > > Also, you said that if you run that benchmark with a single user you don't see > any regression. I don't see how rebuilding an extra hashtable in > entry_dealloc(), so when holding an exclusive lwlock, while not saving any > other work elsewhere could be free? > > Looking at the script, you have: > echo "log_min_messages=debug1" >> $PGDATA/postgresql.conf; \ > > Is that really necessary? Couldn't you upgrade the gc message to a higher > level for your benchmark need, or expose some new counter in > pg_stat_statements_info maybe? Have you done the benchmark using a debug > build > or normal build? > > -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: i.e. and e.g.
At Thu, 14 Jul 2022 09:40:25 +0700, John Naylor wrote in > On Wed, Jul 13, 2022 at 4:13 PM Kyotaro Horiguchi > wrote: > > > > At Wed, 13 Jul 2022 18:09:43 +0900 (JST), Kyotaro Horiguchi < > horikyota@gmail.com> wrote in > > > So, "e.g." (for example) in the message sounds like "that is", which I > > > think is "i.e.". It should be fixed if this is correct. I'm not sure > > > whether to keep using Latin-origin acronyms like this, but in the > > > attached I used "i.e.". > > I did my own quick scan and found one use of i.e. that doesn't really fit, > in a sentence that has other grammatical issues: > > -Due to the differences how ECPG works compared to Informix's > ESQL/C (i.e., which steps > +Due to differences in how ECPG works compared to Informix's ESQL/C > (namely, which steps > are purely grammar transformations and which steps rely on the Oh! > I've pushed that in addition to your changes, thanks! Thanks! regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: First draft of the PG 15 release notes
Regarding this item: "Allow hash lookup for NOT IN clauses with many constants (David Rowley, James Coleman) Previously the code always sequentially scanned the list of values." The todo list has an entry titled "Planning large IN lists", which links to https://www.postgresql.org/message-id/1178821226.6034.63.camel@goldbach Did we already have a hash lookup for IN clauses with constants and the above commit adds NOT IN? If so, maybe we have enough to remove this todo item. -- John Naylor EDB: http://www.enterprisedb.com