Re: PGDOCS - add more links in the pub/sub reference pages
On Mon, Oct 9, 2023 at 3:32 PM Amit Kapila wrote: > > On Fri, Oct 6, 2023 at 12:15 PM Peter Smith wrote: > > > > Here is a patch to add the 2 missing references: > > > > ref/alter_subscription.sgml ==> added more ids > > ref/alter_publication.sgml ==> added link to > > "sql-altersubscription-refresh-publication" > > ref/drop_subscription.sgml ==> added link to "sql-altersubscription" > > > > - > + > new_owner > > > @@ -281,7 +281,7 @@ ALTER SUBSCRIPTION class="parameter">name RENAME TO < > > > > - > + > new_name > > Thanks for looking at this patch! > Shall we append 'params' in the above and other id's in the patch. For > example, sql-altersubscription-params-new-name. The other places like > alter_role.sgml and alter_table.sgml uses similar id's. Is there a > reason to be different here? In v1, I used the same pattern as on the CREATE SUBSCRIPTION page, which doesn't look like those... ~~~ The "Parameters" section describes some things that really are parameters: e.g. "sql-altersubscription-name" "sql-altersubscription-new-owner" "sql-altersubscription-new-name"> I agree, emphasising that those ones are parameters is better. Changed like this in v2. "sql-altersubscription-params-name" "sql-altersubscription-params-new-owner" "sql-altersubscription-params-new-name"> ~ But, the "Parameters" section also describes other SQL syntax clauses which are not really parameters at all. e.g. "sql-altersubscription-refresh-publication" "sql-altersubscription-enable" "sql-altersubscription-disable" So I felt those ones are more intuitive left as they are -- e.g., instead of having ids/linkends like: "sql-altersubscription-params-refresh-publication" "sql-altersubscription-params-enable" "sql-altersubscription-params-disable" ~~ PSA v2. == Kind Regards, Peter Smith. Fujitsu Australia v2-0001-Add-more-link.patch Description: Binary data
Re: pg16: XX000: could not find pathkey item to sort
On Mon, Oct 9, 2023 at 12:13 PM David Rowley wrote: > I've now pushed the patch that trims off the Pathkeys for the ORDER BY > / DISTINCT aggregates. Thanks for pushing! > Those results are a bit noisy. Perhaps a few more runs might yield > more consistency, but it seems that there's not too much overhead to > it. If I take the minimum value out of the 3 runs from each, it comes > to about 1.5% extra time spent in planning. Perhaps that's OK. I agree that the overhead is acceptable, especially it only happens in USE_ASSERT_CHECKING builds. Thanks Richard
Re: Synchronizing slots from primary to standby
On Mon, Oct 9, 2023 at 10:51 AM Drouvot, Bertrand wrote: > > I like the idea and I think that's the one that seems the more reasonable > to me. I'd vote for this idea with: > > - standby_slot_names on the primary (could also be set on standbys in case of > cascading context) > - enable_failover at logical slot creation + API to enable/disable it at wish > - enable_syncslot on the standbys > Thanks, these definitions sounds reasonable to me. -- With Regards, Amit Kapila.
Re: Synchronizing slots from primary to standby
Hi, On 10/6/23 6:48 PM, Amit Kapila wrote: On Wed, Oct 4, 2023 at 5:34 PM Drouvot, Bertrand wrote: On 10/4/23 1:50 PM, shveta malik wrote: On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila wrote: On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand wrote: On 10/4/23 6:26 AM, shveta malik wrote: On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila wrote: How about an alternate scheme where we define sync_slot_names on standby but then store the physical_slot_name in the corresponding logical slot (ReplicationSlotPersistentData) to be synced? So, the standby will send the list of 'sync_slot_names' and the primary will add the physical standby's slot_name in each of the corresponding sync_slot. Now, if we do this then even after restart, we should be able to know for which physical slot each logical slot needs to wait. We can even provide an SQL API to reset the value of standby_slot_names in logical slots as a way to unblock decoding in case of emergency (for example, corresponding when physical standby never comes up). Looks like a better approach to me. It solves most of the pain points like: 1) Avoids the need of multiple GUCs 2) Primary and standby need not to worry to be in sync if we maintain sync-slot-names GUC on both As per my understanding of this approach, we don't want 'sync-slot-names' to be set on the primary. Do you have a different understanding? Same understanding. We do not need it to be set on primary by user. It will be GUC on standby and standby will convey it to primary. +1, same understanding here. At PGConf NYC, I had a brief discussion on this topic with Andres where yet another approach to achieve this came up. Great! Have a parameter like enable_failover at the slot level (this will be persistent information). Users can set it during the create/alter subscription or via pg_create_logical_replication_slot(). Also, on physical standby, there will be a parameter like enable_syncslot. All the physical standbys that have set enable_syncslot will receive all the logical slots that are marked as enable_failover. To me, whether to sync a particular slot is a slot-level property, so defining it in this new way seems reasonable. Yeah, as this is a slot-level property, I agree that this seems reasonable. Also that sounds more natural to me with this approach. The primary is really the one that "drives" which slots can be synced. I like it. One could also set enable_failover while creating a logical slot on a physical standby (so that cascading standbys could also have "extra slot" to sync as compare to "level 1" standbys). I think this will simplify the scheme a bit but still, the list of physical standby's for which logical slots wait during decoding needs to be maintained as we thought. Right. But, how about with the above two parameters (enable_failover and enable_syncslot), we have standby_slot_names defined on the primary. That avoids the need to store the list of standby_slot_names in logical slots and simplifies the implementation quite a bit, right? Agree. Now, one can think if we have a parameter like 'standby_slot_names' then why do we need enable_syncslot on physical standby but that will be required to invoke sync worker which will pull logical slot's information? yes and enable_sync slot on the standby could also be used to "pause" the sync on standbys (by disabling the parameter) if one would want to (without the need to modify anything on the primary). The advantage of having standby_slot_names defined on primary is that we can selectively wait on the subset of physical standbys where we are syncing the slots. Yeah and this flexibility/filtering looks somehow mandatory to me. I think this will be something similar to 'synchronous_standby_names' in the sense that the physical standbys mentioned in standby_slot_names will behave as synchronous copies with respect to slots and after failover user can switch to one of these physical standby and others can start following new master/publisher. Thoughts? I like the idea and I think that's the one that seems the more reasonable to me. I'd vote for this idea with: - standby_slot_names on the primary (could also be set on standbys in case of cascading context) - enable_failover at logical slot creation + API to enable/disable it at wish - enable_syncslot on the standbys Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
On Sun, Oct 8, 2023 at 9:10 PM Noah Misch wrote: > > I didn't think of any phrasing that clearly explained things without the > reader consulting the code. I considered these: > > "socket file descriptor out of range: %d" [what range?] > > Quick drive-by...but it seems that < 0 is a distinctly different problem than exceeding FD_SETSIZE. I'm unsure what would cause the former but the error for the later seems like: error: "Requested socket file descriptor %d exceeds connection limit of %d", fd, FD_SETSIZE-1 hint: Reduce the requested number of concurrent connections In short, the concept of range doesn't seem applicable here. There is an error state at the max, and some invalid system state condition where the position within a set is somehow negative. These should be separated - with the < 0 check happening first. And apparently this condition isn't applicable when dealing with jobs in connect_slot? Also, I see that for connections we immediately issue FD_SET so this check on the boundary of the file descriptor makes sense. But the remaining code in connect_slot doesn't involve FD_SET so the test for the file descriptor falling within its maximum seems to be coming out of nowhere. Likely this is all good, and the lack of symmetry is just due to the natural progressive development of the code, but it stands out to the uninitiated looking at this patch. David J.
Re: Making aggregate deserialization (and WAL receive) functions slightly faster
David Rowley writes: > On Thu, 5 Oct 2023 at 21:24, David Rowley wrote: >> I looked at the patch again and I just couldn't bring myself to change >> it to that. If it were a macro going into stringinfo.h then I'd agree >> with having a macro or inline function as it would allow the reader to >> conceptualise what's happening after learning what the function does. > I've pushed this patch. I didn't go with the macros in the end. I > just felt it wasn't an improvement and none of the existing code which > does the same thing bothers with a macro. I got the idea you were not > particularly for the macro given that you used the word "Perhaps". Sorry for not having paid more attention to this thread ... but I'm pretty desperately unhappy with the patch as-pushed. I agree with the criticism that this is a very repetitive coding pattern that could have used a macro. But my real problem with this: + buf.data = VARDATA_ANY(sstate); + buf.len = VARSIZE_ANY_EXHDR(sstate); + buf.maxlen = 0; + buf.cursor = 0; is that it totally breaks the StringInfo API without even attempting to fix the API specs that it falsifies, particularly this in stringinfo.h: *maxlen is the allocated size in bytes of 'data', i.e. the maximum *string size (including the terminating '\0' char) that we can *currently store in 'data' without having to reallocate *more space. We must always have maxlen > len. I could see inventing a notion of a "read-only StringInfo" to legitimize what you've done here, but you didn't bother to try. I do not like this one bit. This is a fairly fundamental API and we shouldn't be so cavalier about breaking it. regards, tom lane
Re: PGDOCS - add more links in the pub/sub reference pages
On Fri, Oct 6, 2023 at 12:15 PM Peter Smith wrote: > > Here is a patch to add the 2 missing references: > > ref/alter_subscription.sgml ==> added more ids > ref/alter_publication.sgml ==> added link to > "sql-altersubscription-refresh-publication" > ref/drop_subscription.sgml ==> added link to "sql-altersubscription" > - + new_owner @@ -281,7 +281,7 @@ ALTER SUBSCRIPTION name RENAME TO < - + new_name Shall we append 'params' in the above and other id's in the patch. For example, sql-altersubscription-params-new-name. The other places like alter_role.sgml and alter_table.sgml uses similar id's. Is there a reason to be different here? -- With Regards, Amit Kapila.
Re: Making aggregate deserialization (and WAL receive) functions slightly faster
On Thu, 5 Oct 2023 at 21:24, David Rowley wrote: > > On Thu, 5 Oct 2023 at 18:23, Michael Paquier wrote: > > Ahem, well. Based on this argument my own argument does not hold > > much. Perhaps I'd still use a macro at the top of array_userfuncs.c > > and numeric.c, to avoid repeating the same pattern respectively two > > and four times, documenting once on top of both macros that this is a > > fake StringInfo because of the reasons documented in these code paths. > > I looked at the patch again and I just couldn't bring myself to change > it to that. If it were a macro going into stringinfo.h then I'd agree > with having a macro or inline function as it would allow the reader to > conceptualise what's happening after learning what the function does. I've pushed this patch. I didn't go with the macros in the end. I just felt it wasn't an improvement and none of the existing code which does the same thing bothers with a macro. I got the idea you were not particularly for the macro given that you used the word "Perhaps". Anyway, thank you for having a look at this. David
Re: pg16: XX000: could not find pathkey item to sort
On Mon, 9 Oct 2023 at 12:42, David Rowley wrote: > Maybe it's worth checking the total planning time spent in a run of > the regression tests with and without the patch to see how much > overhead it adds to the "average case". I've now pushed the patch that trims off the Pathkeys for the ORDER BY / DISTINCT aggregates. As for the patch to verify the pathkeys during create plan, I patched master with the attached plan_times.patch.txt and used the following to check the time spent in the planner for 3 runs of make installcheck. $ for i in {1..3}; do pg_ctl start -D pgdata -l plantime.log > /dev/null && cd pg_src && make installcheck > /dev/null && cd .. && grep "planning time in" plantime.log|sed -E -e 's/.*planning time in (.*) nanoseconds/\1/'|awk '{nanoseconds += $1} END{print nanoseconds}' && pg_ctl stop -D pgdata > /dev/null && rm plantime.log; done Master: 1855788104 1839655412 1740769066 Patched: 1917797221 1766606115 1881322655 Those results are a bit noisy. Perhaps a few more runs might yield more consistency, but it seems that there's not too much overhead to it. If I take the minimum value out of the 3 runs from each, it comes to about 1.5% extra time spent in planning. Perhaps that's OK. David diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c index 1e4dd27dba..3c713782f1 100644 --- a/src/backend/optimizer/plan/planner.c +++ b/src/backend/optimizer/plan/planner.c @@ -17,6 +17,7 @@ #include #include +#include #include "access/genam.h" #include "access/htup_details.h" @@ -274,11 +275,22 @@ planner(Query *parse, const char *query_string, int cursorOptions, ParamListInfo boundParams) { PlannedStmt *result; + struct timespec start, end; + + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start); if (planner_hook) result = (*planner_hook) (parse, query_string, cursorOptions, boundParams); else result = standard_planner(parse, query_string, cursorOptions, boundParams); + + clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end); + + elog(LOG, +"planning time in %f nanoseconds", +((double) (end.tv_sec * 10 + end.tv_nsec) - +(double) (start.tv_sec * 10 + start.tv_nsec))); + return result; }
Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
On Mon, Oct 09, 2023 at 04:22:52PM +1300, Thomas Munro wrote: > On Mon, Oct 9, 2023 at 3:25 PM Noah Misch wrote: > > The "fd >= FD_SETSIZE" check is irrelevant on Windows. See comments in the > > attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE > > differently. > > The first associated failure was commit dea12a1 (2023-08-03); as a doc > > commit, > > it's an innocent victim. Bisect blamed 8488bab "ci: Use windows VMs instead > > of windows containers" (2023-02), long before the failures began. I'll > > guess > > some 2023-08 Windows update or reconfiguration altered file descriptor > > assignment, hence the onset of failures. In my tests of v16, the highest > > file > > descriptor was 948. I could make v16 fail by changing --client=5 to > > --client=90 in the test. With the attached patch and --client=90, v16 > > peaked > > at file descriptor 2040. > > Ohhh. Thanks for figuring that out. I'd never noticed that quirk. I > didn't/can't test it but the patch looks reasonable after reading the > referenced docs. For what it's worth, I did all that testing through CI, using hacks like https://cirrus-ci.com/task/5352974499708928 to get the relevant information. I didn't bother trying non-CI, since buildfarm animals aren't failing. > Maybe instead of just "out of range" I'd say "out of > range for select()" since otherwise it might seem a little baffling -- > what range? I was trying to follow this from the style guide: Avoid mentioning called function names, either; instead say what the code was trying to do: BAD:open() failed: %m BETTER: could not open file %s: %m I didn't think of any phrasing that clearly explained things without the reader consulting the code. I considered these: "socket file descriptor out of range: %d" [what range?] "socket file descriptor out of range for select(): %d" [style guide violation] "socket file descriptor out of range for checking whether ready for reading: %d" [?] "socket file descriptor out of range for synchronous I/O multiplexing: %d" [term from POSIX] > Random water cooler speculation about future ideas: I wonder > whether/when we can eventually ditch select() and use WSAPoll() for > this on Windows, which is supposed to be like poll(). There's a > comment explaining that we prefer select() because it has a higher > resolution sleep than poll() (us vs ms), so we wouldn't want to use > poll() on Unixen, but we know that Windows doesn't even use high > resolution timers for any user space APIs anyway so that's just not a > concern on that platform. The usual reason nobody ever uses WSAPoll() > is because the Curl guys told[1] everyone that it's terrible in a way > that would quite specifically break our usage. But I wonder, because > the documentation now says "As of Windows 10 version 2004, when a TCP > socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is > indicated", it *sounds* like it might have been fixed ~3 years ago? > One idea would be to hide it inside WaitEventSet, and let WaitEventSet > be used in front end code (we couldn't use the > WaitForMultipleObjects() version because it's hard-limited to 64 > events, but in front end code we don't need latches and other stuff, > so we could have a sockets-only version with WSAPoll()). The idea of > using WaitEventSet for pgbench on Unix was already mentioned a couple > of times by others for general scalability reasons -- that way we > could finish up using a better kernel interface on all supported > platforms. We'd probably want to add high resolution time-out support > (I already posted a patch for that somewhere), or we'd be back to 1ms > timeouts. > > [1] https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/ Interesting. I have no current knowledge to add there.
Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
On Mon, Oct 9, 2023 at 3:25 PM Noah Misch wrote: > The "fd >= FD_SETSIZE" check is irrelevant on Windows. See comments in the > attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE differently. > The first associated failure was commit dea12a1 (2023-08-03); as a doc commit, > it's an innocent victim. Bisect blamed 8488bab "ci: Use windows VMs instead > of windows containers" (2023-02), long before the failures began. I'll guess > some 2023-08 Windows update or reconfiguration altered file descriptor > assignment, hence the onset of failures. In my tests of v16, the highest file > descriptor was 948. I could make v16 fail by changing --client=5 to > --client=90 in the test. With the attached patch and --client=90, v16 peaked > at file descriptor 2040. Ohhh. Thanks for figuring that out. I'd never noticed that quirk. I didn't/can't test it but the patch looks reasonable after reading the referenced docs. Maybe instead of just "out of range" I'd say "out of range for select()" since otherwise it might seem a little baffling -- what range? Random water cooler speculation about future ideas: I wonder whether/when we can eventually ditch select() and use WSAPoll() for this on Windows, which is supposed to be like poll(). There's a comment explaining that we prefer select() because it has a higher resolution sleep than poll() (us vs ms), so we wouldn't want to use poll() on Unixen, but we know that Windows doesn't even use high resolution timers for any user space APIs anyway so that's just not a concern on that platform. The usual reason nobody ever uses WSAPoll() is because the Curl guys told[1] everyone that it's terrible in a way that would quite specifically break our usage. But I wonder, because the documentation now says "As of Windows 10 version 2004, when a TCP socket fails to connect, (POLLHUP | POLLERR | POLLWRNORM) is indicated", it *sounds* like it might have been fixed ~3 years ago? One idea would be to hide it inside WaitEventSet, and let WaitEventSet be used in front end code (we couldn't use the WaitForMultipleObjects() version because it's hard-limited to 64 events, but in front end code we don't need latches and other stuff, so we could have a sockets-only version with WSAPoll()). The idea of using WaitEventSet for pgbench on Unix was already mentioned a couple of times by others for general scalability reasons -- that way we could finish up using a better kernel interface on all supported platforms. We'd probably want to add high resolution time-out support (I already posted a patch for that somewhere), or we'd be back to 1ms timeouts. [1] https://daniel.haxx.se/blog/2012/10/10/wsapoll-is-broken/
Re: Fix output of zero privileges in psql
On Sun, Oct 8, 2023 at 6:55 PM Erik Wienhold wrote: > On 2023-10-08 06:14 +0200, Laurenz Albe write: > > On Sat, 2023-10-07 at 20:41 +0200, Erik Wienhold wrote: > > > > If you are happy enough with my patch, shall we mark it as ready for > > > > committer? > > > > > > I amended your patch to also document the effect of \pset null in this > > > case. See the attached v2. > > > > +1 > > > > If you mention in ddl.sgml that you can use "\pset null" to distinguish > > default from no privileges, you should mention that this only works with > > psql. Many people out there don't use psql. > > I don't think this is necessary because that section in ddl.sgml is > already about psql and \dp. > I agree that we are simply detailing how psql makes this information available to the reader and leave users of other clients on their own to figure out their own methods - which many clients probably have handled for them anyway. For us, I would suggest the following wording: In addition to the situation of printing all acl items, the Access and Column privileges columns report two other situations specially. In the rare case where all privileges for an object have been explicitly removed, including from the owner and PUBLIC, (i.e., has empty privileges) these columns will display NULL. The other case is where the built-in default privileges are in effect, in which case these columns will display the empty string. (Note that by default psql will print NULL as an empty string, so in order to visually distinguish these two cases you will need to issue the \pset null meta-command and choose some other string to print for NULLs). Built-in default privileges include all privileges for the owner, as well as those granted to PUBLIC per for relevant object types as described above. The built-in default privileges are only in effect if the object has not been the target of a GRANT or REVOKE and also has not had its default privileges modified using ALTER DEFAULT PRIVILEGES. (???: if it is possible to revert back to the state of built-in privileges that would need to be described here.) The above removes the parenthetical regarding null in the catalogs, this is intentional as it seems that the goal here is to use psql instead of the catalogs and adding its use of null being printed as the empty string just seems likely to add confusion. > > Also, I'm not sure if "zero privileges" will be readily understood by > > everybody. Perhaps "no privileges at all, even for the object owner" > > would be a better wording. > > Changed in v3 to "empty privileges" with an explanation that this means > "no privileges at all, even for the object owner". > +1 We probably should add the two terms to the glossary: empty privileges: all privileges explicitly revoked from the owner and PUBLIC (where applicable), and none otherwise granted. built-in default privileges: owner having all privileges and no privileges granted or removed via ALTER DEFAULT PRIVILEGES > > Perhaps it would also be good to mention this in the psql documentation. > > Just once under \pset null with a reference to section 5.7? Something > like "This is also useful for distinguishing default privileges from > empty privileges as explained in Section 5.7." > > Or instead under every command affected by this change? \dp and \ddp > already have such a reference ("The meaning of the privilege display is > explained in Section 5.7.") > > I prefer the first one because it's less effort ;) Also done in v3. > We've chosen a poor default and I'd rather inform the user of specific meta-commands to be wary of this poor default and change it at the point they will be learning about the meta-commands adversely affected. That said, I'd be willing to document that these commands, because they are affected by empty string versus null, require a non-empty-string value for \pset null and will choose the string '' for the duration of the meta-command's execution if the user's setting is incompatible. David J.
Re: Does anyone ever use OPTIMIZER_DEBUG?
On Mon, 9 Oct 2023 at 10:28, Tom Lane wrote: > > David Rowley writes: > > It looks like nobody is objecting to this. I understand that not > > everyone who might object will have read this email thread, so what I > > propose to do here is move along and just commit the patch to swap out > > debug_print_rel and use pprint instead. If that's done now then there > > are around 10 months where we could realistically revert this again if > > someone were to come forward with an objection. > > > Sound ok? > > WFM. Ok. I've pushed the patch. Let's see if anyone comes forward. David
Re: Server crash on RHEL 9/s390x platform against PG16
It looks like an issue with JIT. If I disable the JIT then the above query runs successfully. postgres=# set jit to off; SET postgres=# SELECT * FROM rm32044_t1 LEFT JOIN rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by rm32044_t1.pkey,label,hidden; pkey | val | pkey | label | hidden | pkey | val | pkey --+--+--+-++--+-+-- 1 | row1 |1 | hidden | t |1 | 1 | 1 | row1 |1 | hidden | t |2 | 1 | 2 | row2 |2 | visible | f |1 | 1 | 2 | row2 |2 | visible | f |2 | 1 | (4 rows) Any idea on this? On Mon, Sep 18, 2023 at 11:20 AM Suraj Kharage < suraj.khar...@enterprisedb.com> wrote: > Few more details on this: > > (gdb) p val > $1 = 0 > (gdb) p i > $2 = 3 > (gdb) f 3 > #3 0x01a1ef70 in ExecCopySlotMinimalTuple (slot=0x202e4f8) at > ../../../../src/include/executor/tuptable.h:472 > 472 return slot->tts_ops->copy_minimal_tuple(slot); > (gdb) p *slot > $3 = {type = T_TupleTableSlot, tts_flags = 16, tts_nvalid = 8, tts_ops = > 0x1b6dcc8 , tts_tupleDescriptor = 0x202e0e8, tts_values = > 0x202e540, tts_isnull = 0x202e580, tts_mcxt = 0x1f54550, tts_tid = > {ip_blkid = {bi_hi = 65535, > bi_lo = 65535}, ip_posid = 0}, tts_tableOid = 0} > (gdb) p *slot->tts_tupleDescriptor > $2 = {natts = 8, tdtypeid = 2249, tdtypmod = -1, tdrefcount = -1, constr = > 0x0, attrs = 0x202cd28} > > (gdb) p slot.tts_values[3] > $4 = 0 > (gdb) p slot.tts_values[2] > $5 = 1 > (gdb) p slot.tts_values[1] > $6 = 34027556 > > > As per the resultslot, it has 0 value for the third attribute (column > lable). > Im testing this on the docker container and facing some issues with gdb > hence could not able to debug it further. > > Here is a explain plan: > > postgres=# explain (verbose, costs off) SELECT * FROM rm32044_t1 LEFT JOIN > rm32044_t2 ON rm32044_t1.pkey = rm32044_t2.pkey, rm32044_t3 LEFT JOIN > rm32044_t4 ON rm32044_t3.pkey = rm32044_t4.pkey order by > rm32044_t1.pkey,label,hidden; > > QUERY PLAN > > > - > Incremental Sort >Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, > rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey >Sort Key: rm32044_t1.pkey, rm32044_t2.label, rm32044_t2.hidden >Presorted Key: rm32044_t1.pkey >-> Merge Left Join > Output: rm32044_t1.pkey, rm32044_t1.val, rm32044_t2.pkey, > rm32044_t2.label, rm32044_t2.hidden, rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey > Merge Cond: (rm32044_t1.pkey = rm32044_t2.pkey) > -> Sort >Output: rm32044_t3.pkey, rm32044_t3.val, rm32044_t4.pkey, > rm32044_t1.pkey, rm32044_t1.val >Sort Key: rm32044_t1.pkey >-> Nested Loop > Output: rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey, rm32044_t1.pkey, rm32044_t1.val > -> Merge Left Join >Output: rm32044_t3.pkey, rm32044_t3.val, > rm32044_t4.pkey >Merge Cond: (rm32044_t3.pkey = rm32044_t4.pkey) >-> Sort > Output: rm32044_t3.pkey, rm32044_t3.val > Sort Key: rm32044_t3.pkey > -> Seq Scan on public.rm32044_t3 >Output: rm32044_t3.pkey, > rm32044_t3.val >-> Sort > Output: rm32044_t4.pkey > Sort Key: rm32044_t4.pkey > -> Seq Scan on public.rm32044_t4 >Output: rm32044_t4.pkey > -> Materialize >Output: rm32044_t1.pkey, rm32044_t1.val >-> Seq Scan on public.rm32044_t1 > Output: rm32044_t1.pkey, rm32044_t1.val > -> Sort >Output: rm32044_t2.pkey, rm32044_t2.label, rm32044_t2.hidden >Sort Key: rm32044_t2.pkey >-> Seq Scan on public.rm32044_t2 > Output: rm32044_t2.pkey, rm32044_t2.label, > rm32044_t2.hidden > (34 rows) > > > It seems like while building the innerslot for merge join, the value for > attnum 1 is not getting fetched correctly. > > On Tue, Sep 12, 2023 at 3:27 PM Suraj Kharage < > suraj.khar...@enterprisedb.com> wrote: > >> Hi, >> >> Found server crash on RHEL 9/s390x platform with below test case - >> >> *Machine details:* >> >> >> >> >> >> >> >> *[edb@9428da9d2137 postgres]$ cat /etc/redhat-release AlmaLinux release >> 9.2 (Turquoise Kodkod)[edb@9428da9d2137 postgres]$ lscpuArchitecture: >> s390x CPU op-mode(s):
Re: REL_15_STABLE: pgbench tests randomly failing on CI, Windows only
On Mon, Sep 04, 2023 at 03:18:40PM +1200, Thomas Munro wrote: > Somehow these tests have recently become unstable and have failed a few times: > > https://github.com/postgres/postgres/commits/REL_15_STABLE > > The failures are like: > > [22:32:26.722] # Failed test 'pgbench simple update stdout > /(?^:builtin: simple update)/' > [22:32:26.722] # at t/001_pgbench_with_server.pl line 119. > [22:32:26.722] # 'pgbench (15.4) > [22:32:26.722] # ' > [22:32:26.722] # doesn't match '(?^:builtin: simple update)' Fun. That's a test of "pgbench -C". The test harness isn't reporting pgbench's stderr, so I hacked things to get that and the actual file descriptor values being assigned. The failure case gets "pgbench: error: too many client connections for select()" in stderr, from this pgbench.c function: static void add_socket_to_set(socket_set *sa, int fd, int idx) { if (fd < 0 || fd >= FD_SETSIZE) { /* * Doing a hard exit here is a bit grotty, but it doesn't seem worth * complicating the API to make it less grotty. */ pg_fatal("too many client connections for select()"); } FD_SET(fd, &sa->fds); if (fd > sa->maxfd) sa->maxfd = fd; } The "fd >= FD_SETSIZE" check is irrelevant on Windows. See comments in the attached patch; in brief, Windows assigns FDs and uses FD_SETSIZE differently. The first associated failure was commit dea12a1 (2023-08-03); as a doc commit, it's an innocent victim. Bisect blamed 8488bab "ci: Use windows VMs instead of windows containers" (2023-02), long before the failures began. I'll guess some 2023-08 Windows update or reconfiguration altered file descriptor assignment, hence the onset of failures. In my tests of v16, the highest file descriptor was 948. I could make v16 fail by changing --client=5 to --client=90 in the test. With the attached patch and --client=90, v16 peaked at file descriptor 2040. Thanks, nm P.S. Later, we should change test code so the pgbench stderr can't grow an extra line without that line appearing in test logs. Author: Noah Misch Commit: Noah Misch Don't spuriously report FD_SETSIZE exhaustion on Windows. Starting on 2023-08-03, this intermittently terminated a "pgbench -C" test in CI. It could affect a high-client-count "pgbench" without "-C". While parallel reindexdb and vacuumdb reach the same problematic check, sufficient client count and/or connection turnover is less plausible for them. Given the lack of examples from the buildfarm or from manual builds, reproducing this must entail rare operating system configurations. Also correct the associated error message, which was wrong for non-Windows and did not comply with the style guide. Back-patch to v12, where the pgbench check first appeared. While v11 vacuumdb has the problematic check, reaching it with typical vacuumdb usage is implausible. Reviewed by FIXME. Discussion: https://postgr.es/m/CA+hUKG+JwvTNdcyJTriy9BbtzF1veSRQ=9m_zkfn9_lqe7k...@mail.gmail.com diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c index e3919395..a1f566c 100644 --- a/src/bin/pgbench/pgbench.c +++ b/src/bin/pgbench/pgbench.c @@ -7837,14 +7837,22 @@ clear_socket_set(socket_set *sa) static void add_socket_to_set(socket_set *sa, int fd, int idx) { + /* See connect_slot() for background on this code. */ +#ifdef WIN32 + if (sa->fds.fd_count + 1 >= FD_SETSIZE) + { + pg_log_error("too many concurrent database clients for this platform: %d", +sa->fds.fd_count + 1); + exit(1); + } +#else if (fd < 0 || fd >= FD_SETSIZE) { - /* -* Doing a hard exit here is a bit grotty, but it doesn't seem worth -* complicating the API to make it less grotty. -*/ - pg_fatal("too many client connections for select()"); + pg_log_error("socket file descriptor out of range: %d", fd); + pg_log_error_hint("Try fewer concurrent database clients."); + exit(1); } +#endif FD_SET(fd, &sa->fds); if (fd > sa->maxfd) sa->maxfd = fd; diff --git a/src/fe_utils/parallel_slot.c b/src/fe_utils/parallel_slot.c index aca238b..2ee4b6d 100644 --- a/src/fe_utils/parallel_slot.c +++ b/src/fe_utils/parallel_slot.c @@ -295,8 +295,40 @@ connect_slot(ParallelSlotArray *sa, int slotno, const char *dbname) slot->connection = connectDatabase(sa->cparams, sa->progname, sa->echo, false, true); sa->cparams->override_dbname = old_override; - if (PQsocket(slot->connection) >= FD_SETSIZE) - pg_fatal("too many jobs for this platform"); + /* +* POSIX defines FD_SETSIZE as the highest file descriptor acceptable to +* FD_SE
Re: pg16: XX000: could not find pathkey item to sort
On Mon, Oct 9, 2023 at 7:42 AM David Rowley wrote: > On Sun, 8 Oct 2023 at 23:52, Richard Guo wrote: > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > > are computable from the targetlist, it seems that we do not need to trim > > them off, because prepare_sort_from_pathkeys() will add resjunk target > > entries for them. But it's also no harm if we trim them off. So I > > think the patch is a pretty safe fix. +1 to it. > > hmm, I think one of us does not understand what is going on here. I > tried to explain in [1] why we *need* to strip off the pathkeys added > by adjust_group_pathkeys_for_groupagg(). Sorry I didn't make myself clear. I understand why we need to trim off the pathkeys added by adjust_group_pathkeys_for_groupagg(). What I meant was that if the new added pathkeys are *computable* from the existing target entries, then prepare_sort_from_pathkeys() will add resjunk target entries for them, so there seems to be no problem even if we do not trim them off. For example explain (verbose, costs off) select a, count(distinct a+1) from prt1 group by a order by a; QUERY PLAN Result Output: prt1.a, (count(DISTINCT ((prt1.a + 1 -> Merge Append Sort Key: prt1.a, ((prt1.a + 1)) -> GroupAggregate Output: prt1.a, count(DISTINCT ((prt1.a + 1))), ((prt1.a + 1)) Group Key: prt1.a -> Sort Output: prt1.a, ((prt1.a + 1)) Sort Key: prt1.a, ((prt1.a + 1)) -> Seq Scan on public.prt1_p1 prt1 Output: prt1.a, (prt1.a + 1) ... Expression 'a+1' is *computable* from the existing entry 'a', so we just add a new resjunk target entry for 'a+1', and there is no error planning this query. But if we change 'a+1' to something that is not computable, then we would have problems (without your fix), and the reason has been well explained by your messages. explain (verbose, costs off) select a, count(distinct b) from prt1 group by a order by a; ERROR: could not find pathkey item to sort Having said that, I think it's the right thing to do to trim off the new added pathkeys, even if they are *computable*. In the plan above, the '(prt1.a + 1)' in GroupAggregate's targetlist and MergeAppend's pathkeys are actually redundant. It's good to remove it. > Can you explain why you think we can put a resjunk "b" in the target > list of the GroupAggregate in the above case? Hmm, I don't think we can do that, because 'b' is not *computable* from the existing target entries, as I explained above. Thanks Richard
Re: Fix output of zero privileges in psql
On 2023-10-08 06:14 +0200, Laurenz Albe write: > On Sat, 2023-10-07 at 20:41 +0200, Erik Wienhold wrote: > > > If you are happy enough with my patch, shall we mark it as ready for > > > committer? > > > > I amended your patch to also document the effect of \pset null in this > > case. See the attached v2. > > +1 > > If you mention in ddl.sgml that you can use "\pset null" to distinguish > default from no privileges, you should mention that this only works with > psql. Many people out there don't use psql. I don't think this is necessary because that section in ddl.sgml is already about psql and \dp. > Also, I'm not sure if "zero privileges" will be readily understood by > everybody. Perhaps "no privileges at all, even for the object owner" > would be a better wording. Changed in v3 to "empty privileges" with an explanation that this means "no privileges at all, even for the object owner". > Perhaps it would also be good to mention this in the psql documentation. Just once under \pset null with a reference to section 5.7? Something like "This is also useful for distinguishing default privileges from empty privileges as explained in Section 5.7." Or instead under every command affected by this change? \dp and \ddp already have such a reference ("The meaning of the privilege display is explained in Section 5.7.") I prefer the first one because it's less effort ;) Also done in v3. -- Erik >From b7ddd4daa87baab83ad7e857c996b5a4a0b3ffa7 Mon Sep 17 00:00:00 2001 From: Laurenz Albe Date: Fri, 6 Oct 2023 22:12:33 +0200 Subject: [PATCH v3] psql: honor "\pset null" in backslash commands In the output of backslash commands, NULL was always displayed as empty string, rather than honoring the values set with "\pset null". This was surprising, and in some cases it was downright confusing: for example, default privileges (stored as NULL) were displayed just like an empty aclitem[], making these cases undistinguishable in the output. Add this missing detail to the documentation. Discussion: https://postgr.es/m/96d6885a-5e25-9ae8-4a1a-d7e557a5fe9c%40mtneva.com --- doc/src/sgml/ddl.sgml | 7 -- doc/src/sgml/ref/psql-ref.sgml | 3 ++- src/bin/psql/describe.c| 43 -- 3 files changed, 7 insertions(+), 46 deletions(-) diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 075ff32991..436d655d3c 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -2353,8 +2353,9 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw; If the Access privileges column is empty for a given object, it means the object has default privileges (that is, its - privileges entry in the relevant system catalog is null). Default - privileges always include all privileges for the owner, and can include + privileges entry in the relevant system catalog is null) or empty privileges + (that is, no privileges at all, even for the object owner). + Default privileges always include all privileges for the owner, and can include some privileges for PUBLIC depending on the object type, as explained above. The first GRANT or REVOKE on an object will instantiate the default @@ -2362,6 +2363,8 @@ GRANT SELECT (col1), UPDATE (col1) ON mytable TO miriam_rw; example, miriam=arwdDxt/miriam) and then modify them per the specified request. Similarly, entries are shown in Column privileges only for columns with nondefault privileges. + You can, for example, set \pset null '(default)' to + distinguish default privileges from empty privileges. (Note: for this purpose, default privileges always means the built-in default privileges for the object's type. An object whose privileges have been affected by an ALTER DEFAULT diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index d94e3cacfc..966697eb50 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -3079,7 +3079,8 @@ lo_import 152801 Sets the string to be printed in place of a null value. The default is to print nothing, which can easily be mistaken for an empty string. For example, one might prefer \pset null - '(null)'. + '(null)'. This is also useful for distinguishing default + privileges from empty privileges as explained in . diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index bac94a338c..224aa22575 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -124,7 +124,6 @@ describeAggregates(const char *pattern, bool verbose, bool showSystem) if (!res) return false; - myopt.nullPrint = NULL; myopt.title = _("List of aggregate functions"); myopt.translate_header = true; @@ -197,7 +196,6 @@ describeAccessMethods(const char *pattern, bool verbose) if (!res) return false; - myopt.nullPrint
Re: Draft LIMIT pushdown to Append and MergeAppend patch
On Mon, Oct 9, 2023 at 8:52 AM David Rowley wrote: > On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek wrote: > > On 8 Oct 2023, at 03:33, Andy Fan wrote: > >> For the patches for performance improvement, it is better to provide > >> an example to show how much benefits we can get. As for this case, > >> I'm doubtful it can work as an improvement. > > > Could you elaborate a little why you think it won’t work as an > improvement? > > Is it because in practice LIMIT _is_ pushed down already during > execution? > > From what I understand postgres_fdw does indeed fetch on demand. > > OTOH pushing down LIMIT is considered an improvement (as witnessed in > the postgres_fdw code itself after d50d172e51) > > In my opinion, analysis of where we can push LIMIT node deeper into > the plan is an interesting area for research and work. > Well, really impressive on your feedback, always professional and PATIENT, just like what you helped me in pretty many areas for the last N years. Yes, I think "analysis of where we can .." is the key point. SQL is an complex area because of its ever-changing, so providing an example will be pretty helpful for communication. However "doubtful" might not be a good word:(:( > > The Append / MergeAppend case is certainly one place where pushing > LIMIT nodes down could be beneficial. Of course, if there was some > filter or join or aggregation/distinct, etc that occurred after the > Append/MergeAppend then you'd not be able to do this as the pushed > limit might be met before we've got enough rows at the top level of > the plan and that could result in fewer than rows being output than > what was requested in the query (aka wrong results). I'm not totally agree with this, my main idea came from Tom's reply at [1]. The best situation should be we know we should "plan for" top-N rows, but we don't need to really add the execution overhead. and in my current knowledge, in the pretty number of cases, we have achieved this already. If an area which is missed and can be shown within an example, I would be pretty happy to change my mind. Andy was working > around this area recently (see [1] and corresponding thread). He had > to add a bunch of code that checked for operations that might mean > we'd need to read more than the tuple_fraction rows from the Append > node. If we had nice way to know when building base rel paths if > there were or were not upper-level operations that affect if LIMIT > pushing can be done, then that might make such a patch cleaner. Andy > in one of his proposed patches [2] added a field to PlannerInfo to > mark this. That field wasn't well named or documented, so anything > you did would have to be an improvement on that. > > Looking at your patch, I see you've solved this by delaying the > pushing down until the upper planner and just checking if the lower > planner (join search) produced an Append or MergeAppend path. I've not > really developed an opinion yet what's the best method. I feel > creating the correct paths up-front is likely more flexible and more > true to the path costing code. > It might also be worth taking a step backwards and seeing if there are > any other cases where we could push down LIMITs and try to see if > there's something more generic that could be built to do this in a way > that's beneficial to more cases. I can't think of any off the top of > my head, but I've not thought very hard about it. > > FWIW, it's trivial to mock up the possible benefits of pushing LIMIT > nodes down with something like the following: > > create table rp (a int) partition by range (a); > create table rp1 partition of rp for values from (0) to (100); > create table rp2 partition of rp for values from (100) to (200); > insert into rp select x from generate_series(0, 199)x; > > -- limit not pushed: > explain analyze select * from rp order by a limit 10; > Execution Time: 148.041 ms > > -- limit pushed (mocked): > explain analyze (select * from rp1 order by a limit 10) union all > (select * from rp2 order by a limit 10) limit 10; > Execution Time: 49.263 ms > about 3x faster for this case. > > However, it may also be worth you reading over [3] and the ultimate > reason I changed my mind on that being a good idea. Pushing LIMITs > below an Append seems quite incomplete when we don't yet push sorts > below Appends, which is what that patch did. I just was not comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort > until executor shutdown. That'll be done for rescan reasons, but it > does mean if you pushed Sort below Append that we could have a very > large number of sort nodes holding onto work_mem all at once. I find > that a bit scary, especially so given the excessive partitioning cases > I've seen and worked on recently. I did consider if we maybe could > adjust nodeSort.c to do tuplesort_end() after the final row. We'd need > to only do that if we could be somehow certain there were going to be
Re: pg_stat_statements and "IN" conditions
Hi, this is my first email to the pgsql hackers. I came across this email thread while looking at https://github.com/rails/rails/pull/49388 for Ruby on Rails one of the popular web application framework by replacing every query `in` clause with `any` to reduce similar entries in `pg_stat_statements`. I want this to be solved on the PostgreSQL side, mainly because I want to avoid replacing every in clause with any to reduce similar entries in pg_stat_statements. It would be nice to have this patch reviewed. As I'm not familiar with C and PostgreSQL source code, I'm not reviewing this patch myself, I applied this patch to my local PostgreSQL and the Active Record unit tests ran successfully. -- Yasuo Honda
Re: Draft LIMIT pushdown to Append and MergeAppend patch
On Sun, 8 Oct 2023 at 18:32, Michał Kłeczek wrote: > On 8 Oct 2023, at 03:33, Andy Fan wrote: >> For the patches for performance improvement, it is better to provide >> an example to show how much benefits we can get. As for this case, >> I'm doubtful it can work as an improvement. > Could you elaborate a little why you think it won’t work as an improvement? > Is it because in practice LIMIT _is_ pushed down already during execution? > From what I understand postgres_fdw does indeed fetch on demand. > OTOH pushing down LIMIT is considered an improvement (as witnessed in the > postgres_fdw code itself after d50d172e51) In my opinion, analysis of where we can push LIMIT node deeper into the plan is an interesting area for research and work. The Append / MergeAppend case is certainly one place where pushing LIMIT nodes down could be beneficial. Of course, if there was some filter or join or aggregation/distinct, etc that occurred after the Append/MergeAppend then you'd not be able to do this as the pushed limit might be met before we've got enough rows at the top level of the plan and that could result in fewer than rows being output than what was requested in the query (aka wrong results). Andy was working around this area recently (see [1] and corresponding thread). He had to add a bunch of code that checked for operations that might mean we'd need to read more than the tuple_fraction rows from the Append node. If we had nice way to know when building base rel paths if there were or were not upper-level operations that affect if LIMIT pushing can be done, then that might make such a patch cleaner. Andy in one of his proposed patches [2] added a field to PlannerInfo to mark this. That field wasn't well named or documented, so anything you did would have to be an improvement on that. Looking at your patch, I see you've solved this by delaying the pushing down until the upper planner and just checking if the lower planner (join search) produced an Append or MergeAppend path. I've not really developed an opinion yet what's the best method. I feel creating the correct paths up-front is likely more flexible and more true to the path costing code. It might also be worth taking a step backwards and seeing if there are any other cases where we could push down LIMITs and try to see if there's something more generic that could be built to do this in a way that's beneficial to more cases. I can't think of any off the top of my head, but I've not thought very hard about it. FWIW, it's trivial to mock up the possible benefits of pushing LIMIT nodes down with something like the following: create table rp (a int) partition by range (a); create table rp1 partition of rp for values from (0) to (100); create table rp2 partition of rp for values from (100) to (200); insert into rp select x from generate_series(0, 199)x; -- limit not pushed: explain analyze select * from rp order by a limit 10; Execution Time: 148.041 ms -- limit pushed (mocked): explain analyze (select * from rp1 order by a limit 10) union all (select * from rp2 order by a limit 10) limit 10; Execution Time: 49.263 ms about 3x faster for this case. However, it may also be worth you reading over [3] and the ultimate reason I changed my mind on that being a good idea. Pushing LIMITs below an Append seems quite incomplete when we don't yet push sorts below Appends, which is what that patch did. I just was not comfortable proceeding with [3] as nodeSort.c holds onto the tuplesort until executor shutdown. That'll be done for rescan reasons, but it does mean if you pushed Sort below Append that we could have a very large number of sort nodes holding onto work_mem all at once. I find that a bit scary, especially so given the excessive partitioning cases I've seen and worked on recently. I did consider if we maybe could adjust nodeSort.c to do tuplesort_end() after the final row. We'd need to only do that if we could be somehow certain there were going to be no rescans. I don't have a plan on how that would be detected. Anyway, I don't think anything above is that useful to push you forward with that. I just didn't want you running off thinking we don't want to see improvements in this area. I certainly do. David [1] https://postgr.es/m/caaphdvomgyc+1eb8g5remuumegwhe2c_f8yvjjuo1kuhzj0...@mail.gmail.com [2] https://postgr.es/m/caku4awqatnprycb_cmeddywzvgffxuujr75f5lbzwnuq0jv...@mail.gmail.com [3] https://postgr.es/m/caaphdvojkdbr3mr59jxmacybyhb6q_5qpru+dy93en8wm+x...@mail.gmail.com
Re: pg16: XX000: could not find pathkey item to sort
On Sun, 8 Oct 2023 at 23:52, Richard Guo wrote: > On Thu, Oct 5, 2023 at 2:26 PM David Rowley wrote: >> >> So in short, I propose the attached fix without any regression tests >> because I feel that any regression test would just mark that there was >> a big in create_agg_path() and not really help with ensuring we don't >> end up with some similar problem in the future. > > > If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() > are computable from the targetlist, it seems that we do not need to trim > them off, because prepare_sort_from_pathkeys() will add resjunk target > entries for them. But it's also no harm if we trim them off. So I > think the patch is a pretty safe fix. +1 to it. hmm, I think one of us does not understand what is going on here. I tried to explain in [1] why we *need* to strip off the pathkeys added by adjust_group_pathkeys_for_groupagg(). Given the following example: create table ab (a int,b int); explain (costs off) select a,count(distinct b) from ab group by a; QUERY PLAN GroupAggregate Group Key: a -> Sort Sort Key: a, b -> Seq Scan on ab (5 rows) adjust_group_pathkeys_for_groupagg() will add the pathkey for the "b" column and that results in the Sort node sorting on {a,b}. It's simply not at all valid to have the GroupAggregate path claim that its pathkeys are also (effectively) {a,b}" as "b" does not and cannot legally exist after the aggregation takes place. We cannot put a resjunk "b" in the targetlist of the GroupAggregate either as there could be any number "b" values aggregated. Can you explain why you think we can put a resjunk "b" in the target list of the GroupAggregate in the above case? >> >> I have some concerns that the assert_pathkeys_in_target() function >> might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm >> not proposing to commit that without further discussion. > > > Yeah, it looks like some heavy to call assert_pathkeys_in_target() for > each path node. Can we run some benchmarks to see how much overhead it > would bring to USE_ASSERT_CHECKING build? I think it'll be easy to show that there is an overhead to it. It'll be in the realm of the ~41ms patched vs ~24ms unpatched that I showed in [2]. That's quite an extreme case. Maybe it's worth checking the total planning time spent in a run of the regression tests with and without the patch to see how much overhead it adds to the "average case". David [1] https://postgr.es/m/CAApHDvpJJigQRW29TppTOPYp+Aui0mtd3MpfRxyKv=n-tb6...@mail.gmail.com [2] https://postgr.es/m/caaphdvo7rzcqyw-gnkzr6qcijcqf8vjlkj4xfk-kawvyaw1...@mail.gmail.com
Re: Problem, partition pruning for prepared statement with IS NULL clause.
On Sat, 7 Oct 2023 at 03:11, Sergei Glukhov wrote: > I noticed that combination of prepared statement with generic plan and > 'IS NULL' clause could lead partition pruning to crash. > Test case: > -- > set plan_cache_mode to force_generic_plan; > prepare stmt AS select * from hp where a is null and b = $1; > explain execute stmt('xxx'); Thanks for the detailed report and proposed patch. I think your proposed fix isn't quite correct. I think the problem lies in InitPartitionPruneContext() where we assume that the list positions of step->exprs are in sync with the keyno. If you look at perform_pruning_base_step() the code there makes a special effort to skip over any keyno when a bit is set in opstep->nullkeys. It seems that your patch is adjusting the keyno that's given to the PruneCxtStateIdx() and it looks like (for your test case) it'll end up passing keyno==0 when it should be passing keyno==1. keyno is the index of the partition key, so you can't pass 0 when it's for key index 1. I wonder if it's worth expanding the tests further to cover more of the pruning cases to cover run-time pruning too. David fix_runtime_pruning_keyno_idx.patch Description: Binary data
Re: CREATE DATABASE with filesystem cloning
On Mon, Oct 9, 2023 at 2:20 AM Andrew Dunstan wrote: > I've had to disable COW on my BTRFS-resident buildfarm animals (see > previous discussion re Direct I/O). Right, because it is still buggy[1]. I don't see any sign that a fix has been committed yet, assuming that is the right thing (and it sure sounds like it). It means you still have to disable COW to run the 004_io_direct.pl test for now, but that's an independent thing due hopefully to be fixed soon, and you can still run PostgreSQL just fine with COW enabled as it is by default as long as you don't turn on debug_io_direct (which isn't for users yet anyway). Since I hadn't actually tried out this cloning stuff out on Linux/btrfs before and was claiming that it should work, I took it for a quick unscientific spin (literally, this is on a spinning SATA disk for extra crunchy slowness...). I created a scale 500 pgbench database, saw that du -h showed 7.4G, and got these times: postgres=# create database foodb_copy template=foodb strategy=file_copy; CREATE DATABASE Time: 124019.885 ms (02:04.020) postgres=# create database foodb_clone template=foodb strategy=file_clone; CREATE DATABASE Time: 8618.195 ms (00:08.618) That's something, but not as good as I was expecting, so let's also try Linux/XFS for reference on the same spinny rust... One thing I learned is that if you have an existing XFS partition, it might have been created without reflinks enabled (see output of xfs_info) as that was the default not very long ago and it's not changeable later, so on the box I'm writing from I had to do a fresh mkfs.xfs to see any benefit from this. postgres=# create database foodb_copy template=foodb strategy=file_copy; CREATE DATABASE Time: 49157.876 ms (00:49.158) postgres=# create database foodb_clone template=foodb strategy=file_clone; CREATE DATABASE Time: 1026.455 ms (00:01.026) Not bad. To understand what that did, we can check which physical blocks on disk hold the first segment of the pgbench_accounts table in foodb and foodb_clone: $ sudo xfs_bmap /mnt/xfs/pgdata/base/16384/16400 /mnt/xfs/pgdata/base/16384/16400: 0: [0..1637439]: 977586048..979223487 1: [1637440..2097151]: 1464966136..1465425847 $ sudo xfs_bmap /mnt/xfs/pgdata/base/16419/16400 /mnt/xfs/pgdata/base/16419/16400: 0: [0..1637439]: 977586048..979223487 1: [1637440..2097151]: 1464966136..1465425847 The same blocks. Now let's update a tuple on the second page of pgbench_accounts in the clone: foodb=# update pgbench_accounts set bid = bid + 1 where ctid = '(1, 1)'; UPDATE 1 foodb=# checkpoint; CHECKPOINT Now some new physical disk blocks have been allocated just for that page, but the rest are still clones: $ sudo xfs_bmap /mnt/xfs/pgdata/base/16419/16400 /mnt/xfs/pgdata/base/16419/16400: 0: [0..15]: 977586048..977586063 1: [16..31]: 977586064..977586079 2: [32..1637439]: 977586080..979223487 3: [1637440..2097151]: 1464966136..1465425847 I tried changing it to work in 1MB chunks and add the CFI() (v2 attached), and it didn't affect the time measurably and also didn't generate any extra extents as displayed by xfs_bmap, so the end result is the same. I haven't looked into the chunked version on the other file systems yet. I don't have the numbers to hand (different machines far from me right now) but FreeBSD/ZFS and macOS/APFS were on the order of a few hundred milliseconds for the same scale of pgbench on laptop storage (so not comparable with the above). I also tried a -s 5000 database, and saw that XFS could clone a 74GB database just as fast as the 7.4GB database (still ~1s). At a guess, this is going to scale not so much by total data size, but more by things like number of relations, segment size and internal (invisible) fragmentation due to previous cloning/update history in filesystem-dependent ways, since those are the things that generate extents (contiguous ranges of physical blocks to be referenced by the new file). [1] https://lore.kernel.org/linux-btrfs/ae81e48b0e954bae1c3451c0da1a24ae7146606c.1676684984.git.bo...@bur.io/T/#u From dd5c07d873e90a6feac371d2879015a5e6154632 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 2 Sep 2023 22:21:49 +1200 Subject: [PATCH v2] WIP: CREATE DATABASE ... STRATEGY=FILE_CLONE. Similar to STRATEGY=FILE_COPY, but using facilities that tell the OS explicitly that we're copying, so that it has the opportunity to share block ranges in copy-on-write file systems, or maybe push down the copy to network file systems and storage devices. Currently works on Linux, FreeBSD and macOS. More systems could be supported. XXX need docs XXX need to think more about chunk size and interruptibility XXX need redo -- what to do if unsupported during redo, plain copy? Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com --- configure | 2 +- configure.ac | 1 + meson.build| 1 +
Re: JSON Path and GIN Questions
On Sep 12, 2023, at 21:00, Erik Wienhold wrote: >> I posted this question on Stack Overflow >> (https://stackoverflow.com/q/77046554/79202), >> and from the suggestion I got there, it seems that @@ expects a boolean to be >> returned by the path query, while @? wraps it in an implicit exists(). Is >> that >> right? > > That's also my understanding. We had a discussion about the docs on @@, @?, > and > jsonb_path_query on -general a while back [1]. Maybe it's useful also. Hi, finally getting back to this, still fiddling to figure out the differences. From the thread you reference [1], is the point that @@ and jsonb_path_match() can only be properly used with a JSON Path expression that’s a predicate check? If so, as far as I can tell, only exists() around the entire path query, or the deviation from the SQL standard that allows an expression to be a predicate? This suggest to me that the "Only the first item of the result is taken into account” bit from the docs may not be quite right. Consider this example: david=# select jsonb_path_query('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); jsonb_path_query -- false false (2 rows) david=# select jsonb_path_match('{"a":[false,true,false]}', '$.a ?(@[*] == false)'); ERROR: single boolean result is expected jsonb_path_match(), it turns out, only wants a single result. But furthermore perhaps the use of a filter predicate rather than a predicate expression for the entire path query is an error? Curiously, @@ seems okay with it: david=# select '{"a":[false,true,false]}'@@ '$.a ?(@[*] == false)'; ?column? -- t Not a predicate query, and somehow returns true even though the first item of the result is false? Is that how it should be? Best, David [1] https://www.postgresql.org/message-id/CACJufxE01sxgvtG4QEvRZPzs_roggsZeVvBSGpjM5tzE5hMCLA%40mail.gmail.com
Re: Synchronizing slots from primary to standby
On Fri, Oct 6, 2023 at 7:37 PM Alvaro Herrera wrote: > > On 2023-Sep-27, Peter Smith wrote: > > > 3. get_local_synced_slot_names > > > > + for (int i = 0; i < max_replication_slots; i++) > > + { > > + ReplicationSlot *s = &ReplicationSlotCtl->replication_slots[i]; > > + > > + /* Check if it is logical synchronized slot */ > > + if (s->in_use && SlotIsLogical(s) && s->data.synced) > > + { > > + for (int j = 0; j < MySlotSyncWorker->dbcount; j++) > > + { > > > > Loop variables are not declared in the common PG code way. > > Note that since we added C99 as a mandatory requirement for compilers in > commit d9dd406fe281, we've been using declarations in loop initializers > (see 143290efd079). We have almost 500 occurrences of this already. > Older code, obviously, does not use them, but that's no reason not to > introduce them in new code. I think they make the code a bit leaner, so > I suggest to use these liberally. > I also prefer the C99 style, but I had misunderstood there was still a convention to keep using the old style for code consistency (e.g. many new patches I see still seem to use the old style). Thanks for confirming that C99 loop variables are fine for any new code. @Shveta/Ajin - please ignore/revert all my old review comments about this point. == Kind Regards, Peter Smith. Fujitsu Australia.
Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag
Michael Paquier writes: > On Fri, Oct 06, 2023 at 03:29:28PM +0900, Michael Paquier wrote: >> Andres, are there logs for this TAP test on serinus? Or perhaps there >> is a core file that could be looked at? The other animals are not >> showing anything for the moment. > Well, it looks OK. Still that's itching a bit. There have been intermittent failures on various buildfarm machines since this went in. After seeing one on my own animal mamba [1], I tried to reproduce it manually on that machine, and it does indeed fail about one time in two. The buildfarm script is not managing to capture the relevant log files, but what I see in a manual run is that 001_worker_spi.pl logs this: ... # Postmaster PID for node "mynode" is 21897 [01:19:53.931](2.663s) ok 5 - bgworkers all launched [01:19:54.711](0.780s) ok 6 - dynamic bgworkers all launched error running SQL: 'psql::1: ERROR: could not start background process HINT: More details may be available in the server log.' while running 'psql -XAtq -d port=56393 host=/tmp/PETPK0Stwi dbname='postgres' -f - -v ON_ERROR_STOP=1' with sql 'SELECT worker_spi_launch(12, 16394, 16395);' at /home/tgl/pgsql/src/test/modules/worker_spi/../../../../src/test/perl/PostgreSQL/Test/Cluster.pm line 2009. # Postmaster PID for node "mynode" is 21897 ### Stopping node "mynode" using mode immediate # Running: pg_ctl -D /home/tgl/pgsql/src/test/modules/worker_spi/tmp_check/t_001_worker_spi_mynode_data/pgdata -m immediate stop waiting for server to shut down done server stopped # No postmaster PID for node "mynode" [01:19:55.032](0.321s) # Tests were run but no plan was declared and done_testing() was not seen. [01:19:55.035](0.002s) # Looks like your test exited with 29 just after 6. and in the postmaster log 2023-10-08 01:19:54.265 EDT [5820] LOG: worker_spi dynamic worker 10 initialized with schema10.counted 2023-10-08 01:19:54.378 EDT [27776] 001_worker_spi.pl LOG: statement: SELECT worker_spi_launch(11, 5, 16395); 2023-10-08 01:19:54.476 EDT [18120] 001_worker_spi.pl LOG: statement: SELECT datname, usename, wait_event FROM pg_stat_activity WHERE backend_type = 'worker_spi dynamic' AND pid IN (5820, 428) ORDER BY datname; 2023-10-08 01:19:54.548 EDT [428] LOG: worker_spi dynamic worker 11 initialized with schema11.counted 2023-10-08 01:19:54.680 EDT [152] 001_worker_spi.pl LOG: statement: SELECT datname, usename, wait_event FROM pg_stat_activity WHERE backend_type = 'worker_spi dynamic' AND pid IN (5820, 428) ORDER BY datname; 2023-10-08 01:19:54.779 EDT [1675] 001_worker_spi.pl LOG: statement: ALTER DATABASE mydb ALLOW_CONNECTIONS false; 2023-10-08 01:19:54.854 EDT [26562] 001_worker_spi.pl LOG: statement: SELECT worker_spi_launch(12, 16394, 16395); 2023-10-08 01:19:54.878 EDT [23636] FATAL: database "mydb" is not currently accepting connections 2023-10-08 01:19:54.888 EDT [21897] LOG: background worker "worker_spi dynamic" (PID 23636) exited with exit code 1 2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl ERROR: could not start background process 2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl HINT: More details may be available in the server log. 2023-10-08 01:19:54.888 EDT [26562] 001_worker_spi.pl STATEMENT: SELECT worker_spi_launch(12, 16394, 16395); 2023-10-08 01:19:54.912 EDT [21897] LOG: received immediate shutdown request 2023-10-08 01:19:55.014 EDT [21897] LOG: database system is shut down What it looks like to me is that there is a code path by which "could not start background process" is reported as a failure of the SELECT worker_spi_launch() query itself. The test script is not expecting that, because it executes that query with # bgworker cannot be launched with connection restriction. my $worker3_pid = $node->safe_psql('postgres', qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id);]); $node->wait_for_log( qr/database "mydb" is not currently accepting connections/, $log_offset); so safe_psql bails out and we get no further. Since this only seems to happen on slow machines, I'd call it a timing problem or race condition. Unless you want to argue that the race should not happen, probably the fix is to make the test script cope with this worker_spi_launch() call failing. As long as we see the expected result from wait_for_log, we can be pretty sure the right thing happened. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba&dt=2023-10-08%2001%3A00%3A22
Re: Does anyone ever use OPTIMIZER_DEBUG?
David Rowley writes: > It looks like nobody is objecting to this. I understand that not > everyone who might object will have read this email thread, so what I > propose to do here is move along and just commit the patch to swap out > debug_print_rel and use pprint instead. If that's done now then there > are around 10 months where we could realistically revert this again if > someone were to come forward with an objection. > Sound ok? WFM. regards, tom lane
Re: Does anyone ever use OPTIMIZER_DEBUG?
On Tue, 3 Oct 2023 at 12:29, David Rowley wrote: > > On Fri, 29 Sept 2023 at 10:59, Tom Lane wrote: > > We could also discuss keeping the "tracing" aspect of it, but > > replacing debug_print_rel with pprint(rel), which'd still allow > > removal of all the "DEBUG SUPPORT" stuff at the bottom of allpaths.c. > > That's pretty much all of the maintenance-requiring stuff in it. > > To assist discussion, I've attached a patch for that. It looks like nobody is objecting to this. I understand that not everyone who might object will have read this email thread, so what I propose to do here is move along and just commit the patch to swap out debug_print_rel and use pprint instead. If that's done now then there are around 10 months where we could realistically revert this again if someone were to come forward with an objection. Sound ok? David
Re: [PoC/RFC] Multiple passwords, interval expirations
On Fri, Oct 6, 2023 at 1:29 PM Jeff Davis wrote: > > On Thu, 2023-10-05 at 14:28 -0700, Gurjeet Singh wrote: > > > This way there's a notion of a 'new' and 'old' passwords. > > IIUC, you are proposing that there are exactly two slots, NEW and OLD. > When adding a password, OLD must be unset and it moves NEW to OLD, and > adds the new password in NEW. DROP only works on OLD. Is that right? Yes, that's what I was proposing. But thinking a bit more about it, the _implicit_ shuffling of labels 'new' and 'old' doesn't feel right to me. The password that used to be referred to as 'new' now automatically gets labeled 'old'. > It's close to the idea of deprecation, except that adding a new > password implicitly deprecates the existing one. I'm not sure about > that -- it could be confusing. +1 > We could also try using a verb like "expire" that could be coupled with > a date, and that way all old passwords would always have some validity > period. Forcing the users to pick an expiry date for a password they intend to rollover, when an expiry date did not exist before for that password, feels like adding more burden to their password rollover decision making. The dates and rules of password rollover may be a part of a system external to their database, (wiki, docs, calendar, etc.) which now they will be forced to translate into a timestamp to specify in the rollover commands. I believe we should fix the _names_ of the slots the 2 passwords are stored in, and provide commands that manipulate those slots by respective names; the commands should not implicitly move the passwords between the slots. Additionally, we may provide functions that provide observability info for these slots. I propose the slot names FIRST and SECOND (I picked these because these keywords/tokens already exist in the grammar, but not yet sure if the grammar rules would allow their use; feel free to propose better names). FIRST refers to the the existing slot, namely rolpassword. SECOND refers to the new slot we'd add, that is, a pgauthid column named rolsecondpassword. The existing commands, or when neither FIRST nor SECOND are specified, the commands apply to the existing slot, a.k.a. FIRST. The user interface might look like the following: -- Create a user, as usual CREATE ROLE u1 PASSWORD 'p1' VALID UNTIL '2020/01/01'; -- This automatically occupies the 'first' slot -- Add another password that the user can use for authentication. ALTER ROLE u1 ADD SECOND PASSWORD 'p2' VALID UNTIL '2021/01/01'; -- Change both the passwords' validity independently; this solves the -- problem with the previous '2-element stack' approach, where we -- could not address the password at the bottom of the stack. ALTER ROLE u1 SECOND PASSWORD VALID UNTIL '2022/01/01'; ALTER ROLE u1 [ [ FIRST ] PASSWORD ] VALID UNTIL '2022/01/01'; -- If, for some reason, the user wants to get rid of the latest password added. ALTER ROLE u1 DROP SECOND PASSWORD; -- Add a new password (p3) in 'second' slot ALTER ROLE u1 ADD SECOND PASSWORD 'p3' VALID UNTIL '2023/01/01'; -- Attempting to add a password while the respective slot is occupied -- results in error ALTER ROLE u1 ADD [ [ FIRST ] PASSWORD ] 'p4' VALID UNTIL '2024/01/01'; ERROR: first password already exists ALTER ROLE u1 ADD SECOND PASSWORD 'p4' VALID UNTIL '2024/01/01'; ERROR: second password already exists -- Users can use this function to check whether a password slot is occupied => select password_exists('u1', 'first'); password_exists - t => select password_exists('u1', 'second'); password_exists - t -- Remove all passwords from the role. Both, 'first' and 'second', passwords are removed. ALTER ROLE u1 DROP ALL PASSWORD; Best regards, Gurjeet http://Gurje.et
Re: [PoC/RFC] Multiple passwords, interval expirations
On Sun, Oct 8, 2023 at 10:50:15AM -0700, Gurjeet Singh wrote: > On Sun, Oct 8, 2023 at 10:29 AM Bruce Momjian wrote: > > > > I was speaking of autoremoving in cases where we are creating a new one, > > and taking the previous new one and making it the old one, if that was > > not clear. > > Yes, I think I understood it differently. I understood it to mean that > this behaviour would apply to all passwords, those created by existing > commands, as well as to those created by new commands for rollover use > case. Whereas you meant this autoremove behaviour to apply only to > those passwords created by/for rollover related commands. I hope I've > understood your proposal correctly this time around :-) Yes, it is only during the addition of a new password when the previous new password becomes the new old password. The previous old password would need to have an rolvaliduntil in the past. > I believe the passwords created by rollover feature should > behave by the same rules as the rules for passwords created by > existing CREATE/ALTER ROLE commands. If we implement the behaviour to > delete expired passwords, then I believe that behaviour should apply > to all passwords, irrespective of which command/feature was used to > create a password. This would only apply when we are moving the previous new password to old and the old one is removed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: Trigger violates foreign key constraint
On Mon, Oct 02, 2023 at 09:49:53AM -0400, Tom Lane wrote: > Laurenz Albe writes: > > CREATE FUNCTION silly() RETURNS trigger LANGUAGE plpgsql AS 'BEGIN RETURN > > NULL; END;'; > > CREATE TRIGGER silly BEFORE DELETE ON child FOR EACH ROW EXECUTE FUNCTION > > silly(); > > > The trigger function cancels the cascaded delete on "child", and we are > > left with > > a row in "child" that references no row in "parent". > > Yes. This is by design: triggers operate at a lower level than > foreign keys, so an ill-conceived trigger can break an FK constraint. > That's documented somewhere, though maybe not visibly enough. > > There are good reasons to want triggers to be able to see and > react to FK-driven updates, I agree with that, but I also think it's a bug that other triggers can invalidate the constraint, without even going out of their way to do so. Ideally, triggers would be able to react, yet when all non-superuser-defined code settles, the constraint would still hold. While UNIQUE indexes over expressions aren't that strict, at least for those you need to commit the clear malfeasance of redefining an IMMUTABLE function. On Mon, Oct 02, 2023 at 12:02:17PM +0200, Laurenz Albe wrote: > Perhaps it would be enough to run "RI_FKey_noaction_del" after > "RI_FKey_cascade_del", although that would impact the performance. Yes. A cure that doubles the number of heap fetches would be worse than the disease, but a more-optimized version of this idea could work. The FK system could use a broader optimization-oriented rewrite, to deal with the unbounded memory usage and redundant I/O. If that happens, it could be a good time to plan for closing the trigger hole.
Re: [PoC/RFC] Multiple passwords, interval expirations
On Sun, Oct 8, 2023 at 10:29 AM Bruce Momjian wrote: > > I was speaking of autoremoving in cases where we are creating a new one, > and taking the previous new one and making it the old one, if that was > not clear. Yes, I think I understood it differently. I understood it to mean that this behaviour would apply to all passwords, those created by existing commands, as well as to those created by new commands for rollover use case. Whereas you meant this autoremove behaviour to apply only to those passwords created by/for rollover related commands. I hope I've understood your proposal correctly this time around :-) I believe the passwords created by rollover feature should behave by the same rules as the rules for passwords created by existing CREATE/ALTER ROLE commands. If we implement the behaviour to delete expired passwords, then I believe that behaviour should apply to all passwords, irrespective of which command/feature was used to create a password. Best regards, Gurjeet http://Gurje.et
Re: [PoC/RFC] Multiple passwords, interval expirations
On Sun, Oct 8, 2023 at 10:24:42AM -0700, Gurjeet Singh wrote: > On Fri, Oct 6, 2023 at 1:46 PM Bruce Momjian wrote: > > > > On Fri, Oct 6, 2023 at 01:20:03PM -0700, Jeff Davis wrote: > > > The basic problem, as I see it, is: how do we keep users from > > > accidentally dropping the wrong password? Generated unique names or > > > > I thought we could auto-remove old password if the valid-until date is > > in the past. > > Autoremoving expired passwords will surprise users, and not in a good > way. Making a password, even an expired one, disappear from the system > will lead to astonishment. Among uses of an expired password are cases > of it acting like a tombstone, and the case where the user may want to > extend the validity of a password, instead of having to create a new > one and change application configuration(s) to specify the new > password. I was speaking of autoremoving in cases where we are creating a new one, and taking the previous new one and making it the old one, if that was not clear. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: [PoC/RFC] Multiple passwords, interval expirations
On Fri, Oct 6, 2023 at 1:46 PM Bruce Momjian wrote: > > On Fri, Oct 6, 2023 at 01:20:03PM -0700, Jeff Davis wrote: > > The basic problem, as I see it, is: how do we keep users from > > accidentally dropping the wrong password? Generated unique names or > > I thought we could auto-remove old password if the valid-until date is > in the past. Autoremoving expired passwords will surprise users, and not in a good way. Making a password, even an expired one, disappear from the system will lead to astonishment. Among uses of an expired password are cases of it acting like a tombstone, and the case where the user may want to extend the validity of a password, instead of having to create a new one and change application configuration(s) to specify the new password. Best regards, Gurjeet http://Gurje.et
Re: wal recycling problem
## Fabrice Chapuis (fabrice636...@gmail.com): > From a conceptual point of view I think that specific wals per subscription > should be used and stored in the pg_replslot folder in order to avoid > working directly on the wals of the instance. > What do you think about this proposal? I think that would open a wholly new can of worms. The most obvious point here is: that WAL is primarily generated for the operation of the database itself - it's our kind of transaction log, or "Redo Log" in other systems' lingo. Replication (be it physical or logical) is a secondary purpose (an obvious and important one, but still secondary). How would you know which part of WAL is needed for any specific replication slot? You'd have to decode and filter it, and already you're back at square one. How would you handle multiple replications for the same table (in the same publication, or even over multiple (overlapping) publications) - do you multiply the WAL? For now, we have "any replication using replication slots, be it logical or physical replication, retains WAL up to max_slot_wal_keep_size (or "unlimited" if not set - and on PostgreSQL 12 and before); and you need to monitor the state of your replication slots", which is a totally usabe rule, I think. Regards, Christoph -- Spare Space
Re: CREATE DATABASE with filesystem cloning
On 2023-10-07 Sa 01:51, Thomas Munro wrote: Hello hackers, Here is an experimental POC of fast/cheap database cloning. For clones from little template databases, no one cares much, but it might be useful to be able to create a snapshot or fork of very large database for testing/experimentation like this: create database foodb_snapshot20231007 template=foodb strategy=file_clone It should be a lot faster, and use less physical disk, than the two existing strategies on recent-ish XFS, BTRFS, very recent OpenZFS, APFS (= macOS), and it could in theory be extended to other systems that invented different system calls for this with more work (Solaris, Windows). Then extra physical disk space will be consumed only as the two clones diverge. It's just like the old strategy=file_copy, except it asks the OS to do its best copying trick. If you try it on a system that doesn't support copy-on-write, then copy_file_range() should fall back to plain old copy, but it might still be better than we could do, as it can push copy commands to network storage or physical storage. Therefore, the usual caveats from strategy=file_copy also apply here. Namely that it has to perform checkpoints which could be very expensive, and there are some quirks/brokenness about concurrent backups and PITR. Which makes me wonder if it's worth pursuing this idea. Thoughts? I tested on bleeding edge FreeBSD/ZFS, where you need to set sysctl vfs.zfs.bclone_enabled=1 to enable the optimisation, as it's still a very new feature that is still being rolled out. The system call succeeds either way, but that controls whether the new database initially shares blocks on disk, or get new copies. I also tested on a Mac. In both cases I could clone large databases in a fraction of a second. I've had to disable COW on my BTRFS-resident buildfarm animals (see previous discussion re Direct I/O). cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg16: XX000: could not find pathkey item to sort
On Thu, Oct 5, 2023 at 2:26 PM David Rowley wrote: > So in short, I propose the attached fix without any regression tests > because I feel that any regression test would just mark that there was > a big in create_agg_path() and not really help with ensuring we don't > end up with some similar problem in the future. If the pathkeys that were added by adjust_group_pathkeys_for_groupagg() are computable from the targetlist, it seems that we do not need to trim them off, because prepare_sort_from_pathkeys() will add resjunk target entries for them. But it's also no harm if we trim them off. So I think the patch is a pretty safe fix. +1 to it. > I have some concerns that the assert_pathkeys_in_target() function > might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm > not proposing to commit that without further discussion. Yeah, it looks like some heavy to call assert_pathkeys_in_target() for each path node. Can we run some benchmarks to see how much overhead it would bring to USE_ASSERT_CHECKING build? Thanks Richard