Re: Enable data checksums by default
On 16.10.24 08:54, Peter Eisentraut wrote: On 14.10.24 11:28, Peter Eisentraut wrote: On 03.10.24 23:13, Nathan Bossart wrote: On Tue, Oct 01, 2024 at 11:15:02AM -0400, Peter Eisentraut wrote: I have committed 0001 (the new option) and 0004 (the docs tweak). I think there is consensus for the rest, too, but I'll leave it for a few more days to think about. I guess the test failure has to be addressed. Here is a rebased patch with the test fix (for cfbot). I have made no other changes. I have committed the test changes (patch 0002). (I renamed the option to no_data_checksums to keep the wording consistent with the initdb option.) Right now, with checksums off by default, this doesn't do much, but you can test this like PG_TEST_INITDB_EXTRA_OPTS=--data-checksums meson test ... and everything will pass. To make that work, I had to adjust the order of how the initdb options are assembled in Cluster.pm a bit. I will work on the patch that flips the default next. The patch that flips the default has been committed. I also started a PG18 open items page and made a note that we follow up on the upgrade experience, as was discussed in this thread. Ah yes, and the upgrade tests on the buildfarm don't like this. What shall we do about this? Maybe adjust the buildfarm scripts to use --no-data-checksums?
Re: Doc: typo in config.sgml
On 15.10.24 23:51, Bruce Momjian wrote: I don't see why we need to enforce this at this level. Whatever downstream toolchain has requirements about which characters are allowed will complain if it encounters a character it doesn't like. Uh, the PDF build does not complain if you pass it a non-Latin-1 UTF8 characters. To test this I added some Russian characters (non-Latin-1) to release.sgml: (⟨б⟩, ⟨в⟩, ⟨г⟩, ⟨д⟩, ⟨ж⟩, ⟨з⟩, ⟨к⟩, ⟨л⟩, ⟨м⟩, ⟨н⟩, ⟨п⟩, ⟨р⟩, ⟨с⟩, ⟨т⟩, ⟨ф⟩, ⟨х⟩, ⟨ц⟩, ⟨ч⟩, ⟨ш⟩, ⟨щ⟩), ten vowels (⟨а⟩, ⟨е⟩, ⟨ё⟩, ⟨и⟩, ⟨о⟩, ⟨у⟩, ⟨ы⟩, ⟨э⟩, ⟨ю⟩, ⟨я⟩), a semivowel / consonant (⟨й⟩), and two modifier letters or "signs" (⟨ъ⟩, ⟨ь⟩) and I ran 'make postgres-US.pdf', and then removed the Russian characters and ran the same command again. The output, including stderr was identical. The PDFs, of course, were not, with the Russian characters showing as "". Makefile output attached. Hmm, mine complains: /opt/homebrew/bin/fop -fo postgres-A4.fo -pdf postgres-A4.pdf Picked up JAVA_TOOL_OPTIONS: -Djava.awt.headless=true [WARN] FOUserAgent - Font "Symbol,normal,700" not found. Substituting with "Symbol,normal,400". [WARN] FOUserAgent - Font "ZapfDingbats,normal,700" not found. Substituting with "ZapfDingbats,normal,400". [WARN] FOUserAgent - Glyph "⟨" (0x27e8) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "б" (0x431, afii10066) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "⟩" (0x27e9) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "в" (0x432, afii10067) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "г" (0x433, afii10068) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "д" (0x434, afii10069) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "ж" (0x436, afii10072) not available in font "Times-Roman". [WARN] FOUserAgent - Glyph "з" (0x437, afii10073) not available in font "Times-Roman". [WARN] PropertyMaker - span="inherit" on fo:block, but no explicit value found on the parent FO.
Re: Doc: typo in config.sgml
On 15.10.24 23:51, Bruce Momjian wrote: On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote: Bruce Momjian writes: Well, we can only use Latin-1, so the idea is that we will be explicit about specifying Latin-1 only as HTML entities, rather than letting non-Latin-1 creep in as UTF8. We can exclude certain UTF8 or SGML files if desired. That policy would cause substantial problems with contributor names in the release notes. I agree with Peter that we don't need this. Catching otherwise-invisible characters seems sufficient. Uh, why can't we use HTML entities going forward? Is that harder? I think the question should be the other way around. The entities are a historical workaround for when encoding support and rendering support was poor. Now you can just type in the characters you want as is, which seems nicer.
Re: Enable data checksums by default
> On 16 Oct 2024, at 09:49, Peter Eisentraut wrote: > Ah yes, and the upgrade tests on the buildfarm don't like this. What shall > we do about this? Maybe adjust the buildfarm scripts to use > --no-data-checksums? I can't see many other options, and it represents a reasonable use-case, so +1. -- Daniel Gustafsson
Re: Vacuum statistics
Hi! On 16.10.2024 13:31, Ilia Evdokimov wrote: On 08.10.2024 19:18, Alena Rybakina wrote: Made a rebase on a fresh master branch. -- Regards, Alena Rybakina Postgres Professional Thank you for rebasing. I have noticed that when I create a table or an index on this table, there is no information about the table or index in pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM. Example: CREATE TABLE t (i INT, j INT); INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,100) i; SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't'; (0 rows) CREATE INDEX ON t (i); SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx'; ... (0 rows) I can see the entries after running VACUUM or executing autovacuum. or when autovacuum is executed. I would suggest adding a line about the relation even if it has not yet been processed by vacuum. Interestingly, this issue does not occur with pg_stat_vacuum_database: CREATE DATABASE example_db; SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db'; dboid | dbname | ... ... | example_db | ... (1 row) BTW, I recommend renaming the view pg_stat_vacuum_database to pg_stat_vacuum_database_S_ for consistency with pg_stat_vacuum_tables and pg_stat_vacuum_indexes Thanks for the review. I'm investigating this. I agree with the renaming, I will do it in the next version of the patch. -- Regards, Alena Rybakina Postgres Professional
Re: Add “FOR UPDATE NOWAIT” lock details to the log.
On 2024/09/13 20:49, Seino Yuki wrote: Hello, I would like to add the information of the PID that caused the failure when acquiring a lock with "FOR UPDATE NOWAIT". When "FOR UPDATE" is executed and interrupted by lock_timeout, xid and PID are output in the logs, Could you explain how to reproduce this? In my quick test, lock waits canceled by lock_timeout didn’t report either xid or PID, so I might be missing something. but in the case of "FOR UPDATE NOWAIT", no information is output, making it impossible to identify the cause of the lock failure. Therefore, I would like to output information in the logs in the same way as when "FOR UPDATE" is executed and interrupted by lock_timeout. I think NOWAIT is often used for lock waits that can fail frequently. Logging detailed information for each failed NOWAIT lock wait could lead to excessive and potentially noisy log messages for some users. On the other hand, I understand that even with NOWAIT, we might want to investigate why a lock wait failed. In such cases, detailed information about the locking process would be useful. Considering both points, I’m leaning toward adding a new GUC parameter to control whether detailed logs should be generated for failed NOWAIT locks (and possibly for those canceled by lock_timeout). Alternatively, if adding a new GUC is not ideal, we could extend log_lock_waits to accept a new value like 'error,' which would trigger detailed logging when a lock wait fails due to NOWAIT, lock_timeout, or cancellation. The patch is attached as well. I got the following compiler errors; proc.c:1240:3: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1549:4: error: call to undeclared function 'CollectLockHoldersAndWaiters'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] 1549 | CollectLockHoldersAndWaiters(NULL, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ proc.c:1126:8: warning: unused variable 'first_holder' [-Wunused-variable] 1126 | boolfirst_holder = true, | ^~~~ proc.c:1127:5: warning: unused variable 'first_waiter' [-Wunused-variable] 1127 | first_waiter = true; | ^~~~ proc.c:1982:1: error: conflicting types for 'CollectLockHoldersAndWaiters' 1982 | CollectLockHoldersAndWaiters(PROCLOCK *waitProcLock, LOCK *lock, StringInfo lock_holders_sbuf, StringInfo lock_waiters_sbuf, int *lockHoldersNum) | ^ proc.c:1240:3: note: previous implicit declaration is here 1240 | CollectLockHoldersAndWaiters(proclock, lock, &lock_holders_sbuf, &lock_waiters_sbuf, &lockHoldersNum); | ^ 2 warnings and 3 errors generated. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: replace strtok()
Em qua., 16 de out. de 2024 às 04:45, Peter Eisentraut escreveu: > On 15.10.24 14:07, Ranier Vilela wrote: > > I also wonder, if other places touched by 5d2e1cc11 need corrections > > too. > > I played with > > PG_COLOR=always PG_COLORS="error=01;31" .../initdb > > > > and it looks like this free() call in pg_logging_init(): > > char *colors = strdup(pg_colors_env); > > > > if (colors) > > { > > ... > > while ((token = strsep(&colors, ":"))) > > { > > ... > > } > > > > free(colors); > > } > > gets null in colors. > > > > Yeah, I also saw this usage, but I was waiting for a definition for the > > first report. > > The solution IMO, would be the same. > > > > diff --git a/src/common/logging.c b/src/common/logging.c > > index aedd1ae2d8..45b5316d48 100644 > > --- a/src/common/logging.c > > +++ b/src/common/logging.c > > @@ -121,7 +121,7 @@ pg_logging_init(const char *argv0) > >{ > >char *token; > > > > - while ((token = strsep(&colors, ":"))) > > + while ((token = strsep(&colors, ":")) != NULL && colors != NULL) > >{ > >char *e = strchr(token, '='); > > The advantage of this change is that it would avoid processing > > unnecessary tokens. > > This wouldn't fix anything, I think. If colors is NULL, then strsep() > already returns NULL, so the added code does nothing. > If *colors* is NULL, then the delimiter is not found and strsep will return the entire string **stringp, so the token becomes invalid*. IMO, I think it must be necessary to check if *colors* are NULL too. best regards, Ranier Vilela
Re: New function normal_rand_array function to contrib/tablefunc.
Hello Dean, Thanks for the detailed feedback! Here is the rebased version. > 1). In the tests: > > +select setseed(0.8); > +select rand_array(10, 0, 3, 50::int, 80::int); .. > this should really have a comment block to distinguish these new tests > from the preceeding normal_rand() tests. > 3). It's only necessary to call setseed() once to get a reproducible > set of results from then on. > 4). I'd use setseed(0) or setseed(0.5), since those values are exactly > representable as double precision values, unlike 0.8, ensuring that it > works the same on all platforms. It might not matter, but why take the > risk? All Done, very interesting about the reason why the 0.8 is bad. > 2). It's worth also having tests for the error cases. After the code refactor, looks the ERROR is not reachable, so I added both Assert(false) and elog(ERROR) with no test case for that. > > 5). The float8 case still has minimum and maximum value arguments that > it just ignores. It should either not take those arguments, or it > should respect them, and try to return float8 values in the requested > range. I suspect that trying to return float8 values in the requested > range would be hard, if not impossible, due to the inexact nature of > double precision arithmetic. That's why I suggested earlier having the > float8 function not take minval/maxval arguments, and just return > values in the range 0 to 1. > > 11). If the float8 case is made to not have minval/maxval arguments, this > code: > > +Datum minval = PG_GETARG_DATUM(3), > +maxval = PG_GETARG_DATUM(4); > > and the FunctionCallInfo setup code needs to be different for the > float8 case, in order to avoid reading and setting undefined > arguments. Perhaps introduce a "need_val_bounds" boolean variable, > based on the datatype. I should have noticed the float8 issue after reading your first reply.. Actually I don't have speical reason for which I have to support float8. At least when I working on the patch, I want some toast and non-toast data type, so supporting both int/bigint and numeric should be good, at least for my purpose. So in the attached version, I removed the support for float8 for simplification. > > 6). This new function: > > +static Datum > +rand_array_internal(FunctionCallInfo fcinfo, Oid datatype) > +{ > > should have a comment block. In particular, it should document what > its inputs and outputs are. Done. > > 7). This code: > > +int num_tuples = PG_GETARG_INT32(0); > +int minlen = PG_GETARG_INT32(1); > +int maxlen = PG_GETARG_INT32(2); > +Datum minval = PG_GETARG_DATUM(3), > +maxval = PG_GETARG_DATUM(4); > +rand_array_fctx *fctx; > + > +if (datatype == INT4OID) > +random_fn_oid = F_RANDOM_INT4_INT4; > +else if (datatype == INT8OID) ... > should be inside the "if (SRF_IS_FIRSTCALL())" block. There's no point > repeating all those checks for every call. Done. > > 8). I think it would be neater to code the "if (datatype == INT4OID) > ... else if ..." as a switch statement. Done. > > > 9). I would swap the last 2 bound checks round, and simplify the error > messages as follows: Done. > > Also note that primary error messages should not end with a period. > See https://www.postgresql.org/docs/current/error-style-guide.html Thanks for sharing this! > 10). In this error: > > +elog(ERROR, "unsupported type %d for rand_array function.", > + datatype); > > "datatype" is of type Oid, which is unsigned, and so it should use > "%u" not "%d". Also, as above, it should not end with a period, so it > should be: > > +elog(ERROR, "unsupported type %u for rand_array function", > + datatype); Done. > > 12). This code: > > +random_len_fcinfo->args[0].value = minlen; > +random_len_fcinfo->args[1].value = maxlen; > > should really be: > > +random_len_fcinfo->args[0].value = Int32GetDatum(minlen); > +random_len_fcinfo->args[1].value = Int32GetDatum(maxlen); > > It amounts to the same thing, but documents the fact that it's > converting an integer to a Datum. Done. > > > 13). These new functions are significantly under-documented, > especially when compared to all the other functions on > https://www.postgresql.org/docs/current/tablefunc.html > > They really should have their own subsection, along the same lines as > "F.41.1.1. Normal_rand", explaining what the functions do, what their > arguments mean, and giving a couple of usage examples. Done, very impresive feedback and I know why PostgreSQL can always have user friendly documentation now. -- Best Regards Andy Fan >From 2024871241c35959a259e54a8151d4e9372dbfbf Mon Sep 17 00:00:00 2001 From: Andy Fan Date: Mon, 26 Aug 2024 18:50:57 +0800 Subject: [PATCH v20241016 1/1] Add functions rand_array function to contrib/tablefunc. It produces an array of numeric_type with its length in the range of [minlen,maxlen] and each value is in the range of [minval,maxval]. --- contrib/tab
Re: replace strtok()
On 15.10.24 14:07, Ranier Vilela wrote: I also wonder, if other places touched by 5d2e1cc11 need corrections too. I played with PG_COLOR=always PG_COLORS="error=01;31" .../initdb and it looks like this free() call in pg_logging_init(): char *colors = strdup(pg_colors_env); if (colors) { ... while ((token = strsep(&colors, ":"))) { ... } free(colors); } gets null in colors. Yeah, I also saw this usage, but I was waiting for a definition for the first report. The solution IMO, would be the same. diff --git a/src/common/logging.c b/src/common/logging.c index aedd1ae2d8..45b5316d48 100644 --- a/src/common/logging.c +++ b/src/common/logging.c @@ -121,7 +121,7 @@ pg_logging_init(const char *argv0) { char *token; - while ((token = strsep(&colors, ":"))) + while ((token = strsep(&colors, ":")) != NULL && colors != NULL) { char *e = strchr(token, '='); The advantage of this change is that it would avoid processing unnecessary tokens. This wouldn't fix anything, I think. If colors is NULL, then strsep() already returns NULL, so the added code does nothing.
Re: Vacuum statistics
Hi Ilia, On Wed, 2024-10-16 at 13:31 +0300, Ilia Evdokimov wrote: > BTW, I recommend renaming the view pg_stat_vacuum_database to > pg_stat_vacuum_databaseS for consistency with pg_stat_vacuum_tables > and pg_stat_vacuum_indexes Such renaming doesn't seems correct to me because pg_stat_vacuum_database is consistent with pg_stat_database view, while pg_stat_vacuum_tables is consistent with pg_stat_all_tables. This inconsistency is in Postgres views, so it should be changed synchronously. -- regards, Andrei Zubkov
Re: ECPG Refactor: move sqlca variable in ecpg_log()
On 2024/10/16 15:04, Yuto Sasaki (Fujitsu) wrote: >Hmm, I'm not sure we want that, because if we do this, then >ECPGget_sqlca() gets run with the debug_mutex held. In the original >coding, it's run without the mutex. Thank you for pointing that out. I agree with your observation. But there is another room for improvement - we can avoid unnecessary calls to ECPGget_sqlca() by moving just before mutex lock. +1 PSA a patch. The patch looks good to me. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: BF mamba failure
Hi, On Wed, Oct 16, 2024 at 09:43:48AM +0900, Michael Paquier wrote: > On Fri, Oct 11, 2024 at 08:18:58AM +, Bertrand Drouvot wrote: > > On Fri, Oct 11, 2024 at 11:07:29AM +0300, Kouber Saparev wrote: > >> Unfortunately not, we are running 15.4 and planning to upgrade very soon. > >> Is the patch mentioned already merged in PostgreSQL 16? > > > > Yes, as of 16.4. > > Right. I'd surely welcome more eyes on what I have posted here: > https://www.postgresql.org/message-id/zm-8xo93k9yd9...@paquier.xyz I applied your patch and do confirm that it fixes the issue. While that works I wonder is there no other way to fix the issue. Indeed, in pgstat_release_entry_ref(), we're doing: " if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1) . . shent = dshash_find(pgStatLocal.shared_hash, &entry_ref->shared_entry->key, true); " I wonder if we are not decrementing &entry_ref->shared_entry->refcount too early. I mean, wouldn't that make sense to decrement it after the dshash_find() call? (to ensure a "proper" whole entry locking, done in dshash_find(), first) > I am a bit annoyed by the fact of making PgStatShared_HashEntry > slightly larger to track the age of the entries, Yeah, FWIW, we would be going from 32 bytes: (gdb) ptype /o struct PgStatShared_HashEntry /* offset |size */ type = struct PgStatShared_HashEntry { /* 0 | 16 */PgStat_HashKey key; /* 16 | 1 */_Bool dropped; /* XXX 3-byte hole */ /* 20 | 4 */pg_atomic_uint32 refcount; /* 24 | 8 */dsa_pointer body; /* total size (bytes): 32 */ } to 40 bytes (with the patch applied): (gdb) ptype /o struct PgStatShared_HashEntry /* offset |size */ type = struct PgStatShared_HashEntry { /* 0 | 16 */PgStat_HashKey key; /* 16 | 1 */_Bool dropped; /* XXX 3-byte hole */ /* 20 | 4 */pg_atomic_uint32 refcount; /* 24 | 4 */pg_atomic_uint32 agecount; /* XXX 4-byte hole */ /* 32 | 8 */dsa_pointer body; /* total size (bytes): 40 */ } Due to the padding, that's a 8 bytes increase while we're adding "only" 4 bytes. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Remove deprecated -H option from oid2name
> On 15 Oct 2024, at 22:48, Peter Eisentraut wrote: > > On 09.10.24 20:30, Daniel Gustafsson wrote: >>> On 9 Oct 2024, at 19:15, Nathan Bossart wrote: >>> Another problem is that "deprecated" may or may not imply that the feature >>> will be removed in the future. IMHO we should be clear about that when we >>> intend to remove something down the road (e.g., "this flag is deprecated >>> and will be removed in a future major release of PostgreSQL"). >> That's a fair point, but if we don't aim to remove something we have, IMHO, a >> social contract to maintain the feature instead and at that point it's >> questionable if it is indeed deprecated. I guess I think we should separate >> between discouraged and deprecated. > > The dictionary definition of "deprecate" is "to express disapproval of", > which I think is about the same as your "discouraged". If we need another > term, then we should say something like "planned/scheduled to be removed". I guess my interpretation of "deprecated" is more in line with the Wikipedia paragraph on deprecation in software. "Deprecated status may also indicate the feature will be removed in the future. Features are deprecated, rather than immediately removed, to provide backward compatibility and to give programmers time to bring affected code into compliance with the new standard." Either way, it's not a hill to die on, if the consensus is to keep features marked deprecated then I'll withdraw this one. It's not an important thing, just something stumbled upon when looking at another thing. -- Daniel Gustafsson
Re: Set query_id for query contained in utility statement
On Tue, Oct 15, 2024 at 3:23 PM jian he wrote: > explain(verbose) SELECT 1, 2, 3\; explain SELECT 1, 2, 3, 4; > will transformed to > explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4; > > it seems to me your patch care about following position. > explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4; > ^ > but this patch [1] at another thread will get the top level statement > (passed the raw parse, no syntax error) beginning more effortless. > explain(verbose) SELECT 1, 2, 3; explain SELECT 1, 2, 3, 4; > ^ ^ > > can you try to looking at [1]. it may help to resolve this patch problem. > > [1] https://www.postgresql.org/message-id/2245576.1728418678%40sss.pgh.pa.us The top level statement beginning doesn't have an impact on this patch. The patch's focus is on the nested statement and RawStmt's location and length are only used to get the nested statement length. I will need the nested statement's location and length no matter what, to handle cases like "COPY (UPDATE ...) to stdout;" and only report the statement within parentheses. The linked patch will allow a simplification: the "if (@$ < 0) @$ = @2;" I've added won't be needed anymore. But I think that's the only possible simplification? I've run the tests on top of the linked patch and there was no change in the regression output.
Re: Vacuum statistics
On 08.10.2024 19:18, Alena Rybakina wrote: Made a rebase on a fresh master branch. -- Regards, Alena Rybakina Postgres Professional Thank you for rebasing. I have noticed that when I create a table or an index on this table, there is no information about the table or index in pg_stat_vacuum_tables and pg_stat_vacuum_indexes until we perform a VACUUM. Example: CREATE TABLE t (i INT, j INT); INSERT INTO t SELECT i/10, i/100 FROM GENERATE_SERIES(1,100) i; SELECT * FROM pg_stat_vacuum_tables WHERE relname = 't'; (0 rows) CREATE INDEX ON t (i); SELECT * FROM pg_stat_vacuum_indexes WHERE relname = 't_i_idx'; ... (0 rows) I can see the entries after running VACUUM or executing autovacuum. or when autovacuum is executed. I would suggest adding a line about the relation even if it has not yet been processed by vacuum. Interestingly, this issue does not occur with pg_stat_vacuum_database: CREATE DATABASE example_db; SELECT * FROM pg_stat_vacuum_database WHERE dbname = 'example_db'; dboid | dbname | ... ... | example_db | ... (1 row) BTW, I recommend renaming the view pg_stat_vacuum_database to pg_stat_vacuum_database_S_ for consistency with pg_stat_vacuum_tables and pg_stat_vacuum_indexes -- Regards, Ilia Evdokimov, Tantor Labs LLC.
Re: replace strtok()
On 15.10.24 14:00, Alexander Lakhin wrote: I also wonder, if other places touched by 5d2e1cc11 need corrections too. I played with PG_COLOR=always PG_COLORS="error=01;31" .../initdb and it looks like this free() call in pg_logging_init(): char *colors = strdup(pg_colors_env); if (colors) { ... while ((token = strsep(&colors, ":"))) { ... } free(colors); } gets null in colors. Yes, this is indeed incorrect. We need to keep a separate pointer to the start of the string to free later. This matches the example on the strsep man page (https://man.freebsd.org/cgi/man.cgi?strsep(3)). Patch attached. From 1b842bfdc2b303264134687f3ec21fd3e53a0c82 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 16 Oct 2024 09:37:54 +0200 Subject: [PATCH] Fix memory leaks from incorrect strsep() uses Commit 5d2e1cc117b introduced some strsep() uses, but it did the memory management wrong in some cases. We need to keep a separate pointer to the allocate memory so that we can free it later, because strsep() advances the pointer we pass to it, and it at the end it will be NULL, so any free() calls won't do anything. (This fixes two of the four places changed in commit 5d2e1cc117b. The other two don't have this problem.) Reported-by: Alexander Lakhin Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c39442...@eisentraut.org --- src/common/logging.c | 3 ++- src/test/regress/pg_regress.c | 7 +-- 2 files changed, 7 insertions(+), 3 deletions(-) diff --git a/src/common/logging.c b/src/common/logging.c index aedd1ae2d8c..3cf119090a5 100644 --- a/src/common/logging.c +++ b/src/common/logging.c @@ -120,8 +120,9 @@ pg_logging_init(const char *argv0) if (colors) { char *token; + char *cp = colors; - while ((token = strsep(&colors, ":"))) + while ((token = strsep(&cp, ":"))) { char *e = strchr(token, '='); diff --git a/src/test/regress/pg_regress.c b/src/test/regress/pg_regress.c index 5157629b1cc..6c188954b14 100644 --- a/src/test/regress/pg_regress.c +++ b/src/test/regress/pg_regress.c @@ -233,14 +233,17 @@ free_stringlist(_stringlist **listhead) static void split_to_stringlist(const char *s, const char *delim, _stringlist **listhead) { - char *sc = pg_strdup(s); char *token; + char *sc; + char *tofree; + + tofree = sc = pg_strdup(s); while ((token = strsep(&sc, delim))) { add_stringlist_item(listhead, token); } - free(sc); + free(tofree); } /* -- 2.47.0
Re: System views for versions reporting
On 06.10.24 17:36, Dmitry Dolgov wrote: Based on the feedback in [1], here is my attempt at implementing system views for versions reporting. It adds pg_system_versions for showing things like core version, compiler, LLVM, etc, and pg_system_libraries for showing linked shared objects. Is a system view the right interface? For example, in pgbouncer we just added it to the version output: $ pgbouncer --version PgBouncer 1.18.0 libevent 2.1.12-stable adns: c-ares 1.19.0 tls: OpenSSL 3.3.2 3 Sep 2024 That way, you can get this information without having to start a server instance. (Maybe you can't start a server instance because it just crashed because of some library version issue ...)
Re: System views for versions reporting
On 10/16/24 08:47, Peter Eisentraut wrote: On 06.10.24 17:36, Dmitry Dolgov wrote: Based on the feedback in [1], here is my attempt at implementing system views for versions reporting. It adds pg_system_versions for showing things like core version, compiler, LLVM, etc, and pg_system_libraries for showing linked shared objects. Is a system view the right interface? For example, in pgbouncer we just added it to the version output: $ pgbouncer --version PgBouncer 1.18.0 libevent 2.1.12-stable adns: c-ares 1.19.0 tls: OpenSSL 3.3.2 3 Sep 2024 That way, you can get this information without having to start a server instance. (Maybe you can't start a server instance because it just crashed because of some library version issue ...) While it is also useful to be able to get the info without being able to start the server, I think that would be an addition not a replacement. When you have a fleet with no direct access to run shell commands, being able to get this info via SQL is valuable. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: BitmapHeapScan streaming read user and prelim refactoring
Hi, I took a quick look on the first couple parts of this series, up to 0007. I only have a couple minor review comments: 1) 0001 I find it a bit weird that we use ntuples to determine if it's exact or lossy. Not an issue caused by this patch, of course, but maybe we could improve that somehow? I mean this: if (tbmres->ntuples >= 0) (*exact_pages)++; else (*lossy_pages)++; Also, maybe consider manually wrapping long lines - I haven't tried, but I guess pgindent did not wrap this. I mean this line, for example: ... whitespace ... &node->stats.lossy_pages, &node->stats.exact_pages)) 2) 0003 The "tidr" name is not particularly clear, IMO. Maybe just "range" would be better? + struct + { + ItemPointerData rs_mintid; + ItemPointerData rs_maxtid; + } tidr; Isn't it a bit strange that this patch remove tmbres from heapam_scan_bitmap_next_block, when it was already removed from table_scan_bitmap_next_tuple in the previous commit? Does it make sense to separate it like this? (Maybe it does, not sure.) I find this not very clear: + *recheck do current page's tuples need recheck + *blockno used to validate pf and current block in sync + *pfblockno used to validate pf stays ahead of current block The "blockno" sounds weird - shouldn't it say "are in sync"? Also, not clear what "pf" stands for without context (sure, it's "prefetch"). But Why not to not write "prefetch_blockno" or something like that? 3) 0006 Seems unnecessary to keep this as separate commit, it's just log message improvement. I'd merge it to the earlier patch. 4) 0007 Seems fine to me in principle. This adds "subclass" similar to what we do for plan nodes, except that states are not derived from Node. But it's the same approach. Why not to do scan = (HeapScanDesc *) bscan; instead of scan = &bscan->rs_heap_base; I think the simple cast is what we do for the plan nodes, and we even do it this way in the opposite direction in a couple places: BitmapHeapScanDesc *bscan = (BitmapHeapScanDesc *) scan; BTW would it be a good idea for heapam_scan_bitmap_next_block to check the passed "scan" has SO_TYPE_BITMAPSCAN? We haven't been checking that so far I think, but we only had a single struct ... regards -- Tomas Vondra
Re: Proposal to Enable/Disable Index using ALTER INDEX
On 24.09.24 20:21, Tom Lane wrote: Peter Eisentraut writes: On 23.09.24 22:51, Shayon Mukherjee wrote: I am happy to draft a patch for this as well. I think I have a working idea so far of where the necessary checks might go. However if you don’t mind, can you elaborate further on how the effect would be similar to enable_indexscan? Planner settings like enable_indexscan used to just add a large number (disable_cost) to the estimated plan node costs. It's a bit more sophisticated in PG17. But in any case, I imagine "disabling an index" could use the same mechanism. Or maybe not, maybe the setting would just completely ignore the index. Yeah, I'd be inclined to implement this by having create_index_paths just not make any paths for rejected indexes. Or you could back up another step and keep plancat.c from building IndexOptInfos for them. The latter might have additional effects, such as preventing the plan from relying on a uniqueness condition enforced by the index. Not clear to me if that's desirable or not. Yes, I think you'd want that, because one of the purposes of this feature would be to test whether it's okay to drop an index. So you don't want the planner to take any account of the index for its decisions.
Re: Proposal to Enable/Disable Index using ALTER INDEX
On Wed, Oct 16, 2024 at 8:33 AM Peter Eisentraut wrote: > > Yeah, I'd be inclined to implement this by having create_index_paths > > just not make any paths for rejected indexes. Or you could back up > > another step and keep plancat.c from building IndexOptInfos for them. > > The latter might have additional effects, such as preventing the plan > > from relying on a uniqueness condition enforced by the index. Not > > clear to me if that's desirable or not. > > Yes, I think you'd want that, because one of the purposes of this > feature would be to test whether it's okay to drop an index. So you > don't want the planner to take any account of the index for its decisions. I think this is right. I think we want to avoid invalidating the index, so we still need to consider it in determining where HOT updates must be performed, but we don't want to "improve" the plan using the index if it's disabled. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Large expressions in indexes can't be stored (non-TOASTable)
On Wed, Oct 16, 2024 at 10:24:32AM +0900, Michael Paquier wrote: > On Tue, Oct 15, 2024 at 08:20:17PM -0500, Nathan Bossart wrote: >> I assume all of this will get compiled out in non-USE_ASSERT_CHECKING >> builds as-is, but I see no problem with surrounding it with an #ifdef to be >> sure. > > Yeah, I'm not sure that that would always be the case when optimized. > Code generated can be dumb sometimes even if compilers got much > smarter in the last 10 years or so (not compiler guy here). Here is a new version of the patch set with this #ifdef added. I plan to give each of the code paths adjusted by 0001 a closer look, but otherwise I'm hoping to commit these soon. -- nathan >From b9444fd22d8dcc05e3a27817943be7f5296503fa Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 14 Oct 2024 14:43:26 -0500 Subject: [PATCH v4 1/3] Ensure we have a snapshot when updating various system catalogs. Reported-by: Alexander Lakhin Reviewed-by: Michael Paquier Discussion: https://postgr.es/m/18127-fe54b6a667f29658%40postgresql.org Discussion: https://postgr.es/m/18309-c0bf914950c46692%40postgresql.org Discussion: https://postgr.es/m/ZvMSUPOqUU-VNADN%40nathan Backpatch-through: FIXME --- src/backend/commands/tablecmds.c| 13 +++ src/backend/replication/logical/tablesync.c | 21 ++ src/backend/replication/logical/worker.c| 24 + 3 files changed, 58 insertions(+) diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 1ccc80087c..c6f82e93b9 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -19283,9 +19283,22 @@ ATExecDetachPartition(List **wqueue, AlteredTableInfo *tab, Relation rel, tab->rel = rel; } + /* +* Detaching the partition might involve TOAST table access, so ensure we +* have a valid snapshot. We only expect to get here without a snapshot +* in the concurrent path. +*/ + if (concurrent) + PushActiveSnapshot(GetTransactionSnapshot()); + else + Assert(HaveRegisteredOrActiveSnapshot()); + /* Do the final part of detaching */ DetachPartitionFinalize(rel, partRel, concurrent, defaultPartOid); + if (concurrent) + PopActiveSnapshot(); + ObjectAddressSet(address, RelationRelationId, RelationGetRelid(partRel)); /* keep our lock until commit */ diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index e03e761392..d3fbb18438 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -377,6 +377,12 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) replorigin_session_origin_lsn = InvalidXLogRecPtr; replorigin_session_origin_timestamp = 0; + /* +* Updating pg_replication_origin might involve TOAST table access, so +* ensure we have a valid snapshot. +*/ + PushActiveSnapshot(GetTransactionSnapshot()); + /* * Drop the tablesync's origin tracking if exists. * @@ -387,6 +393,8 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) */ replorigin_drop_by_name(originname, true, false); + PopActiveSnapshot(); + finish_sync_worker(); } else @@ -483,6 +491,12 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) started_tx = true; } + /* +* Updating pg_replication_origin might involve TOAST table +* access, so ensure we have a valid snapshot. +*/ + PushActiveSnapshot(GetTransactionSnapshot()); + /* * Remove the tablesync origin tracking if exists. * @@ -500,6 +514,8 @@ process_syncing_tables_for_apply(XLogRecPtr current_lsn) sizeof(originname)); replorigin_drop_by_name(originname, true, false); + PopActiveSnapshot(); + /* * Update the state to READY only after the origin cleanup. */ @@ -1455,8 +1471,13 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) * Then advance to the LSN got from walrcv_create_slot. This is WAL * logged for the purpose of recovery. Locks are to prevent the * replication origin from vanishing while advancing. +
Re: sunsetting md5 password support
On Fri, Oct 11, 2024 at 04:36:27PM -0500, Nathan Bossart wrote: > Here is a first attempt at a patch for marking MD5 passwords as deprecated. > It's quite bare-bones at the moment, so I anticipate future revisions will > add more content. Besides sprinkling several deprecation notices > throughout the documentation, this patch teaches CREATE ROLE and ALTER ROLE > to emit warnings when setting MD5 passwords. A new GUC named > md5_password_warnings can be set to "off" to disable these warnings. I > considered adding even more warnings (e.g., when authenticating), but I > felt that would be far too noisy. In v2, I've added an entry for the new md5_password_warnings GUC to the documentation, and I've simplified the passwordcheck test changes a bit. -- nathan >From e7786f4d2e4c892d34c7992ffddbd1cf9e109be7 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 11 Oct 2024 16:21:09 -0500 Subject: [PATCH v2 1/1] Deprecate MD5 passwords. MD5 has been considered to be unsuitable for use as a cryptographic hash algorithm for some time. Furthermore, MD5 password hashes in PostgreSQL are vulnerable to pass-the-hash attacks, i.e., knowing the username and hashed password is sufficient to authenticate. The SCRAM-SHA-256 method added in v10 is not subject to these problems and is considered to be superior to MD5. This commit marks MD5 password support in PostgreSQL as deprecated and to be removed in a future release. The documentation now contains several deprecation notices, and CREATE ROLE and ALTER ROLE now emit deprecation warnings when setting MD5 passwords. The warnings can be disabled by setting the md5_password_warnings parameter to "off". Reviewed-by: ??? Discussion: https://postgr.es/m/ZwbfpJJol7lDWajL%40nathan --- .../passwordcheck/expected/passwordcheck.out | 1 + .../expected/passwordcheck_1.out | 1 + contrib/passwordcheck/sql/passwordcheck.sql | 1 + doc/src/sgml/catalogs.sgml| 9 +++ doc/src/sgml/client-auth.sgml | 8 +++ doc/src/sgml/config.sgml | 24 +++ doc/src/sgml/libpq.sgml | 9 +++ doc/src/sgml/protocol.sgml| 8 +++ doc/src/sgml/ref/create_role.sgml | 8 +++ doc/src/sgml/runtime.sgml | 10 src/backend/libpq/crypt.c | 10 src/backend/utils/misc/guc_tables.c | 9 +++ src/backend/utils/misc/postgresql.conf.sample | 1 + src/include/libpq/crypt.h | 3 +++ src/test/modules/test_misc/t/003_check_guc.pl | 2 +- src/test/regress/expected/password.out| 15 src/test/regress/expected/password_1.out | 9 +++ 17 files changed, 127 insertions(+), 1 deletion(-) diff --git a/contrib/passwordcheck/expected/passwordcheck.out b/contrib/passwordcheck/expected/passwordcheck.out index 2027681daf..dfb2ccfe00 100644 --- a/contrib/passwordcheck/expected/passwordcheck.out +++ b/contrib/passwordcheck/expected/passwordcheck.out @@ -1,3 +1,4 @@ +SET md5_password_warnings = off; LOAD 'passwordcheck'; CREATE USER regress_passwordcheck_user1; -- ok diff --git a/contrib/passwordcheck/expected/passwordcheck_1.out b/contrib/passwordcheck/expected/passwordcheck_1.out index 5d8d5dcc1c..9519d60a49 100644 --- a/contrib/passwordcheck/expected/passwordcheck_1.out +++ b/contrib/passwordcheck/expected/passwordcheck_1.out @@ -1,3 +1,4 @@ +SET md5_password_warnings = off; LOAD 'passwordcheck'; CREATE USER regress_passwordcheck_user1; -- ok diff --git a/contrib/passwordcheck/sql/passwordcheck.sql b/contrib/passwordcheck/sql/passwordcheck.sql index 1fbd6b0e96..5953ece5c2 100644 --- a/contrib/passwordcheck/sql/passwordcheck.sql +++ b/contrib/passwordcheck/sql/passwordcheck.sql @@ -1,3 +1,4 @@ +SET md5_password_warnings = off; LOAD 'passwordcheck'; CREATE USER regress_passwordcheck_user1; diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 964c819a02..0b9ca087c8 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -1618,6 +1618,15 @@ will store the md5 hash of xyzzyjoe. + + +Support for MD5-encrypted passwords is deprecated and will be removed in a +future release of PostgreSQL. Refer to + for details about migrating to another +password type. + + + If the password is encrypted with SCRAM-SHA-256, it has the format: diff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index 51343de7ca..60a1d16288 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -1260,6 +1260,14 @@ omicron bryanh guest1 server is encrypted for SCRAM (see below), then SCRAM-based authentication will automatically be chosen instead. + + + +Support for MD5-encrypted passwords is deprecated and will be removed +in a future release of PostgreS
Re: Add support to TLS 1.3 cipher suites and curves lists
On Tue, Oct 15, 2024 at 3:42 AM Daniel Gustafsson wrote: > Thanks! I think the v8 posted todays is about ready to go in and unless there > are objections I'll go ahead with it shortly. This new paragraph is missing a close-paren: > + > + Additionally, LibreSSL is supported > using the > + OpenSSL compatibility layer. The > minimum > + required version is 3.4 (from class="osname">OpenBSD > + version 7.0. > Other than that, LGTM! Thanks, --Jacob
Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
On Oct 15, 2024, at 7:25 PM, David Rowley wrote:On Wed, 16 Oct 2024 at 03:40, Robert Haas wrote:On Sat, Oct 12, 2024 at 5:56 AM Shayon Mukherjee wrote:Thank you for sharing this Robert. I like the idea behind "allowing extensions to control planner behavior” overall and I think it does help towards a powerful extension ecosystem too. I wonder if there is a reality where we can achieve both the outcomes here- Support disabling of indexes [1] through ALTER command- While also building on "allowing extensions to control planner behavior” for the reasons above?[1] https://www.postgresql.org/message-id/ABD42A12-4DCF-4EE4-B903-4C657903CECE%40gmail.comYes, I think we can do both things.I think so too. I imagine there'd be cases where even hints global toall queries running on the server wouldn't result in the index beingcompletely disabled. For example, a physical replica might not beprivy to the hints defined on the primary and it might just be thequeries running on the physical replica that are getting the most useout of the given index. Having the change made in pg_index would meanphysical replicas have the index disabled too. For the primary usecase I have in mind (test disabling indexes you're consideringdropping), having the disabledness replicate would be very useful.+1 and I agree. That said - Thank you everyone for the discussions and pointers. I now have a new patch that introduces the ability to enable or disable indexes using ALTER INDEX and CREATE INDEX commands, and updating get_relation_info in plancat.c to skip disabled indexes entirely by baking in the concept into IndexOptInfo structure. Below are all the relevant details.Original motivation for the problem and proposal for a patch can be found at [1].This patch passes all the existing specs and the newly added regression tests. The patch is ready for review and test. It compiles, so the can patch can be applied for testing as well. Implementation details:- New Grammar: - ALTER INDEX ... ENABLE/DISABLE - CREATE INDEX ... DISABLE- Default state is enabled. Indexes are only disabled when explicitly instructed via CREATE INDEX ... DISABLE or ALTER INDEX ... DISABLE.- Primary Key and Unique constraint indexes are always enabled as well. The ENABLE/DISABLE grammar is not supported for these types of indexes. They can be later disabled via ALTER INDEX ... ENABLE/DISABLE if needed.- ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index catalog to protect against indcheckxmin [2] (older unrelated thread).- pg_get_indexdef() support the new functionality and grammar. This change is reflected in \d output for tables and pg_dump. We show the DISABLE syntax accordingly.- Updated create_index.sql regression test to cover the new grammar and verify that disabled indexes are not used in queries. The test CATALOG_VERSION_NO - Basic single-column and multi-column indexes - Partial indexes - _expression_ indexes - Join indexes - GIN and GiST indexes - Covering indexes - Range indexes - Unique indexes and constraints- Adds a new enabled attribute to the IndexOptInfo structure.- Modifies get_relation_info in plancat.c to skip disabled indexes entirely, thus reducing the number of places we need to check if an index is disabled or not. Inspired by the conversations start at [3]. - I chose to modify the logic within get_relation_info as compared to, say, reducing the cost to make the planner not consider an index during planning, mostly to keep the number of changes being introduced to a minimum and also the logic itself being self-contained and easier to under perhaps (?).- No changes are made to stop the index from getting rebuilt. This way we ensure no data miss or corruption when index is re-nabled.- TOAST indexes are supported and enabled by default as well.- REINDEX CONCURRENTLY is supported as well and existing state of pg_index.indisenabled is carried over accordingly.- See the changes in create_index.sql to get an idea of the grammar and sql statements. - See the changes in create_index.out to get an idea of the catalogue states and EXPLAIN output to see when an index is getting used or isn't (when disabled).- Incorporated DavidR's feedback from [4] around documentation and also you will see that by skip disabled indexes entirely from get_relation_info in plancat.c (as mentioned above), we address the other mentioned issues as well.Looking forward to any and all feedback on this patch, including but not limited to code quality, tests, fundamental logic. [1] https://www.postgresql.org/message-id/CANqtF-oXKe0M%3D0QOih6H%2BsZRjE2BWAbkW_1%2B9nMEAMLxUJg5jA%40mail.gmail.com [2] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytv...@alap3.anarazel.de [3] https://www.postgresql.org/message-id/3465209.1727202064%40sss.pgh.pa.us [4] https://www.postgresql.org/message-id/CAApHDvpUNu%3DiVcdJ74sypvgeaCF%2Btfpyb8VRhZaF7DTd1xVr7g%40mail.gmail.comThanksShayon v1-0001-Introduce-the-ability-to-enable-disa
Re: ECPG cleanup and fix for clang compile-time problem
Alexander Lakhin writes: > Maybe you would like to fix in passing several (not new) defects, I've > found while playing with ecpg under Valgrind: Done. After evaluation I concluded that none of these were worth the trouble to back-patch, but by all means let's fix such things in HEAD. regards, tom lane
Re: New "raw" COPY format
Joel Jacobson wrote: > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. On the other hand, that might prevent cases where we want to export, for instance, a valid json array: copy (select json_agg(col) from table ) to 'file' RAW This is a variant of the discussion in [1] where the OP does: copy (select json_agg(row_to_json(t)) from t) TO 'file' and he complains that both text and csv "break the JSON". That discussion morphed into a proposed patch adding JSON format to COPY, but RAW would work directly as the OP expected. That is, unless happens to include JSON fields with LF/CRLF in them, and the RAW format says this is an error condition. In that case it's quite annoying to make it an error, rather than simply let it pass. [1] https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=k...@mail.gmail.com Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: Misleading error "permission denied for table"
Nathan Bossart writes: > On Wed, Oct 16, 2024 at 07:36:29PM +0530, Ashutosh Bapat wrote: >> Shouldn't we report "permission defined for column atest5.three? > We do have "permission denied for column" messages in aclchk.c (e.g., > aclcheck_error_col()), but I don't see them actually used anywhere (or at > least they don't show up in any expected regression test output). I'm > inclined to agree that a more specific error would be nice, but I worry > there's some hidden complexity that's prevented it thus far... See ExecCheckOneRelPerms, which seems to regard this as a TODO. I think the hard part is how to report the cases that involve pg_attribute_aclcheck_all(..., ACLMASK_ANY), which means * If 'how' is ACLMASK_ANY, then returns ACLCHECK_OK if user has any of the * privileges identified by 'mode' on any non-dropped column in the relation; so that you can't really finger a specific column as being in violation. Maybe we could leave those cases as just mentioning the rel; not sure if that leads to failing to move the goalposts much. Otherwise, we could extend ExecCheckOneRelPerms and pg_attribute_aclcheck_all to pass back a column number, and then modify the error reporting in ExecCheckPermissions. regards, tom lane
Re: allowing extensions to control planner behavior
On Mon, Oct 14, 2024 at 6:02 AM Jakub Wartak wrote: >> I wonder if we could think about reversing the order of operations >> here and making it so that we do the distinct-ification during parse >> analysis or maybe early in planning, so that the name you see EXPLAIN >> print out is the real name of that thing and not just a value invented >> for display purposes. > > This if that's possible?, or simply some counter and numbering the plan > operation? or Andrei's response/idea of using hashing?? I spent some time looking at this. I think everything that we might ever want to do here is possible, but what I unfortunately couldn't find is a place to do it where it would be more-or-less free. In parse analysis, we do detect certain kinds of namespace collisions. For example, "SELECT * FROM x, x" or "SELECT * FROM x, y x" will fail with ERROR: table name "x" specified more than once. Similarly, "SELECT * FROM generate_series(1,2), x generate_series" will fail with ERROR: table name "generate_series" specified more than once, even though generate_series is not, strictly speaking, a table name. From this you might infer that each alias can be used only once, but it turns out that's not entirely correct. If you say "SELECT * FROM x, q.x" that is totally fine even though both relations end up with the alias "x". There's a specific exception in the code to allow this case, when the aliases are the same and but they refer to different relation OIDs, and this is justified by reference to the SQL standard. Furthermore, "SELECT * FROM (x JOIN y ON true) x, y" is totally fine even though there are two x's and two y's. The fact that the parenthesized part has its own alias makes everything inside the parentheses a separate scope from the names outside the parentheses, so there's no name conflict. If you delete the "x" just after the closing parenthesis then it's all a single scope and you get a complaint about "y" being used twice. In this example, we're allowed to have collisions even though there are no subqueries, but it is also true that each subquery level gets its own scope e.g. "SELECT * FROM (SELECT * FROM x) x" is permitted. What I think all this means is that piggy-backing on the existing logic here doesn't look especially promising. If we're trying to identify a specific usage of a specific relation inside of a query, we want global uniqueness across the whole query, but this isn't even local uniqueness within a certain query level -- it's uniqueness within a part of a query level with a special carve-out for same-named tables in different schemas. Just to drive the point home, the duplicate checks are done with two nested for loops, a good old-fashioned O(n^2) algorithm, instead of something like entering everything into a hash table and complaining if the entry already exists. We're basically relying on the fact that there won't be very many items at a single level of a FROM-list for this to have reasonable performance; and scaling it even as far as the whole query sounds like it might be dubious. Maybe it's possible to rewrite this somehow to use a hash table and distinguish the cases where it shouldn't complain, but it doesn't look like a ton of fun. Once we get past parse analysis, there's really nothing that cares about duplicate names, as far as I can see. The planner is perfectly happy to work on multiple instances of the same table or of different tables with the same alias, and it really has no reason to care about what anything would look like if it were deparsed and printed out. I suppose this is why the code works the way it does: by postponing renaming until somebody actually does EXPLAIN or some other deparsing operation, we avoid doing it when it isn't "really needed". Aside from this nudge-the-planner use case there's no real reason to do anything else. In terms of solving the problem, nothing prevents us from installing a hash table in PlannerGlobal and using that to label every RangeTblEntry (except unnamed joins, probably) with a unique alias. My tentative idea is to have the hash table key be a string. We'd check whether the table alias is of the form string+'_'+positive_integer_without_a_leading_zero. If so, we'd look up the string in the hash table, else the entire alias. The hash table entry would tell us which integers were already used for that string, so then we could arrange to use one that isn't used yet, essentially re-aliasing tables wherever collisions occur. However, in the "normal" case where the query plan is not being printed and no query-plan-frobbing extensions are in use, this effort is all wasted. Maybe there's a way to do this kind of work only on demand, but it seems to be somewhat complicated by the way we handle flattening the range table: initially, every subquery level gets its own range table, and at the end of planning, those range tables are all collapsed into one. This means that depending on WHEN we get asked for information about the re-aliased table
Re: New "raw" COPY format
On Tue, Oct 15, 2024 at 1:38 PM Joel Jacobson wrote: > > However, I thinking rejecting such column data seems like the > better alternative, to ensure data exported with COPY TO > can always be imported back using COPY FROM, > for the same format. If text column data contains newlines, > users probably ought to be using the text or csv format instead. Yeah. I think _someone's_ going to have strong opinions one way or the other, but that person is not me. And I assume a contents check during COPY TO is going to have a noticeable performance impact... > > - RAW seems like an okay-ish label, but for something that's doing as > > much magic end-of-line detection as this patch is, I'd personally > > prefer SINGLE (as in, "single column"). > > It's actually the same end-of-line detection as the text format > in copyfromparse.c's CopyReadLineText(), except the code > is simpler thanks to not having to deal with quotes or escapes. Right, sorry, I hadn't meant to imply that you made it up. :D Just that a "raw" format that is actually automagically detecting things doesn't seem very "raw" to me, so I prefer the other name. > It basically just learns the newline sequence based on the first > occurrence, and then require it to be the same throughout the file. A hypothetical type whose text representation can contain '\r' but not '\n' still can't be unambiguously round-tripped under this scheme: COPY FROM will see the "mixed" line endings and complain, even though there's no ambiguity. Maybe no one will run into that problem in practice? But if they did, I think that'd be a pretty frustrating limitation. It'd be nice to override the behavior, to change it from "do what you think I mean" to "do what I say". > > - Speaking of magic end-of-line detection, can there be a way to turn > > that off? Say, via DELIMITER? > > - Generic DELIMITER support, for any single-byte separator at all, > > might make a "single-column" format more generally applicable. But I > > might be over-architecting. And it would make the COPY TO issue even > > worse... > > That's an interesting idea that would provide more flexibility, > though, at the cost of complicating things by overloading the meaning > of DELIMITER. I think that'd be a docs issue rather than a conceptual one, though... it's still a delimiter. I wouldn't really expect end-user confusion. > If aiming to make this more generally applicable, > then at least DELIMITER would need to be multi-byte, > since otherwise the Windows case \r\n couldn't be specified. True. > What I found appealing with the idea of a new COPY format, > was that instead of overloading the existing options > with more complexity, a new format wouldn't need to affect > the existing options, and the new format could be explained > separately, without making things worse for users not > using this format. I agree that we should not touch the existing formats. If RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not proposing that the other formats should change. --Jacob
Re: Misleading error "permission denied for table"
On Wed, Oct 16, 2024 at 07:36:29PM +0530, Ashutosh Bapat wrote: > In privileges.sql there are tests for column level privileges e.g. > > INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set > three = 10 RETURNING atest5.three; > ERROR: permission denied for table atest5 > > In the above case the current user regress_priv_user4, doesn't have > privileges to access atest5.three. But the error does not mention > atest5.three anywhere. In fact, if the same query were to be changed > to return atest5.four, it would succeed since the user has privileges > to access column atest5.four. > > Shouldn't we report "permission defined for column atest5.three? We do have "permission denied for column" messages in aclchk.c (e.g., aclcheck_error_col()), but I don't see them actually used anywhere (or at least they don't show up in any expected regression test output). I'm inclined to agree that a more specific error would be nice, but I worry there's some hidden complexity that's prevented it thus far... -- nathan
Re: System views for versions reporting
Joe Conway writes: > On 10/16/24 08:47, Peter Eisentraut wrote: >> That way, you can get this information without having to start a server >> instance. (Maybe you can't start a server instance because it just >> crashed because of some library version issue ...) > While it is also useful to be able to get the info without being able to > start the server, I think that would be an addition not a replacement. > When you have a fleet with no direct access to run shell commands, being > able to get this info via SQL is valuable. Yeah. In addition, I envisioned that this might include information that's only readily available at runtime. Don't have a concrete example at hand (-ENOCAFFEINE) but I think that --version is necessarily going to be exceedingly constrained in what it can do. Another problem is that, just like with version(), there is already code making assumptions about what --version will output. Most of that is under our control, but perhaps not all. The main value of a new system view, IMV, is that it's a completely green field for us to define the contents of. regards, tom lane
Re: Doc: shared_memory_size_in_huge_pages with the "SHOW" command.
On Wed, Oct 16, 2024 at 03:31:05PM +0900, Fujii Masao wrote: > Another idea is to use the same wording as for num_os_semaphores in > runtime.sgml, like this: > > This parameter can be viewed > before starting the server with a postgres command > like: WFM -- nathan
Misleading error "permission denied for table"
Hi hackers, In privileges.sql there are tests for column level privileges e.g. INSERT INTO atest5(two) VALUES (6) ON CONFLICT (two) DO UPDATE set three = 10 RETURNING atest5.three; ERROR: permission denied for table atest5 In the above case the current user regress_priv_user4, doesn't have privileges to access atest5.three. But the error does not mention atest5.three anywhere. In fact, if the same query were to be changed to return atest5.four, it would succeed since the user has privileges to access column atest5.four. Shouldn't we report "permission defined for column atest5.three? -- Best Wishes, Ashutosh Bapat
Re: Doc: typo in config.sgml
On Wed, Oct 16, 2024 at 10:00:15AM +0200, Peter Eisentraut wrote: > On 15.10.24 23:51, Bruce Momjian wrote: > > On Tue, Oct 15, 2024 at 05:27:49PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > Well, we can only use Latin-1, so the idea is that we will be explicit > > > > about specifying Latin-1 only as HTML entities, rather than letting > > > > non-Latin-1 creep in as UTF8. We can exclude certain UTF8 or SGML files > > > > if desired. > > > > > > That policy would cause substantial problems with contributor names > > > in the release notes. I agree with Peter that we don't need this. > > > Catching otherwise-invisible characters seems sufficient. > > > > Uh, why can't we use HTML entities going forward? Is that harder? > > I think the question should be the other way around. The entities are a > historical workaround for when encoding support and rendering support was > poor. Now you can just type in the characters you want as is, which seems > nicer. Yes, that does make sense, and if we fully supported Unicode, we could ignore all of this. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Doc: typo in config.sgml
On Wed, Oct 16, 2024 at 09:58:23AM +0200, Peter Eisentraut wrote: > On 15.10.24 23:51, Bruce Momjian wrote: > > > I don't see why we need to enforce this at this level. Whatever > > > downstream > > > toolchain has requirements about which characters are allowed will > > > complain > > > if it encounters a character it doesn't like. > > > > Uh, the PDF build does not complain if you pass it a non-Latin-1 UTF8 > > characters. To test this I added some Russian characters (non-Latin-1) > > to release.sgml: > > > > (⟨б⟩, ⟨в⟩, ⟨г⟩, ⟨д⟩, ⟨ж⟩, ⟨з⟩, ⟨к⟩, ⟨л⟩, ⟨м⟩, ⟨н⟩, ⟨п⟩, ⟨р⟩, ⟨с⟩, ⟨т⟩, > > ⟨ф⟩, ⟨х⟩, ⟨ц⟩, ⟨ч⟩, ⟨ш⟩, ⟨щ⟩), ten vowels (⟨а⟩, ⟨е⟩, ⟨ё⟩, ⟨и⟩, ⟨о⟩, ⟨у⟩, > > ⟨ы⟩, ⟨э⟩, ⟨ю⟩, ⟨я⟩), a semivowel / consonant (⟨й⟩), and two modifier > > letters or "signs" (⟨ъ⟩, ⟨ь⟩) > > > > and I ran 'make postgres-US.pdf', and then removed the Russian > > characters and ran the same command again. The output, including stderr > > was identical. The PDFs, of course, were not, with the Russian > > characters showing as "". Makefile output attached. > > Hmm, mine complains: My Debian 12 toolchain must be older. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: [PoC] Federated Authn/z with OAUTHBEARER
On 15.10.24 20:10, Jacob Champion wrote: On Tue, Oct 15, 2024 at 11:00 AM Alexander Lakhin wrote: I've discovered that starting from 0785d1b8b, make check -C src/bin/pg_combinebackup fails under Valgrind, with the following diagnostics: Yep, sorry for that (and thanks for the report!). It's currently tracked over at [1], but I should have mentioned it here. The patch I used is attached, renamed to not stress out the cfbot. I have committed this fix.
Re: Improve node type forward reference
On 15.10.24 16:43, Nathan Bossart wrote: On Tue, Oct 15, 2024 at 09:02:48AM +0200, Peter Eisentraut wrote: On 14.10.24 23:28, Nathan Bossart wrote: On Mon, Oct 14, 2024 at 09:47:59AM +0200, Peter Eisentraut wrote: But we can do this better by using an incomplete struct, like struct Query *viewQuery ...; That way, everything has the correct type and fewer casts are required. This technique is already used elsewhere in node type definitions. I noticed that the examples in parsenodes.h are for structs defined within the same file. If the struct is defined in a separate file, I guess you might need to include another header file wherever it is used, but that doesn't seem too bad. No, you can leave the struct incomplete. You only need to provide its full definition (= include the other header file) if you actually want to access the struct's fields. That's what I figured. This one LGTM, too, then. Committed, thanks.
Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions
On Thu, Oct 17, 2024 at 7:59 AM Bruce Momjian wrote: > > > Where are we on this? I still see this behavior. > > --- > > but I found following two examples returning different results, > > i think they should return the same value. > > select json_value('1', '$ == "1"' returning jsonb error on error); > > select json_query('1', '$ == "1"' returning jsonb error on error); This part has been resolved. see section Note section in https://www.postgresql.org/docs/current/functions-json.html#SQLJSON-QUERY-FUNCTIONS > > On Fri, Jun 21, 2024 at 04:53:55PM +0800, jian he wrote: > > On Fri, Jun 21, 2024 at 11:11 AM David G. Johnston > > wrote: > > > > > > On Thu, Jun 20, 2024 at 7:30 PM jian he > > > wrote: > > >> > > >> "predicate check expressions return the single three-valued result of > > >> > > >> the predicate: true, false, or unknown." > > >> "unknown" is wrong, because `select 'unknown'::jsonb;` will fail. > > >> here "unknown" should be "null"? see jsonb_path_query doc entry also. > > >> doc (https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS) <> While SQL-standard path expressions return the relevant element(s) of the queried JSON value, predicate check expressions return the single three-valued result of the predicate: true, false, or unknown. <> https://www.postgresql.org/docs/current/datatype-boolean.html says "The boolean type can have several states: “true”, “false”, and a third state, “unknown”, which is represented by the SQL null value." but here select jsonb_path_query('1', '$ == "a"'); return JSON null value, not SQL null value. however. select jsonb_path_match('1', '$ == "a"'); return SQL null value. maybe we can change to "predicate check expressions return the single three-valued result of the predicate: true, false, or null" Then in the section mention that when Predicate check expressions cannot be applied, it returns JSON null for function jsonb_path_query, return SQL NULL for function jsonb_path_match or @@ operator.
Re: Set query_id for query contained in utility statement
hi. Anthonin please check attached v9-0001, v9-0002, v9-003. v9-0001-Better-error-reporting-from-extension-scripts-Was.patch same as v4-0001-Improve-parser-s-reporting-of-statement-start-loc.patch in [1] v9-0002-Add-tests-covering-pgss-nested-queries.patch same as v8-0001-Add-tests-covering-pgss-nested-queries.patch in [2] which is your work. v9-0003-Track-location-in-nested-explain-statement.patch is the main change I made based on your patch. in [3] I mentioned adding "ParseLoc location" to ExplainStmt, then you found some problems at [4] with multi statements, now I found a way to resolve it. I also add "ParseLoc location;" to typedef struct CopyStmt. copy (select 1) to stdout; I tested my change can tracking beginning location and length of the nested query ("select 1") I didn't touch other nested queries cases yet, so far only ExplainStmt and CopyStmt1 IMHO, it's more neat than your patches. Can you give it a try? [1] https://www.postgresql.org/message-id/2245576.1728418678%40sss.pgh.pa.us [2] https://www.postgresql.org/message-id/CAO6_XqqMYOxJmHJWCKjP44T9AsW0MmKV87XUYCP3R9JZvYcVaw%40mail.gmail.com [3] https://www.postgresql.org/message-id/CACJufxEXSfk4o2jHDhf50fOY6WC%2BdFQke2gmpcz%2BEHVUsmEptg%40mail.gmail.com [4] https://www.postgresql.org/message-id/CAO6_Xqrjr_1Ss0bRe5VFm6OsUwX2nuN_VhbhYj0LFP3acoaaWw%40mail.gmail.com SELECT pg_stat_statements_reset() IS NOT NULL AS t; explain(verbose) SELECT 1, 2, 3; explain(verbose) (SELECT 1, 2, 3); SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query COLLATE "C", toplevel; will have 2 calls for "SELECT $1, $2, $3" SELECT pg_stat_statements_reset() IS NOT NULL AS t; explain(verbose) (SELECT 1, 2, 3); explain(verbose) SELECT 1, 2, 3; SELECT toplevel, calls, query FROM pg_stat_statements ORDER BY query COLLATE "C", toplevel; will have 2 calls for " (SELECT $1, $2, $3)" I think that's fine. From 8718894eecfaf03dcce44f6dd3c90e0bd7294291 Mon Sep 17 00:00:00 2001 From: jian he Date: Wed, 16 Oct 2024 20:56:28 +0800 Subject: [PATCH v9 1/3] Better error reporting from extension scripts (Was: Extend ALTER OPERATOR) --- .../pg_stat_statements/expected/select.out| 5 +- contrib/pg_stat_statements/sql/select.sql | 3 +- src/backend/nodes/queryjumblefuncs.c | 6 ++ src/backend/parser/gram.y | 66 +++ 4 files changed, 34 insertions(+), 46 deletions(-) diff --git a/contrib/pg_stat_statements/expected/select.out b/contrib/pg_stat_statements/expected/select.out index dd6c756f67..e0e2fa265c 100644 --- a/contrib/pg_stat_statements/expected/select.out +++ b/contrib/pg_stat_statements/expected/select.out @@ -19,8 +19,9 @@ SELECT 1 AS "int"; 1 (1 row) +/* this comment should not appear in the output */ SELECT 'hello' - -- multiline + -- but this one will appear AS "text"; text --- @@ -129,7 +130,7 @@ SELECT calls, rows, query FROM pg_stat_statements ORDER BY query COLLATE "C"; ---+--+-- 1 |1 | PREPARE pgss_test (int) AS SELECT $1, $2 LIMIT $3 4 |4 | SELECT $1 + - | | -- multiline + + | | -- but this one will appear + | | AS "text" 2 |2 | SELECT $1 + $2 3 |3 | SELECT $1 + $2 + $3 AS "add" diff --git a/contrib/pg_stat_statements/sql/select.sql b/contrib/pg_stat_statements/sql/select.sql index eb45cb81ad..e0be58d5e2 100644 --- a/contrib/pg_stat_statements/sql/select.sql +++ b/contrib/pg_stat_statements/sql/select.sql @@ -12,8 +12,9 @@ SELECT pg_stat_statements_reset() IS NOT NULL AS t; -- SELECT 1 AS "int"; +/* this comment should not appear in the output */ SELECT 'hello' - -- multiline + -- but this one will appear AS "text"; SELECT 'world' AS "text"; diff --git a/src/backend/nodes/queryjumblefuncs.c b/src/backend/nodes/queryjumblefuncs.c index 5e43fd9229..e8bf95690b 100644 --- a/src/backend/nodes/queryjumblefuncs.c +++ b/src/backend/nodes/queryjumblefuncs.c @@ -90,6 +90,12 @@ CleanQuerytext(const char *query, int *location, int *len) /* * Discard leading and trailing whitespace, too. Use scanner_isspace() * not libc's isspace(), because we want to match the lexer's behavior. + * + * Note: the parser now strips leading comments and whitespace from the + * reported stmt_location, so this first loop will only iterate in the + * unusual case that the location didn't propagate to here. But the + * statement length will extend to the end-of-string or terminating + * semicolon, so the second loop often does something useful. */ while (query_len > 0 && scanner_isspace(query[0])) query++, query_location++, query_len--; diff --git a/src/backend/parser/gram.y b/src/backend/
Re: Should we document how column DEFAULT expressions work?
On 10/17/24 06:19, Bruce Momjian wrote: On Fri, Jul 5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote: On Fri, Jul 5, 2024 at 05:03:35PM -0400, Tom Lane wrote: Bruce Momjian writes: Well, 'now()' certainly _looks_ like a function call, though it isn't. The fact that 'now()'::timestamptz and 'now'::timestamptz generate volatile results via a function call was my point. The only reason 'now()'::timestamptz works is that timestamptz_in ignores irrelevant punctuation (or what it thinks is irrelevant, anyway). I do not think we should include examples that look like that, because it will further confuse readers who don't already have a solid grasp of how this works. Wow, I see that now: test=> SELECT 'now('::timestamptz; timestamptz --- 2024-07-05 17:04:33.457915-04 If I remove the 'now()' mention in the docs, patch attached, I am concerned people will be confused whether it is the removal of the single quotes or the use of "()" which causes insert-time evaluation, and they might try 'now()'. Does anyone like this patch? I changed now()::timestamptz to now::timestamptz. Pardon the noise, but can you consider the idea of replacing the phrase 'data insertion time' with something like 'immediately before the insertion operation starts'? Sometimes people (especially younglings) ask which time it is precisely: will it differ for each tuple? -- regards, Andrei Lepikhov
Re: BF mamba failure
On Wed, Oct 16, 2024 at 10:11:08AM +, Bertrand Drouvot wrote: > Indeed, in pgstat_release_entry_ref(), we're doing: > > if (pg_atomic_fetch_sub_u32(&entry_ref->shared_entry->refcount, 1) == 1) > . > . > shent = dshash_find(pgStatLocal.shared_hash, > &entry_ref->shared_entry->key, > true); > > I wonder if we are not decrementing &entry_ref->shared_entry->refcount too > early. > > I mean, wouldn't that make sense to decrement it after the dshash_find() call? > (to ensure a "proper" whole entry locking, done in dshash_find(), first) Making that a two-step process (first step to read the refcount, second step to decrement it after taking the exclusive lock through dshash_find) would make the logic more complicated what what we have now, without offering more protection, afaik, because you'd still need to worry about a race condition between the first and second steps. Making this a one-step only would incur more dshash_find() calls than we actually need, and I would not underestimate that under a heavily-concurrent pgstat_release_entry_ref() path taken. > Yeah, FWIW, we would be going from 32 bytes: >/* total size (bytes): 32 */ > > to 40 bytes (with the patch applied): >/* total size (bytes): 40 */ > > Due to the padding, that's a 8 bytes increase while we're adding "only" 4 > bytes. I have spent some time digging into all the original pgstat threads dealing with the shmem implementation or dshash, and I'm not really seeing anybody stressing on the size of the individual hash entries. This stuff was already wasting space with padding originally, so perhaps we're stressing too much here? Would anybody like to comment? -- Michael signature.asc Description: PGP signature
Re: Considering fractional paths in Append node
On 10/17/24 07:05, Andy Fan wrote: Nikita Malakhov writes: Helll Nikita, Hi hackers! Sorry, I've forgot to attach the patch itself. Please check it out. Could you check if [1] is related to this subject? I think the hard part would be which tuple_fraction to use during adding add_paths_to_append_rel since root->tuple_fraction is on subquery level, while the add_paths_to_append_rel is only on RelOptInfo level. [1] https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com Yes, this thread is connected to the code Nikita has proposed. It is almost related to partitioned cases, of course. I'm not sure about partitionwise joins - it looks overcomplicated for the moment. We just see cases when SeqScan is preferred to IndexScan. One logical way is to add into the Append node an alternative fractional path and, if at the top level some Limit, IncrementalSort or another fractional-friendly node will request only a small part of the data, the optimiser will have the option to choose the fractional path. -- regards, Andrei Lepikhov
Re: Should we document how column DEFAULT expressions work?
Andrei Lepikhov writes: > Pardon the noise, but can you consider the idea of replacing the phrase > 'data insertion time' with something like 'immediately before the > insertion operation starts'? Sometimes people (especially younglings) > ask which time it is precisely: will it differ for each tuple? If we're discussing the meaning of "now", it's the transaction start timestamp, cf https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-CURRENT I do not think other wordings will improve on that. regards, tom lane
Re: Doc: shared_memory_size_in_huge_pages with the "SHOW" command.
Thank you, everyone. the point being to encourage its use before the server is running as we don't want to allocate anything when tuning it. I was mistaken and now understand that it needs to be run before the server is running. Another idea is to use the same wording as for num_os_semaphores in runtime.sgml, like this: WFM I've fixed the patch and will register it in the CF. Regards, -- Yuki Seino NTT DATA CORPORATIONdiff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 2c4d5ef640..63669f2d2e 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -1431,9 +1431,9 @@ export PG_OOM_ADJUST_VALUE=0 the operating system to provide enough huge pages of the desired size. To determine the number of huge pages needed, use the postgres command to see the value of -. Note that the -server must be shut down to view this runtime-computed parameter. -This might look like: +. This parameter +can be viewed before starting the server with a postgres +command like: $ postgres -D $PGDATA -C shared_memory_size_in_huge_pages 3170
Re: Should we document how column DEFAULT expressions work?
On Wed, Oct 16, 2024 at 04:45:39PM -0700, David G. Johnston wrote: > On Wednesday, October 16, 2024, Bruce Momjian wrote: > I do not, but maybe I’m being overly pedantic. All string literals are parsed > during the create table command. It’s only the situations where that parsing > is non-deterministic that cause an issue. > > Is there anything wrong with the patch I proposed? I thought it was too verbose so the point was not clear. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: ltree docs imprecise about sorting order
On Thu, May 23, 2024 at 04:17:50PM +, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/16/ltree.html > Description: > > The ltree docs available at > https://www.postgresql.org/docs/current/ltree.html state "Comparison sorts > in the order of a tree traversal" without specifying the strategy > implemented to walk the tree. > A quick experiment suggests that the implemented solution is pre-ordered > depth-first search. > I suggest the ltree docs be amended to "Comparison sorts in the order of a > pre-ordered depth-first tree traversal". [ moved to hackers ] Can someone confirm this and/or create a patch? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Considering fractional paths in Append node
Hi hackers! Sorry, I've forgot to attach the patch itself. Please check it out. -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/ 0001_append_limit_v1.patch Description: Binary data
Re: Statistics Import and Export
> > Code fix with comment on why nobody expects a relpages -1. Test case to > demonstrate that relpages -1 can happen, and updated doc to reflect the new > lower bound. > Additional fixes, now in a patch-set: 1. Allow relpages to be set to -1 (partitioned tables with partitions have this value after ANALYZE). 2. Turn off autovacuum on tables (where possible) if they are going to be the target of pg_set_relation_stats(). 3. Allow pg_set_relation_stats to continue past an out-of-range detection on one attribute, rather than immediately returning false. From c0efe3fb2adac102e186e086bf94fb29fabba28b Mon Sep 17 00:00:00 2001 From: Corey Huinker Date: Tue, 15 Oct 2024 19:09:59 -0400 Subject: [PATCH v2 1/3] Allow pg_set_relation_stats() to set relpages to -1. While the default value for relpages is 0, if a partitioned table with at least one child has been analyzed, then the partititoned table will have a relpages value of -1. pg_set_relation_stats() should therefore be able to set relpages to -1. Add test case to demonstrate the behavior of ANALYZE on a partitioned table with child table(s), and to demonstrate that pg_set_relation_stats() now accepts -1. --- src/backend/statistics/relation_stats.c| 8 +++-- src/test/regress/expected/stats_import.out | 38 +- src/test/regress/sql/stats_import.sql | 25 ++ doc/src/sgml/func.sgml | 2 +- 4 files changed, 69 insertions(+), 4 deletions(-) diff --git a/src/backend/statistics/relation_stats.c b/src/backend/statistics/relation_stats.c index 26f15061e8..b90f70b190 100644 --- a/src/backend/statistics/relation_stats.c +++ b/src/backend/statistics/relation_stats.c @@ -99,11 +99,15 @@ relation_statistics_update(FunctionCallInfo fcinfo, int elevel) { int32 relpages = PG_GETARG_INT32(RELPAGES_ARG); - if (relpages < 0) + /* + * While the default value for relpages is 0, a partitioned table with at + * least one child partition can have a relpages of -1. + */ + if (relpages < -1) { ereport(elevel, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("relpages cannot be < 0"))); + errmsg("relpages cannot be < -1"))); table_close(crel, RowExclusiveLock); return false; } diff --git a/src/test/regress/expected/stats_import.out b/src/test/regress/expected/stats_import.out index cd1b80aa43..46e45a5fa3 100644 --- a/src/test/regress/expected/stats_import.out +++ b/src/test/regress/expected/stats_import.out @@ -135,9 +135,45 @@ SELECT 'stats_import.testview'::regclass); ERROR: cannot modify statistics for relation "testview" DETAIL: This operation is not supported for views. +-- Partitioned tables with at least 1 child partition will, when analyzed, +-- have a relpages of -1 +CREATE TABLE stats_import.part_parent ( i integer ) PARTITION BY RANGE(i); +CREATE TABLE stats_import.part_child_1 + PARTITION OF stats_import.part_parent + FOR VALUES FROM (0) TO (10); +ANALYZE stats_import.part_parent; +SELECT relpages +FROM pg_class +WHERE oid = 'stats_import.part_parent'::regclass; + relpages +-- + -1 +(1 row) + +-- nothing stops us from setting it to not -1 +SELECT +pg_catalog.pg_set_relation_stats( +relation => 'stats_import.part_parent'::regclass, +relpages => 2::integer); + pg_set_relation_stats +--- + t +(1 row) + +-- nothing stops us from setting it to -1 +SELECT +pg_catalog.pg_set_relation_stats( +relation => 'stats_import.part_parent'::regclass, +relpages => -1::integer); + pg_set_relation_stats +--- + t +(1 row) + DROP SCHEMA stats_import CASCADE; -NOTICE: drop cascades to 4 other objects +NOTICE: drop cascades to 5 other objects DETAIL: drop cascades to type stats_import.complex_type drop cascades to table stats_import.test drop cascades to sequence stats_import.testseq drop cascades to view stats_import.testview +drop cascades to table stats_import.part_parent diff --git a/src/test/regress/sql/stats_import.sql b/src/test/regress/sql/stats_import.sql index 3e9f6d9124..ad129b1ab9 100644 --- a/src/test/regress/sql/stats_import.sql +++ b/src/test/regress/sql/stats_import.sql @@ -95,4 +95,29 @@ SELECT pg_catalog.pg_clear_relation_stats( 'stats_import.testview'::regclass); +-- Partitioned tables with at least 1 child partition will, when analyzed, +-- have a relpages of -1 +CREATE TABLE stats_import.part_parent ( i integer ) PARTITION BY RANGE(i); +CREATE TABLE stats_import.part_child_1 + PARTITION OF stats_import.part_parent + FOR VALUES FROM (0) TO (10); + +ANALYZE stats_import.part_parent; + +SELECT relpages +FROM pg_class +WHERE oid = 'stats_import.part_parent'::regclass; + +-- nothing stops us from setting it to not -1 +SELECT +pg_catalog.pg_set_relation_stats( +relation => 'stats_import.part_parent'::regclass, +relpages => 2::integer); + +-- nothing stops us from setting it to -1 +SELECT +pg_catalog.pg_set_r
Re: Should we document how column DEFAULT expressions work?
On Fri, Jul 5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote: > On Fri, Jul 5, 2024 at 05:03:35PM -0400, Tom Lane wrote: > > Bruce Momjian writes: > > > Well, 'now()' certainly _looks_ like a function call, though it isn't. > > > The fact that 'now()'::timestamptz and 'now'::timestamptz generate > > > volatile results via a function call was my point. > > > > The only reason 'now()'::timestamptz works is that timestamptz_in > > ignores irrelevant punctuation (or what it thinks is irrelevant, > > anyway). I do not think we should include examples that look like > > that, because it will further confuse readers who don't already > > have a solid grasp of how this works. > > Wow, I see that now: > > test=> SELECT 'now('::timestamptz; > timestamptz > --- >2024-07-05 17:04:33.457915-04 > > If I remove the 'now()' mention in the docs, patch attached, I am > concerned people will be confused whether it is the removal of the > single quotes or the use of "()" which causes insert-time evaluation, > and they might try 'now()'. Does anyone like this patch? I changed now()::timestamptz to now::timestamptz. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 83859bac76f..f5b861b387f 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -897,6 +897,13 @@ WITH ( MODULUS numeric_literal, REM does not specify a value for the column. If there is no default for a column, then the default is null. + + + Note, a string that returns a volatile result once cast to a data + type, like 'now'::timestamptz, is evaluated at + table creation time, while now::timestamptz + (without quotes) is evaluated at data insertion time. +
Re: pgindent exit status if a file encounters an error
Uh, where are we on this? --- On Fri, Jun 28, 2024 at 06:05:35PM +0530, Ashutosh Bapat wrote: > > > On Wed, Jun 26, 2024 at 8:54 PM Tom Lane wrote: > > Ashutosh Bapat writes: > > The usage help mentions exit code 2 specifically while describing > --check > > option but it doesn't mention exit code 1. Neither does the README. So I > > don't think we need to document exit code 3 anywhere. Please let me know > if > > you think otherwise. > > I think we should have at least a code comment summarizing the > possible exit codes, along the lines of > > # Exit codes: > # 0 -- all OK > # 1 -- could not invoke pgindent, nothing done > # 2 -- --check mode and at least one file requires changes > # 3 -- pgindent failed on at least one file > > > Thanks. Here's a patch with these lines. > > In an offline chat Andrew mentioned that he expects more changes in the patch > and he would take care of those. Will review and test his patch. > > -- > Best Wishes, > Ashutosh Bapat > From 2a61830f0a2b6559d04bef75c6aa224e30ed1609 Mon Sep 17 00:00:00 2001 > From: Ashutosh Bapat > Date: Wed, 26 Jun 2024 15:46:53 +0530 > Subject: [PATCH 1/2] pgindent exit status on error > > When pg_bsd_indent exits with a non-zero status or reports an error, > make pgindent exit with non-zero status 3. The program does not exit on > the first instance of the error. Instead it continues to process > remaining files as long as some other exit condition is encountered, in > which case exit code 3 is reported. > > This helps to detect errors automatically when pgindent is run in shells > which interpret non-zero exit status as failure. > > Author: Ashutosh Bapat > Reviewed by: Andrew Dunstan > Discussion: > https://www.postgresql.org/message-id/caexhw5sprsifeldp-u1fa5ba7ys2f0gvljmkoobopkadjwq...@mail.gmail.com > --- > src/tools/pgindent/pgindent | 10 +- > 1 file changed, 9 insertions(+), 1 deletion(-) > > diff --git a/src/tools/pgindent/pgindent b/src/tools/pgindent/pgindent > index 48d83bc434..03b319c9c3 100755 > --- a/src/tools/pgindent/pgindent > +++ b/src/tools/pgindent/pgindent > @@ -1,5 +1,12 @@ > #!/usr/bin/perl > > +# Program to maintain uniform layout style in our C and Perl code. > +# Exit codes: > +# 0 -- all OK > +# 1 -- could not invoke pgindent, nothing done > +# 2 -- --check mode and at least one file requires changes > +# 3 -- pgindent failed on at least one file > + > # Copyright (c) 2021-2024, PostgreSQL Global Development Group > > use strict; > @@ -409,6 +416,7 @@ foreach my $source_filename (@files) > if ($source eq "") > { > print STDERR "Failure in $source_filename: " . $error_message . > "\n"; > + $status = 3; > next; > } > > @@ -429,7 +437,7 @@ foreach my $source_filename (@files) > > if ($check) > { > - $status = 2; > + $status ||= 2; > last unless $diff; > } > } > -- > 2.34.1 > -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Limiting overshoot in nbtree's parallel SAOP index scans
On Thu, 17 Oct 2024 at 00:33, Peter Geoghegan wrote: > > On Wed, Oct 16, 2024 at 5:48 PM Matthias van de Meent > wrote: > > In v17 and the master branch you'll note 16 buffer hits for the test > > query. However, when we use more expensive btree compare operations > > (e.g. by adding pg_usleep(1) to both btint8cmp and btint4cmp), the > > buffer access count starts to vary a lot and skyrockets to 30+ on my > > machine, in some cases reaching >100 buffer hits. After applying my > > patch, the buffer access count is capped to a much more agreeable > > 16-18 hits - it still shows signs of overshooting the serial bounds, > > but the number of buffers we overshoot our target is capped and thus > > significantly lower. > > It's not exactly capped, though. Since in any case you're always prone > to getting extra leaf page reads at the end of each primitive index > scan. That's not something that's new to Postgres 17, though. True, but the SAOP-enabled continued overshoot _is_ new: previously, each backend would only get up to one additional buffer access for every SOAP scan entry, while now it's only limited by outer SAOP bounds and index size. > Anyway, I'm still not convinced. Your test case requires adding a one > second delay to each ORDER proc comparison, microsecond. pg_usleep() takes microseconds as argument, and so pg_usleep(1) should sleep for >=1 microsecond, but definitely much less than 1 second. > and so has an unrealistic adversarial character. I don't think it's too adversarial to expect some compare operations for a page to take one microsecond, especially when that compare operation is related to SAOPs. I think complex comparators like those for jsonb or collated text can certainly take much more CPU time than your usual integer compare operation, which can definitely be long enough for other workers to seize the scan. It's just that I didn't want to spend a lot of time building a dataset that would expose the issue when a single and obvious modification can do the same. Kind regards, Matthias van de Meent Neon (https://neon.tech)
Should we document how column DEFAULT expressions work?
On Wednesday, October 16, 2024, Bruce Momjian wrote: > On Fri, Jul 5, 2024 at 05:11:22PM -0400, Bruce Momjian wrote: > > On Fri, Jul 5, 2024 at 05:03:35PM -0400, Tom Lane wrote: > > > Bruce Momjian writes: > > > > Well, 'now()' certainly _looks_ like a function call, though it > isn't. > > > > The fact that 'now()'::timestamptz and 'now'::timestamptz generate > > > > volatile results via a function call was my point. > > > > > > The only reason 'now()'::timestamptz works is that timestamptz_in > > > ignores irrelevant punctuation (or what it thinks is irrelevant, > > > anyway). I do not think we should include examples that look like > > > that, because it will further confuse readers who don't already > > > have a solid grasp of how this works. > > > > Wow, I see that now: > > > > test=> SELECT 'now('::timestamptz; > > timestamptz > > --- > >2024-07-05 17:04:33.457915-04 > > > > If I remove the 'now()' mention in the docs, patch attached, I am > > concerned people will be confused whether it is the removal of the > > single quotes or the use of "()" which causes insert-time evaluation, > > and they might try 'now()'. > > Does anyone like this patch? I changed now()::timestamptz to > now::timestamptz. > I do not, but maybe I’m being overly pedantic. All string literals are parsed during the create table command. It’s only the situations where that parsing is non-deterministic that cause an issue. Is there anything wrong with the patch I proposed? David J.
Re: POC, WIP: OR-clause support for indexes
On Wed, Oct 16, 2024 at 7:22 AM Andrei Lepikhov wrote: > On 10/12/24 21:25, Alexander Korotkov wrote: > > I forgot to specify (COSTS OFF) for EXPLAINs in regression tests. Fixed in > > v42. > I've passed through the patch set. > > Let me put aside the v42-0003 patch—it looks debatable, and I need time > to analyse the change in regression tests caused by this patch. Yes, 0003 patch is for illustration purposes for now. I will not keep rebasing it. We can pick it later when main patches are committed. > Comments look much better according to my current language level. Ideas > with fast exits also look profitable and are worth an additional > 'matched' variable. > > So, in general, it is ok. I think only one place with > inner_other_clauses can be improved. Maybe it will be enough to create > this list only once, outside 'foreach(j, groupedArgs)' cycle? Also, the > comment on the necessity of this operation was unclear to me. See the > attachment for my modest attempt at improving it. Thank you, I've integrated your patch with minor edits from me. -- Regards, Alexander Korotkov Supabase v43-0002-Teach-bitmap-path-generation-about-transforming-.patch Description: Binary data v43-0001-Transform-OR-clauses-to-SAOP-s-during-index-matc.patch Description: Binary data
Re: Using per-transaction memory contexts for storing decoded tuples
On Wed, Oct 16, 2024 at 10:32 AM Masahiko Sawada wrote: > > On Tue, Oct 15, 2024 at 9:01 PM Amit Kapila wrote: > > > > On Tue, Oct 15, 2024 at 11:15 PM Masahiko Sawada > > wrote: > > > > > > On Sun, Oct 13, 2024 at 11:00 PM Amit Kapila > > > wrote: > > > > > > > > On Fri, Oct 11, 2024 at 3:40 AM Masahiko Sawada > > > > wrote: > > > > > > > > > > Please find the attached patches. > > > > > > > > > > > Thank you for reviewing the patch! > > > > > > > > > > > @@ -343,9 +343,9 @@ ReorderBufferAllocate(void) > > > > */ > > > > buffer->tup_context = GenerationContextCreate(new_ctx, > > > > "Tuples", > > > > - SLAB_LARGE_BLOCK_SIZE, > > > > - SLAB_LARGE_BLOCK_SIZE, > > > > - SLAB_LARGE_BLOCK_SIZE); > > > > + SLAB_DEFAULT_BLOCK_SIZE, > > > > + SLAB_DEFAULT_BLOCK_SIZE, > > > > + SLAB_DEFAULT_BLOCK_SIZE); > > > > > > > > Shouldn't we change the comment atop this change [1] which states that > > > > we should benchmark the existing values? > > > > > > Agreed. > > > > > > > Can we slightly tweak the comments as follows so that it doesn't sound > > like a fix for a bug? > > > > /* To minimize memory fragmentation caused by long-running > > transactions with changes spanning multiple memory blocks, we use a > > single fixed-size memory block for decoded tuple storage. The tests > > showed that the default memory block size maintains logical decoding > > performance without causing fragmentation due to concurrent > > transactions. One might think that we can use the max size as > > SLAB_LARGE_BLOCK_SIZE but the test also showed it doesn't help resolve > > the memory fragmentation. > > Agreed. I've incorporated your comment in the latest patches. I'm > going to push them today, barring any objections or further comments. Pushed. Thank you all for reviewing and testing the patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Should we document how column DEFAULT expressions work?
Bruce Momjian writes: > Does anyone like this patch? I changed now()::timestamptz to > now::timestamptz. No, because you clearly didn't bother to test it: regression=# select now::timestamptz; ERROR: column "now" does not exist LINE 1: select now::timestamptz; ^ Also "a string that returns a volatile result once cast to a data type" is pretty meaningless to my eyes, not least because the real problem is that the construct is *not* volatile, but gets folded to a constant at CREATE TABLE time. The distinction we want to draw is that 'now'::timestamptz is a parse-time constant and so is not equivalent to the function call now() (which already produces timestamptz, so no need for a cast). This is already covered at the end of section 9.9.5 Current Date/Time, although I have no objection to repeating it in some form in the CREATE TABLE docs. regards, tom lane
Re: Set AUTOCOMMIT to on in script output by pg_dump
On 2024-10-10 14:56, Shinya Kato wrote: A new patch is attached. I am not a native English, so corrections to the texts are welcome. I created a commit fest entry. https://commitfest.postgresql.org/50/5306/ -- Regards, Shinya Kato NTT DATA GROUP CORPORATION
Re: MergeAppend could consider sorting cheapest child path
Bruce Momjian writes: > Is this still being considered? I'd +1 on this feature. I guess this would be more useful on parallel case, where the Sort can be pushed down to parallel worker, and in the distributed database case, where the Sort can be pushed down to multiple nodes, at the result, the leader just do the merge works. At the high level implementaion, sorting *cheapest* child path looks doesn't add too much overhead on the planning effort. -- Best Regards Andy Fan
Re: wrong comment in libpq.h
On Fri, May 10, 2024 at 11:14:51AM -0700, David Zhang wrote: > > > It looks like this wording "prototypes for functions in" is used many > > times in src/include/, in many cases equally inaccurately, so I would > > suggest creating a more comprehensive patch for this. > > I noticed this "prototypes for functions in" in many header files and > briefly checked them. It kind of all make sense except the bufmgr.h has > something like below. > > /* in buf_init.c */ > extern void InitBufferPool(void); > extern Size BufferShmemSize(void); > > /* in localbuf.c */ > extern void AtProcExit_LocalBuffers(void); > > /* in freelist.c */ > > which doesn't say "prototypes for functions xxx", but it still make sense > for me. I looked at this and decided the GUC section was illogical, so I just moved the variables up into the be-secure.c section. Patch attached. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/src/include/libpq/libpq.h b/src/include/libpq/libpq.h index 142c98462ed..2a1ff89eb71 100644 --- a/src/include/libpq/libpq.h +++ b/src/include/libpq/libpq.h @@ -86,19 +86,6 @@ extern bool pq_check_connection(void); /* * prototypes for functions in be-secure.c */ -extern PGDLLIMPORT char *ssl_library; -extern PGDLLIMPORT char *ssl_cert_file; -extern PGDLLIMPORT char *ssl_key_file; -extern PGDLLIMPORT char *ssl_ca_file; -extern PGDLLIMPORT char *ssl_crl_file; -extern PGDLLIMPORT char *ssl_crl_dir; -extern PGDLLIMPORT char *ssl_dh_params_file; -extern PGDLLIMPORT char *ssl_passphrase_command; -extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload; -#ifdef USE_SSL -extern PGDLLIMPORT bool ssl_loaded_verify_locations; -#endif - extern int secure_initialize(bool isServerStart); extern bool secure_loaded_verify_locations(void); extern void secure_destroy(void); @@ -109,6 +96,27 @@ extern ssize_t secure_write(Port *port, void *ptr, size_t len); extern ssize_t secure_raw_read(Port *port, void *ptr, size_t len); extern ssize_t secure_raw_write(Port *port, const void *ptr, size_t len); +/* + * declarations for variables defined in be-secure.c + */ +extern PGDLLIMPORT char *ssl_library; +extern PGDLLIMPORT char *ssl_ca_file; +extern PGDLLIMPORT char *ssl_cert_file; +extern PGDLLIMPORT char *ssl_crl_file; +extern PGDLLIMPORT char *ssl_crl_dir; +extern PGDLLIMPORT char *ssl_key_file; +extern PGDLLIMPORT int ssl_min_protocol_version; +extern PGDLLIMPORT int ssl_max_protocol_version; +extern PGDLLIMPORT char *ssl_passphrase_command; +extern PGDLLIMPORT bool ssl_passphrase_command_supports_reload; +extern PGDLLIMPORT char *ssl_dh_params_file; +extern PGDLLIMPORT char *SSLCipherSuites; +extern PGDLLIMPORT char *SSLECDHCurve; +extern PGDLLIMPORT bool SSLPreferServerCiphers; +#ifdef USE_SSL +extern PGDLLIMPORT bool ssl_loaded_verify_locations; +#endif + /* * prototypes for functions in be-secure-gssapi.c */ @@ -116,13 +124,6 @@ extern ssize_t secure_raw_write(Port *port, const void *ptr, size_t len); extern ssize_t secure_open_gssapi(Port *port); #endif -/* GUCs */ -extern PGDLLIMPORT char *SSLCipherSuites; -extern PGDLLIMPORT char *SSLECDHCurve; -extern PGDLLIMPORT bool SSLPreferServerCiphers; -extern PGDLLIMPORT int ssl_min_protocol_version; -extern PGDLLIMPORT int ssl_max_protocol_version; - enum ssl_protocol_versions { PG_TLS_ANY = 0,
Re: Fixing Hash Join bug I caused with adf97c156
On Wed, 16 Oct 2024 at 15:21, David Rowley wrote: > Here's a patch including a test this time. I've pushed this patch. Thanks for looking. David
Re: ltree docs imprecise about sorting order
On Wednesday, October 16, 2024, Bruce Momjian wrote: > On Thu, May 23, 2024 at 04:17:50PM +, PG Doc comments form wrote: > > The following documentation comment has been logged on the website: > > > > Page: https://www.postgresql.org/docs/16/ltree.html > > Description: > > > > The ltree docs available at > > https://www.postgresql.org/docs/current/ltree.html state "Comparison > sorts > > in the order of a tree traversal" without specifying the strategy > > implemented to walk the tree. > > A quick experiment suggests that the implemented solution is pre-ordered > > depth-first search. > > I suggest the ltree docs be amended to "Comparison sorts in the order of > a > > pre-ordered depth-first tree traversal". > > [ moved to hackers ] > > Can someone confirm this and/or create a patch? > If we are going to update the description of sorting we should probably take the chance to mention collation, or I believe the lack thereof. Including an example with order by wouldn’t hurt either. The example data is also a perfect tree, all intermediate nodes exist as their own rows. It may be informative to include exceptions to this rule. David J.
Re: New "raw" COPY format
On Wed, Oct 16, 2024, at 18:04, Jacob Champion wrote: > A hypothetical type whose text representation can contain '\r' but not > '\n' still can't be unambiguously round-tripped under this scheme: > COPY FROM will see the "mixed" line endings and complain, even though > there's no ambiguity. Yeah, that's quite an ugly limitation. > Maybe no one will run into that problem in practice? But if they did, > I think that'd be a pretty frustrating limitation. It'd be nice to > override the behavior, to change it from "do what you think I mean" to > "do what I say". That would be nice. >> That's an interesting idea that would provide more flexibility, >> though, at the cost of complicating things by overloading the meaning >> of DELIMITER. > > I think that'd be a docs issue rather than a conceptual one, though... > it's still a delimiter. I wouldn't really expect end-user confusion. Yeah, I meant the docs, but that's probably fine, we could just add to DELIMITER. >> What I found appealing with the idea of a new COPY format, >> was that instead of overloading the existing options >> with more complexity, a new format wouldn't need to affect >> the existing options, and the new format could be explained >> separately, without making things worse for users not >> using this format. > > I agree that we should not touch the existing formats. If > RAW/SINGLE/whatever needed a multibyte line delimiter, I'm not > proposing that the other formats should change. Right, I didn't think you did either, I meant overloading the existing options, from a docs perspective. But I agree it's probably fine if we just overload DELIMITER in the docs, that should be possible to explain in a pedagogic way, without causing confusion. /Joel
Re: ECPG cleanup and fix for clang compile-time problem
I wrote: > Attached are rebased and renumbered 0006-0008, mostly to keep the > cfbot happy. For some reason I thought the stuff I pushed later on Monday didn't interact with these patches, but the cfbot disabused me of that folly. Here's a rebased v7 --- no substantive change. regards, tom lane From 52e50770ca723278cf04da52fee0ffaddfae8235 Mon Sep 17 00:00:00 2001 From: Tom Lane Date: Wed, 16 Oct 2024 13:38:13 -0400 Subject: [PATCH v7 1/3] ecpg: fix some memory leakage of data-type-related structures. ECPGfree_type() and related functions were quite incomplete about removing subsidiary data structures. Possibly this is because ecpg wasn't careful to make sure said data structures always had their own storage. Previous patches in this series cleaned up a lot of that, and I had to add a couple more mm_strdup's here. Also, ecpg.trailer tended to overwrite struct_member_list[struct_level] without bothering to free up its previous contents, thus potentially leaking a lot of struct-member-related storage. Add ECPGfree_struct_member() calls at appropriate points. (Note: the lifetime of those lists is not obvious. They are still live after initial construction, in order to handle cases like struct foo { ... } foovar1, foovar2; We can't delete the list immediately after parsing the right brace, because it needs to be copied for each of the variables. Instead, it's kept around until the next struct declaration.) Discussion: https://postgr.es/m/2011420.1713493...@sss.pgh.pa.us --- src/interfaces/ecpg/preproc/ecpg.header | 3 ++- src/interfaces/ecpg/preproc/ecpg.trailer | 11 +-- src/interfaces/ecpg/preproc/type.c | 15 +-- src/interfaces/ecpg/preproc/type.h | 5 +++-- src/interfaces/ecpg/preproc/variable.c | 7 ++- 5 files changed, 29 insertions(+), 12 deletions(-) diff --git a/src/interfaces/ecpg/preproc/ecpg.header b/src/interfaces/ecpg/preproc/ecpg.header index a415780faf..9554e2b02e 100644 --- a/src/interfaces/ecpg/preproc/ecpg.header +++ b/src/interfaces/ecpg/preproc/ecpg.header @@ -507,11 +507,12 @@ add_typedef(const char *name, const char *dimension, const char *length, this->name = mm_strdup(name); this->brace_level = braces_open; this->type = (struct this_type *) mm_alloc(sizeof(struct this_type)); + this->type->type_storage = NULL; this->type->type_enum = type_enum; this->type->type_str = mm_strdup(name); this->type->type_dimension = mm_strdup(dimension); /* dimension of array */ this->type->type_index = mm_strdup(length); /* length of string */ - this->type->type_sizeof = ECPGstruct_sizeof; + this->type->type_sizeof = ECPGstruct_sizeof ? mm_strdup(ECPGstruct_sizeof) : NULL; this->struct_member_list = (type_enum == ECPGt_struct || type_enum == ECPGt_union) ? ECPGstruct_member_dup(struct_member_list[struct_level]) : NULL; diff --git a/src/interfaces/ecpg/preproc/ecpg.trailer b/src/interfaces/ecpg/preproc/ecpg.trailer index e48ea2..0b15252433 100644 --- a/src/interfaces/ecpg/preproc/ecpg.trailer +++ b/src/interfaces/ecpg/preproc/ecpg.trailer @@ -752,6 +752,7 @@ var_type: simple_type else $$.type_sizeof = cat_str(3, "sizeof(", this->name, ")"); + ECPGfree_struct_member(struct_member_list[struct_level]); struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list); } } @@ -879,6 +880,7 @@ var_type: simple_type else $$.type_sizeof = cat_str(3, "sizeof(", this->name, ")"); + ECPGfree_struct_member(struct_member_list[struct_level]); struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list); } } @@ -901,6 +903,7 @@ var_type: simple_type $$.type_dimension = this->type->type_dimension; $$.type_index = this->type->type_index; $$.type_sizeof = this->type->type_sizeof; + ECPGfree_struct_member(struct_member_list[struct_level]); struct_member_list[struct_level] = ECPGstruct_member_dup(this->struct_member_list); } else @@ -910,6 +913,7 @@ var_type: simple_type $$.type_dimension = "-1"; $$.type_index = "-1"; $$.type_sizeof = ""; + ECPGfree_struct_member(struct_member_list[struct_level]); struct_member_list[struct_level] = NULL; } } @@ -925,6 +929,7 @@ enum_definition: '{' c_list '}' struct_union_type_with_symbol: s_struct_union_symbol { + ECPGfree_struct_member(struct_member_list[struct_level]); struct_member_list[struct_level++] = NULL; if (struct_level >= STRUCT_DEPTH) mmerror(PARSE_ERROR, ET_ERROR, "too many levels in nested structure/union definition"); @@ -966,12 +971,13 @@ struct_union_type_with_symbol: s_struct_union_symbol this->name = mm_strdup(su_type.type_str); this->brace_level = braces_open; this->type = (struct this_type *) mm_alloc(sizeof(struct this_type)); + this->type->type_storage = NULL; this->type->type_enum = su_type.type_enum; this->type->type_str = mm_strdup(su_type.type_str); this->type->type_dimension = mm_s
Re: Fix for consume_xids advancing XIDs incorrectly
On Tue, Oct 15, 2024 at 10:06 PM Yushi Ogiwara wrote: > > Hi, > > Thank you for your comment. > > Regarding the first patch, I believe it works correctly when > consume_xids(1) is called. This is because the lastxid variable in the > consume_xids_common function is initialized as lastxid = > ReadNextFullTransactionId(), where the ReadNextFullTransactionId > function returns the (current XID) + 1. But it's possible that concurrent transactions consume XIDs in meanwhile, no? > > Separately, I found that consume_xids(0) does not behave as expected. > Below is an example: > > postgres=# select txid_current(); > txid_current > -- > 45496 > (1 row) > > postgres=# select consume_xids(0); > consume_xids > -- > 45497 > (1 row) > > postgres=# select consume_xids(0); > consume_xids > -- > 45497 > (1 row) > > In the example, the argument to consume_xids is 0, meaning it should not > consume any XIDs. However, the first invocation of consume_xids(0) looks > like unexpectedly consuming 1 XID though it's not consuming actually. > This happens because consume_xids(0) returns the value from > ReadNextFullTransactionId. > > I have updated the patch (skip.diff, attached to this e-mail) to address > this issue. Now, when consume_xids(0) is called, it returns > ReadNextFullTransactionId().value - 1, ensuring no XID is consumed as > shown below: > > postgres=# select txid_current(); > txid_current > -- > 45498 > (1 row) > > postgres=# select consume_xids(0); > consume_xids > -- > 45498 > (1 row) Hmm, I think if we expect this function to return the last XID that the function actually consumed, calling consume_xids with 0 should raise an error instead. Even if it returns ReadNextFullTransactionId().value - 1 as you proposed, other concurrent transactions might consume XIDs between txid_current() and consume_xids(0), resulting in consume_xids() appearing to have consumed XIDs. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Pgoutput not capturing the generated columns
On Wed, Oct 9, 2024 at 11:00 AM vignesh C wrote: > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna > wrote: > > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith wrote: > > > > > > Hi Shubham, here are my review comments for v36-0001. > > > > > > == > > > 1. General - merge patches > > > > > > It is long past due when patches 0001 and 0002 should've been merged. > > > AFAIK the split was only because historically these parts had > > > different authors. But, keeping them separated is not helpful anymore. > > > > > > == > > > src/backend/catalog/pg_publication.c > > > > > > 2. > > > Bitmapset * > > > -pub_collist_validate(Relation targetrel, List *columns) > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols) > > > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused > > > so it should also be removed. > > > > > > == > > > src/backend/replication/pgoutput/pgoutput.c > > > > > > 3. > > > /* > > > - * If the publication is FOR ALL TABLES then it is treated the same as > > > - * if there are no column lists (even if other publications have a > > > - * list). > > > + * To handle cases where the publish_generated_columns option isn't > > > + * specified for all tables in a publication, we must create a column > > > + * list that excludes generated columns. So, the publisher will not > > > + * replicate the generated columns. > > > */ > > > - if (!pub->alltables) > > > + if (!(pub->alltables && pub->pubgencols)) > > > > > > I still found that comment hard to understand. Does this mean to say > > > something like: > > > > > > -- > > > Process potential column lists for the following cases: > > > > > > a. Any publication that is not FOR ALL TABLES. > > > > > > b. When the publication is FOR ALL TABLES and > > > 'publish_generated_columns' is false. > > > A FOR ALL TABLES publication doesn't have user-defined column lists, > > > so all columns will be replicated by default. However, if > > > 'publish_generated_columns' is set to false, column lists must still > > > be created to exclude any generated columns from being published > > > -- > > > > > > == > > > src/test/regress/sql/publication.sql > > > > > > 4. > > > +SET client_min_messages = 'WARNING'; > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) > > > STORED); > > > > > > AFAIK you don't need to keep changing 'client_min_messages', > > > particularly now that you've removed the WARNING message that was > > > previously emitted. > > > > > > ~ > > > > > > 5. > > > nit - minor comment changes. > > > > > > == > > > Please refer to the attachment which implements any nits from above. > > > > > > > I have fixed all the given comments. Also, I have created a new 0003 > > patch for the TAP-Tests related to the '011_generated.pl' file. I am > > planning to merge 0001 and 0003 patches once they will get fixed. > > The attached patches contain the required changes. > > There is inconsistency in replication when a generated column is > specified in the column list. The generated column data is not > replicated during initial sync whereas it is getting replicated during > incremental sync: > -- publisher > CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED) > INSERT INTO t1 VALUES (1); > CREATE PUBLICATION pub1 for table t1(c1, c2); > > --subscriber > CREATE TABLE t1(c1 int, c2 int) > CREATE SUBSCRIPTION sub1 connection 'dbname=postgres host=localhost > port=5432' PUBLICATION pub1; > > -- Generate column data is not synced during initial sync > postgres=# select * from t1; > c1 | c2 > + > 1 | > (1 row) > > -- publisher > INSERT INTO t1 VALUES (2); > > -- Whereas generated column data is synced during incremental sync > postgres=# select * from t1; > c1 | c2 > + > 1 | > 2 | 4 > (2 rows) > There was an issue for this scenario: CREATE TABLE t1(c1 int, c2 int GENERATED ALWAYS AS (c1 * 2) STORED) create publication pub1 for table t1(c1, c2) In this case included_cols was getting set to NULL. Changed it to get included_cols as it is instead of replacing with NULL and changed the condition to: if (server_version >= 18) { remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull)); /* * If the column is generated and neither the generated column * option is specified nor it appears in the column list, we will * skip it. */ if (remotegenlist[natt] && !has_pub_with_pubgencols && !included_cols) { ExecClearTuple(slot); continue; } } I will further think if there is a better solution for this. Please refer to the updated v39 Patches here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Re: Pgoutput not capturing the generated columns
On Wed, Oct 9, 2024 at 11:13 AM Peter Smith wrote: > > Hi, here are my review comments for patch v37-0001. > > == > Commit message > > 1. > Example usage of subscription option: > CREATE PUBLICATION FOR TABLE tab_gencol WITH (publish_generated_columns > = true); > > ~ > > This is wrong -- it's not a "subscription option". Better to just say > "Example usage:" > > ~~~ > > 2. > When 'copy_data' is true, during the initial sync, the data is replicated from > the publisher to the subscriber using the COPY command. The normal COPY > command does not copy generated columns, so when 'publish_generated_columns' > is true... > > ~ > > By only mentioning the "when ... is true" case this description does > not cover the scenario when 'publish_generated_columns' is false when > the publication column list has a generated column. > > ~~~ > > 3. > typo - /replication of generated column/replication of generated columns/ > typo - /filed/filled/ > typo - 'pg_publicataion' catalog > > == > src/backend/replication/logical/tablesync.c > > make_copy_attnamelist: > 4. > nit - missing word in a comment > > ~~~ > > fetch_remote_table_info: > 5. > + appendStringInfo(&cmd, > " FROM pg_catalog.pg_attribute a" > " LEFT JOIN pg_catalog.pg_index i" > " ON (i.indexrelid = pg_get_replica_identity_index(%u))" > " WHERE a.attnum > 0::pg_catalog.int2" > - " AND NOT a.attisdropped %s" > + " AND NOT a.attisdropped", lrel->remoteid); > + > + appendStringInfo(&cmd, > " AND a.attrelid = %u" > " ORDER BY a.attnum", > - lrel->remoteid, > - (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ? > - "AND a.attgenerated = ''" : ""), > lrel->remoteid); > > Version v37-0001 has removed a condition previously between these two > appendStringInfo's. But, that now means there is no reason to keep > these statements separated. These should be combined now to use one > appendStringInfo. > > ~ > > 6. > + if (server_version >= 12) > + remotegenlist[natt] = DatumGetBool(slot_getattr(slot, 5, &isnull)); > + > > Are you sure the version check for 12 is correct? IIUC, this 5 > matches the 'attgenerated' column, but the SQL for that was > constructed using a different condition: > if (server_version >= 18) > appendStringInfo(&cmd, ", a.attgenerated != ''"); > > It is this 12 versus 18 difference that makes me suspicious of > a potential mistake. > > ~~~ > > 7. > + /* > + * If the column is generated and neither the generated column option > + * is specified nor it appears in the column list, we will skip it. > + */ > + if (remotegenlist[natt] && !has_pub_with_pubgencols && > + !bms_is_member(attnum, included_cols)) > + { > + ExecClearTuple(slot); > + continue; > + } > > 7b. > I am also suspicious about how this condition interacts with the other > condition (shown below) that came earlier: > /* If the column is not in the column list, skip it. */ > if (included_cols != NULL && !bms_is_member(attnum, included_cols)) > > Something doesn't seem right. e.g. If we can only get here by passing > the earlier condition, then it means we already know the generated > condition was *not* a member of a column list in which case that > should affect this new condition and the new comment too. > > == > src/backend/replication/pgoutput/pgoutput.c > > pgoutput_column_list_init: > > 8. > /* > - * If the publication is FOR ALL TABLES then it is treated the same as > - * if there are no column lists (even if other publications have a > - * list). > + * Process potential column lists for the following cases: a. Any > + * publication that is not FOR ALL TABLES. b. When the publication is > + * FOR ALL TABLES and 'publish_generated_columns' is false. FOR ALL > + * TABLES publication doesn't have user-defined column lists, so all > + * columns will be replicated by default. However, if > + * 'publish_generated_columns' is set to false, column lists must > + * still be created to exclude any generated columns from being > + * published. > */ > > nit - please reformat this comment so the bullets are readable > I have fixed all the comments and posted the v39 patches for them. Please refer to the updated v39 Patches here in [1]. See [1] for the changes added. [1] https://www.postgresql.org/message-id/CAHv8RjLjb%2B98i5ZQUphivxdOZ3hSGLfq2SiWQetUvk8zGyAQwQ%40mail.gmail.com Thanks and Regards, Shubham Khanna.
Soft errors and ThrowErrorData() comment updates
The comments around ThrowErrorData() and how it might apply to soft errors is slightly confusing. Attached a patch which hopefully clarifies things. >From a distance, ThrowErrorData() is somewhat like ReThrowError(), but it's actually quite different. The former is expecting a free-standing ErrorData that hasn't been processed at all; while the latter is for an ERROR specifically, for which errstart() and errfinish() have already been called. We also might consider adding some asserts. Regards, Jeff Davis From 2fc18f71afc89d95eb38fe2d2606b5ff07a8b8f9 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Wed, 16 Oct 2024 13:16:02 -0700 Subject: [PATCH v1] Improve ThrowErrorData() comments for use with soft errors. --- src/backend/utils/error/elog.c | 15 +-- src/include/nodes/miscnodes.h | 7 --- 2 files changed, 13 insertions(+), 9 deletions(-) diff --git a/src/backend/utils/error/elog.c b/src/backend/utils/error/elog.c index 987ff98067..8acca3e0a0 100644 --- a/src/backend/utils/error/elog.c +++ b/src/backend/utils/error/elog.c @@ -1881,12 +1881,15 @@ FlushErrorState(void) /* * ThrowErrorData --- report an error described by an ErrorData structure * - * This is somewhat like ReThrowError, but it allows elevels besides ERROR, - * and the boolean flags such as output_to_server are computed via the - * default rules rather than being copied from the given ErrorData. - * This is primarily used to re-report errors originally reported by - * background worker processes and then propagated (with or without - * modification) to the backend responsible for them. + * This function should be called on an ErrorData structure that isn't stored + * on the errordata stack and hasn't been processed yet. It will call + * errstart() and errfinish() as needed, so those should not have already been + * called. + * + * ThrowErrorData() is useful for handling soft errors. It's also useful for + * re-reporting errors originally reported by background worker processes and + * then propagated (with or without modification) to the backend responsible + * for them. */ void ThrowErrorData(ErrorData *edata) diff --git a/src/include/nodes/miscnodes.h b/src/include/nodes/miscnodes.h index 1612b63acd..f6f2dd4377 100644 --- a/src/include/nodes/miscnodes.h +++ b/src/include/nodes/miscnodes.h @@ -36,9 +36,10 @@ * After calling code that might report an error this way, check * error_occurred to see if an error happened. If so, and if details_wanted * is true, error_data has been filled with error details (stored in the - * callee's memory context!). FreeErrorData() can be called to release - * error_data, although that step is typically not necessary if the called - * code was run in a short-lived context. + * callee's memory context!). The ErrorData can be modified (i.e. downgraded + * to a WARNING) and reported with ThrowErrorData(). FreeErrorData() can be + * called to release error_data, although that step is typically not necessary + * if the called code was run in a short-lived context. */ typedef struct ErrorSaveContext { -- 2.34.1
Re: [BUG FIX] Fix validation of COPY options FORCE_NOT_NULL/FORCE_NULL
On Wed, Oct 16, 2024 at 02:50:53PM +0900, Michael Paquier wrote: > You are right. f6d4c9cf162b got that wrong. Will fix and backpatch > with the extra tests. And done down to 17 for 0002, down to 16 for 0001, with tweaks in 0001 to limit the use of COPY TO in the queries where we want to force error patterns linked to the TO clause. The two tests for the all-column cases with force_quote were not strictly required for the bug, still are useful to have for coverage purposes. -- Michael signature.asc Description: PGP signature
Re: minor doc issue in 9.16.2.1.1. Boolean Predicate Check Expressions
Where are we on this? I still see this behavior. --- On Fri, Jun 21, 2024 at 04:53:55PM +0800, jian he wrote: > On Fri, Jun 21, 2024 at 11:11 AM David G. Johnston > wrote: > > > > On Thu, Jun 20, 2024 at 7:30 PM jian he wrote: > >> > >> "predicate check expressions return the single three-valued result of > >> > >> the predicate: true, false, or unknown." > >> "unknown" is wrong, because `select 'unknown'::jsonb;` will fail. > >> here "unknown" should be "null"? see jsonb_path_query doc entry also. > >> > > > > The syntax for json_exists belies this claim (assuming our docs are > > accurate there). Its "on error" options are true/false/unknown. > > Additionally, the predicate test operator is named "is unknown" not "is > > null". > > > > The result of the predicate test, which is never produced as a value, only > > a concept, is indeed "unknown" - which then devolves to false when it is > > practically applied to determining whether to output the path item being > > tested. As it does also when used in a parth expression. > > > > in [1] says > The similar predicate check expression simply returns true, indicating > that a match exists: > > => select jsonb_path_query(:'json', '$.track.segments[*].HR > 130'); > jsonb_path_query > -- > true > > > > but in this example > select jsonb_path_query('1', '$ == "1"'); > return null. > > I guess here, the match evaluation cannot be applied, thus returning null. > > > So summary: > if the boolean predicate check expressions are applicable, return true or > false. > > the boolean predicate check expressions are not applicable, return null. > example: select jsonb_path_query('1', '$ == "a"'); > > > but I found following two examples returning different results, > i think they should return the same value. > select json_value('1', '$ == "1"' returning jsonb error on error); > select json_query('1', '$ == "1"' returning jsonb error on error); > > [1] > https://www.postgresql.org/docs/devel/functions-json.html#FUNCTIONS-SQLJSON-CHECK-EXPRESSIONS > > -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: MergeAppend could consider sorting cheapest child path
Is this still being considered? --- On Tue, Jun 18, 2024 at 07:45:09PM +0300, Alexander Pyhalov wrote: > Hi. > > Now when planner finds suitable pathkeys in generate_orderedappend_paths(), > it uses them, even if explicit sort of the cheapest child path could be more > efficient. > > We encountered this issue on partitioned table with two indexes, where one > is suitable for sorting, and another is good for selecting data. MergeAppend > was generated > with subpaths doing index scan on suitably ordered index and filtering a lot > of data. > The suggested fix allows MergeAppend to consider sorting on cheapest > childrel total path as an alternative. > > -- > Best regards, > Alexander Pyhalov, > Postgres Professional > From d5eb3d326d83e2ca47c17552fcc6fd27b6b98d4e Mon Sep 17 00:00:00 2001 > From: Alexander Pyhalov > Date: Tue, 18 Jun 2024 15:56:18 +0300 > Subject: [PATCH] MergeAppend could consider using sorted best path. > > This helps when index with suitable pathkeys is not > good for filtering data. > --- > .../postgres_fdw/expected/postgres_fdw.out| 6 +- > src/backend/optimizer/path/allpaths.c | 21 + > src/test/regress/expected/inherit.out | 45 +- > src/test/regress/expected/partition_join.out | 87 +++ > src/test/regress/expected/union.out | 6 +- > src/test/regress/sql/inherit.sql | 17 > 6 files changed, 141 insertions(+), 41 deletions(-) > > diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out > b/contrib/postgres_fdw/expected/postgres_fdw.out > index ea566d50341..687591e4a97 100644 > --- a/contrib/postgres_fdw/expected/postgres_fdw.out > +++ b/contrib/postgres_fdw/expected/postgres_fdw.out > @@ -10074,13 +10074,15 @@ SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 > ON (t1.a = t2.b) WHERE t1.a > -> Nested Loop > Join Filter: (t1.a = t2.b) > -> Append > - -> Foreign Scan on ftprt1_p1 t1_1 > + -> Sort > + Sort Key: t1_1.a > + -> Foreign Scan on ftprt1_p1 t1_1 > -> Foreign Scan on ftprt1_p2 t1_2 > -> Materialize > -> Append > -> Foreign Scan on ftprt2_p1 t2_1 > -> Foreign Scan on ftprt2_p2 t2_2 > -(10 rows) > +(12 rows) > > SELECT t1.a, t2.b FROM fprt1 t1 INNER JOIN fprt2 t2 ON (t1.a = t2.b) WHERE > t1.a % 25 = 0 ORDER BY 1,2 FOR UPDATE OF t1; >a | b > diff --git a/src/backend/optimizer/path/allpaths.c > b/src/backend/optimizer/path/allpaths.c > index 4895cee9944..827bc469269 100644 > --- a/src/backend/optimizer/path/allpaths.c > +++ b/src/backend/optimizer/path/allpaths.c > @@ -1845,6 +1845,27 @@ generate_orderedappend_paths(PlannerInfo *root, > RelOptInfo *rel, > /* Assert we do have an unparameterized path > for this child */ > Assert(cheapest_total->param_info == NULL); > } > + else > + { > + /* > + * Even if we found necessary pathkeys, using > unsorted path > + * can be more efficient. > + */ > + Pathsort_path; > + > + cost_sort(&sort_path, > + root, > + pathkeys, > + > childrel->cheapest_total_path->total_cost, > + > childrel->cheapest_total_path->rows, > + > childrel->cheapest_total_path->pathtarget->width, > + 0.0, > + work_mem, > + -1.0 /* need all tuples to > sort them */ ); > + > + if (compare_path_costs(&sort_path, > cheapest_total, TOTAL_COST) < 0) > + cheapest_total = > childrel->cheapest_total_path; > + } > > /* >* When building a fractional path, determine a cheapest > diff --git a/src/test/regress/expected/inherit.out > b/src/test/regress/expected/inherit.out > index ad732134148..16e78c8d2ff 100644 > --- a/src/test/regress/expected/inherit.out > +++ b/src/test/regress/expected/inherit.out > @@ -1555,6 +1555,7 @@ insert into matest2 (name) values ('Test 4'); > insert into matest3 (name) values ('Test 5'); > insert into matest3 (name) values ('Test 6'); > set enable_indexscan = off; -- force use of seqscan/sort, so no merge > +set enable_sort = off; -- avoid sorting below MergeAppend > explain (verbose, cost
Re: Considering fractional paths in Append node
Nikita Malakhov writes: Helll Nikita, > Hi hackers! > > Sorry, I've forgot to attach the patch itself. Please check it out. Could you check if [1] is related to this subject? I think the hard part would be which tuple_fraction to use during adding add_paths_to_append_rel since root->tuple_fraction is on subquery level, while the add_paths_to_append_rel is only on RelOptInfo level. [1] https://www.postgresql.org/message-id/CAApHDvry0nSV62kAOH3iccvfPhGPLN0Q97%2B%3Db1RsDPXDz3%3DCiQ%40mail.gmail.com -- Best Regards Andy Fan
Re: Popcount optimization using AVX512
On Tue, Oct 08, 2024 at 09:36:03PM -0500, Nathan Bossart wrote: > On Wed, Jul 31, 2024 at 04:43:02PM -0500, Nathan Bossart wrote: >> On Wed, Jul 31, 2024 at 01:52:54PM -0700, Andres Freund wrote: >>> I think we'd be better off enabling architectural features on a per-function >>> basis, roughly like this: >>> >>> [...] >>> >>> /* FIXME: Should be gated by configure check of -mavx512vpopcntdq >>> -mavx512bw support */ >>> pg_enable_target("avx512vpopcntdq,avx512bw") >>> uint64_t >>> pg_popcount_avx512(const char *buf, int bytes) >>> ... >> >> I remember wondering why the CRC-32C code wasn't already doing something >> like this (old compiler versions? non-gcc-like compilers?), and I'm not >> sure I ever discovered the reason, so out of an abundance of caution I used >> the same approach for AVX-512. If we can convince ourselves that >> __attribute__((target("..."))) is standard enough at this point, +1 for >> moving to that. > > [...] > > So, at least for the CRC code, __attribute__((target("..."))) was probably > not widely available enough yet when it was first added. Unfortunately, > the ARMv8 CRC target support (without -march) is still pretty new, but it > might be possible to switch the others to a per-function approach in v18. Here is a first attempt at using __attribute__((target("..."))) for the AVX-512 stuff. Besides allowing us to consolidate the code into a single file, this simplifies the build file changes quite a bit. -- nathan >From c97e25e56347c90f169a5ce069a9ea06c873915b Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 16 Oct 2024 15:57:55 -0500 Subject: [PATCH v1 1/1] use __attribute__((target(...))) for AVX-512 stuff --- config/c-compiler.m4 | 60 +- configure| 163 ++- configure.ac | 17 +-- meson.build | 17 +-- src/Makefile.global.in | 5 - src/include/c.h | 10 ++ src/makefiles/meson.build| 4 +- src/port/Makefile| 12 +- src/port/meson.build | 7 +- src/port/pg_popcount_avx512.c| 86 +- src/port/pg_popcount_avx512_choose.c | 102 - 11 files changed, 171 insertions(+), 312 deletions(-) delete mode 100644 src/port/pg_popcount_avx512_choose.c diff --git a/config/c-compiler.m4 b/config/c-compiler.m4 index 10f8c7bd0a..aa90f8ef33 100644 --- a/config/c-compiler.m4 +++ b/config/c-compiler.m4 @@ -700,20 +700,20 @@ undefine([Ac_cachevar])dnl # Check if the compiler supports the XSAVE instructions using the _xgetbv # intrinsic function. # -# An optional compiler flag can be passed as argument (e.g., -mxsave). If the -# intrinsic is supported, sets pgac_xsave_intrinsics and CFLAGS_XSAVE. +# If the intrinsics are supported, sets pgac_xsave_intrinsics. AC_DEFUN([PGAC_XSAVE_INTRINSICS], -[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics_$1])])dnl -AC_CACHE_CHECK([for _xgetbv with CFLAGS=$1], [Ac_cachevar], -[pgac_save_CFLAGS=$CFLAGS -CFLAGS="$pgac_save_CFLAGS $1" -AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], - [return _xgetbv(0) & 0xe0;])], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_xsave_intrinsics])])dnl +AC_CACHE_CHECK([for _xgetbv], [Ac_cachevar], +[AC_LINK_IFELSE([AC_LANG_PROGRAM([#include +__attribute__((target("xsave"))) +static int xsave_test(void) +{ + return _xgetbv(0) & 0xe0; +}], + [return xsave_test();])], [Ac_cachevar=yes], - [Ac_cachevar=no]) -CFLAGS="$pgac_save_CFLAGS"]) + [Ac_cachevar=no])]) if test x"$Ac_cachevar" = x"yes"; then - CFLAGS_XSAVE="$1" pgac_xsave_intrinsics=yes fi undefine([Ac_cachevar])dnl @@ -725,29 +725,27 @@ undefine([Ac_cachevar])dnl # _mm512_setzero_si512, _mm512_maskz_loadu_epi8, _mm512_popcnt_epi64, # _mm512_add_epi64, and _mm512_reduce_add_epi64 intrinsic functions. # -# Optional compiler flags can be passed as argument (e.g., -mavx512vpopcntdq -# -mavx512bw). If the intrinsics are supported, sets -# pgac_avx512_popcnt_intrinsics and CFLAGS_POPCNT. +# If the intrinsics are supported, sets pgac_avx512_popcnt_intrinsics. AC_DEFUN([PGAC_AVX512_POPCNT_INTRINSICS], -[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics_$1])])dnl -AC_CACHE_CHECK([for _mm512_popcnt_epi64 with CFLAGS=$1], [Ac_cachevar], -[pgac_save_CFLAGS=$CFLAGS -CFLAGS="$pgac_save_CFLAGS $1" -AC_LINK_IFELSE([AC_LANG_PROGRAM([#include ], - [const char buf@<:@sizeof(__m512i)@:>@; - PG_INT64_TYPE popcnt = 0; - __m512i accum = _mm512_setzero_si512(); - const __m512i val = _mm512_maskz_loadu_epi8((__mmask64) 0xf0f0f0f0f0f0f0f0, (const __m512i *) buf); - const __m512i cnt = _mm512_popcnt_epi64(val); - accum = _mm512_add_epi64(accum, cnt); - popcnt = _mm512_reduce_add_epi64(accum); - /* return computed value, to prevent the above being optimized away */ - return popcnt == 0;])], +[define([Ac_cachevar], [AS_TR_SH([pgac_cv_avx512_popcnt_intrinsics]
Re: Limiting overshoot in nbtree's parallel SAOP index scans
On Wed, 16 Oct 2024 at 20:52, Peter Geoghegan wrote: > > On Fri, Oct 11, 2024 at 10:27 AM Matthias van de Meent > wrote: > > With the introduction of the new SAOP handling in PG17, however, the > > shared state has become a bit more muddied. Because the default has > > changed from "stop scanning at the end of a SAOP value's range" to > > "continue scanning, unless ...", and because it is relatively > > expensive to determine whether we need to stop or continue we release > > the parallel scan to continue before we know if a new primitive scan > > would be preferred. > > It's not just relatively expensive. It's essential that _bt_readpage > be able to release the parallel scan at the earliest opportunity (just > as soon as the next sibling link is known). Why is this essential? AFAIK, that's only for performance reasons - and it's also only for performance reasons that we start a new primitive scan, so in my view there is a trade-off between releasing the scan early (and making better use of parallel resources) and scheduling a new primitive scan (to reduce overall resource usage). > Whereas a parallel backend > worker will only decide that it's likely a good idea to do another > primitive index scan when _bt_readpage has finished executing. Correct. > Clearly it would not be sensible to wait until _bt_readpage had gotten > as far as deciding on the need for another descent of the index. To do > so would essentially serialize the scan. There are clear advantages to > doing as little coordination as possible among backends. (I don't > think that you disagree with any of this, but just want to establish > context for anybody else that's reading along -- you seem to want to > do some limited/conditional form of this behavior.) In general, yes, but if we see strong signals that it may be worth the wait, then why not? > > Problem > > --- > > The new parallel index scan code can only get into a new primitive > > scan IFF no parallel worker has taken the next block in the time > > between the call to _bt_parallel_release at the start of _bt_readpage, > > and the call to _bt_parallel_primscan_schedule through _bt_checkkeys > > slightly later in _bt_readpage. This makes the code susceptible to > > race conditions > > Do you have a test case? Yes, see below. > I was aware of the issues in this area when designing the > _bt_parallel_primscan_schedule mechanism. I believe it's true that the > scan is only strictly guaranteed to be able to make forward progress > via naively scanning the next leaf page, again and again. This is a > fairly theoretical point, though. > > I believe that it's practically impossible for this to cause us > problems. That would require a group of backends that all want to > start another primitive index scan, but each block each other out, > constantly. Not just once, mind you -- again and again (enough to be > noticeable). This would have to happen in spite of the fact that we > only release one backend per _bt_parallel_release call > (_bt_parallel_release uses ConditionVariableSignal under the hood, > which only releases the oldest worker waiting on the CV, not all > backends). I may not have communicated this well enough, but what happens in the topical issue is as follows: 0. backend 1 has seized the scan; backend 2 is now waiting for another page to process. 1. backend 1: _release 2. backend 1: start _readpage 3. backend 2: wake up 4. backend 2: _seize 5. backend 1: finish _readpage; 6. backend 1: _primscan_schedule -> fail to schedule. ... swap roles of backend 1 and 2, and restart at 0. This is easy to replicate with somewhat expensive compare operator, one which take enough CPU time so that a signaled backend can wake up and seize the scan before a page is processed in _readpage. Because the pages contain no interesting tuples, the scan infrastructure won't ever need to visit HEAP VM pages or return any data from the scan in this overshoot case, and thus won't easily get into a state that might stop seizing the scan for long enough that _primscan_schedule actually does something - after _readpage finds out a new primitive scan is needed, it immediately returns and is ready to _seize the scan again. A test case for this: Schema and data population: CREATE TABLE test (a int, b bigint); INSERT INTO test (a, b) SELECT i % 101, i % 1001 FROM generate_series(1, 1000) i; CREATE INDEX ON test (a, b); DELETE FROM test WHERE a = 0 AND b = 1000; DELETE FROM test WHERE a = 100 AND b = 0; VACUUM (FREEZE, INDEX_CLEANUP on, ANALYZE) test; This gets you an index of 9239 blocks. Config to force actual parallel scan (note: debug_parallel_query doesn't exhibit the issue due to single-worker processing of data :-( ): SET parallel_setup_cost = 0; SET parallel_tuple_cost = 0; SET min_parallel_table_scan_size = 1; SET min_parallel_index_scan_size = 1; Test query: SELECT count(*) FROM test WHERE a IN (0, 100) AND b IN (0, 1000); -- count = 196 In v17 and the master branch you'll no
Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
> On Oct 16, 2024, at 2:15 PM, Shayon Mukherjee wrote: > > >> On Oct 16, 2024, at 12:19 PM, Shayon Mukherjee wrote: >> >> - ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index >> catalog to protect against indcheckxmin [2] (older unrelated thread). > > Performing the in place update of the pg_index row from > ATExecEnableDisableIndex using systable_inplace_update_begin was failing in > CI weirdly but not on my local MacBook when running spec suite. I am also > wondering what is causing the requirement [1] to update the row in-place vs > say using CatalogTupleUpdate since using the later is working well locally + > CI? > I believe I somewhat understand why the issue might be occurring, where CI is failing the create_index regression test and crashing (signal 6 in postmaster logs). I suspect it might be due to a segmentation fault or a similar issue. IsInplaceUpdateOid is used in systable_inplace_update_begin (recently introduced), but it doesn’t yet support pg_index. Based on check_lock_if_inplace_updateable_rel in heapam.c and IsInplaceUpdateOid in catalog.c, introducing support for in-place updates via this new mechanism might not be straightforward for pg_index. It would require updating the callers to handle in-place updates and locks, I presume? I did confirm that, based on the v2 PATCH, when not doing in-place updates on pg_index - it means that the xmin for pg_index would increment. I suppose there’s a risk of indcheckxmin being marked as TRUE when xmin exceeds, potentially causing an index to be accidentally skipped ? (I am not 100% sure) I'll take some time to think this through and familiarize myself with the new systable_inplace_update_begin. In the meantime, aside from the in-place updates on pg_index, I would love to learn any first impressions or feedback on the patch folks may have. Thank you, Shayon
Re: Using per-transaction memory contexts for storing decoded tuples
On Tue, Oct 15, 2024 at 9:01 PM Amit Kapila wrote: > > On Tue, Oct 15, 2024 at 11:15 PM Masahiko Sawada > wrote: > > > > On Sun, Oct 13, 2024 at 11:00 PM Amit Kapila > > wrote: > > > > > > On Fri, Oct 11, 2024 at 3:40 AM Masahiko Sawada > > > wrote: > > > > > > > > Please find the attached patches. > > > > > > > > Thank you for reviewing the patch! > > > > > > > > @@ -343,9 +343,9 @@ ReorderBufferAllocate(void) > > > */ > > > buffer->tup_context = GenerationContextCreate(new_ctx, > > > "Tuples", > > > - SLAB_LARGE_BLOCK_SIZE, > > > - SLAB_LARGE_BLOCK_SIZE, > > > - SLAB_LARGE_BLOCK_SIZE); > > > + SLAB_DEFAULT_BLOCK_SIZE, > > > + SLAB_DEFAULT_BLOCK_SIZE, > > > + SLAB_DEFAULT_BLOCK_SIZE); > > > > > > Shouldn't we change the comment atop this change [1] which states that > > > we should benchmark the existing values? > > > > Agreed. > > > > Can we slightly tweak the comments as follows so that it doesn't sound > like a fix for a bug? > > /* To minimize memory fragmentation caused by long-running > transactions with changes spanning multiple memory blocks, we use a > single fixed-size memory block for decoded tuple storage. The tests > showed that the default memory block size maintains logical decoding > performance without causing fragmentation due to concurrent > transactions. One might think that we can use the max size as > SLAB_LARGE_BLOCK_SIZE but the test also showed it doesn't help resolve > the memory fragmentation. Agreed. I've incorporated your comment in the latest patches. I'm going to push them today, barring any objections or further comments. > Other than that the changes in the patch look good to me. Note that I > haven't tested the patch by myself but the test results shown by you > and others in the thread seem sufficient to proceed with this change. Understood. Thank you for the discussion and your help reviewing the patch. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com REL16_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data REL15_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data REL14_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data master_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data REL17_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data REL12_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data REL13_v3-0001-Reduce-memory-block-size-for-decoded-tuple-storag.patch Description: Binary data
Failing assertion in predicate.c
Hi pg-hackers, I am encountering an assertion failure in predicate.c when running a high volume of short serializable transactions. The error occurs during stress testing with many concurrent connections. I have replicated this issue on two separate macOS M1 systems, but I have not been able to reproduce it on Linux. The issue arises in both PostgreSQL 16 and 17. Interestingly, introducing a delay of just 1 ms between statements seems to prevent the assertion from failing. The specific failure I encounter is: ``` TRAP: failed Assert("TransactionIdIsValid(tailXid)"), File: "predicate.c", Line: 885, PID: 1350 ``` For anyone interested, I have created a reproduction program available here: https://github.com/yrashk/pg-predicate-tailxid-assertion-issue. The program establishes a connection pool and issues simple INSERT statements, each wrapped in its own serializable transaction. I would appreciate any insights into this issue or potential steps for further debugging. -- Founder at Omnigres https://omnigres.com
Re: Limiting overshoot in nbtree's parallel SAOP index scans
On Wed, Oct 16, 2024 at 5:48 PM Matthias van de Meent wrote: > In v17 and the master branch you'll note 16 buffer hits for the test > query. However, when we use more expensive btree compare operations > (e.g. by adding pg_usleep(1) to both btint8cmp and btint4cmp), the > buffer access count starts to vary a lot and skyrockets to 30+ on my > machine, in some cases reaching >100 buffer hits. After applying my > patch, the buffer access count is capped to a much more agreeable > 16-18 hits - it still shows signs of overshooting the serial bounds, > but the number of buffers we overshoot our target is capped and thus > significantly lower. It's not exactly capped, though. Since in any case you're always prone to getting extra leaf page reads at the end of each primitive index scan. That's not something that's new to Postgres 17, though. Anyway, I'm still not convinced. Your test case requires adding a one second delay to each ORDER proc comparison, and so has an unrealistic adversarial character. It uses an index-only scan that is drastically faster if we don't use a parallel scan at all. The serial case is 0.046 ms for me, whereas the parallel case is 3.094 ms (obviously that's without the addition of a 1 second delay). You've thrown everything but the kitchen sink at the issue, and yet the impact on buffer hits really isn't too bad. Does anybody else have an opinion on this? -- Peter Geoghegan
Re: ActiveState Perl is not valid anymore to build PG17 on the Windows 10/11 platforms, So Documentation still suggesting it should be updated
On Tue, Jul 16, 2024 at 08:23:11AM -0400, Andrew Dunstan wrote: > > On 2024-07-16 Tu 7:46 AM, Yasir wrote: > > Hi Hackers, > > Recently, I compiled PG17 on the windows. Till PG16 "ActiveState Perl", as > instructed in the documentation, was being used successfully on the > Windows > 10/11 to compile PG. > However, it looks like that "ActiveState Perl" is not valid anymore to > compile PG17 on Windows 10/11 but documentation still suggests it. > Therefore, I think documentation needs to be updated. > Moreover, I had to install "strawberry's perl" in order to compile PG17 on > Windows 10/11. Please check out the thread "errors building on windows > using meson" highlighting the issue. > > > > See https://postgr.es/m/4acddcd4-1c08-44e7-ba60-cab102259...@dunslane.net > > I agree we should fix the docco. I have written the attached patch to make these changes. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 3a491b59896..0e16bb01d85 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -3809,14 +3809,12 @@ make: *** [postgres] Error 1 - ActiveState Perl + Strawberry Perl -ActiveState Perl is required to run the build generation scripts. MinGW +Strawberry Perl is required to run the build generation scripts. MinGW or Cygwin Perl will not work. It must also be present in the PATH. Binaries can be downloaded from -https://www.activestate.com";> -(Note: version 5.14 or later is required, -the free Standard Distribution is sufficient). +https://strawberryperl.com";>. @@ -3868,10 +3866,9 @@ make: *** [postgres] Error 1 - ActiveState Tcl + Magicsplat Tcl -Required for building PL/Tcl (Note: version -8.4 is required, the free Standard Distribution is sufficient). +Required for building PL/Tcl.
Re: DOCS - pg_replication_slot . Fix the 'inactive_since' description
On Mon, Sep 9, 2024 at 01:15:32PM +1000, Peter Smith wrote: > On Mon, Sep 9, 2024 at 12:20 PM David G. Johnston > wrote: > > > > > > > > On Sun, Sep 8, 2024, 18:55 Peter Smith wrote: > >> > >> Saying "The time..." is fine, but the suggestions given seem backwards to > >> me: > >> - The time this slot was inactivated > >> - The time when the slot became inactive. > >> - The time when the slot was deactivated. > >> > >> e.g. It is not like light switch. So, there is no moment when the > >> active slot "became inactive" or "was deactivated". > > > > > > While this is plausible the existing wording and the name of the field > > definitely fail to convey this. > > > >> > >> Rather, the 'inactive_since' timestamp field is simply: > >> - The time the slot was last active. > >> - The last time the slot was active. > > > > > > I see your point but that wording is also quite confusing when an active > > slot returns null for this field. > > > > At this point I'm confused enough to need whatever wording is taken to be > > supported by someone explaining the code that interacts with this field. > > > > Me too. I created this thread primarily to get the description changed > to clarify this field represents a moment in time, rather than a > duration. So I will be happy with any wording that addresses that. I dug into the code and came up with the attached patch. "active" means there is a process streaming the slot, and the "inactive_since" time means the last time synchronous slot streaming was stopped. Doc patch attached. I am not sure what other things are needed, but this is certainly unclear. This comment from src/backend/replication/logical/slotsync.c helped me understand this: * We need to update inactive_since only when we are promoting standby to * correctly interpret the inactive_since if the standby gets promoted * without a restart. We don't want the slots to appear inactive for a * long time after promotion if they haven't been synchronized recently. * Whoever acquires the slot, i.e., makes the slot active, will reset it. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?" diff --git a/doc/src/sgml/system-views.sgml b/doc/src/sgml/system-views.sgml index 61d28e701f2..934104e1bc9 100644 --- a/doc/src/sgml/system-views.sgml +++ b/doc/src/sgml/system-views.sgml @@ -2435,7 +2435,7 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx active bool - True if this slot is currently actively being used + True if this slot is currently currently being streamed @@ -2444,9 +2444,8 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx active_pid int4 - The process ID of the session using this slot if the slot - is currently actively being used. NULL if - inactive. + The process ID of the session streaming data for this slot. + NULL if inactive. @@ -2566,15 +2565,13 @@ SELECT * FROM pg_locks pl LEFT JOIN pg_prepared_xacts ppx inactive_since timestamptz -The time since the slot has become inactive. -NULL if the slot is currently being used. -Note that for slots on the standby that are being synced from a -primary server (whose synced field is -true), the -inactive_since indicates the last -synchronization (see -) -time. +The time slot synchronization (see ) +was most recently stopped. NULL if the slot +has always been synchronized. This is useful for slots on the +standby that are intended to be synced from a primary server (whose +synced field is true) +so they know when the slot stopped being synchronized. diff --git a/src/backend/replication/logical/slotsync.c b/src/backend/replication/logical/slotsync.c index f9649eec1a5..cbd8c00aa3f 100644 --- a/src/backend/replication/logical/slotsync.c +++ b/src/backend/replication/logical/slotsync.c @@ -1517,7 +1517,7 @@ update_synced_slots_inactive_since(void) * correctly interpret the inactive_since if the standby gets promoted * without a restart. We don't want the slots to appear inactive for a * long time after promotion if they haven't been synchronized recently. - * Whoever acquires the slot i.e.makes the slot active will reset it. + * Whoever acquires the slot, i.e., makes the slot active, will reset it. */ if (!StandbyMode) return; diff --git a/src/include/replication/slot.h b/src/include/replication/slot.h index 45582cf9d89..e3e0816bcd3 100644 --- a/src/include/replication/slot.h +++ b/src/include/replication/slot.h @@ -205,7 +205,7 @@ typedef struct ReplicationSlot */ XLogRecPtr last_saved_confirmed_flush; - /* The time since the slot has become inactiv
Re: Avoiding superfluous buffer locking during nbtree backwards scans
On Fri, Oct 11, 2024 at 7:29 PM Peter Geoghegan wrote: > Attached is v5 Now I'm attaching a v6, which further polishes things. Current plan is to commit something very close to this in the next day or two. v6 is mostly just further comment polishing. But it also merges together the two existing _bt_readnextpage loops (for forward and backward scan directions) into one single loop that handles everything. This is definitely a win for brevity, and might also be a win for clarity -- but I'm not 100% sure which way I prefer it just yet. I'll need to make a final decision on this loop merging business before committing. Anybody else have an opinion, either way? -- Peter Geoghegan From 17959762e34cf8ee75af37f98dc5b085d05c7860 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Tue, 8 Oct 2024 11:40:54 -0400 Subject: [PATCH v6] Optimize and simplify nbtree backwards scans. This enhancement completely supersedes the one added by commit 3f44959f. Author: Matthias van de Meent Author: Peter Geoghegan Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com --- src/include/access/nbtree.h | 23 +- src/backend/access/nbtree/nbtree.c| 77 ++- src/backend/access/nbtree/nbtsearch.c | 744 -- src/backend/access/nbtree/nbtutils.c | 2 +- 4 files changed, 401 insertions(+), 445 deletions(-) diff --git a/src/include/access/nbtree.h b/src/include/access/nbtree.h index 90a68375a..22bf38934 100644 --- a/src/include/access/nbtree.h +++ b/src/include/access/nbtree.h @@ -954,6 +954,7 @@ typedef struct BTScanPosData XLogRecPtr lsn; /* pos in the WAL stream when page was read */ BlockNumber currPage; /* page referenced by items array */ + BlockNumber prevPage; /* page's left link when we scanned it */ BlockNumber nextPage; /* page's right link when we scanned it */ /* @@ -965,15 +966,6 @@ typedef struct BTScanPosData bool moreLeft; bool moreRight; - /* - * Direction of the scan at the time that _bt_readpage was called. - * - * Used by btrestrpos to "restore" the scan's array keys by resetting each - * array to its first element's value (first in this scan direction). This - * avoids the need to directly track the array keys in btmarkpos. - */ - ScanDirection dir; - /* * If we are doing an index-only scan, nextTupleOffset is the first free * location in the associated tuple storage workspace. @@ -1022,6 +1014,7 @@ typedef BTScanPosData *BTScanPos; #define BTScanPosInvalidate(scanpos) \ do { \ (scanpos).currPage = InvalidBlockNumber; \ + (scanpos).prevPage = InvalidBlockNumber; \ (scanpos).nextPage = InvalidBlockNumber; \ (scanpos).buf = InvalidBuffer; \ (scanpos).lsn = InvalidXLogRecPtr; \ @@ -1073,6 +1066,7 @@ typedef struct BTScanOpaqueData * markPos. */ int markItemIndex; /* itemIndex, or -1 if not valid */ + ScanDirection markDir; /* direction of scan with mark */ /* keep these last in struct for efficiency */ BTScanPosData currPos; /* current position data */ @@ -1091,7 +1085,6 @@ typedef struct BTReadPageState OffsetNumber minoff; /* Lowest non-pivot tuple's offset */ OffsetNumber maxoff; /* Highest non-pivot tuple's offset */ IndexTuple finaltup; /* Needed by scans with array keys */ - BlockNumber prev_scan_page; /* previous _bt_parallel_release block */ Page page; /* Page being read */ /* Per-tuple input parameters, set by _bt_readpage for _bt_checkkeys */ @@ -1192,12 +1185,14 @@ extern int btgettreeheight(Relation rel); /* * prototypes for internal functions in nbtree.c */ -extern bool _bt_parallel_seize(IndexScanDesc scan, BlockNumber *pageno, - bool first); -extern void _bt_parallel_release(IndexScanDesc scan, BlockNumber scan_page); +extern bool _bt_parallel_seize(IndexScanDesc scan, BlockNumber *next_scan_page, + BlockNumber *curr_page, bool first); +extern void _bt_parallel_release(IndexScanDesc scan, + BlockNumber next_scan_page, + BlockNumber curr_page); extern void _bt_parallel_done(IndexScanDesc scan); extern void _bt_parallel_primscan_schedule(IndexScanDesc scan, - BlockNumber prev_scan_page); + BlockNumber curr_page); /* * prototypes for functions in nbtdedup.c diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c index 9b968aa0f..ff4fa2cf9 100644 --- a/src/backend/access/nbtree/nbtree.c +++ b/src/backend/access/nbtree/nbtree.c @@ -66,7 +66,9 @@ typedef enum */ typedef struct BTParallelScanDescData { - BlockNumber btps_scanPage; /* latest or next page to be scanned */ + BlockNumber btps_nextScanPage; /* next page to be scanned */ + BlockNumber btps_lastCurrPage; /* page whose sibling link was copied into + * btps_scanPage */ BTPS_State btps_pageStatus; /* indicates whether next page is * available for scan.
Re: Pgoutput not capturing the generated columns
On Wed, Oct 9, 2024 at 11:52 AM vignesh C wrote: > > On Tue, 8 Oct 2024 at 11:37, Shubham Khanna > wrote: > > > > On Fri, Oct 4, 2024 at 9:36 AM Peter Smith wrote: > > > > > > Hi Shubham, here are my review comments for v36-0001. > > > > > > == > > > 1. General - merge patches > > > > > > It is long past due when patches 0001 and 0002 should've been merged. > > > AFAIK the split was only because historically these parts had > > > different authors. But, keeping them separated is not helpful anymore. > > > > > > == > > > src/backend/catalog/pg_publication.c > > > > > > 2. > > > Bitmapset * > > > -pub_collist_validate(Relation targetrel, List *columns) > > > +pub_collist_validate(Relation targetrel, List *columns, bool pubgencols) > > > > > > Since you removed the WARNING, this parameter 'pubgencols' is unused > > > so it should also be removed. > > > > > > == > > > src/backend/replication/pgoutput/pgoutput.c > > > > > > 3. > > > /* > > > - * If the publication is FOR ALL TABLES then it is treated the same as > > > - * if there are no column lists (even if other publications have a > > > - * list). > > > + * To handle cases where the publish_generated_columns option isn't > > > + * specified for all tables in a publication, we must create a column > > > + * list that excludes generated columns. So, the publisher will not > > > + * replicate the generated columns. > > > */ > > > - if (!pub->alltables) > > > + if (!(pub->alltables && pub->pubgencols)) > > > > > > I still found that comment hard to understand. Does this mean to say > > > something like: > > > > > > -- > > > Process potential column lists for the following cases: > > > > > > a. Any publication that is not FOR ALL TABLES. > > > > > > b. When the publication is FOR ALL TABLES and > > > 'publish_generated_columns' is false. > > > A FOR ALL TABLES publication doesn't have user-defined column lists, > > > so all columns will be replicated by default. However, if > > > 'publish_generated_columns' is set to false, column lists must still > > > be created to exclude any generated columns from being published > > > -- > > > > > > == > > > src/test/regress/sql/publication.sql > > > > > > 4. > > > +SET client_min_messages = 'WARNING'; > > > +CREATE TABLE gencols (a int, gen1 int GENERATED ALWAYS AS (a * 2) > > > STORED); > > > > > > AFAIK you don't need to keep changing 'client_min_messages', > > > particularly now that you've removed the WARNING message that was > > > previously emitted. > > > > > > ~ > > > > > > 5. > > > nit - minor comment changes. > > > > > > == > > > Please refer to the attachment which implements any nits from above. > > > > > > > I have fixed all the given comments. Also, I have created a new 0003 > > patch for the TAP-Tests related to the '011_generated.pl' file. I am > > planning to merge 0001 and 0003 patches once they will get fixed. > > The attached patches contain the required changes. > > Few comments: > 1) I felt this change need not be part of this patch, if required it > can be proposed as a separate patch: > + if (server_version >= 15) > { > WalRcvExecResult *pubres; > TupleTableSlot *tslot; > Oid attrsRow[] = {INT2VECTOROID}; > - StringInfoData pub_names; > - > - initStringInfo(&pub_names); > - foreach(lc, MySubscription->publications) > - { > - if (foreach_current_index(lc) > 0) > - appendStringInfoString(&pub_names, ", "); > - appendStringInfoString(&pub_names, > quote_literal_cstr(strVal(lfirst(lc; > - } > + StringInfo pub_names = makeStringInfo(); > > 2) These two statements can be combined in to single appendStringInfo: > + appendStringInfo(&cmd, > " FROM pg_catalog.pg_attribute a" > " LEFT JOIN pg_catalog.pg_index i" > " ON (i.indexrelid = > pg_get_replica_identity_index(%u))" > " WHERE a.attnum > > 0::pg_catalog.int2" > -" AND NOT a.attisdropped %s" > +" AND NOT a.attisdropped", > lrel->remoteid); > + > + appendStringInfo(&cmd, > " AND a.attrelid = %u" > " ORDER BY a.attnum", > -lrel->remoteid, > - > (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ? > - "AND a.attgenerated = ''" : ""), > lrel->remoteid); > > 3) In which scenario this will be hit: > + /* > +* Construct column list for COPY, excluding columns that are > subscription > +
Re: Pgoutput not capturing the generated columns
On Thu, Oct 10, 2024 at 10:53 AM Peter Smith wrote: > > Here are some comments for TAP test patch v37-0003. > > I’m not in favour of the removal of such a large number of > 'combination' and other 'misc' tests. In the commit message, please > delete me as a "co-author" of this patch. > > == > > 1. > Any description or comment that still mentions "all combinations" is > no longer valid: > > (e.g. in the comment message) > Add tests for all combinations of generated column replication. > > (e.g. in the test file) > # The following test cases exercise logical replication for all combinations > # where there is a generated column on one or both sides of pub/sub: > > and > > # Furthermore, all combinations are tested using: > > == > 2. > +# -- > +# Testcase: generated -> normal > +# Publisher table has generated column 'b'. > +# Subscriber table has normal column 'b'. > +# -- > + > > Now that COPY for generated columns is already implemented in patch > 0001, shouldn't this test be using 'copy_data' enabled, so it can test > replication both for initial tablesync as well as normal replication? > > That was the whole point of having the "# XXX copy_data=false for now. > This will be changed later." reminder comment in this file. > > == > > 3. > Previously there were some misc tests to ensure that a generated > column which was then altered using DROP EXPRESSION would work as > expected. The test scenario was commented like: > > +# > = > +# Misc test. > +# > +# A "normal -> generated" replication fails, reporting an error that the > +# subscriber side column is missing. > +# > +# In this test case we use DROP EXPRESSION to change the subscriber generated > +# column into a normal column, then verify replication works ok. > +# > = > > Now in patch v37 this test no longer exists. Why? > > == > 4. > +# > = > +# The following test cases demonstrate behavior of generated column > replication > +# when publish_generated_colums=false/true: > +# > +# Test: column list includes gencols, when publish_generated_columns=false > +# Test: column list does not include gencols, when > publish_generated_columns=false > +# > +# Test: column list includes gencols, when publish_generated_columns=true > +# Test: column list does not include gencols, when > publish_generated_columns=true > +# Test: no column list, when publish_generated_columns=true > +# > = > > These tests are currently only testing the initial tablesync > replication. Since the COPY logic is different from the normal > replication logic, I think it would be better to test some normal > replication records as well, to make sure both parts work > consistently. This comment applies to all of the following test cases. > > ~~~ > > 5. > +# Create table and publications. > +$node_publisher->safe_psql( > + 'postgres', qq( > + CREATE TABLE nogen_to_gen3 (a int, b int, gen1 int GENERATED ALWAYS > AS (a * 2) STORED, gen2 int GENERATED ALWAYS AS (a * 2) STORED); > + CREATE TABLE nogen_to_gen4 (c int, d int, gen1 int GENERATED ALWAYS > AS (c * 2) STORED, gen2 int GENERATED ALWAYS AS (c * 2) STORED); > + INSERT INTO nogen_to_gen3 VALUES (1, 1); > + INSERT INTO nogen_to_gen4 VALUES (1, 1); > + CREATE PUBLICATION pub1 FOR table nogen_to_gen3, nogen_to_gen4(gen1) > WITH (publish_generated_columns=true); > +)); > + > > 5a. > The code should do only what the comments say it does. So, the INSERTS > should be done separately after the CREATE PUBLICATION, but before the > CREATE SUBSCRIPTION. A similar change should be made for all of these > test cases. > > # Insert some initial data > INSERT INTO nogen_to_gen3 VALUES (1, 1); > INSERT INTO nogen_to_gen4 VALUES (1, 1); > > ~ > > 5b. > The tables are badly named. Why are they 'nogen_to_gen', when the > publisher side has generated cols and the subscriber side does not? > This problem seems repeated in multiple subsequent test cases. > > ~ > > 6. > +$result = $node_subscriber->safe_psql('postgres', > + "SELECT * FROM gen_to_nogen ORDER BY a"); > +is($result, qq(1|1||2), > + 'gen_to_nogen initial sync, when publish_generated_columns=false'); > + > +$result = $node_subscriber->safe_psql('postgres', > + "SELECT * FROM gen_to_nogen2 ORDER BY c"); > +is($result, qq(1|1||), > + 'gen_to_nogen2 initial sync, when publish_generated_columns=false'); > > IMO all the "result" queries like these ones ought to have to have a > comment which explains the reason for the expected results. This > review comment applies to multiple places. Please add comments to all > of them. > > ~~~ > > 7. > +# -
Re: optimize hashjoin
On Fri, Aug 23, 2024 at 08:17:26AM -0400, Robert Haas wrote: > On Fri, Aug 23, 2024 at 7:02 AM bucoo wrote: > > Howerver, the non-parallel hashjoin indeed showed about a 10% performance > > improvement. > >-> Hash Join (cost=508496.00..2302429.31 rows=47989008 width=0) > > (actual time=1075.213..9503.727 rows=47989007 loops=1) > >-> Hash Join (cost=508496.00..2302429.31 rows=47989008 width=0) > > (actual time=1087.588..8726.441 rows=47989007 loops=1) > > It's not a good idea to test performance with EXPLAIN ANALYZE, > generally speaking. And you usually need to test a few times and > average or something, rather than just a single test. But also, this > doesn't show the hash join being 10% faster. It shows the hash join > being essentially the same speed (1075ms unpatched, 1087ms patched), > and the aggregate node on top of it being faster. > > Now, it does seem possible to me that changing one node could cause a > performance improvement for the node above it, but I don't quite see > why that would happen in this case. Where are we on this patch? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com When a patient asks the doctor, "Am I going to die?", he means "Am I going to die soon?"
Re: Proposal to Enable/Disable Index using ALTER INDEX (with patch)
On Oct 16, 2024, at 12:19 PM, Shayon Mukherjee wrote:- ALTER INDEX ... ENABLE/DISABLE performs an in-place update of the pg_index catalog to protect against indcheckxmin [2] (older unrelated thread).Performing the in place update of the pg_index row from ATExecEnableDisableIndex using systable_inplace_update_begin was failing in CI weirdly but not on my local MacBook when running spec suite. I am also wondering what is causing the requirement [1] to update the row in-place vs say using CatalogTupleUpdate since using the later is working well locally + CI? I have attached a v2 patch (following from the last v1 patch [1]) that uses CatalogTupleUpdate and local + CI [2] is passing. [1] https://www.postgresql.org/message-id/20180618215635.m5vrnxdxhxytv...@alap3.anarazel.de[2] https://www.postgresql.org/message-id/EF2313B8-A017-4869-9B7F-A24EDD8795DE%40gmail.com[3] https://github.com/shayonj/postgres/pull/1ThanksShayon v2-0001-Introduce-the-ability-to-enable-disable-indexes-u.patch Description: Binary data
Re: Better error reporting from extension scripts (Was: Extend ALTER OPERATOR)
jian he writes: > just found out the"elog(INFO, "should not reached here");" part never reached. You didn't check any of the cases we were discussing I guess? (That is, places in gram.y that throw an error without a parser_errposition call.) Note that even if we fix all of those and keep them fixed, we still couldn't assume the case is unreachable, because gram.y isn't self-contained. For instance, if we hit out-of-memory during raw parsing, the OOM error out of mcxt.c isn't going to provide a syntax error position. I'm not too concerned about doing better than what the patch does now (i.e. nothing) in such edge cases, but we can't do worse. > i guess, we don't need performance in script_error_callback, > but in script_error_callback arrange code seperate syntax error(raw > parser) and other error seems good. > please check the attached minor refactor. I do not think that's an improvement. It's more complicated and less readable, and I don't see why we need to squeeze more performance out of this error-reporting path that should never be taken in production. regards, tom lane
Re: Limiting overshoot in nbtree's parallel SAOP index scans
On Fri, Oct 11, 2024 at 10:27 AM Matthias van de Meent wrote: > With the introduction of the new SAOP handling in PG17, however, the > shared state has become a bit more muddied. Because the default has > changed from "stop scanning at the end of a SAOP value's range" to > "continue scanning, unless ...", and because it is relatively > expensive to determine whether we need to stop or continue we release > the parallel scan to continue before we know if a new primitive scan > would be preferred. It's not just relatively expensive. It's essential that _bt_readpage be able to release the parallel scan at the earliest opportunity (just as soon as the next sibling link is known). Whereas a parallel backend worker will only decide that it's likely a good idea to do another primitive index scan when _bt_readpage has finished executing. Clearly it would not be sensible to wait until _bt_readpage had gotten as far as deciding on the need for another descent of the index. To do so would essentially serialize the scan. There are clear advantages to doing as little coordination as possible among backends. (I don't think that you disagree with any of this, but just want to establish context for anybody else that's reading along -- you seem to want to do some limited/conditional form of this behavior.) > Problem > --- > The new parallel index scan code can only get into a new primitive > scan IFF no parallel worker has taken the next block in the time > between the call to _bt_parallel_release at the start of _bt_readpage, > and the call to _bt_parallel_primscan_schedule through _bt_checkkeys > slightly later in _bt_readpage. This makes the code susceptible to > race conditions Do you have a test case? I was aware of the issues in this area when designing the _bt_parallel_primscan_schedule mechanism. I believe it's true that the scan is only strictly guaranteed to be able to make forward progress via naively scanning the next leaf page, again and again. This is a fairly theoretical point, though. I believe that it's practically impossible for this to cause us problems. That would require a group of backends that all want to start another primitive index scan, but each block each other out, constantly. Not just once, mind you -- again and again (enough to be noticeable). This would have to happen in spite of the fact that we only release one backend per _bt_parallel_release call (_bt_parallel_release uses ConditionVariableSignal under the hood, which only releases the oldest worker waiting on the CV, not all backends). There is no signalling of condition variables within _bt_parallel_primscan_schedule, either. So you just have this one call to ConditionVariableSignal. Why should it be possible to get a stampede of parallel backends? (I'm not 100% sure if your concern is even theoretically valid.) > Given the above, a parallel index scan with ScanKey `id = ANY > (minvalue, maxvalue)` and bad enough timing/luck could thus scan the > whole index, while just 2 primitive index scans (the behaviour before > PG17) are sufficient. > > In practice it is quite unlikely that parallel index scans have such > bad luck, but I don't think we should have luck or timing as a big > factor for the performance picture in parallel index scans. It's also possible for a hash table to degenerate when there are excessively many hash collisions. This is certainly possible with adversarial inputs -- it's not hard to write a test case that'll generate some. So hash tables generally also "have luck as a big factor", in about the same sense. No? I don't think that it's fundamentally unreasonable to have concerns about things like this, of course. Safety-critical avionics systems tend to just not use hash tables at all. > I think I've found a reasonably elegant solution, which limits the > number of buffers we can fail to start a new primitive scan to the > number of parallel workers + 1: If we detect that a parallel worker > in the same primscan range thinks this is the right moment to start a > new primscan (but couldn't due to concurrent advancement), we don't > release the parallel scan immediately, but instead only release it > after reading the pages contents, to find out if we really should > start a new primitive scan. If so, we do that, and if not, we instead > mark that the primitive scan has reached a new primscan range, do some > cleanup, and then continue business as usual. I'm opposed to this patch. It'll introduce a behavior that's more difficult to reason about and test then the one that it purports to fix. -- Peter Geoghegan
Re: New "raw" COPY format
On Wed, Oct 16, 2024, at 18:34, Daniel Verite wrote: > Joel Jacobson wrote: > >> However, I thinking rejecting such column data seems like the >> better alternative, to ensure data exported with COPY TO >> can always be imported back using COPY FROM, >> for the same format. > > On the other hand, that might prevent cases where we > want to export, for instance, a valid json array: > > copy (select json_agg(col) from table ) to 'file' RAW > > This is a variant of the discussion in [1] where the OP does: > > copy (select json_agg(row_to_json(t)) from t) TO 'file' > > and he complains that both text and csv "break the JSON". > That discussion morphed into a proposed patch adding JSON > format to COPY, but RAW would work directly as the OP > expected. > > That is, unless happens to include JSON fields with LF/CRLF > in them, and the RAW format says this is an error condition. > In that case it's quite annoying to make it an error, rather than > simply let it pass. > > > [1] > https://postgr.es/m/CALvfUkBxTYy5uWPFVwpk_7ii2zgT07t3d-yR_cy4sfrrLU=k...@mail.gmail.com Thanks for finding this related thread. Very good example, that would be solved with RAW. I've used a different hack myself many times to export text "as is", which is to use COPY TO ... BINARY, and then to manually edit the file using vim, stripping away the header and footer from the file. :D But I don't see how JSON fields that come from PostgreSQL could contain LF/CRLF though, could they? Since LF/CR must be escaped inside a JSON field, and when casting JSONB to text, there are no newlines injected anywhere in between fields. I can only see how a value of the legacy JSON type could have newlines in it. That doesn't matter though, this is still a very good example on the need to export text "as is" from PostgreSQL to a file. I like Jacob's idea of letting the user specify a DELIMITER also for the RAW format, to override the automagical newline detection. This would e.g. allow importing values containing e.g. \r when the newline delimiter is \n, which would otherwise be reject with an "inconsistent newline style" error. I think it would also be useful to allow specifying DELIMITER NONE, which would allow importing an entire text file, "as is", into a single column and single row. To export a single column single row, the delimiter wouldn't matter, since there wouldn't be any. But DELIMITER NONE would be useful also for COPY TO, if wanting to concatenate text "as is" without adding newlines in between, to a file, similar to the UNIX "cat" command. Regarding the name for the format, I thought SINGLE was nice before I read this message, but now since the need for more rawness seems desirable, I think I like RAW the most again, since if the automagical newline detection can be overridden, then it's really raw for real. A final thought is to maybe consider just skipping the automagical newline detection for RAW? Instead of the automagical detection, the default newline delimiter could be the OS default, similar to how COPY TO works. That way, it would almost always just work for most users, as long as processing files within their OS, and when not, they would just need to specify the DELIMITER. /Joel
Considering fractional paths in Append node
Hi hackers! A colleague of mine, Andrei Lepikhov, has found interesting behavior in path cost calculation for Append node - when evaluating the cheapest path it does not take into account fractional path costs. We've prepared a patch that forces add_paths_to_append_rel function to consider non-parametrized fractional path costs. The effect is easily seen in one of standard PG tests: Vanilla (current master): explain analyze select t1.unique1 from tenk1 t1 inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 union all (values(1)) limit 1; QUERY PLAN -- Limit (cost=0.00..219.55 rows=1 width=4) (actual time=6.309..6.312 rows=1 loops=1) -> Append (cost=0.00..2415.09 rows=11 width=4) (actual time=6.308..6.310 rows=1 loops=1) -> Nested Loop (cost=0.00..2415.03 rows=10 width=4) (actual time=6.307..6.308 rows=1 loops=1) Join Filter: (t1.tenthous = t2.tenthous) Rows Removed by Join Filter: 4210 -> Seq Scan on tenk1 t1 (cost=0.00..445.00 rows=1 width=8) (actual time=0.004..0.057 rows=422 loops=1) -> Materialize (cost=0.00..470.05 rows=10 width=4) (actual time=0.000..0.014 rows=10 loops=422) Storage: Memory Maximum Storage: 17kB -> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..5.535 rows=10 loops=1) Filter: (thousand = 0) Rows Removed by Filter: 9990 -> Result (cost=0.00..0.01 rows=1 width=4) (never executed) Planning Time: 0.324 ms Execution Time: 6.336 ms (14 rows) Patched, the same test: explain analyze select t1.unique1 from tenk1 t1 inner join tenk2 t2 on t1.tenthous = t2.tenthous and t2.thousand = 0 union all (values(1)) limit 1; QUERY PLAN -- Limit (cost=0.29..126.00 rows=1 width=4) (actual time=0.105..0.106 rows=1 loops=1) -> Append (cost=0.29..1383.12 rows=11 width=4) (actual time=0.104..0.105 rows=1 loops=1) -> Nested Loop (cost=0.29..1383.05 rows=10 width=4) (actual time=0.104..0.104 rows=1 loops=1) -> Seq Scan on tenk2 t2 (cost=0.00..470.00 rows=10 width=4) (actual time=0.076..0.076 rows=1 loops=1) Filter: (thousand = 0) Rows Removed by Filter: 421 -> Index Scan using tenk1_thous_tenthous on tenk1 t1 (cost=0.29..91.30 rows=1 width=8) (actual time=0.026..0.026 rows=1 loops=1) Index Cond: (tenthous = t2.tenthous) -> Result (cost=0.00..0.01 rows=1 width=4) (never executed) Planning Time: 0.334 ms Execution Time: 0.130 ms (11 rows) Hope this optimization could be useful. -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: New "raw" COPY format
On Wed, Oct 16, 2024, at 20:30, Joel Jacobson wrote: > A final thought is to maybe consider just skipping > the automagical newline detection for RAW? > > Instead of the automagical detection, > the default newline delimiter could be the OS default, > similar to how COPY TO works. > > That way, it would almost always just work for most users, > as long as processing files within their OS, > and when not, they would just need to specify the DELIMITER. I would guess that nowadays, dealing with unstructured text files are probably less common, than dealing with structured text files, such as JSON, YAML, TOML, XML, etc. Therefore, maybe DELIMITER NONE would be a better default for RAW? Especially since it's then also more honest in being "raw". If needing to import an unstructured text file that is just newline delimited, and not wanting the entire file as a single value, the newline style would then just need to be specified using the DELIMITER option. /Joel
Re: POC, WIP: OR-clause support for indexes
On 10/17/24 03:39, Alexander Korotkov wrote: On Wed, Oct 16, 2024 at 7:22 AM Andrei Lepikhov wrote: On 10/12/24 21:25, Alexander Korotkov wrote: I forgot to specify (COSTS OFF) for EXPLAINs in regression tests. Fixed in v42. I've passed through the patch set. Let me put aside the v42-0003 patch—it looks debatable, and I need time to analyse the change in regression tests caused by this patch. Yes, 0003 patch is for illustration purposes for now. I will not keep rebasing it. We can pick it later when main patches are committed. Got it. I will save it into the TODO list. Comments look much better according to my current language level. Ideas with fast exits also look profitable and are worth an additional 'matched' variable. So, in general, it is ok. I think only one place with inner_other_clauses can be improved. Maybe it will be enough to create this list only once, outside 'foreach(j, groupedArgs)' cycle? Also, the comment on the necessity of this operation was unclear to me. See the attachment for my modest attempt at improving it. Thank you, I've integrated your patch with minor edits from me. Thanks, I'm not sure about necessity to check NIL value of a list (list_free also do it), but I'm ok with the edits. -- regards, Andrei Lepikhov