Re: race condition when writing pg_control
On Fri, May 17, 2024 at 4:46 PM Thomas Munro wrote: > The specific problem here is that LocalProcessControlFile() runs in > every launched child for EXEC_BACKEND builds. Windows uses > EXEC_BACKEND, and Windows' NTFS file system is one of the two file > systems known to this list to have the concurrent read/write data > mashing problem (the other being ext4). Phngh... this is surprisingly difficult to fix. Things that don't work: We "just" need to acquire ControlFileLock while reading the file or examining the object in shared memory, or get a copy of it, passed through the EXEC_BACKEND BackendParameters that was acquired while holding the lock, but the current location of this code in child startup is too early to use LWLocks, and the postmaster can't acquire locks either so it can't even safely take a copy to pass on. You could reorder startup so that we are allowed to acquire LWLocks in children at that point, but then you'd need to convince yourself that there is no danger of breaking some ordering requirement in external preload libraries, and figure out what to do about children that don't even attach to shared memory. Maybe that's possible, but that doesn't sound like a good idea to back-patch. First idea idea I've come up with to avoid all of that: pass a copy of the "proto-controlfile", to coin a term for the one read early in postmaster startup by LocalProcessControlFile(). As far as I know, the only reason we need it is to suck some settings out of it that don't change while a cluster is running (mostly can't change after initdb, and checksums can only be {en,dis}abled while down). Right? Children can just "import" that sucker instead of calling LocalProcessControlFile() to figure out the size of WAL segments yada yada, I think? Later they will attach to the real one in shared memory for all future purposes, once normal interlocking is allowed. I dunno. Draft patch attached. Better plans welcome. This passes CI on Linux systems afflicted by EXEC_BACKEND, and Windows. Thoughts? From 48c2de14bd9368b4708a99ecbb75452dc327e608 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Sat, 18 May 2024 13:41:09 +1200 Subject: [PATCH v1 1/3] Fix pg_control corruption in EXEC_BACKEND startup. When backend processes were launched in EXEC_BACKEND builds, they would run LocalProcessControlFile() to read in pg_control and extract several important settings. This happens too early to acquire ControlFileLock, and the postmaster is also not allowed to acquire ControlFileLock, so it can't safely take a copy to give to the child. Instead, pass down the "proto-controlfile" that was read by the postmaster in LocalProcessControlFile(). Introduce functions ExportProtoControlFile() and ImportProtoControlFile() to allow that. Subprocesses will extract information from that, and then later attach to the current control file in shared memory. Reported-by: Melanie Plageman per Windows CI failure Discussion: https://postgr.es/m/CAAKRu_YNGwEYrorQYza_W8tU%2B%3DtoXRHG8HpyHC-KDbZqA_ZVSA%40mail.gmail.com --- src/backend/access/transam/xlog.c | 46 +++-- src/backend/postmaster/launch_backend.c | 19 ++ src/include/access/xlog.h | 5 +++ 3 files changed, 62 insertions(+), 8 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index 330e058c5f2..b69a0d95af9 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -568,6 +568,10 @@ static WALInsertLockPadded *WALInsertLocks = NULL; */ static ControlFileData *ControlFile = NULL; +#ifdef EXEC_BACKEND +static ControlFileData *ProtoControlFile = NULL; +#endif + /* * Calculate the amount of space left on the page after 'endptr'. Beware * multiple evaluation! @@ -686,6 +690,7 @@ static bool PerformRecoveryXLogAction(void); static void InitControlFile(uint64 sysidentifier); static void WriteControlFile(void); static void ReadControlFile(void); +static void ScanControlFile(void); static void UpdateControlFile(void); static char *str_time(pg_time_t tnow); @@ -4309,9 +4314,7 @@ WriteControlFile(void) static void ReadControlFile(void) { - pg_crc32c crc; int fd; - static char wal_segsz_str[20]; int r; /* @@ -4344,6 +4347,15 @@ ReadControlFile(void) close(fd); + ScanControlFile(); +} + +static void +ScanControlFile(void) +{ + static char wal_segsz_str[20]; + pg_crc32c crc; + /* * Check for expected pg_control format version. If this is wrong, the * CRC check will likely fail because we'll be checking the wrong number @@ -4815,8 +4827,33 @@ LocalProcessControlFile(bool reset) Assert(reset || ControlFile == NULL); ControlFile = palloc(sizeof(ControlFileData)); ReadControlFile(); + +#ifdef EXEC_BACKEND + /* We need to be able to give this to subprocesses. */ + ProtoControlFile = ControlFile; +#endif +} + +#ifdef EXEC_BACKEND +void +ExportProtoControlFile(ControlFileData *copy) +{ + *copy =
Re: Refactoring backend fork+exec code
On Mon, Mar 18, 2024 at 10:41 PM Heikki Linnakangas wrote: > Committed, with some final cosmetic cleanups. Thanks everyone! Nitpicking from UBSan with EXEC_BACKEND on Linux (line numbers may be a bit off, from a branch of mine): ../src/backend/postmaster/launch_backend.c:772:2: runtime error: null pointer passed as argument 2, which is declared to never be null ==13303==Using libbacktrace symbolizer. #0 0x564b0202 in save_backend_variables ../src/backend/postmaster/launch_backend.c:772 #1 0x564b0242 in internal_forkexec ../src/backend/postmaster/launch_backend.c:311 #2 0x564b0bdd in postmaster_child_launch ../src/backend/postmaster/launch_backend.c:244 #3 0x564b3121 in StartChildProcess ../src/backend/postmaster/postmaster.c:3928 #4 0x564b933a in PostmasterMain ../src/backend/postmaster/postmaster.c:1357 #5 0x562de4ad in main ../src/backend/main/main.c:197 #6 0x7667ad09 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x23d09) #7 0x55e34279 in _start (/tmp/cirrus-ci-build/build/tmp_install/usr/local/pgsql/bin/postgres+0x8e0279) This silences it: - memcpy(param->startup_data, startup_data, startup_data_len); + if (startup_data_len > 0) + memcpy(param->startup_data, startup_data, startup_data_len); (I found that out by testing EXEC_BACKEND on CI. I also learned that the Mac and FreeBSD tasks fail with EXEC_BACKEND because of SysV shmem bleating. We probably should go and crank up the relevant sysctls in the .cirrus.tasks.yml...)
Re: libpq compression (part 3)
On Fri, May 17, 2024 at 11:40 AM Robert Haas wrote: > > To be clear, I am not arguing that it should be the receiver's choice. > I'm arguing it should be the client's choice, which means the client > decides what it sends and also tells the server what to send to it. > I'm open to counter-arguments, but as I've thought about this more, > I've come to the conclusion that letting the client control the > behavior is the most likely to be useful and the most consistent with > existing facilities. I think we're on the same page based on the rest > of your email: I'm just clarifying. This is what I am imagining too > I have some quibbles with the syntax but I agree with the concept. > What I'd probably do is separate the server side thing into two GUCs, > each with a list of algorithms, comma-separated, like we do for other > lists in postgresql.conf. Maybe make the default 'all' meaning > "everything this build of the server supports". On the client side, > I'd allow both things to be specified using a single option, because > wanting to do the same thing in both directions will be common, and > you actually have to type in connection strings sometimes, so > verbosity matters more. > > As far as the format of the value for that keyword, what do you think > about either compression=DO_THIS_BOTH_WAYS or > compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do > this" being a specification of the same form already accepted for > server-side compression e.g. gzip or gzip:level=9? If you don't like > that, why do you think the proposal you made above is better, and why > is that one now punctuated with slashes instead of semicolons? I like this more than what I proposed, and will update the patches to reflect this proposal. (I've gotten them locally back into a state of applying cleanly and dealing with the changes needed to support direct SSL connections, so refactoring the protocol layer shouldn't be too hard now.) On Fri, May 17, 2024 at 11:10 AM Jacob Champion wrote: > We're talking about a transport-level option, though -- I thought the > proposal enabled compression before authentication completed? Or has > that changed? On Fri, May 17, 2024 at 1:03 PM Jelte Fennema-Nio wrote: > I think it would make sense to only compress messages after > authentication has completed. The gain of compressing authentication > related packets seems pretty limited. At the protocol level, compressed data is a message type that can be used to wrap arbitrary data as soon as the startup packet is processed. However, as an implementation detail that clients should not rely on but that we can rely on in thinking about the implications, the only message types that are compressed (except in the 0005 CI patch for test running only) are PqMsg_CopyData, PqMsg_DataRow, and PqMsg_Query, all of which aren't sent before authentication. -- Jacob Burroughs | Staff Software Engineer E: jburrou...@instructure.com
Re: Streaming read-ready sequential scan code
On Sat, May 18, 2024 at 11:30 AM Thomas Munro wrote: > Andres happened to have TPC-DS handy, and reproduced that regression > in q15. We tried some stuff and figured out that it requires > parallel_leader_participation=on, ie that this looks like some kind of > parallel fairness and/or timing problem. It seems to be a question of > which worker finishes up processing matching rows, and the leader gets > a ~10ms head start but may be a little more greedy with the new > streaming code. He tried reordering the table contents and then saw > 17 beat 16. So for q15, initial indications are that this isn't a > fundamental regression, it's just a test that is sensitive to some > arbitrary conditions. > > I'll try to figure out some more details about that, ie is it being > too greedy on small-ish tables, After more debugging, we learned a lot more things... 1. That query produces spectacularly bad estimates, so we finish up having to increase the number of buckets in a parallel hash join many times. That is quite interesting, but unrelated to new code. 2. Parallel hash join is quite slow at negotiating an increase in the number of hash bucket, if all of the input tuples are being filtered out by quals, because of the choice of where workers check for PHJ_GROWTH_NEED_MORE_BUCKETS. That could be improved quite easily I think. I have put that on my todo list 'cause that's also my code, but it's not a new issue it's just one that is now highlighted... 3. This bit of read_stream.c is exacerbating unfairness in the underlying scan, so that 1 and 2 come together and produce a nasty slowdown, which goes away if you change it like so: - BlockNumber blocknums[16]; + BlockNumber blocknums[1]; I will follow up after some more study.
Re: Propagate sanity checks of ProcessUtility() to standard_ProcessUtility()?
On Fri, May 17, 2024 at 03:53:58PM -0400, Robert Haas wrote: > The usual pattern for using hooks like this is to call the next > implementation, or the standard implementation, and pass down the > arguments that you got. If you try to do that and mess it up, you're > going to get a crash really, really quickly and it shouldn't be very > difficult at all to figure out what you did wrong. Honestly, that > doesn't seem like it would be hard even without the assertions: for > the most part, a quick glance at the stack backtrace ought to be > enough. If you're trying to do something more sophisticated, like > mutating the node tree before passing it down, the chances of your > mistakes getting caught by these assertions are pretty darn low, since > they're very superficial checks. Perhaps, still that would be something. Hmm. We've had in the past bugs where DDL paths were playing with the manipulation of Querys in core, corrupting their contents. It seems like what you would mean is to have a check at the *end* of ProcessUtility() that does an equalFuncs() on the Query, comparing it with a copy of it taken at its beginning, all that hidden within two USE_ASSERT_CHECKING blocks, when we'd expect the tree to not change. With readOnlyTree, that would be just changing from one policy to another, which is not really interesting at this stage based on how ProcessUtility() is shaped. -- Michael signature.asc Description: PGP signature
Re: Internal error codes triggered by tests
On Fri, May 17, 2024 at 01:41:29PM -0400, Robert Haas wrote: > On Tue, Apr 23, 2024 at 12:55 AM Michael Paquier wrote: >> Thoughts? > > The patch as proposed seems fine. Marking RfC. Thanks. I'll look again at that once v18 opens up for business. -- Michael signature.asc Description: PGP signature
Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
On 2024-05-18 03:27 +0200, David G. Johnston wrote: > > On 2024-05-16 17:47 +0200, David G. Johnston wrote: > > > We have a glossary. > > If sticking with stand-alone composite type as a formal term we should > document it in the glossary. It's unclear whether this will survive review > though. With the wording provided in this patch it doesn't really add > enough to continue a strong defense of it. Oh, I thought you meant we already have that term in the glossary (I haven't checked until now). Let's see if we can convince Robert of the rewording. > > It's now a separate error message (like I already had in v1) which > > states that the specified type must not be a row type of another table > > (based on Peter's feedback). And the hint directs the user to CREATE > > TYPE. > > > > In passing, I also quoted the type name in the existing error message > > for consistency. I saw that table names etc. are already quoted in > > other error messages. > > > > > The hint and the quoting both violate the documented rules for these things: > > https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-QUOTES > > There are functions in the backend that will double-quote their own output > as needed (for example, format_type_be()). Do not put additional quotes > around the output of such functions. > > https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-GRAMMAR-PUNCTUATION > > Detail and hint messages: Use complete sentences, and end each with a > period. Capitalize the first word of sentences. Thanks, I didn't know that guideline. Both fixed in v6. -- Erik >From 39d2dc9b58dfa3b68245070648ecdf9eed892c7a Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Fri, 8 Mar 2024 04:21:56 +0100 Subject: [PATCH v6] Document that typed tables rely on CREATE TYPE CREATE TABLE OF requires a stand-alone composite type that is not the row type of another table. Clarify that with a reference to CREATE TYPE in the docs. Also report a separate error message in this case. Reworded docs provided by David G. Johnston. --- doc/src/sgml/ref/create_table.sgml| 16 src/backend/commands/tablecmds.c | 9 - src/test/regress/expected/typed_table.out | 7 ++- src/test/regress/sql/typed_table.sql | 4 4 files changed, 26 insertions(+), 10 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index f19306e776..11458be9cf 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -249,18 +249,18 @@ WITH ( MODULUS numeric_literal, REM Creates a typed table, which takes its - structure from the specified composite type (name optionally - schema-qualified). A typed table is tied to its type; for - example the table will be dropped if the type is dropped - (with DROP TYPE ... CASCADE). + structure from an existing (name optionally schema-qualified) stand-alone + composite type (i.e., created using ) + though it still produces a new composite type as well. The table will + have a dependency on the referenced type such that cascaded alter and + drop actions on the type will propagate to the table. - When a typed table is created, then the data types of the - columns are determined by the underlying composite type and are - not specified by the CREATE TABLE command. + A typed table always has the same column names and data types as the + type it is derived from, and you cannot specify additional columns. But the CREATE TABLE command can add defaults - and constraints to the table and can specify storage parameters. + and constraints to the table, as well as specify storage parameters. diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c index 313c782cae..2331a9185a 100644 --- a/src/backend/commands/tablecmds.c +++ b/src/backend/commands/tablecmds.c @@ -6935,8 +6935,15 @@ check_of_type(HeapTuple typetuple) * the type before the typed table creation/conversion commits. */ relation_close(typeRelation, NoLock); + + if (!typeOk) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), + errmsg("type %s must not be a row type of another table", + format_type_be(typ->oid)), + errhint("Must be a stand-alone composite type created with CREATE TYPE."))); } - if (!typeOk) + else ereport(ERROR, (errcode(ERRCODE_WRONG_OBJECT_TYPE), errmsg("type %s is not a composite type", diff --git a/src/test/regress/expected/typed_table.out b/src/test/regress/expected/typed_table.out index 2e47ecbcf5..5743e74978 100644 --- a/src/test/regress/expected/typed_table.out +++ b/src/test/regress/expected/typed_table.out @@ -89,8 +89,13 @@ drop cascades to function get_all_persons() drop cascades to table persons2 drop cascades to table persons3 CREATE TABLE persons5
Re: Underscore in positional parameters?
On 2024-05-17 02:06 +0200, Michael Paquier wrote: > On Thu, May 16, 2024 at 08:41:11AM +0200, Peter Eisentraut wrote: > > On this specific patch, maybe reword "parameter too large" to "parameter > > number too large". > > WFM here. Done in v3. I noticed this compiler warning with my previous patch: scan.l:997:41: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] 997 | ErrorSaveContext escontext = {T_ErrorSaveContext}; | ^~~~ I thought that I had to factor this out into a function similar to process_integer_literal (which also uses ErrorSaveContext). But moving that declaration to the start of the {param} action was enough in the end. While trying out the refactoring, I noticed two small things that can be fixed as well in scan.l: * Prototype and definition of addunicode do not match. The prototype uses yyscan_t while the definition uses core_yyscan_t. * Parameter base of process_integer_literal is unused. But those should be one a separate thread, right, even for minor fixes? -- Erik >From d3ad2971dcb8ab15fafe1a5c69a15e5994eac76d Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Tue, 14 May 2024 14:12:11 +0200 Subject: [PATCH v3 1/3] Fix overflow in parsing of positional parameter Replace atol with pg_strtoint32_safe in the backend parser and with strtoint in ECPG to reject overflows when parsing the number of a positional parameter. With atol from glibc, parameters $2147483648 and $4294967297 turn into $-2147483648 and $1, respectively. --- src/backend/parser/scan.l| 8 +++- src/interfaces/ecpg/preproc/pgc.l| 5 - src/test/regress/expected/numerology.out | 4 src/test/regress/sql/numerology.sql | 1 + 4 files changed, 16 insertions(+), 2 deletions(-) diff --git a/src/backend/parser/scan.l b/src/backend/parser/scan.l index 9b33fb8d72..436cc64f07 100644 --- a/src/backend/parser/scan.l +++ b/src/backend/parser/scan.l @@ -992,8 +992,14 @@ other . } {param} { + ErrorSaveContext escontext = {T_ErrorSaveContext}; + int32 val; + SET_YYLLOC(); - yylval->ival = atol(yytext + 1); + val = pg_strtoint32_safe(yytext + 1, (Node *) ); + if (escontext.error_occurred) + yyerror("parameter number too large"); + yylval->ival = val; return PARAM; } {param_junk} { diff --git a/src/interfaces/ecpg/preproc/pgc.l b/src/interfaces/ecpg/preproc/pgc.l index d117cafce6..7122375d72 100644 --- a/src/interfaces/ecpg/preproc/pgc.l +++ b/src/interfaces/ecpg/preproc/pgc.l @@ -938,7 +938,10 @@ cppline {space}*#([^i][A-Za-z]*|{if}|{ifdef}|{ifndef}|{import})((\/\*[^*/]*\*+ } {param} { - base_yylval.ival = atol(yytext+1); + errno = 0; + base_yylval.ival = strtoint(yytext+1, NULL, 10); + if (errno == ERANGE) + mmfatal(PARSE_ERROR, "parameter number too large"); return PARAM; } {param_junk} { diff --git a/src/test/regress/expected/numerology.out b/src/test/regress/expected/numerology.out index c8196d2c85..5bac05c3b3 100644 --- a/src/test/regress/expected/numerology.out +++ b/src/test/regress/expected/numerology.out @@ -206,6 +206,10 @@ PREPARE p1 AS SELECT $1a; ERROR: trailing junk after parameter at or near "$1a" LINE 1: PREPARE p1 AS SELECT $1a; ^ +PREPARE p1 AS SELECT $2147483648; +ERROR: parameter number too large at or near "$2147483648" +LINE 1: PREPARE p1 AS SELECT $2147483648; + ^ SELECT 0b; ERROR: invalid binary integer at or near "0b" LINE 1: SELECT 0b; diff --git a/src/test/regress/sql/numerology.sql b/src/test/regress/sql/numerology.sql index 3f0ec34ecf..6722a09c5f 100644 --- a/src/test/regress/sql/numerology.sql +++ b/src/test/regress/sql/numerology.sql @@ -52,6 +52,7 @@ SELECT 0.0e1a; SELECT 0.0e; SELECT 0.0e+a; PREPARE p1 AS SELECT $1a; +PREPARE p1 AS SELECT $2147483648; SELECT 0b; SELECT 1b; -- 2.45.1
Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
On Fri, May 17, 2024 at 4:57 PM Erik Wienhold wrote: > On 2024-05-16 17:47 +0200, David G. Johnston wrote: > > On Wed, May 15, 2024 at 8:46 AM Robert Haas > wrote: > > > > > On Thu, Apr 4, 2024 at 12:41 AM Erik Wienhold wrote: > > > > Thanks, fixed in v4. Looks like American English prefers that comma > and > > > > it's also more common in our docs. > > > > > > Reviewing this patch: > > > > > > - Creates a typed table, which takes its > > > - structure from the specified composite type (name optionally > > > - schema-qualified). A typed table is tied to its type; for > > > - example the table will be dropped if the type is dropped > > > - (with DROP TYPE ... CASCADE). > > > + Creates a typed table, which takes its > > > structure > > > + from an existing (name optionally schema-qualified) stand-alone > > > composite > > > + type (i.e., created using ) > though > > > it > > > + still produces a new composite type as well. The table will > have > > > + a dependency on the referenced type such that cascaded alter and > > > drop > > > + actions on the type will propagate to the table. > > > > > > It would be better if this diff didn't reflow the unchanged portions > > > of the paragraph. > > Right. I now reformatted it so that first line remains unchanged. But > the rest of the para is still a complete rewrite. > > > > I agree that it's a good idea to mention that the table must have been > > > created using CREATE TYPE .. AS here, but I disagree with the rest of > > > the rewording in this hunk. I think we could just add "creating using > > > CREATE TYPE" to the end of the first sentence, with an xref, and leave > > > the rest as it is. > > > > > > > > > I don't see a reason to mention that the typed > > > table also spawns a rowtype; that's just standard CREATE TABLE > > > behavior and not really relevant here. > > > > > > I figured it wouldn't be immediately obvious that the system would > create a > > second type with identical structure. Of course, in order for SELECT tbl > > FROM tbl; to work it must indeed do so. I'm not married to pointing out > > this dynamic explicitly though. > > > > > > > And I don't understand what the > > > rest of the rewording does for us. > > > > > > > It calls out the explicit behavior that the table's columns can change > due > > to actions on the underlying type. Mentioning this unique behavior seems > > worth a sentence. > > > > > > > > > > - When a typed table is created, then the data types of the > > > - columns are determined by the underlying composite type and are > > > - not specified by the CREATE TABLE command. > > > + A typed table always has the same column names and data types > as the > > > + type it is derived from, and you cannot specify additional > columns. > > >But the CREATE TABLE command can add defaults > > > - and constraints to the table and can specify storage parameters. > > > + and constraints to the table, as well as specify storage > parameters. > > > > > > > > > I don't see how this is better. > > > > > > > I'll agree this is more of a stylistic change, but mainly because the > talk > > about data types reasonably implies the other items the patch explicitly > > mentions - names and additional columns. > > I prefer David's changes to both paras because right now the details of > typed tables are spread over the respective CREATE and ALTER commands > for types and tables. Or maybe we should add those details to the > existing "Typed Tables" section at the very end of CREATE TABLE? > > > > - errmsg("type %s is not a composite type", > > > + errmsg("type %s is not a stand-alone composite type", > > > > > > I agree with Peter's complaint that people aren't going to understand > > > what a stand-alone composite type means when they see the revised > > > error message; to really help people, we're going to need to do better > > > than this, I think. > > > > > > > > We have a glossary. > If sticking with stand-alone composite type as a formal term we should document it in the glossary. It's unclear whether this will survive review though. With the wording provided in this patch it doesn't really add enough to continue a strong defense of it. > It's now a separate error message (like I already had in v1) which > states that the specified type must not be a row type of another table > (based on Peter's feedback). And the hint directs the user to CREATE > TYPE. > > In passing, I also quoted the type name in the existing error message > for consistency. I saw that table names etc. are already quoted in > other error messages. > > The hint and the quoting both violate the documented rules for these things: https://www.postgresql.org/docs/current/error-style-guide.html#ERROR-STYLE-GUIDE-QUOTES There are functions in the backend that will double-quote their own output as needed (for example, format_type_be()). Do not put
Re: allow sorted builds for btree_gist
On Fri, May 17, 2024 at 12:41 PM Tomas Vondra wrote: > I've been looking at GiST to see if there could be a good way to do > parallel builds - and there might be, if the opclass supports sorted > builds, because then we could parallelize the sort. > > But then I noticed we support this mode only for point_ops, because > that's the only opclass with sortsupport procedure. Which mostly makes > sense, because types like geometries, polygons, ... do not have a well > defined order. Oh, I'm excited about this for temporal tables. It seems to me that ranges and multiranges should have a well-defined order (assuming their base types do), since you can do dictionary-style ordering (compare the first value, then the next, then the next, etc.). Is there any reason not to support those? No reason not to commit these btree_gist functions first though. If you aren't interested in adding GiST sortsupport for ranges & multiranges I'm willing to do it myself; just let me know. Do note that the 1.7 -> 1.8 changes have been reverted though (as part of my temporal work), and it looks like your patch is a bit messed up from that. You'll want to take 1.8 for yourself, and also your 1.9 upgrade script is trying to add the reverted stratnum functions. Yours, Paul
Re: CREATE TABLE creates a composite type corresponding to the table row, which is and is not there
On 2024-05-16 17:47 +0200, David G. Johnston wrote: > On Wed, May 15, 2024 at 8:46 AM Robert Haas wrote: > > > On Thu, Apr 4, 2024 at 12:41 AM Erik Wienhold wrote: > > > Thanks, fixed in v4. Looks like American English prefers that comma and > > > it's also more common in our docs. > > > > Reviewing this patch: > > > > - Creates a typed table, which takes its > > - structure from the specified composite type (name optionally > > - schema-qualified). A typed table is tied to its type; for > > - example the table will be dropped if the type is dropped > > - (with DROP TYPE ... CASCADE). > > + Creates a typed table, which takes its > > structure > > + from an existing (name optionally schema-qualified) stand-alone > > composite > > + type (i.e., created using ) though > > it > > + still produces a new composite type as well. The table will have > > + a dependency on the referenced type such that cascaded alter and > > drop > > + actions on the type will propagate to the table. > > > > It would be better if this diff didn't reflow the unchanged portions > > of the paragraph. Right. I now reformatted it so that first line remains unchanged. But the rest of the para is still a complete rewrite. > > I agree that it's a good idea to mention that the table must have been > > created using CREATE TYPE .. AS here, but I disagree with the rest of > > the rewording in this hunk. I think we could just add "creating using > > CREATE TYPE" to the end of the first sentence, with an xref, and leave > > the rest as it is. > > > > > I don't see a reason to mention that the typed > > table also spawns a rowtype; that's just standard CREATE TABLE > > behavior and not really relevant here. > > > I figured it wouldn't be immediately obvious that the system would create a > second type with identical structure. Of course, in order for SELECT tbl > FROM tbl; to work it must indeed do so. I'm not married to pointing out > this dynamic explicitly though. > > > > And I don't understand what the > > rest of the rewording does for us. > > > > It calls out the explicit behavior that the table's columns can change due > to actions on the underlying type. Mentioning this unique behavior seems > worth a sentence. > > > > > > - When a typed table is created, then the data types of the > > - columns are determined by the underlying composite type and are > > - not specified by the CREATE TABLE command. > > + A typed table always has the same column names and data types as the > > + type it is derived from, and you cannot specify additional columns. > >But the CREATE TABLE command can add defaults > > - and constraints to the table and can specify storage parameters. > > + and constraints to the table, as well as specify storage parameters. > > > > > > I don't see how this is better. > > > > I'll agree this is more of a stylistic change, but mainly because the talk > about data types reasonably implies the other items the patch explicitly > mentions - names and additional columns. I prefer David's changes to both paras because right now the details of typed tables are spread over the respective CREATE and ALTER commands for types and tables. Or maybe we should add those details to the existing "Typed Tables" section at the very end of CREATE TABLE? > > - errmsg("type %s is not a composite type", > > + errmsg("type %s is not a stand-alone composite type", > > > > I agree with Peter's complaint that people aren't going to understand > > what a stand-alone composite type means when they see the revised > > error message; to really help people, we're going to need to do better > > than this, I think. > > > > > We have a glossary. > > That said, leave the wording as-is and add a conditional hint: The > composite type must not also be a table. It's now a separate error message (like I already had in v1) which states that the specified type must not be a row type of another table (based on Peter's feedback). And the hint directs the user to CREATE TYPE. In passing, I also quoted the type name in the existing error message for consistency. I saw that table names etc. are already quoted in other error messages. -- Erik >From 575bfec362bde5bf77ccb793bd8e2d78679c062f Mon Sep 17 00:00:00 2001 From: Erik Wienhold Date: Fri, 8 Mar 2024 04:21:56 +0100 Subject: [PATCH v5] Document that typed tables rely on CREATE TYPE CREATE TABLE OF requires a stand-alone composite type that is not the row type of another table. Clarify that with a reference to CREATE TYPE in the docs. Also report a separate error message in this case. Reworded docs provided by David G. Johnston. --- doc/src/sgml/ref/create_table.sgml| 16 src/backend/commands/tablecmds.c | 11 +-- src/test/regress/expected/typed_table.out | 7 ++- src/test/regress/sql/typed_table.sql | 4 4 files
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, 17 May 2024 at 21:24, Robert Haas wrote: > I think the > fear that we're going to run into cases where such complex handshaking > is necessary is a major reason why I'm afraid of Jelte's ideas about > ParameterSet: it seems much more opinionated to me than he seems to > think it is. I think that fear is valid, and I agree that we might want to add a bespoke message for cases where ParameterSet is not enough. But as far as I can tell ParameterSet would at least cover all the protocol changes that have been suggested so far. Using an opinionated but limited message type for 90% of the cases and a bespoke message for the last 10% seems much better to me than having a bespoke one for each (especially if currently none of the protocol proposals fall into the last 10%). > To the extent that I can wrap my head around what Jelte is proposing, > and all signs point to that extent being quite limited, I suppose I > was thinking of something like his option (2). That is, I assumed that > a client would request all the optional features that said client > might wish to use at connection startup time. But I can see how that > assumption might be somewhat problematic in a connection-pooling > environment. To be clear, I'd also be totally fine with my option (2). I'm personally slightly leaning towards my option (1), due to the reasons listed before. But connection poolers could request all the protocol extensions at the start just fine (using the default "disabled" value) to check for support. So I think option (2) would probably be the most conservative, i.e. we could always decide that option (1) is fine in some future release. > Still, at least to me, it seems better than trying to > rely on GUC_REPORT. My opinion is (1) GUC_REPORT isn't a particularly > well-designed mechanism so I dislike trying to double down on it I agree that GUC_REPORT is not particularly well designed, currently... But even in its current form it's already a very effective mechanism for connection poolers to find out to which value a specific GUC is set to, and if something similar to patch 0014 would be merged my main gripe with GUC_REPORT would be gone. Tracking GUC settings by using ParameterSet would actually be harder, because if ParameterSet errors at Postgres then the connection pooler would have to roll back its cache of that setting. While with the GUC_REPORT response from Postgres it can simply rely on Postgres telling the pooler that the GUC has changed, even rollbacks are handled correctly this way. > and > (2) trying to mix these protocol-level parameters and the > transactional GUCs we have together in a single mechanism seems > potentially problematic I don't understand what potential problems you're worried about here. Could you clarify? > and (3) I'm still not particularly happy about > the idea of making protocol parameters into GUCs in the first place. Similar to the above: Could you clarify why you're not happy about that? > Perhaps these are all minority positions, but I can't tell you what > everyone thinks, only what I think. I'd love to hear some opinions from others on these design choices. So far it seems like we're the only two that have opinions on this (which seems hard to believe) and our opinions are clearly conflicting. And above all I'd like to move forward with this, be it my way or yours (although I'd prefer my way of course ;) )
Re: Streaming read-ready sequential scan code
On Sat, May 18, 2024 at 8:09 AM Thomas Munro wrote: > On Sat, May 18, 2024 at 1:00 AM Alexander Lakhin wrote: > > I decided to compare v17 vs v16 performance (as I did the last year [1]) > > and discovered that v17 loses to v16 in the pg_tpcds (s64da_tpcds) > > benchmark, query15 (and several others, but I focused on this one): > > Best pg-src-master--.* worse than pg-src-16--.* by 52.2 percents (229.84 > > > 151.03): pg_tpcds.query15 > > Average pg-src-master--.* worse than pg-src-16--.* by 53.4 percents (234.20 > > > 152.64): pg_tpcds.query15 > > Please look at the full html report attached in case you're interested. > > > > (I used my pg-mark tool to measure/analyze performance, but I believe the > > same results can be seen without it.) > > Will investigate, but if it's easy for you to rerun, does it help if > you increase Linux readahead, eg blockdev --setra setting? Andres happened to have TPC-DS handy, and reproduced that regression in q15. We tried some stuff and figured out that it requires parallel_leader_participation=on, ie that this looks like some kind of parallel fairness and/or timing problem. It seems to be a question of which worker finishes up processing matching rows, and the leader gets a ~10ms head start but may be a little more greedy with the new streaming code. He tried reordering the table contents and then saw 17 beat 16. So for q15, initial indications are that this isn't a fundamental regression, it's just a test that is sensitive to some arbitrary conditions. I'll try to figure out some more details about that, ie is it being too greedy on small-ish tables, and generally I do wonder about the interactions between the heuristics and batching working at different levels (OS, seq scan, read stream, hence my earlier ra question which is likely a red herring) and how there might be unintended consequences/interference patterns, but this particular case seems more data dependent.
Re: libpq compression (part 3)
On Fri, 17 May 2024 at 23:10, Jacob Champion wrote: > We're talking about a transport-level option, though -- I thought the > proposal enabled compression before authentication completed? Or has > that changed? I think it would make sense to only compress messages after authentication has completed. The gain of compressing authentication related packets seems pretty limited. > (I'm suspicious of arguments that begin "well you can already do bad > things" Once logged in it's really easy to max out a core of the backend you're connected as. There's many trivial queries you can use to do that. An example would be: SELECT sum(i) from generate_series(1, 10) i; So I don't think it makes sense to worry about an attacker using a high compression level as a means to DoS the server. Sending a few of the above queries seems much easier.
Re: libpq compression (part 3)
On Fri, 17 May 2024 at 23:40, Robert Haas wrote: > To be clear, I am not arguing that it should be the receiver's choice. > I'm arguing it should be the client's choice, which means the client > decides what it sends and also tells the server what to send to it. > I'm open to counter-arguments, but as I've thought about this more, > I've come to the conclusion that letting the client control the > behavior is the most likely to be useful and the most consistent with > existing facilities. I think we're on the same page based on the rest > of your email: I'm just clarifying. +1 > I have some quibbles with the syntax but I agree with the concept. > What I'd probably do is separate the server side thing into two GUCs, > each with a list of algorithms, comma-separated, like we do for other > lists in postgresql.conf. Maybe make the default 'all' meaning > "everything this build of the server supports". On the client side, > I'd allow both things to be specified using a single option, because > wanting to do the same thing in both directions will be common, and > you actually have to type in connection strings sometimes, so > verbosity matters more. > > As far as the format of the value for that keyword, what do you think > about either compression=DO_THIS_BOTH_WAYS or > compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do > this" being a specification of the same form already accepted for > server-side compression e.g. gzip or gzip:level=9? If you don't like > that, why do you think the proposal you made above is better, and why > is that one now punctuated with slashes instead of semicolons? +1
Re: Speed up clean meson builds by ~25%
On 17.05.24 23:01, Robert Haas wrote: On Fri, May 17, 2024 at 4:59 PM Tom Lane wrote: As I mentioned upthread, I'm more worried about confusing error reports than the machine time. Well, Jelte fixed that. I grant that there are people who are more affected, but still, I'd just as soon not contort the build rules for a temporary problem. Arguably by doing this, but I don't think it's enough of a contortion to get excited about. Anyone else want to vote? I retested the patch from 2024-04-07 (I think that's the one that "fixed that"? There are multiple "v1" patches in this thread.) using gcc-14 and clang-18, with ccache disabled of course. The measured effects of the patch are: gcc-14: 1% slower clang-18: 3% faster So with that, it doesn't seem very interesting.
Re: pg_combinebackup does not detect missing files
On 5/17/24 22:20, Robert Haas wrote: On Fri, May 17, 2024 at 1:18 AM David Steele wrote: However, I think allowing the user to optionally validate the input would be a good feature. Running pg_verifybackup as a separate step is going to be a more expensive then verifying/copying at the same time. Even with storage tricks to copy ranges of data, pg_combinebackup is going to aware of files that do not need to be verified for the current operation, e.g. old copies of free space maps. In cases where pg_combinebackup reuses a checksums from the input manifest rather than recomputing it, this could accomplish something. However, for any file that's actually reconstructed, pg_combinebackup computes the checksum as it's writing the output file. I don't see how it's sensible to then turn around and verify that the checksum that we just computed is the same one that we now get. Here's an example. First make a few backups: $ pg_basebackup -c fast -X none -D test/backup/full -F plain $ pg_basebackup -c fast -D test/backup/incr1 -F plain -i /home/dev/test/backup/full/backup_manifest Then intentionally corrupt a file in the incr backup: $ truncate -s 0 test/backup/incr1/base/5/3764_fsm In this case pg_verifybackup will error: $ pg_verifybackup test/backup/incr1 pg_verifybackup: error: "base/5/3764_fsm" has size 0 on disk but size 24576 in the manifest But pg_combinebackup does not complain: $ pg_combinebackup test/backup/full test/backup/incr1 -o test/backup/combine $ ls -lah test/backup/combine/base/5/3764_fsm -rw--- 1 dev dialout 0 May 17 22:08 test/backup/combine/base/5/3764_fsm It would be nice if pg_combinebackup would (at least optionally but prefferrably by default) complain in this case rather than the user needing to separately run pg_verifybackup. Regards, -David
Re: libpq compression (part 3)
On Fri, May 17, 2024 at 4:53 PM Jacob Burroughs wrote: > I think I was more thinking that trying to let both parties control > the parameter seemed like a recipe for confusion and sadness, and so > the choice that felt most natural to me was to let the sender control > it, but I'm definitely open to changing that the other way around. To be clear, I am not arguing that it should be the receiver's choice. I'm arguing it should be the client's choice, which means the client decides what it sends and also tells the server what to send to it. I'm open to counter-arguments, but as I've thought about this more, I've come to the conclusion that letting the client control the behavior is the most likely to be useful and the most consistent with existing facilities. I think we're on the same page based on the rest of your email: I'm just clarifying. > On the server side, we use slash separated sets of options > connection_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS/client_to_server=OVERRIDE_FOR_THIS_DIRECTION/server_to_client=OVERRIDE_FOR_THIS_DIRECTION > with the values being semicolon separated compression algorithms. > On the client side, you can specify > compression=, > but on the client side you can actually specify compression options, > which the server will use if provided, and otherwise it will fall back > to defaults. I have some quibbles with the syntax but I agree with the concept. What I'd probably do is separate the server side thing into two GUCs, each with a list of algorithms, comma-separated, like we do for other lists in postgresql.conf. Maybe make the default 'all' meaning "everything this build of the server supports". On the client side, I'd allow both things to be specified using a single option, because wanting to do the same thing in both directions will be common, and you actually have to type in connection strings sometimes, so verbosity matters more. As far as the format of the value for that keyword, what do you think about either compression=DO_THIS_BOTH_WAYS or compression=DO_THIS_WHEN_SENDING/DO_THIS_WHEN_RECEIVING, with each "do this" being a specification of the same form already accepted for server-side compression e.g. gzip or gzip:level=9? If you don't like that, why do you think the proposal you made above is better, and why is that one now punctuated with slashes instead of semicolons? > If we think we need to, we could let the server specify defaults for > server-side compression. My overall thought though is that having an > excessive number of knobs increases the surface area for testing and > bugs while also increasing potential user confusion and that allowing > configuration on *both* sides doesn't seem sufficiently useful to be > worth adding that complexity. I agree. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Fix log_line_prefix to display the transaction id (%x) for statements not in a transaction block
On Thu, May 16, 2024 at 08:37:43AM -0400, Robert Haas wrote: > On Thu, May 16, 2024 at 6:01 AM Quan Zongliang wrote: > > I thought about writing statement log when xid assigned. But it's seemed > > too complicated. > > I'm inclined to keep it for a while. Until we find a good way or give > > up. It's a reasonable request, after all. > > I don't think it's reasonable at all. We can't log the XID before it's > assigned, and we can't log the statement after the XID is assigned > without completely changing how the parameter works. I have removed the TODO item. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: libpq compression (part 3)
On Fri, May 17, 2024 at 1:53 PM Jacob Burroughs wrote: > New proposal, predicated on the assumption that if you enable > compression you are ok with the client picking whatever level they > want. At least with the currently enabled algorithms I don't think > any of them are so insane that they would knock over a server or > anything, and in general postgres servers are usually connected to by > clients that the server admin has some channel to talk to (after all > they somehow had to get access to log in to the server in the first > place) if they are doing something wasteful, given that a client can > do a lot worse things than enable aggressive compression by writing > bad queries. We're talking about a transport-level option, though -- I thought the proposal enabled compression before authentication completed? Or has that changed? (I'm suspicious of arguments that begin "well you can already do bad things", anyway... It seems like there's a meaningful difference between consuming resources running a parsed query and consuming resources trying to figure out what the parsed query is. I don't know if the solution is locking in a compression level, or something else; maybe they're both reasonably mitigated in the same way. I haven't really looked into zip bombs much.) --Jacob
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi, Mark! > > At the first glance it's not clear to me: > > - why your test creates cross-page unique constraint violations? > > To see if they are detected. > > > - how do you know they are not detected? > > It appears that they are detected. At least, rerunning the test after > adjusting the expected output, I no longer see problems. > I understand your point. It was unclear how it modified the index so that only unique constraint check between pages should have failed with other checks passed. Anyway, thanks for your testing and efforts! I'm happy that the test now passes and confirms that amcheck feature works as intended. Kind regards, Pavel Borisov
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 3:51 PM Laurenz Albe wrote: >> I think it is a good rule. I don't think that it shouldn't lead to putting >> people on the pillory or kicking their patches, but I imagine that a >> committer >> looking for somebody else's patch to work on could prefer patches by people >> who are doing their share of reviews. > If you give me an automated way to find that out, I'll consider paying > some attention to it. Yeah, I can't imagine that any committer (or reviewer, really) is doing any such thing, because it would take far too much effort to figure out how much work anyone else is doing. I see CFMs reminding everybody that this rule exists, but I don't think they ever try to check it either. It's pretty much the honor system, and I'm sure some folk ignore it. regards, tom lane
Re: Speed up clean meson builds by ~25%
On Fri, May 17, 2024 at 4:59 PM Tom Lane wrote: > As I mentioned upthread, I'm more worried about confusing error > reports than the machine time. Well, Jelte fixed that. > I grant that there are people who are more affected, but still, I'd > just as soon not contort the build rules for a temporary problem. Arguably by doing this, but I don't think it's enough of a contortion to get excited about. Anyone else want to vote? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Speed up clean meson builds by ~25%
Robert Haas writes: > If this is the consensus opinion, then > https://commitfest.postgresql.org/48/4914/ should be marked Rejected. > However, while I think the improvements that Tom was able to make here > sound fantastic, I don't understand why that's an argument against > Jelte's patches. After all, Tom's work will only go into v18, but this > patch could be adopted in v17 and back-patched to all releases that > support meson builds, saving oodles of compile time for as long as > those releases are supported. The most obvious beneficiary of that > course of action would seem to be Tom himself, since he back-patches > more fixes than anybody, last I checked, but it'd be also be useful to > get slightly quicker results from the buildfarm and slightly quicker > results for anyone using CI on back-branches and for other hackers who > are looking to back-patch bug fixes. I don't quite understand why we > want to throw those potential benefits out the window just because we > have a better fix for the future. As I mentioned upthread, I'm more worried about confusing error reports than the machine time. It would save me personally exactly nada, since (a) I usually develop with gcc not clang, (b) when I do use clang it's not a heavily-affected version, and (c) since we *very* seldom change the grammar in stable branches, ccache will hide the problem pretty effectively anyway in the back branches. (If you're not using ccache, please don't complain about build time.) I grant that there are people who are more affected, but still, I'd just as soon not contort the build rules for a temporary problem. regards, tom lane
Re: libpq compression (part 3)
On Thu, May 16, 2024 at 3:28 AM Robert Haas wrote: > > Well, I mean, I don't really know what the right answer is here, but > right now I can say pg_dump --compress=gzip to compress the dump with > gzip, or pg_dump --compress=gzip:9 to compress with gzip level 9. Now, > say that instead of compressing the output, I want to compress the > data sent to me over the connection. So I figure I should be able to > say pg_dump 'compress=gzip' or pg_dump 'compress=gzip:9'. I think you > want to let me do the first of those but not the second. But, to turn > your question on its head, what would be the reasoning behind such a > restriction? I think I was more thinking that trying to let both parties control the parameter seemed like a recipe for confusion and sadness, and so the choice that felt most natural to me was to let the sender control it, but I'm definitely open to changing that the other way around. > Note also the precedent of pg_basebackup. I can say pg_basebackup > --compress=server-gzip:9 to ask the server to compress the backup with > gzip at level 9. In that case, what I request from the server changes > the actual output that I get, which is not the case here. Even so, I > don't really understand what the justification would be for refusing > to let the client ask for a specific compression level. > > And on the flip side, I also don't understand why the server would > want to mandate a certain compression level. If compression is very > expensive for a certain algorithm when the level is above some > threshold X, we could have a GUC to limit the maximum level that the > client can request. But, given that the gzip compression level > defaults to 6 in every other context, why would the administrator of a > particular server want to say, well, the default for my server is 3 or > 9 or whatever? > > (This is of course all presuming you want to use gzip at all, which > you probably don't, because gzip is crazy slow. Use lz4 or zstd! But > it makes the point.) New proposal, predicated on the assumption that if you enable compression you are ok with the client picking whatever level they want. At least with the currently enabled algorithms I don't think any of them are so insane that they would knock over a server or anything, and in general postgres servers are usually connected to by clients that the server admin has some channel to talk to (after all they somehow had to get access to log in to the server in the first place) if they are doing something wasteful, given that a client can do a lot worse things than enable aggressive compression by writing bad queries. On the server side, we use slash separated sets of options connection_compression=DEFAULT_VALUE_FOR_BOTH_DIRECTIONS/client_to_server=OVERRIDE_FOR_THIS_DIRECTION/server_to_client=OVERRIDE_FOR_THIS_DIRECTION with the values being semicolon separated compression algorithms. On the client side, you can specify compression=, but on the client side you can actually specify compression options, which the server will use if provided, and otherwise it will fall back to defaults. If we think we need to, we could let the server specify defaults for server-side compression. My overall thought though is that having an excessive number of knobs increases the surface area for testing and bugs while also increasing potential user confusion and that allowing configuration on *both* sides doesn't seem sufficiently useful to be worth adding that complexity. -- Jacob Burroughs | Staff Software Engineer E: jburrou...@instructure.com
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
pá 17. 5. 2024 v 22:05 odesílatel Pavel Stehule napsal: > > > pá 17. 5. 2024 v 21:50 odesílatel Andres Freund > napsal: > >> Hi, >> >> On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote: >> > after migration on PostgreSQL 16 I seen 3x times (about every week) >> broken >> > tables on replica nodes. The query fails with error >> >> Migrating from what version? >> > > I think 14, but it should be verified tomorrow > upgrade was from 15.2 > >> >> You're saying that the data is correctly accessible on primaries, but >> broken >> on standbys? Is there any difference in how the page looks like on the >> primary >> vs standby? >> > > I saved one page from master and standby. Can I send it privately? There > are some private data (although not too sensitive) > > >> >> >> > ERROR: could not access status of transaction 1442871302 >> > DETAIL: Could not open file "pg_xact/0560": No such file or directory >> > >> > verify_heapam reports >> > >> > ^[[Aprd=# select * from verify_heapam('account_login_history') where >> blkno >> > = 179036; >> > blkno | offnum | attnum |msg >> > >> > >> +++--- >> > 179036 | 30 || xmin 1393743382 precedes oldest valid >> > transaction ID 3:1687012112 >> >> So that's not just a narrow race... >> >> >> > master >> > >> > (2024-05-17 14:36:57) prd=# SELECT * FROM >> > page_header(get_raw_page('account_login_history', 179036)); >> > lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ >> > version │ prune_xid >> > >> ───┼──┼───┼───┼───┼─┼──┼─┼─── >> > A576/810F4CE0 │0 │ 4 │ 296 │ 296 │8192 │ 8192 │ >> > 4 │ 0 >> > (1 row) >> > >> > >> > replica >> > prd_aukro=# SELECT * FROM >> page_header(get_raw_page('account_login_history', >> > 179036)); >> > lsn | checksum | flags | lower | upper | special | pagesize | >> > version | prune_xid >> > >> ---+--+---+---+---+-+--+-+--- >> > A56C/63979DA0 |0 | 0 | 296 | 296 |8192 | 8192 | >> > 4 | 0 >> > (1 row) >> >> Is the replica behind the primary? Or did we somehow end up with diverging >> data? The page LSNs differ by about 40GB... >> >> Is there evidence of failed truncations of the relation in the log? From >> autovacuum? >> > > no I did not see these bugs, > >> >> Does the data in the readable versions of the tuples on that page actually >> look valid? Is it possibly duplicated data? >> > > looks well, I didn't see any strange content > >> >> >> I'm basically wondering whether it's possible that we errored out during >> truncation (e.g. due to a file permission issue or such). Due to some >> brokenness in RelationTruncate() that can lead to data divergence between >> primary and standby and to old tuples re-appearing on either. >> >> >> Another question: Do you use pg_repack or such? >> > > pg_repack was used 2 months before migration > > > >> >> Greetings, >> >> Andres Freund >> >
Re: First draft of PG 17 release notes
On Thu, 2024-05-09 at 00:03 -0400, Bruce Momjian wrote: > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html For this item: Create a "builtin" collation provider similar to libc's C locale (Jeff Davis) It uses a "C" locale which is identical but independent of libc, but it allows the use of non-"C" collations like "en_US" and "C.UTF-8" with the "C" locale, which libc does not. MORE? I suggest something more like: New, platform-independent "builtin" collation provider. (Jeff Davis) Currently, it offers the "C" and "C.UTF-8" locales. The "C.UTF-8" locale combines stable and fast code point order collation with Unicode character semantics. Regards, Jeff Davis
Re: commitfest.postgresql.org is no longer fit for purpose
On 5/17/24 21:59, Robert Haas wrote: > On Fri, May 17, 2024 at 3:51 PM Laurenz Albe wrote: >> On Fri, 2024-05-17 at 13:12 -0400, Greg Sabino Mullane wrote: Long time ago there was a "rule" that people submitting patches are expected to do reviews. Perhaps we should be more strict this. >>> >>> Big -1. How would we even be more strict about this? Public shaming? >>> Withholding a commit? >> >> I think it is a good rule. I don't think that it shouldn't lead to putting >> people on the pillory or kicking their patches, but I imagine that a >> committer >> looking for somebody else's patch to work on could prefer patches by people >> who are doing their share of reviews. > Yeah, I don't have any particular idea how should the rule be "enforced" and I certainly did not imagine public shaming or anything like that. My thoughts were more about reminding people the reviews are part of the deal, that's it ... maybe "more strict" was not quite what I meant. > If you give me an automated way to find that out, I'll consider paying > some attention to it. However, in order to sort the list of patches > needing review by the amount of review done by the patch author, we'd > first need to have a list of patches needing review. > > And right now we don't, or at least not in any usable way. > commitfest.postgresql.org is supposed to give us that, but it doesn't. > It'd certainly help to know which patches to consider for review, but I guess I'd still look at patches from people doing more reviews first, even if I had to find out in what shape the patch is. I'm far more skeptical about "automated way" to track this, though. I'm not sure it's quite possible - reviews can have a lot of very different forms, and deciding what is or is not a review is pretty subjective. So it's not clear how would we quantify that. Not to mention I'm sure we'd promptly find ways to game that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Comments on Custom RMGRs
On Fri, May 17, 2024 at 4:20 PM Jeff Davis wrote: > Regarding this particular change: the checkpointing hook seems more > like a table AM feature, so I agree with you that we should have a good > idea how a real table AM might use this, rather than only > pg_stat_statements. I would even be OK with a pg_stat_statements example that is fully working and fully explained. I just don't want to have no example at all. The original proposal has been changed twice because of complaints that the hook wasn't quite useful enough, but I think that only proves that v3 is closer to being useful than v1. If v1 is 40% of the way to useful and v3 is 120% of the way to useful, wonderful! But if v1 is 20% of the way to being useful and v3 is 60% of the way to being useful, it's not time to commit anything yet. I don't know which is the case, and I think if someone wants this to be committed, they need to explain clearly why it's the first and not the second. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Comments on Custom RMGRs
On Fri, 2024-05-17 at 14:56 -0400, Robert Haas wrote: > (2) a detailed > description of how a non-core table AM or index AM is expected to be > able to make use of this. Bonus points if the person providing that > rationale can say credibly that they've actually implemented this and > it works great with 100TB of production data. That's a chicken-and-egg problem and we should be careful about setting the bar too high for table AM improvements. Ultimately, AM authors will benefit more from a steady stream of improvements that sometimes miss the mark than complete stagnation, as long as we use reasonable judgement. There aren't a lot of table AMs, and to create a good one you need a lot of internals knowledge. If it's an important AM, the developers are surely going to try it out on mainline occasionally, and expect API breaks. If the API breaks for them in some fundamental way, they can complain and we still have time to fix it. > The problem here is not only that we don't want to commit a hook that > does nothing useful. We also don't want to commit a hook that works > wonderfully for someone but we have no idea why. If we do that, then > we don't know whether it's OK to modify the hook in the future as the > code evolves, or more to the point, which kinds of modifications will > be acceptable. We have to have some kind of understanding between us and AM authors that they need to participate in discussions when using these APIs, try changes during development, be adaptable when they change from release to release, and come back and tell us when something is wrong. > And also, the next person who wants to use it is likely > to have to figure out all on their own how to do so, instead of being > able to refer to comments or documentation or the commit message or > at > least a mailing list post. Obviously it would be better to have a nice example table AM in /contrib, different enough from heap, but nobody has done that yet. You could argue that we never should have exposed the API without something like this in the first place, but that's also a big ask and we'd probably still not have it. Regarding this particular change: the checkpointing hook seems more like a table AM feature, so I agree with you that we should have a good idea how a real table AM might use this, rather than only pg_stat_statements. Regards, Jeff Davis
Re: Speed up clean meson builds by ~25%
On Wed, Apr 17, 2024 at 11:11 PM Tom Lane wrote: > I think we should hold off on this. I found a simpler way to address > ecpg's problem than what I sketched upthread. I have a not-ready-to- > show-yet patch that allows the vast majority of ecpg's grammar > productions to use the default semantic action. Testing on my M1 > Macbook with clang 16.0.6 from MacPorts, I see the compile time for > preproc.o in HEAD as about 1m44 sec; but with this patch, 0.6 sec. If this is the consensus opinion, then https://commitfest.postgresql.org/48/4914/ should be marked Rejected. However, while I think the improvements that Tom was able to make here sound fantastic, I don't understand why that's an argument against Jelte's patches. After all, Tom's work will only go into v18, but this patch could be adopted in v17 and back-patched to all releases that support meson builds, saving oodles of compile time for as long as those releases are supported. The most obvious beneficiary of that course of action would seem to be Tom himself, since he back-patches more fixes than anybody, last I checked, but it'd be also be useful to get slightly quicker results from the buildfarm and slightly quicker results for anyone using CI on back-branches and for other hackers who are looking to back-patch bug fixes. I don't quite understand why we want to throw those potential benefits out the window just because we have a better fix for the future. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Fri, May 17, 2024 at 4:10 PM Mark Dilger wrote: > The quick-and-dirty TAP test I wrote this morning is intended to introduce > duplicates across page boundaries, not to test for ones that got there by > normal database activity. In other words, the TAP test forcibly corrupts the > index by changing a value on one side of a boundary to be equal to the value > on the other side of the boundary. Prior to the corrupting action the values > were all unique. I understood that. I was just pointing out that an index that looks even somewhat like that is already quite unnatural. -- Peter Geoghegan
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On May 17, 2024, at 1:00 PM, Peter Geoghegan wrote: > > Many different parts of the B-Tree code will fight against allowing > duplicates of the same value to span multiple leaf pages -- this is > especially true for unique indexes. The quick-and-dirty TAP test I wrote this morning is intended to introduce duplicates across page boundaries, not to test for ones that got there by normal database activity. In other words, the TAP test forcibly corrupts the index by changing a value on one side of a boundary to be equal to the value on the other side of the boundary. Prior to the corrupting action the values were all unique. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Streaming read-ready sequential scan code
On Sat, May 18, 2024 at 1:00 AM Alexander Lakhin wrote: > I decided to compare v17 vs v16 performance (as I did the last year [1]) > and discovered that v17 loses to v16 in the pg_tpcds (s64da_tpcds) > benchmark, query15 (and several others, but I focused on this one): > Best pg-src-master--.* worse than pg-src-16--.* by 52.2 percents (229.84 > > 151.03): pg_tpcds.query15 > Average pg-src-master--.* worse than pg-src-16--.* by 53.4 percents (234.20 > > 152.64): pg_tpcds.query15 > Please look at the full html report attached in case you're interested. > > (I used my pg-mark tool to measure/analyze performance, but I believe the > same results can be seen without it.) Will investigate, but if it's easy for you to rerun, does it help if you increase Linux readahead, eg blockdev --setra setting?
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On May 17, 2024, at 12:42 PM, Pavel Borisov wrote: > > At the first glance it's not clear to me: > - why your test creates cross-page unique constraint violations? To see if they are detected. > - how do you know they are not detected? It appears that they are detected. At least, rerunning the test after adjusting the expected output, I no longer see problems. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
pá 17. 5. 2024 v 21:50 odesílatel Andres Freund napsal: > Hi, > > On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote: > > after migration on PostgreSQL 16 I seen 3x times (about every week) > broken > > tables on replica nodes. The query fails with error > > Migrating from what version? > I think 14, but it should be verified tomorrow > > > You're saying that the data is correctly accessible on primaries, but > broken > on standbys? Is there any difference in how the page looks like on the > primary > vs standby? > I saved one page from master and standby. Can I send it privately? There are some private data (although not too sensitive) > > > > ERROR: could not access status of transaction 1442871302 > > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > > > verify_heapam reports > > > > ^[[Aprd=# select * from verify_heapam('account_login_history') where > blkno > > = 179036; > > blkno | offnum | attnum |msg > > > > > +++--- > > 179036 | 30 || xmin 1393743382 precedes oldest valid > > transaction ID 3:1687012112 > > So that's not just a narrow race... > > > > master > > > > (2024-05-17 14:36:57) prd=# SELECT * FROM > > page_header(get_raw_page('account_login_history', 179036)); > > lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ > > version │ prune_xid > > > ───┼──┼───┼───┼───┼─┼──┼─┼─── > > A576/810F4CE0 │0 │ 4 │ 296 │ 296 │8192 │ 8192 │ > > 4 │ 0 > > (1 row) > > > > > > replica > > prd_aukro=# SELECT * FROM > page_header(get_raw_page('account_login_history', > > 179036)); > > lsn | checksum | flags | lower | upper | special | pagesize | > > version | prune_xid > > > ---+--+---+---+---+-+--+-+--- > > A56C/63979DA0 |0 | 0 | 296 | 296 |8192 | 8192 | > > 4 | 0 > > (1 row) > > Is the replica behind the primary? Or did we somehow end up with diverging > data? The page LSNs differ by about 40GB... > > Is there evidence of failed truncations of the relation in the log? From > autovacuum? > no I did not see these bugs, > > Does the data in the readable versions of the tuples on that page actually > look valid? Is it possibly duplicated data? > looks well, I didn't see any strange content > > > I'm basically wondering whether it's possible that we errored out during > truncation (e.g. due to a file permission issue or such). Due to some > brokenness in RelationTruncate() that can lead to data divergence between > primary and standby and to old tuples re-appearing on either. > > > Another question: Do you use pg_repack or such? > pg_repack was used 2 months before migration > > Greetings, > > Andres Freund >
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
On Fri, May 17, 2024 at 3:50 PM Andres Freund wrote: > You're saying that the data is correctly accessible on primaries, but broken > on standbys? Is there any difference in how the page looks like on the primary > vs standby? There clearly is. The relevant infomask bits are different. I didn't examine it much closer than that, though. -- Peter Geoghegan
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
On Fri, May 17, 2024 at 3:42 PM Mark Dilger wrote: > On further review, the test was not anticipating the error message "high key > invariant violated for index". That wasn't seen in calls to > bt_index_parent_check(), but appears as one of the errors from > bt_index_check(). I am rerunning the test now Many different parts of the B-Tree code will fight against allowing duplicates of the same value to span multiple leaf pages -- this is especially true for unique indexes. For example, nbtsplitloc.c has a variety of strategies that will prevent choosing a split point that necessitates including a distinguishing heap TID in the new high key. In other words, nbtsplitloc.c is very aggressive about picking a split point between (rather than within) groups of duplicates. Of course it's still *possible* for a unique index to have multiple leaf pages containing the same individual value. The regression tests do have coverage for certain relevant code paths (e.g., there is coverage for code paths only hit when _bt_check_unique has to go to the page to the right). This is only the case because I went out of my way to make sure of it, by adding tests that allow a huge number of version duplicates to accumulate within a unique index. (The "move right" _bt_check_unique branches had zero test coverage for a year or two.) Just how important it is that amcheck covers cases where the version duplicates span multiple leaf pages is of course debatable -- it's always better to be more thorough, when practical. But it's certainly something that needs to be assessed based on the merits. -- Peter Geoghegan
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 3:51 PM Laurenz Albe wrote: > On Fri, 2024-05-17 at 13:12 -0400, Greg Sabino Mullane wrote: > > > Long time ago there was a "rule" that people submitting patches are > > > expected > > > to do reviews. Perhaps we should be more strict this. > > > > Big -1. How would we even be more strict about this? Public shaming? > > Withholding a commit? > > I think it is a good rule. I don't think that it shouldn't lead to putting > people on the pillory or kicking their patches, but I imagine that a committer > looking for somebody else's patch to work on could prefer patches by people > who are doing their share of reviews. If you give me an automated way to find that out, I'll consider paying some attention to it. However, in order to sort the list of patches needing review by the amount of review done by the patch author, we'd first need to have a list of patches needing review. And right now we don't, or at least not in any usable way. commitfest.postgresql.org is supposed to give us that, but it doesn't. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Propagate sanity checks of ProcessUtility() to standard_ProcessUtility()?
On Thu, Feb 29, 2024 at 2:21 AM Michael Paquier wrote: > It's been brought to me that an extension may finish by breaking the > assumptions ProcessUtility() relies on when calling > standard_ProcessUtility(), causing breakages when passing down data to > cascading utility hooks. > > Isn't the state of the arguments given something we should check not > only in the main entry point ProcessUtility() but also in > standard_ProcessUtility(), to prevent issues if an extension > incorrectly manipulates the arguments it needs to pass down to other > modules that use the utility hook, like using a NULL query string? I can't imagine a scenario where this change saves more than 5 minutes of debugging, so I'd rather leave things the way they are. If you do this, then people will see the macro and have to go look at what it does, whereas right now, they can see the assertions themselves, which is better. The usual pattern for using hooks like this is to call the next implementation, or the standard implementation, and pass down the arguments that you got. If you try to do that and mess it up, you're going to get a crash really, really quickly and it shouldn't be very difficult at all to figure out what you did wrong. Honestly, that doesn't seem like it would be hard even without the assertions: for the most part, a quick glance at the stack backtrace ought to be enough. If you're trying to do something more sophisticated, like mutating the node tree before passing it down, the chances of your mistakes getting caught by these assertions are pretty darn low, since they're very superficial checks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 2024-05-17 at 13:12 -0400, Greg Sabino Mullane wrote: > > Long time ago there was a "rule" that people submitting patches are expected > > to do reviews. Perhaps we should be more strict this. > > Big -1. How would we even be more strict about this? Public shaming? > Withholding a commit? I think it is a good rule. I don't think that it shouldn't lead to putting people on the pillory or kicking their patches, but I imagine that a committer looking for somebody else's patch to work on could prefer patches by people who are doing their share of reviews. Yours, Laurenz Albe
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
Hi, On 2024-05-17 15:12:31 +0200, Pavel Stehule wrote: > after migration on PostgreSQL 16 I seen 3x times (about every week) broken > tables on replica nodes. The query fails with error Migrating from what version? You're saying that the data is correctly accessible on primaries, but broken on standbys? Is there any difference in how the page looks like on the primary vs standby? > ERROR: could not access status of transaction 1442871302 > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > verify_heapam reports > > ^[[Aprd=# select * from verify_heapam('account_login_history') where blkno > = 179036; > blkno | offnum | attnum |msg > > +++--- > 179036 | 30 || xmin 1393743382 precedes oldest valid > transaction ID 3:1687012112 So that's not just a narrow race... > master > > (2024-05-17 14:36:57) prd=# SELECT * FROM > page_header(get_raw_page('account_login_history', 179036)); > lsn │ checksum │ flags │ lower │ upper │ special │ pagesize │ > version │ prune_xid > ───┼──┼───┼───┼───┼─┼──┼─┼─── > A576/810F4CE0 │0 │ 4 │ 296 │ 296 │8192 │ 8192 │ > 4 │ 0 > (1 row) > > > replica > prd_aukro=# SELECT * FROM page_header(get_raw_page('account_login_history', > 179036)); > lsn | checksum | flags | lower | upper | special | pagesize | > version | prune_xid > ---+--+---+---+---+-+--+-+--- > A56C/63979DA0 |0 | 0 | 296 | 296 |8192 | 8192 | > 4 | 0 > (1 row) Is the replica behind the primary? Or did we somehow end up with diverging data? The page LSNs differ by about 40GB... Is there evidence of failed truncations of the relation in the log? From autovacuum? Does the data in the readable versions of the tuples on that page actually look valid? Is it possibly duplicated data? I'm basically wondering whether it's possible that we errored out during truncation (e.g. due to a file permission issue or such). Due to some brokenness in RelationTruncate() that can lead to data divergence between primary and standby and to old tuples re-appearing on either. Another question: Do you use pg_repack or such? Greetings, Andres Freund
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi, Mark! On Fri, 17 May 2024 at 23:10, Mark Dilger wrote: > > > > On May 17, 2024, at 11:51 AM, Pavel Borisov > wrote: > > > > Amcheck with checkunique option does check uniqueness violation between > pages. But it doesn't warranty detection of cross page uniqueness > violations in extremely rare cases when the first equal index entry on the > next page corresponds to tuple that is not visible (e.g. dead). In this, I > followed the Peter's notion [1] that checking across a number of dead equal > entries that could theoretically span even across many pages is an unneeded > code complication and amcheck is not a tool that provides any warranty when > checking an index. > > This confuses me a bit. The regression test creates a table and index but > never performs any DELETE nor any UPDATE operations, so none of the index > entries should be dead. If I am understanding you correct, I'd be forced > to conclude that the uniqueness checking code is broken. Can you take a > look? > At the first glance it's not clear to me: - why your test creates cross-page unique constraint violations? - how do you know they are not detected?
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On May 17, 2024, at 12:10 PM, Mark Dilger > wrote: > >> Amcheck with checkunique option does check uniqueness violation between >> pages. But it doesn't warranty detection of cross page uniqueness violations >> in extremely rare cases when the first equal index entry on the next page >> corresponds to tuple that is not visible (e.g. dead). In this, I followed >> the Peter's notion [1] that checking across a number of dead equal entries >> that could theoretically span even across many pages is an unneeded code >> complication and amcheck is not a tool that provides any warranty when >> checking an index. > > This confuses me a bit. The regression test creates a table and index but > never performs any DELETE nor any UPDATE operations, so none of the index > entries should be dead. If I am understanding you correct, I'd be forced to > conclude that the uniqueness checking code is broken. Can you take a look? On further review, the test was not anticipating the error message "high key invariant violated for index". That wasn't seen in calls to bt_index_parent_check(), but appears as one of the errors from bt_index_check(). I am rerunning the test now — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
allow sorted builds for btree_gist
Hi, I've been looking at GiST to see if there could be a good way to do parallel builds - and there might be, if the opclass supports sorted builds, because then we could parallelize the sort. But then I noticed we support this mode only for point_ops, because that's the only opclass with sortsupport procedure. Which mostly makes sense, because types like geometries, polygons, ... do not have a well defined order. Still, we have btree_gist, and I don't think there's much reason to not support sorted builds for those opclasses, where the gist opclass is defined on top of btree ordering semantics. So this patch does that, and the difference (compared to builds with buiffering=on) can easily be an order of magnitude - at least that's what my tests show: test=# create index on test_int8 using gist (a) with (buffering = on); CREATE INDEX Time: 578799.450 ms (09:38.799) test=# create index on test_int8 using gist (a) with (buffering = auto); CREATE INDEX Time: 53022.593 ms (00:53.023) test=# create index on test_uuid using gist (a) with (buffering = on); CREATE INDEX Time: 39322.799 ms (00:39.323) test=# create index on test_uuid using gist (a) with (buffering = auto); CREATE INDEX Time: 6466.341 ms (00:06.466) The WIP patch adds enables this for data types with a usable sortsupport procedure, which excludes time, timetz, cash, interval, bit, vbit, bool, enum and macaddr8. I assume time, timetz and macaddr8 could be added, it's just that I didn't find any existing sortsupport procedure. Same for cash, but IIRC it's mostly deprecated. Of course, people probably don't use btree_gist with a single column, because they could just use btree. It's useful for multi-column GiST indexes, with data types requiring GiST. And if the other opclasses also allow sorted builds, this patch makes that possible. Of course, most "proper GiST opclasses" don't support that, but why not - it's easy. FWIW this is also why sorted builds likely are not a very practical way to do parallel builds for GiST - it would help only with a small part of cases, I think. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 011e6eee923a6f51668f3277f470d32922923d17 Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Fri, 17 May 2024 20:36:08 +0200 Subject: [PATCH v20240517] allow sorted builds for btree_gist --- contrib/btree_gist/Makefile | 2 +- contrib/btree_gist/btree_gist--1.8--1.9.sql | 82 + contrib/btree_gist/btree_gist.control | 2 +- 3 files changed, 84 insertions(+), 2 deletions(-) create mode 100644 contrib/btree_gist/btree_gist--1.8--1.9.sql diff --git a/contrib/btree_gist/Makefile b/contrib/btree_gist/Makefile index 7ac2df26c10..68190ac5e46 100644 --- a/contrib/btree_gist/Makefile +++ b/contrib/btree_gist/Makefile @@ -34,7 +34,7 @@ DATA = btree_gist--1.0--1.1.sql \ btree_gist--1.1--1.2.sql btree_gist--1.2.sql btree_gist--1.2--1.3.sql \ btree_gist--1.3--1.4.sql btree_gist--1.4--1.5.sql \ btree_gist--1.5--1.6.sql btree_gist--1.6--1.7.sql \ - btree_gist--1.7--1.8.sql + btree_gist--1.7--1.8.sql btree_gist--1.8--1.9.sql PGFILEDESC = "btree_gist - B-tree equivalent GiST operator classes" REGRESS = init int2 int4 int8 float4 float8 cash oid timestamp timestamptz \ diff --git a/contrib/btree_gist/btree_gist--1.8--1.9.sql b/contrib/btree_gist/btree_gist--1.8--1.9.sql new file mode 100644 index 000..3fd6f33f41b --- /dev/null +++ b/contrib/btree_gist/btree_gist--1.8--1.9.sql @@ -0,0 +1,82 @@ +/* contrib/btree_gist/btree_gist--1.8--1.9.sql */ + +-- complain if script is sourced in psql, rather than via CREATE EXTENSION +\echo Use "ALTER EXTENSION btree_gist UPDATE TO '1.9'" to load this file. \quit + +ALTER OPERATOR FAMILY gist_oid_ops USING gist ADD + FUNCTION 11 (oid, oid) btoidsortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_int2_ops USING gist ADD + FUNCTION 11 (int2, int2) btint2sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_int4_ops USING gist ADD + FUNCTION 11 (int4, int4) btint4sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_int8_ops USING gist ADD + FUNCTION 11 (int8, int8) btint8sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_float4_ops USING gist ADD + FUNCTION 11 (float4, float4) btfloat4sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_float8_ops USING gist ADD + FUNCTION 11 (float8, float8) btfloat8sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_timestamp_ops USING gist ADD + FUNCTION 11 (timestamp, timestamp) timestamp_sortsupport (internal) ; + +ALTER OPERATOR FAMILY gist_timestamptz_ops USING gist ADD + FUNCTION 11 (timestamptz, timestamptz) timestamp_sortsupport (internal) ; + +-- ALTER OPERATOR FAMILY gist_time_ops USING gist ADD +-- FUNCTION 12 (time, time) gist_stratnum_btree (int2) ; + +ALTER OPERATOR FAMILY gist_date_ops USING gist ADD + FUNCTION 11 (date, date) date_sortsupport (internal) ; + +-- ALTER OPERATOR FAMILY
Re: problems with "Shared Memory and Semaphores" section of docs
> postgres -D pgdev-dev -c shared_buffers=16MB -C > shared_memory_size_in_huge_pages > 13 > postgres -D pgdev-dev -c shared_buffers=16MB -c huge_page_size=1GB -C > shared_memory_size_in_huge_pages > 1 > Which is very useful to be able to actually configure that number of huge > pages. I don't think a system view or such would not help here. Oops. Totally missed the -C flag. Thanks for clarifying! Regards, Sami
Re: problems with "Shared Memory and Semaphores" section of docs
Hi, On 2024-05-17 18:30:08 +, Imseih (AWS), Sami wrote: > > The advantage of the GUC is that its value could be seen before trying to > > actually start the server. > > Only if they have a sample in postgresql.conf file, right? > A GUC like shared_memory_size_in_huge_pages will not be. You can query gucs with -C. E.g. postgres -D pgdev-dev -c shared_buffers=16MB -C shared_memory_size_in_huge_pages 13 postgres -D pgdev-dev -c shared_buffers=16MB -c huge_page_size=1GB -C shared_memory_size_in_huge_pages 1 Which is very useful to be able to actually configure that number of huge pages. I don't think a system view or such would not help here. Greetings, Andres Freund
Re: problems with "Shared Memory and Semaphores" section of docs
On Fri, May 17, 2024 at 06:30:08PM +, Imseih (AWS), Sami wrote: >> The advantage of the GUC is that its value could be seen before trying to >> actually start the server. > > Only if they have a sample in postgresql.conf file, right? > A GUC like shared_memory_size_in_huge_pages will not be. shared_memory_size_in_huge_pages is computed at runtime and can be viewed with "postgres -C" before actually trying to start the server [0]. [0] https://www.postgresql.org/docs/devel/kernel-resources.html#LINUX-HUGE-PAGES -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, May 17, 2024 at 1:26 PM Jacob Burroughs wrote: > I was leaving the details around triggering that for this conversation > and in that patch just designing the messages in a way that would > allow adding the reconfiguration/restarting to be easily added in a > backwards-compatible way in a future patch. I would imagine that an > explicit `ParameterSet` call that sets `_pq_.connection_compression` > (or whatever the implementation details turn out to be) would also > trigger a compressor restart, and when restarted it would pick an > algorithm/configuration based on the new value of the parameter rather > than the one used at connection establishment. Hmm, OK, interesting. I suppose that I thought we were going to handle problems like this by just adding bespoke messages for each case (e.g. CompressionSet). I wasn't thinking it would be practical to make a message whose remit was as general as what it seems like Jelte wants to do with ParameterSet, because I'm not sure everything can be handled that simply. It's not an exact analogy, but when you want to stop the server, it's not enough to say that you want to change from the Running state to the Stopped state. You have to specify which type of shutdown should be used to make the transition. You could even have more complicated cases, where one side says "prepare to do X" and the other side eventually says "OK, I'm prepared" and the first side says "great, now activate X" and the other side eventually says "OK, it's activate, please confirm that you've also activated it from your side" and the first side eventually says "OK, I confirm that". I think the fear that we're going to run into cases where such complex handshaking is necessary is a major reason why I'm afraid of Jelte's ideas about ParameterSet: it seems much more opinionated to me than he seems to think it is. To the extent that I can wrap my head around what Jelte is proposing, and all signs point to that extent being quite limited, I suppose I was thinking of something like his option (2). That is, I assumed that a client would request all the optional features that said client might wish to use at connection startup time. But I can see how that assumption might be somewhat problematic in a connection-pooling environment. Still, at least to me, it seems better than trying to rely on GUC_REPORT. My opinion is (1) GUC_REPORT isn't a particularly well-designed mechanism so I dislike trying to double down on it and (2) trying to mix these protocol-level parameters and the transactional GUCs we have together in a single mechanism seems potentially problematic and (3) I'm still not particularly happy about the idea of making protocol parameters into GUCs in the first place. Perhaps these are all minority positions, but I can't tell you what everyone thinks, only what I think. -- Robert Haas EDB: http://www.enterprisedb.com
Re: problems with "Shared Memory and Semaphores" section of docs
On Fri, May 17, 2024 at 12:48:37PM -0500, Nathan Bossart wrote: > On Fri, May 17, 2024 at 01:09:55PM -0400, Tom Lane wrote: >> Nathan Bossart writes: >>> At a bare minimum, we should probably fix the obvious problems, but I >>> wonder if we could simplify this section a bit, too. >> >> Yup. "The definition of insanity is doing the same thing over and >> over and expecting different results." Time to give up on documenting >> these things in such detail. Anybody who really wants to know can >> look at the source code. > > Cool. I'll at least fix the back-branches as-is, but I'll see about > revamping this stuff for v18. Attached is probably the absolute least we should do for the back-branches. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 853104fa59bb1c219f02f71ece0d5106cb6c0588 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Fri, 17 May 2024 14:17:59 -0500 Subject: [PATCH v1 1/1] fix kernel resources docs on back-branches --- doc/src/sgml/runtime.sgml | 9 + 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 6047b8171d..883a849e6f 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -781,13 +781,13 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such SEMMNI Maximum number of semaphore identifiers (i.e., sets) -at least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) plus room for other applications +at least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 7) / 16) plus room for other applications SEMMNS Maximum number of semaphores system-wide -ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for other applications +ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 7) / 16) * 17 plus room for other applications @@ -838,7 +838,8 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (), allowed autovacuum worker process -() and allowed background +(), allowed WAL sender process +(), and allowed background process (), in sets of 16. Each such set will also contain a 17th semaphore which contains a magic @@ -852,7 +853,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such linkend="sysvipc-parameters"/>). The parameter SEMMNI determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at -least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16). +least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 7) / 16). Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded No space -- 2.25.1
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On May 17, 2024, at 11:51 AM, Pavel Borisov wrote: > > Amcheck with checkunique option does check uniqueness violation between > pages. But it doesn't warranty detection of cross page uniqueness violations > in extremely rare cases when the first equal index entry on the next page > corresponds to tuple that is not visible (e.g. dead). In this, I followed the > Peter's notion [1] that checking across a number of dead equal entries that > could theoretically span even across many pages is an unneeded code > complication and amcheck is not a tool that provides any warranty when > checking an index. This confuses me a bit. The regression test creates a table and index but never performs any DELETE nor any UPDATE operations, so none of the index entries should be dead. If I am understanding you correct, I'd be forced to conclude that the uniqueness checking code is broken. Can you take a look? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Comments on Custom RMGRs
On Fri, Mar 29, 2024 at 1:09 PM Jeff Davis wrote: > I am fine with this. > > You've moved the discussion forward in two ways: > > 1. Changes to pg_stat_statements to actually use the API; and > 2. The hook is called at multiple points. > > Those at least partially address the concerns raised by Andres and > Robert. But given that there was pushback from multiple people on the > feature, I'd like to hear from at least one of them. It's very late in > the cycle so I'm not sure we'll get more feedback in time, though. In my seemingly-neverending pass through the July CommitFest, I reached this patch. My comment is: it's possible that rmgr_003.v3.patch is enough to be useful, but does anyone in the world think they know that for a fact? I mean, pgss_001.v1.patch purports to demonstrate that it can be used, but that's based on rmgr_003.v2.patch, not the v3 patch, and the emails seem to indicate that it may not actually work. I also think, looking at it, that it looks much more like a POC than something we'd consider ready for commit. It also seems very unclear that we'd want pg_stat_statements to behave this way, and indeed "this way" isn't really spelled out anywhere. I think it would be nice if we had an example that uses the proposed hook that we could actually commit. Maybe that's asking too much, though. I think the minimum thing we need is a compelling rationale for why this particular hook design is going to be good enough. That could be demonstrated by means of (1) a well-commented example that accomplishes some understandable goal and/or (2) a detailed description of how a non-core table AM or index AM is expected to be able to make use of this. Bonus points if the person providing that rationale can say credibly that they've actually implemented this and it works great with 100TB of production data. The problem here is not only that we don't want to commit a hook that does nothing useful. We also don't want to commit a hook that works wonderfully for someone but we have no idea why. If we do that, then we don't know whether it's OK to modify the hook in the future as the code evolves, or more to the point, which kinds of modifications will be acceptable. And also, the next person who wants to use it is likely to have to figure out all on their own how to do so, instead of being able to refer to comments or documentation or the commit message or at least a mailing list post. My basic position is not that this patch is a bad idea, but that it isn't really finished. The idea is probably a pretty good one, but whether this is a reasonable implementation of the idea doesn't seem clear, at least not to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi, Mark! > The documentation in > https://www.postgresql.org/docs/devel/amcheck.html#AMCHECK-FUNCTIONS is > ambiguous: > > "bt_index_check does not verify invariants that span child/parent > relationships, but will verify the presence of all heap tuples as index > tuples within the index when heapallindexed is true. When checkunique is > true bt_index_check will check that no more than one among duplicate > entries in unique index is visible. When a routine, lightweight test for > corruption is required in a live production environment, using > bt_index_check often provides the best trade-off between thoroughness of > verification and limiting the impact on application performance and > availability." > > The second sentence, "When checkunique is true bt_index_check will check > that no more than one among duplicate entries in unique index is visible." > is not strictly true, as it won't check if the violation spans a page > boundary. > Amcheck with checkunique option does check uniqueness violation between pages. But it doesn't warranty detection of cross page uniqueness violations in extremely rare cases when the first equal index entry on the next page corresponds to tuple that is not visible (e.g. dead). In this, I followed the Peter's notion [1] that checking across a number of dead equal entries that could theoretically span even across many pages is an unneeded code complication and amcheck is not a tool that provides any warranty when checking an index. I'm not against docs modification in any way that clarifies its exact usage and limitations. Kind regards, Pavel Borisov [1] https://www.postgresql.org/message-id/CAH2-Wz%3DttG__BTZ-r5ccopBRb5evjg%3DzsF_o_3C5h4zRBA_LjQ%40mail.gmail.com
Re: problems with "Shared Memory and Semaphores" section of docs
>>> If the exact values >>> are important, maybe we could introduce more GUCs like >>> shared_memory_size_in_huge_pages that can be consulted (instead of >>> requiring users to break out their calculators). >> >> I don't especially like shared_memory_size_in_huge_pages, and I don't >> want to introduce more of those. GUCs are not the right way to expose >> values that you can't actually set. (Yeah, I'm guilty of some of the >> existing ones like that, but it's still not a good thing.) Maybe it's >> time to introduce a system view for such things? It could be really >> simple, with name and value, or we could try to steal some additional >> ideas such as units from pg_settings. I always found some of the preset GUCs [1] to be useful for writing SQLs used by DBAs, particularly block_size, wal_block_size, server_version and server_version_num. > The advantage of the GUC is that its value could be seen before trying to > actually start the server. Only if they have a sample in postgresql.conf file, right? A GUC like shared_memory_size_in_huge_pages will not be. [1] https://www.postgresql.org/docs/current/runtime-config-preset.html Regards, Sami
Re: Postgres and --config-file option
Robert Haas writes: > If the reason that somebody is upset is because it's not technically > true to say that you *must* do one of those things, we could fix that > with "You must" -> "You can" or with "You must specify" -> "Specify". > The patch you propose is also not terrible or anything, but it goes in > the direction of listing every alternative, which will become > unpalatable as soon as somebody adds one more way to do it, or maybe > it's unpalatable already. I agree that it's not necessary or particularly useful for this hint to be exhaustive. I could get behind your suggestion of s/You must specify/Specify/, but I also think it's fine to do nothing. regards, tom lane
Re: Postgres and --config-file option
On Thu, May 16, 2024 at 4:57 AM Aleksander Alekseev wrote: > I propose my original v1 patch for correcting the --help output of > 'postgres' too. I agree with the above comments that corresponding > changes in v4 became somewhat unwieldy. So, who is it exactly that will be confused by the status quo? I mean, let's say you get this error: postgres does not know where to find the server configuration file. You must specify the --config-file or -D invocation option or set the PGDATA environment variable. As I see it, either you know how it works and just made a mistake this time, or you are a beginner. If it's the former, the fact that the error message doesn't mention every possible way of solving the problem does not matter, because you already know how to fix your mistake. If it's the latter, you don't need to know *every* way to fix the problem. You just need to know *one* way to fix the problem. I don't really understand why somebody would look at the existing message and say "gosh, it didn't tell me that I could also use -c!". If you already know that, don't you just ignore the hint and get busy with fixing the problem? If the reason that somebody is upset is because it's not technically true to say that you *must* do one of those things, we could fix that with "You must" -> "You can" or with "You must specify" -> "Specify". The patch you propose is also not terrible or anything, but it goes in the direction of listing every alternative, which will become unpalatable as soon as somebody adds one more way to do it, or maybe it's unpalatable already. Even if we don't do that, I don't see that there's a huge problem here. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Comments about TLS (no SSLRequest) and ALPN
On Sat, 11 May 2024 at 15:36, AJ ONeal wrote: > Having postgres TLS/SNI/ALPN routable by default will just be more intuitive > (it's what I assumed would have been the default anyway), and help increase > adoption in cloud, enterprise, and other settings. Routing is primarily a feature for "cloud-first" deployments. I.e. things like Kubernetes or equivalent. I don't think people deploying to their own metal or on their own network often need this kind of feature today. Of course we don't know what the future holds and it could well become more common. In that context I think it's clear that user-oriented tools like psql shouldn't change their default behaviour. They need the maximum flexibility of being able to negotiate plain text and GSSAUTH connections if possible. It's only applications deployed by the same cloud environment building tools that deploy the database and SSL proxies that will know where direct SSL connections are necessary. > We live in the world of ACME / Let's Encrypt / ZeroSSL. Many proxies have > that built in. As such optimizing for unverified TLS takes the user down a > path that's just more difficult to begin with (it's easier to get a valid TLS > cert than it is to get a self-signed cert these days), and more nuanced > (upcoming implementors are accustomed to TLS being verified). It's easy to > document how to use the letsencrypt client with postgres. It will also be > increasingly easy to configure an ACME-enable proxy for postgres and not > worry about it in the server at all. I tend to agree that it would be good for our documentation and install scripts to assume letsencrypt certs can be requested. That said there are still a lot of database environments that are not on networks that can reach internet services directly without special firewall or routing rules set up. > Allow the user to specify ALPN > > I don't think this is particularly controversial or nuanced, so I don't have > much to say here - most TLS tools allow the user to specify ALPN for the same > reason they allow specifying the port number - either for privacy, > security-by-obscurity, or navigating some form of application or user routing. I think I need a citation before I believe this. I can't imagine it makes sense for anything other than general purpose TLS testing tools to allow arbitrary protocol names. It seems like something that would be mostly useful for pentesting or regression tests. But for actual deployed applications it doesn't make any sense to me. -- greg
Re: problems with "Shared Memory and Semaphores" section of docs
On Fri, May 17, 2024 at 01:09:55PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> [ many, many problems in documented formulas ] > >> At a bare minimum, we should probably fix the obvious problems, but I >> wonder if we could simplify this section a bit, too. > > Yup. "The definition of insanity is doing the same thing over and > over and expecting different results." Time to give up on documenting > these things in such detail. Anybody who really wants to know can > look at the source code. Cool. I'll at least fix the back-branches as-is, but I'll see about revamping this stuff for v18. >> If the exact values >> are important, maybe we could introduce more GUCs like >> shared_memory_size_in_huge_pages that can be consulted (instead of >> requiring users to break out their calculators). > > I don't especially like shared_memory_size_in_huge_pages, and I don't > want to introduce more of those. GUCs are not the right way to expose > values that you can't actually set. (Yeah, I'm guilty of some of the > existing ones like that, but it's still not a good thing.) Maybe it's > time to introduce a system view for such things? It could be really > simple, with name and value, or we could try to steal some additional > ideas such as units from pg_settings. The advantage of the GUC is that its value could be seen before trying to actually start the server. I don't dispute that it's not the right way to surface this information, though. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Internal error codes triggered by tests
On Tue, Apr 23, 2024 at 12:55 AM Michael Paquier wrote: > Thoughts? The patch as proposed seems fine. Marking RfC. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
> On May 17, 2024, at 3:11 AM, Alexander Korotkov wrote: > > On Mon, May 13, 2024 at 4:42 AM Alexander Korotkov > wrote: >> On Mon, May 13, 2024 at 12:23 AM Alexander Korotkov >> wrote: >>> On Sat, May 11, 2024 at 4:13 AM Mark Dilger >>> wrote: > On May 10, 2024, at 12:05 PM, Alexander Korotkov > wrote: > The only bt_target_page_check() caller is > bt_check_level_from_leftmost(), which overrides state->target in the > next iteration anyway. I think the patch is just refactoring to > eliminate the confusion pointer by Peter Geoghegan upthread. I find your argument unconvincing. After bt_target_page_check() returns at line 919, and before bt_check_level_from_leftmost() overrides state->target in the next iteration, bt_check_level_from_leftmost() conditionally fetches an item from the page referenced by state->target. See line 963. I'm left with four possibilities: 1) bt_target_page_check() never gets to the code that uses "rightpage" rather than "state->target" in the same iteration where bt_check_level_from_leftmost() conditionally fetches an item from state->target, so the change you're making doesn't matter. 2) The code prior to v2-0003 was wrong, having changed state->target in an inappropriate way, causing the wrong thing to happen at what is now line 963. The patch fixes the bug, because state->target no longer gets overwritten where you are now using "rightpage" for the value. 3) The code used to work, having set up state->target correctly in the place where you are now using "rightpage", but v2-0003 has broken that. 4) It's been broken all along and your patch just changes from wrong to wrong. If you believe (1) is true, then I'm complaining that you are relying far to much on action at a distance, and that you are not documenting it. Even with documentation of this interrelationship, I'd be unhappy with how brittle the code is. I cannot easily discern that the two don't ever happen in the same iteration, and I'm not at all convinced one way or the other. I tried to set up some Asserts about that, but none of the test cases actually reach the new code, so adding Asserts doesn't help to investigate the question. If (2) is true, then I'm complaining that the commit message doesn't mention the fact that this is a bug fix. Bug fixes should be clearly documented as such, otherwise future work might assume the commit can be reverted with only stylistic consequences. If (3) is true, then I'm complaining that the patch is flat busted. If (4) is true, then maybe we should revert the entire feature, or have a discussion of mitigation efforts that are needed. Regardless of which of 1..4 you pick, I think it could all do with more regression test coverage. For reference, I said something similar earlier today in another email to this thread: This patch introduces a change that stores a new page into variable "rightpage" rather than overwriting "state->target", which the old implementation most certainly did. That means that after returning from bt_target_page_check() into the calling function bt_check_level_from_leftmost() the value in state->target is not what it would have been prior to this patch. Now, that'd be irrelevant if nobody goes on to consult that value, but just 44 lines further down in bt_check_level_from_leftmost() state->target is clearly used. So the behavior at that point is changing between the old and new versions of the code, and I think I'm within reason to ask if it was wrong before the patch, wrong after the patch, or something else? Is this a bug being introduced, being fixed, or ... ? >>> >>> Thank you for your analysis. I'm inclined to believe in 2, but not >>> yet completely sure. It's really pity that our tests don't cover >>> this. I'm investigating this area. >> >> It seems that I got to the bottom of this. Changing >> BtreeCheckState.target for a cross-page unique constraint check is >> wrong, but that happens only for leaf pages. After that >> BtreeCheckState.target is only used for setting the low key. The low >> key is only used for non-leaf pages. So, that didn't lead to any >> visible bug. I've revised the commit message to reflect this. >> >> So, the picture for the patches is the following now. >> 0001 – optimization, but rather simple and giving huge effect >> 0002 – refactoring >> 0003 – fix for the bug >> 0004 – better error reporting > > I think the thread contains enough motivation on why 0002, 0003 and > 0004 are material for post-FF. They are fixes and refactoring for > new-in-v17 feature. I'm going to push them if no objections.
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Fri, May 17, 2024 at 3:15 AM Robert Haas wrote: > > OK, so you made it so that compressed data is fully self-identifying. > Hence, there's no need to worry if something gets changed: the > receiver, if properly implemented, can't help but notice. The only > downside that I can see to this design is that you only have one byte > to identify the compression algorithm, but that doesn't actually seem > like a real problem at all, because I expect the number of supported > compression algorithms to grow very slowly. I think it would take > centuries, possibly millenia, before we started to get short of > identifiers. So, cool. > > But, in your system, how does the client ask the server to switch to a > different compression algorithm, or to restart the compression stream? I was leaving the details around triggering that for this conversation and in that patch just designing the messages in a way that would allow adding the reconfiguration/restarting to be easily added in a backwards-compatible way in a future patch. I would imagine that an explicit `ParameterSet` call that sets `_pq_.connection_compression` (or whatever the implementation details turn out to be) would also trigger a compressor restart, and when restarted it would pick an algorithm/configuration based on the new value of the parameter rather than the one used at connection establishment. -- Jacob Burroughs | Staff Software Engineer E: jburrou...@instructure.com
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
On Fri, May 17, 2024 at 1:18 PM Pavel Stehule wrote: > pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan napsal: >> You've shown an inconsistency between the primary and standby with >> respect to the heap tuple infomask bits related to freezing. It looks >> like a FREEZE WAL record from the primary was never replayed on the >> standby. > > > It think is possible so broken tuples was created before upgrade from > Postgres 15 to Postgres 16 - not too far before, so this bug can be side > effect of upgrade I half suspected something like that myself. Maybe the problem happened *during* the upgrade, even. There have been historical bugs affecting pg_upgrade and relfrozenxid. Commit 74cf7d46 is one good example from only a few years ago. -- Peter Geoghegan
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
pá 17. 5. 2024 v 18:02 odesílatel Peter Geoghegan napsal: > On Fri, May 17, 2024 at 9:13 AM Pavel Stehule > wrote: > > after migration on PostgreSQL 16 I seen 3x times (about every week) > broken tables on replica nodes. The query fails with error > > > > ERROR: could not access status of transaction 1442871302 > > DETAIL: Could not open file "pg_xact/0560": No such file or directory > > You've shown an inconsistency between the primary and standby with > respect to the heap tuple infomask bits related to freezing. It looks > like a FREEZE WAL record from the primary was never replayed on the > standby. > It think is possible so broken tuples was created before upgrade from Postgres 15 to Postgres 16 - not too far before, so this bug can be side effect of upgrade > > It's natural for me to wonder if my Postgres 16 work on page-level > freezing might be a factor here. If that really was true, then it > would be necessary to explain why the primary and standby are > inconsistent (no reason to suspect a problem on the primary here). > It'd have to be the kind of issue that could be detected mechanically > using wal_consistency_checking, but wasn't detected that way before > now -- that seems unlikely. > > It's worth considering if the more aggressive behavior around > relfrozenxid advancement (in 15) and freezing (in 16) has increased > the likelihood of problems like these in setups that were already > faulty, in whatever way. The standby database is indeed corrupt, but > even on 16 it's fairly isolated corruption in practical terms. The > full extent of the problem is clear once amcheck is run, but only one > tuple can actually cause the system to error due to the influence of > hint bits (for better or worse, hint bits mask the problem quite well, > even on 16). > > -- > Peter Geoghegan >
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 7:11 AM Tomas Vondra wrote: > It does seem to me a part of the solution needs to be helping to get > those patches reviewed. I don't know how to do that, but perhaps there's > a way to encourage people to review more stuff, or review stuff from a > wider range of contributors. Say by treating reviews more like proper > contributions. This is a huge problem. I've been in the situation before where I had some cycles to do a review, but actually finding one to review is super-difficult. You simply cannot tell without clicking on the link and wading through the email thread. Granted, it's easy as an occasional reviewer to simply disregard potential patches if the email thread is over a certain size, but it's still a lot of work. Having some sort of summary/status field would be great, even if not everything was labelled. It would also be nice if simpler patches were NOT picked up by experienced hackers, as we want to encourage new/inexperienced people, and having some "easy to review" patches available will help people gain confidence and grow. > Long time ago there was a "rule" that people submitting patches are > expected to do reviews. Perhaps we should be more strict this. > Big -1. How would we even be more strict about this? Public shaming? Withholding a commit? Cheers, Greg
Re: problems with "Shared Memory and Semaphores" section of docs
Nathan Bossart writes: > [ many, many problems in documented formulas ] > At a bare minimum, we should probably fix the obvious problems, but I > wonder if we could simplify this section a bit, too. Yup. "The definition of insanity is doing the same thing over and over and expecting different results." Time to give up on documenting these things in such detail. Anybody who really wants to know can look at the source code. > If the exact values > are important, maybe we could introduce more GUCs like > shared_memory_size_in_huge_pages that can be consulted (instead of > requiring users to break out their calculators). I don't especially like shared_memory_size_in_huge_pages, and I don't want to introduce more of those. GUCs are not the right way to expose values that you can't actually set. (Yeah, I'm guilty of some of the existing ones like that, but it's still not a good thing.) Maybe it's time to introduce a system view for such things? It could be really simple, with name and value, or we could try to steal some additional ideas such as units from pg_settings. regards, tom lane
Re: remove Todo item: Allow infinite intervals just like infinite timestamps
On Fri, May 17, 2024 at 9:12 AM jian he wrote: > https://wiki.postgresql.org/wiki/Todo > Dates and Times[edit] > Allow infinite intervals just like infinite timestamps > https://www.postgresql.org/message-id/4eb095c8.1050...@agliodbs.com > > this done at > https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7 > > Should we remove this item? If the item is done, you should edit the wiki page and mark it that way. Note this note at the top of the page: [D] Completed item - marks changes that are done, and will appear in the PostgreSQL 17 release. Note that the Todo list is not a very good Todo list and most people don't use it to find projects (and haven't for a long time). So it may not get updated, or consulted, very often. -- Robert Haas EDB: http://www.enterprisedb.com
problems with "Shared Memory and Semaphores" section of docs
(moving to a new thread) On Thu, May 16, 2024 at 09:16:46PM -0500, Nathan Bossart wrote: > On Thu, May 16, 2024 at 04:37:10PM +, Imseih (AWS), Sami wrote: >> Also, Not sure if I am mistaken here, but the "+ 5" in the existing docs >> seems wrong. >> >> If it refers to NUM_AUXILIARY_PROCS defined in >> include/storage/proc.h, it should a "6" >> >> #define NUM_AUXILIARY_PROCS 6 >> >> This is not a consequence of this patch, and can be dealt with >> In a separate thread if my understanding is correct. > > Ha, I think it should actually be "+ 7"! The value is calculated as > > MaxConnections + autovacuum_max_workers + 1 + max_worker_processes + > max_wal_senders + 6 > > Looking at the history, this documentation tends to be wrong quite often. > In v9.2, the checkpointer was introduced, and these formulas were not > updated. In v9.3, background worker processes were introduced, and the > formulas were still not updated. Finally, in v9.6, it was fixed in commit > 597f7e3. Then, in v14, the archiver process was made an auxiliary process > (commit d75288f), making the formulas out-of-date again. And in v17, the > WAL summarizer was added. > > On top of this, IIUC you actually need even more semaphores if your system > doesn't support atomics, and from a quick skim this doesn't seem to be > covered in this documentation. A couple of other problems I noticed: * max_wal_senders is missing from this sentence: When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (), allowed autovacuum worker process () and allowed background process (), in sets of 16. * AFAICT the discussion about the formulas in the paragraphs following the table doesn't explain the reason for the constant. * IMHO the following sentence is difficult to decipher, and I can't tell if it actually matches the formula in the table: The maximum number of semaphores in the system is set by SEMMNS, which consequently must be at least as high as max_connections plus autovacuum_max_workers plus max_wal_senders, plus max_worker_processes, plus one extra for each 16 allowed connections plus workers (see the formula in ). At a bare minimum, we should probably fix the obvious problems, but I wonder if we could simplify this section a bit, too. If the exact values are important, maybe we could introduce more GUCs like shared_memory_size_in_huge_pages that can be consulted (instead of requiring users to break out their calculators). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Why is citext/regress failing on hamerkop?
TAKATSUKA Haruka writes: > I'm a hamerkop maintainer. > Sorry I have missed the scm error for so long. > Today I switched scmrepo from git.postgresql.org/git/postgresql.git > to github.com/postgres/postgres.git and successfully modernized > the build target code. Thanks very much! I see hamerkop has gone green in HEAD. It looks like it succeeded in v13 too but failed in v12, which suggests that the isolationcheck problem is intermittent, which is not too surprising given our current theory about what's causing that. At this point I think we are too close to the 17beta1 release freeze to mess with it, but I'd support pushing Thomas' proposed patch after the freeze is over. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
"David G. Johnston" writes: > On Friday, May 17, 2024, Joe Conway wrote: >> A solution to both of these issues (yours and mine) would be to allow >> things to be added *during* the CF month. What is the point of having a >> "freeze" before every CF anyway? Especially if they start out clean. If >> something is ready for review on day 8 of the CF, why not let it be added >> for review? > In conjunction with WIP removing this limitation on the bimonthlies makes > sense to me. I think that the original motivation for this was two-fold: 1. A notion of fairness, that you shouldn't get your patch reviewed ahead of others that have been waiting (much?) longer. I'm not sure how much this is really worth. In particular, even if we do delay an incoming patch by one CF, it's still going to compete with the older stuff in future CFs. There's already a sort feature in the CF dashboard whereby patches that have lingered for more CFs appear ahead of patches that are newer, so maybe just ensuring that late-arriving patches sort as "been around for 0 CFs" is sufficient. 2. As I mentioned a bit ago, the original idea was that we didn't exit a CF until it was empty of un-handled patches, so obviously allowing new patches to come in would mean we'd never get to empty. That idea didn't work and we don't think that way anymore. So yeah, I'm okay with abandoning the must-submit-to-a-future-CF restriction. regards, tom lane
RE: Proposal for Updating CRC32C with AVX-512 Algorithm.
Hi, forgive the top-post but I have not seen any response to this post? Thanks, Paul > -Original Message- > From: Amonson, Paul D > Sent: Wednesday, May 1, 2024 8:56 AM > To: pgsql-hackers@lists.postgresql.org > Cc: Nathan Bossart ; Shankaran, Akash > > Subject: Proposal for Updating CRC32C with AVX-512 Algorithm. > > Hi, > > Comparing the current SSE4.2 implementation of the CRC32C algorithm in > Postgres, to an optimized AVX-512 algorithm [0] we observed significant > gains. The result was a ~6.6X average multiplier of increased performance > measured on 3 different Intel products. Details below. The AVX-512 algorithm > in C is a port of the ISA-L library [1] assembler code. > > Workload call size distribution details (write heavy): >* Average was approximately around 1,010 bytes per call >* ~80% of the calls were under 256 bytes >* ~20% of the calls were greater than or equal to 256 bytes up to the max > buffer size of 8192 > > The 256 bytes is important because if the buffer is smaller, it makes sense > fallback to the existing implementation. This is because the AVX-512 algorithm > needs a minimum of 256 bytes to operate. > > Using the above workload data distribution, > at 0%calls < 256 bytes, a 841% improvement on average for crc32c > functionality was observed. > at 50% calls < 256 bytes, a 758% improvement on average for crc32c > functionality was observed. > at 90% calls < 256 bytes, a 44% improvement on average for crc32c > functionality was observed. > at 97.6% calls < 256 bytes, the workload's crc32c performance breaks-even. > at 100% calls < 256 bytes, a 14% regression is seen when using AVX-512 > implementation. > > The results above are averages over 3 machines, and were measured on: Intel > Saphire Rapids bare metal, and using EC2 on AWS cloud: Intel Saphire Rapids > (m7i.2xlarge) and Intel Ice Lake (m6i.2xlarge). > > Summary Data (Saphire Rapids bare metal, AWS m7i-2xl, and AWS m6i-2xl): > +-+---+---+---+- > ---+ > | Rates in Bytes/us | Bare Metal|AWS m6i-2xl| AWS m7i-2xl > | > | > | (Larger is Better) > +-+-+-+-+-+-+ > Overall Multiplier | > | | SSE 4.2 | AVX-512 | SSE 4.2 | AVX-512 | SSE 4.2 | > AVX-512 | > | > +-+-+-+-+-+-+-+--- > -+ > | Numbers 256-8192| 12,046 | 83,196 | 7,471 | 39,965 | 11,867 | > 84,589 |6.62| > +-+-+-+-+-+-+-+--- > -+ > | Numbers 64 - 255| 16,865 | 15,909 | 9,209 | 7,363 | 12,496 | > 10,046 |0.86| > +-+-+-+-+-+-+-+--- > -+ > | Weighted Multiplier > [*]|1.44| > > +-++ > There was no evidence of AVX-512 frequency throttling from perf data, which > stayed steady during the test. > > Feedback on this proposed improvement is appreciated. Some questions: > 1) This AVX-512 ISA-L derived code uses BSD-3 license [2]. Is this compatible > with the PostgreSQL License [3]? They both appear to be very permissive > licenses, but I am not an expert on licenses. > 2) Is there a preferred benchmark I should run to test this change? > > If licensing is a non-issue, I can post the initial patch along with my > Postgres > benchmark function patch for further review. > > Thanks, > Paul > > [0] > https://www.researchgate.net/publication/263424619_Fast_CRC_computati > on#full-text > [1] https://github.com/intel/isa-l > [2] https://opensource.org/license/bsd-3-clause > [3] https://opensource.org/license/postgresql > > [*] Weights used were 90% of requests less than 256 bytes, 10% greater than > or equal to 256 bytes.
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, 17 May 2024 at 17:59, Robert Haas wrote: > If they > get attention, it's much more likely to be because somebody saw their > email and wrote back than it is to be because somebody went through > the CommitFest and found their entry and was like "oh, I should review > this". I think this is an important insight. I used the commitfest app to find patches to review when I was just starting out in postgres development, since I didn't subscribe to all af pgsql-hackers yet. I do subscribe nowadays, but now I rarely look at the commitfest app, instead I skim the titles of emails that go into my "Postgres" folder in my mailbox. Going from such an email to a commitfest entry is near impossible (at least I don't know how to do this efficiently). I guess I'm not the only one doing this. > Honestly, if we get to a situation where a patch author is sad > because they have to click a link every 2 months to say "yeah, I'm > still here, please review my patch," we've already lost the game. That > person isn't sad because we asked them to click a link. They're sad > it's already been N * 2 months and nothing has happened. Maybe it wouldn't be so bad for an author to click the 2 months button, if it would actually give their patch some higher chance of being reviewed by doing that. And given the previous insight, that people don't look at the commitfest app that often, it might be good if pressing this button would also bump the item in people's mailboxes.
Re: commitfest.postgresql.org is no longer fit for purpose
On 5/17/24 11:58, Robert Haas wrote: I realize that many people here are (rightly!) concerned with burdening patch authors with more steps that they have to follow. But the current system is serving new patch authors very poorly. If they get attention, it's much more likely to be because somebody saw their email and wrote back than it is to be because somebody went through the CommitFest and found their entry and was like "oh, I should review this". Honestly, if we get to a situation where a patch author is sad because they have to click a link every 2 months to say "yeah, I'm still here, please review my patch," we've already lost the game. That person isn't sad because we asked them to click a link. They're sad it's already been N * 2 months and nothing has happened. +many -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: broken tables on hot standby after migration on PostgreSQL 16 (3x times last month)
On Fri, May 17, 2024 at 9:13 AM Pavel Stehule wrote: > after migration on PostgreSQL 16 I seen 3x times (about every week) broken > tables on replica nodes. The query fails with error > > ERROR: could not access status of transaction 1442871302 > DETAIL: Could not open file "pg_xact/0560": No such file or directory You've shown an inconsistency between the primary and standby with respect to the heap tuple infomask bits related to freezing. It looks like a FREEZE WAL record from the primary was never replayed on the standby. It's natural for me to wonder if my Postgres 16 work on page-level freezing might be a factor here. If that really was true, then it would be necessary to explain why the primary and standby are inconsistent (no reason to suspect a problem on the primary here). It'd have to be the kind of issue that could be detected mechanically using wal_consistency_checking, but wasn't detected that way before now -- that seems unlikely. It's worth considering if the more aggressive behavior around relfrozenxid advancement (in 15) and freezing (in 16) has increased the likelihood of problems like these in setups that were already faulty, in whatever way. The standby database is indeed corrupt, but even on 16 it's fairly isolated corruption in practical terms. The full extent of the problem is clear once amcheck is run, but only one tuple can actually cause the system to error due to the influence of hint bits (for better or worse, hint bits mask the problem quite well, even on 16). -- Peter Geoghegan
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 11:57 AM Tom Lane wrote: > Melanie Plageman writes: > > So, anyway, I'd argue that we need a parking lot for patches which we > > all agree are important and have a path forward but need someone to do > > the last 20-80% of the work. To avoid this being a dumping ground, > > patches should _only_ be allowed in the parking lot if they have a > > clear path forward. Patches which haven't gotten any interest don't go > > there. Patches in which the author has clearly not addressed feedback > > that is reasonable for them to address don't go there. These are > > effectively community TODOs which we agree need to be done -- if only > > someone had the time. > > Hmm. I was envisioning "parking lot" as meaning "this is on my > personal TODO list, and I'd like CI support for it, but I'm not > expecting anyone else to pay attention to it yet". +1. > I think what > you are describing is valuable but different. Maybe call it > "pending" or such? Or invent a different name for the other thing. Yeah, there should be someplace that we keep a list of things that are thought to be important but we haven't gotten around to doing anything about yet, but I think that's different from the parking lot CF. -- Robert Haas EDB: http://www.enterprisedb.com
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 11:05 AM Tom Lane wrote: > > We already have gone back to that model. We just haven't admitted it > > yet. And we're never going to get out of it until we find a way to get > > the contents of the CommitFest application down to a more reasonable > > size and level of complexity. There's just no way everyone's up for > > that level of pain. I'm not sure not up for that level of pain. > > Yeah, we clearly need to get the patch list to a point of > manageability, but I don't agree that abandoning time-boxed CFs > will improve anything. I'm not sure. Suppose we plotted commits generally, or commits of non-committer patches, or reviews on-list, vs. time. Would we see any uptick in activity during CommitFests? Would it vary by committer? I don't know. I bet the difference wouldn't be as much as Tom Lane would like to see. Realistically, we can't manage how contributors spend their time all that much, and trying to do so is largely tilting at windmills. To me, the value of time-based CommitFests is as a vehicle to ensure freshness, or cleanup, or whatever word you want to do. If you just make a list of things that need attention and keep incrementally updating it, eventually for various reasons that list no longer reflects your current list of priorities. At some point, you have to throw it out and make a new list, or at least that's what always happens to me. We've fallen into a system where the default treatment of a patch is to be carried over to the next CommitFest and CfMs are expected to exert effort to keep patches from getting that default treatment when they shouldn't. But that does not scale. We need a system where things drop off the list unless somebody makes an effort to keep them on the list, and the easiest way to do that is to periodically make a *fresh* list that *doesn't* just clone some previous list. I realize that many people here are (rightly!) concerned with burdening patch authors with more steps that they have to follow. But the current system is serving new patch authors very poorly. If they get attention, it's much more likely to be because somebody saw their email and wrote back than it is to be because somebody went through the CommitFest and found their entry and was like "oh, I should review this". Honestly, if we get to a situation where a patch author is sad because they have to click a link every 2 months to say "yeah, I'm still here, please review my patch," we've already lost the game. That person isn't sad because we asked them to click a link. They're sad it's already been N * 2 months and nothing has happened. -- Robert Haas EDB: http://www.enterprisedb.com
Re: commitfest.postgresql.org is no longer fit for purpose
Melanie Plageman writes: > So, anyway, I'd argue that we need a parking lot for patches which we > all agree are important and have a path forward but need someone to do > the last 20-80% of the work. To avoid this being a dumping ground, > patches should _only_ be allowed in the parking lot if they have a > clear path forward. Patches which haven't gotten any interest don't go > there. Patches in which the author has clearly not addressed feedback > that is reasonable for them to address don't go there. These are > effectively community TODOs which we agree need to be done -- if only > someone had the time. Hmm. I was envisioning "parking lot" as meaning "this is on my personal TODO list, and I'd like CI support for it, but I'm not expecting anyone else to pay attention to it yet". I think what you are describing is valuable but different. Maybe call it "pending" or such? Or invent a different name for the other thing. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 7:40 AM Robert Haas wrote: > > On Fri, May 17, 2024 at 10:31 AM Tom Lane wrote: > > To my mind, the point of the time-boxed commitfests is to provide > > a structure wherein people will (hopefully) pay some actual attention > > to other peoples' patches. Conversely, the fact that we don't have > > one running all the time gives committers some defined intervals > > where they can work on their own stuff without feeling guilty that > > they're not handling other people's patches. > > > > If we go back to the old its-development-mode-all-the-time approach, > > what is likely to happen is that the commit rate for not-your-own- > > patches goes to zero, because it's always possible to rationalize > > your own stuff as being more important. > > We already have gone back to that model. We just haven't admitted it > yet. I've worked on teams that used the short-timebox CF calendar to organize community work, like Tom describes. That was a really positive thing for us. Maybe it feels different from the committer point of view, but I don't think all of the community is operating on the long-timebox model, and I really wouldn't want to see us lengthen the cycles to try to get around the lack of review/organization that's being complained about. (But maybe you're not arguing for that in the first place.) --Jacob
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 7:11 AM Tomas Vondra wrote: > > On 5/16/24 23:43, Peter Eisentraut wrote: > > On 16.05.24 23:06, Joe Conway wrote: > >> On 5/16/24 16:57, Jacob Champion wrote: > >>> On Thu, May 16, 2024 at 1:31 PM Joe Conway wrote: > Maybe we should just make it a policy that *nothing* gets moved forward > from commitfest-to-commitfest and therefore the author needs to care > enough to register for the next one? > >>> > >>> I think that's going to severely disadvantage anyone who doesn't do > >>> this as their day job. Maybe I'm bristling a bit too much at the > >>> wording, but not having time to shepherd a patch is not the same as > >>> not caring. > >> > >> Maybe the word "care" was a poor choice, but forcing authors to think > >> about and decide if they have the "time to shepherd a patch" for the > >> *next CF* is exactly the point. If they don't, why clutter the CF with > >> it. > > > > Objectively, I think this could be quite effective. You need to prove > > your continued interest in your project by pressing a button every two > > months. > > > > But there is a high risk that this will double the annoyance for > > contributors whose patches aren't getting reviews. Now, not only are > > you being ignored, but you need to prove that you're still there every > > two months. > > > > Yeah, I 100% agree with this. If a patch bitrots and no one cares enough > to rebase it once in a while, then sure - it's probably fine to mark it > RwF. But forcing all contributors to do a dance every 2 months just to > have a chance someone might take a look, seems ... not great. > > I try to see this from the contributors' PoV, and with this process I'm > sure I'd start questioning if I even want to submit patches. Agreed. > That is not to say we don't have a problem with patches that just move > to the next CF, and that we don't need to do something about that ... > > Incidentally, I've been preparing some git+CF stats because of a talk > I'm expected to do, and it's very obvious the number of committed (and > rejected) CF entries is more or very stable over time, while the number > of patches that move to the next CF just snowballs. > > My impression is a lot of these contributions/patches just never get the > review & attention that would allow them to move forward. Sure, some do > bitrot and/or get abandoned, and let's RwF those. But forcing everyone > to re-register the patches over and over seems like "reject by default". > I'd expect a lot of people to stop bothering and give up, and in a way > that would "solve" the bottleneck. But I'm not sure it's the solution > we'd like ... I don't think we should reject by default. It is discouraging and it is already hard enough as it is to contribute. > It does seem to me a part of the solution needs to be helping to get > those patches reviewed. I don't know how to do that, but perhaps there's > a way to encourage people to review more stuff, or review stuff from a > wider range of contributors. Say by treating reviews more like proper > contributions. One reason I support the parking lot idea is for patches like the one in [1]. EXPLAIN for parallel bitmap heap scan is just plain broken. The patch in this commitfest entry is functionally 80% of the way there. It just needs someone to do the rest of the work to make it committable. I actually think it is unreasonable of us to expect the original author to do this. I have had it on my list for weeks to get back around to helping with this patch. However, I spent the better part of my coding time in the last two weeks trying to reproduce and fix a bug on stable branches that causes vacuum processes to infinitely loop. Arguably that is a bigger problem. Because I knew this EXPLAIN patch was slipping down my TODO list, I changed the patch to "waiting on author", but I honestly don't think the original author should have to do the rest of the work. Should I spend more time on this patch reviewing it and moving it forward? Yes. Maybe I'm just too slow at writing postgres code or I have bad time management or I should spend less time doing things like figuring out how many lavalier mics we need in each room for PGConf.dev. I don't know. But it is hard for me to figure out how to do more review work and guarantee that this kind of thing won't happen. So, anyway, I'd argue that we need a parking lot for patches which we all agree are important and have a path forward but need someone to do the last 20-80% of the work. To avoid this being a dumping ground, patches should _only_ be allowed in the parking lot if they have a clear path forward. Patches which haven't gotten any interest don't go there. Patches in which the author has clearly not addressed feedback that is reasonable for them to address don't go there. These are effectively community TODOs which we agree need to be done -- if only someone had the time. - Melanie [1] https://commitfest.postgresql.org/48/4248/
Re: Why does pgindent's README say to download typedefs.list from the buildfarm?
Peter Eisentraut writes: > On 16.05.24 16:45, Tom Lane wrote: >> Yeah, that was bothering me too, but I went for the minimum delta. >> I did think that a couple of integer macros would be a better idea, >> so +1 for what you did here. > I committed this, and Michael took care of WaitEventExtension, so we > should be all clear here. Thanks. I just made the committed typedefs.list exactly match the current buildfarm output, so we're clean for now. regards, tom lane
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
On Fri, May 17, 2024 at 10:56 AM Jeff Davis wrote: > I'm still not entirely clear on why hash indexes can't just follow the > rules and exclusive lock the buffer and dirty it. Presumably > performance would suffer, but I asked that question previously and > didn't get an answer: > > https://www.postgresql.org/message-id/CA%2BTgmoY%2BdagCyrMKau7UQeQU6w4LuVEu%2ByjsmJBoXKAo6XbUUA%40mail.gmail.com In my defense, the last time I worked on hash indexes was 7 years ago. If this question had come up within a year or two of that work, I probably would have both (a) had a much clearer idea of what the answer was and (b) felt obliged to drop everything and go research it if I didn't. But at this point, I feel like it's fair for me to tell you what I know and leave it to you to do further research if you feel like that's warranted. I know that we're each responsible for what we commit, but I don't really think that should extend to having to prioritize answering a hypothetical question ("what would happen if X thing worked like Y instead of the way it does?") about an area I haven't touched in long enough for every release that doesn't contain those commits to be out of support. If you feel otherwise, let's have that argument, but I have a feeling that it may be more that you're hoping I have some kind of oracular powers which, in reality, I lack. -- Robert Haas EDB: http://www.enterprisedb.com
Re: improve performance of pg_dump --binary-upgrade
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From ded5e61ff631c2d02835fdba941068dcd86741ce Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Thu, 18 Apr 2024 15:15:19 -0500 Subject: [PATCH v5 1/2] Remove is_index parameter from binary_upgrade_set_pg_class_oids(). --- src/bin/pg_dump/pg_dump.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index e324070828..0fbb8e8831 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -324,7 +324,7 @@ static void binary_upgrade_set_type_oids_by_rel(Archive *fout, const TableInfo *tbinfo); static void binary_upgrade_set_pg_class_oids(Archive *fout, PQExpBuffer upgrade_buffer, - Oid pg_class_oid, bool is_index); + Oid pg_class_oid); static void binary_upgrade_extension_member(PQExpBuffer upgrade_buffer, const DumpableObject *dobj, const char *objtype, @@ -5391,8 +5391,7 @@ binary_upgrade_set_type_oids_by_rel(Archive *fout, static void binary_upgrade_set_pg_class_oids(Archive *fout, - PQExpBuffer upgrade_buffer, Oid pg_class_oid, - bool is_index) + PQExpBuffer upgrade_buffer, Oid pg_class_oid) { PQExpBuffer upgrade_query = createPQExpBuffer(); PGresult *upgrade_res; @@ -5441,7 +5440,8 @@ binary_upgrade_set_pg_class_oids(Archive *fout, appendPQExpBufferStr(upgrade_buffer, "\n-- For binary upgrade, must preserve pg_class oids and relfilenodes\n"); - if (!is_index) + if (relkind != RELKIND_INDEX && + relkind != RELKIND_PARTITIONED_INDEX) { appendPQExpBuffer(upgrade_buffer, "SELECT pg_catalog.binary_upgrade_set_next_heap_pg_class_oid('%u'::pg_catalog.oid);\n", @@ -11668,7 +11668,7 @@ dumpCompositeType(Archive *fout, const TypeInfo *tyinfo) binary_upgrade_set_type_oids_by_type_oid(fout, q, tyinfo->dobj.catId.oid, false, false); - binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid, false); + binary_upgrade_set_pg_class_oids(fout, q, tyinfo->typrelid); } qtypname = pg_strdup(fmtId(tyinfo->dobj.name)); @@ -15802,7 +15802,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) if (dopt->binary_upgrade) binary_upgrade_set_pg_class_oids(fout, q, - tbinfo->dobj.catId.oid, false); + tbinfo->dobj.catId.oid); appendPQExpBuffer(q, "CREATE VIEW %s", qualrelname); @@ -15904,7 +15904,7 @@ dumpTableSchema(Archive *fout, const TableInfo *tbinfo) if (dopt->binary_upgrade) binary_upgrade_set_pg_class_oids(fout, q, - tbinfo->dobj.catId.oid, false); + tbinfo->dobj.catId.oid); appendPQExpBuffer(q, "CREATE %s%s %s", tbinfo->relpersistence == RELPERSISTENCE_UNLOGGED ? @@ -16755,7 +16755,7 @@ dumpIndex(Archive *fout, const IndxInfo *indxinfo) if (dopt->binary_upgrade) binary_upgrade_set_pg_class_oids(fout, q, - indxinfo->dobj.catId.oid, true); + indxinfo->dobj.catId.oid); /* Plain secondary index */ appendPQExpBuffer(q, "%s;\n", indxinfo->indexdef); @@ -17009,7 +17009,7 @@ dumpConstraint(Archive *fout, const ConstraintInfo *coninfo) if (dopt->binary_upgrade) binary_upgrade_set_pg_class_oids(fout, q, - indxinfo->dobj.catId.oid, true); + indxinfo->dobj.catId.oid); appendPQExpBuffer(q, "ALTER %sTABLE ONLY %s\n", foreign, fmtQualifiedDumpable(tbinfo)); @@ -17403,7 +17403,7 @@ dumpSequence(Archive *fout, const TableInfo *tbinfo) if (dopt->binary_upgrade) { binary_upgrade_set_pg_class_oids(fout, query, - tbinfo->dobj.catId.oid, false); + tbinfo->dobj.catId.oid); /* * In older PG versions a sequence will have a pg_type entry, but v14 -- 2.25.1 >From 7ff5168f5984865bd405e5d53dc6a190f989e7cd Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Mon, 22 Apr 2024 13:21:18 -0500 Subject: [PATCH v5 2/2] Improve performance of pg_dump --binary-upgrade. --- src/bin/pg_dump/pg_dump.c| 141 +-- src/tools/pgindent/typedefs.list | 1 + 2 files changed, 96 insertions(+), 46 deletions(-) diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index 0fbb8e8831..7b8ddc6443 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -55,6 +55,7 @@ #include "catalog/pg_trigger_d.h" #include "catalog/pg_type_d.h" #include "common/connect.h" +#include "common/int.h" #include "common/relpath.h" #include "compress_io.h" #include "dumputils.h" @@ -92,6 +93,17 @@ typedef struct int objsubid; /* subobject (table column #) */ } SecLabelItem; +typedef struct +{ + Oid oid; /* object OID */ + char relkind; /* object kind */ + RelFileNumber relfilenode; /* object filenode */ + Oid reltoastrelid; /* toast table OID */ + RelFileNumber toast_relfilenode; /* toast table filenode */ + Oid indexrelid; /* toast table
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 10:31 AM Tom Lane wrote: >> If we go back to the old its-development-mode-all-the-time approach, >> what is likely to happen is that the commit rate for not-your-own- >> patches goes to zero, because it's always possible to rationalize >> your own stuff as being more important. > We already have gone back to that model. We just haven't admitted it > yet. And we're never going to get out of it until we find a way to get > the contents of the CommitFest application down to a more reasonable > size and level of complexity. There's just no way everyone's up for > that level of pain. I'm not sure not up for that level of pain. Yeah, we clearly need to get the patch list to a point of manageability, but I don't agree that abandoning time-boxed CFs will improve anything. regards, tom lane
Re: WAL record CRC calculated incorrectly because of underlying buffer modification
On Fri, 2024-05-17 at 10:12 +0900, Michael Paquier wrote: > That's something I've done four weeks ago in the hash replay code > path, and having the image with XLR_CHECK_CONSISTENCY even if > REGBUF_NO_CHANGE was necessary because replay was setting up a LSN on > a REGBUF_NO_CHANGE page it should not have touched. Then the candidate fix to selectively break XLR_CHECK_CONSISTENCY is not acceptable. > > Yeah, agreed that getting rid of REGBUF_NO_CHANGE would be nice in > the > final picture. It still strikes me as a weird concept that WAL > replay > for hash indexes logs full pages just to be able to lock them at > replay based on what's in the records. :/ I'm still not entirely clear on why hash indexes can't just follow the rules and exclusive lock the buffer and dirty it. Presumably performance would suffer, but I asked that question previously and didn't get an answer: https://www.postgresql.org/message-id/CA%2BTgmoY%2BdagCyrMKau7UQeQU6w4LuVEu%2ByjsmJBoXKAo6XbUUA%40mail.gmail.com And if that does affect performance, what about following the same protocol as XLogSaveBufferForHint()? Regards, Jeff Davis
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 10:31 AM Tom Lane wrote: > To my mind, the point of the time-boxed commitfests is to provide > a structure wherein people will (hopefully) pay some actual attention > to other peoples' patches. Conversely, the fact that we don't have > one running all the time gives committers some defined intervals > where they can work on their own stuff without feeling guilty that > they're not handling other people's patches. > > If we go back to the old its-development-mode-all-the-time approach, > what is likely to happen is that the commit rate for not-your-own- > patches goes to zero, because it's always possible to rationalize > your own stuff as being more important. We already have gone back to that model. We just haven't admitted it yet. And we're never going to get out of it until we find a way to get the contents of the CommitFest application down to a more reasonable size and level of complexity. There's just no way everyone's up for that level of pain. I'm not sure not up for that level of pain. -- Robert Haas EDB: http://www.enterprisedb.com
Re: commitfest.postgresql.org is no longer fit for purpose
Robert Haas writes: > On Fri, May 17, 2024 at 9:54 AM Joe Conway wrote: >> I agree with you. Before commitfests were a thing, we had no structure >> at all as I recall. The dates for the dev cycle were more fluid as I >> recall, and we had no CF app to track things. Maybe retaining the >> structure but going back to the continuous development would be just the >> thing, with the CF app tracking just the currently (as of this >> week/month/???) active stuff. > The main thing that we'd gain from that is to avoid all the work of > pushing lots of things forward to the next CommitFest at the end of > every CommitFest. To my mind, the point of the time-boxed commitfests is to provide a structure wherein people will (hopefully) pay some actual attention to other peoples' patches. Conversely, the fact that we don't have one running all the time gives committers some defined intervals where they can work on their own stuff without feeling guilty that they're not handling other people's patches. If we go back to the old its-development-mode-all-the-time approach, what is likely to happen is that the commit rate for not-your-own- patches goes to zero, because it's always possible to rationalize your own stuff as being more important. > Like, we could also just have a button that says "move everything > that's left to the next CommitFest". Clearly, CFMs would appreciate some more tooling to make that sort of thing easier. Perhaps we omitted it in the original CF app coding because we expected the end-of-CF backlog to be minimal, but it's now obvious that it generally isn't. BTW, I was reminded while trawling old email yesterday that we used to freeze the content of a CF at its start and then hold the CF open until the backlog actually went to zero, which resulted in multi-month death-march CFs and no clarity at all as to release timing. Let's not go back to that. But the CF app was probably built with that model in mind. regards, tom lane
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 9:09 AM Peter Eisentraut wrote: > How would the development process look like if we > just had one commitfest per dev cycle that runs from July 1st to March 31st? Exactly the same? -- Peter Geoghegan
Re: psql JSON output format
pá 17. 5. 2024 v 16:04 odesílatel Robert Haas napsal: > On Fri, May 17, 2024 at 9:42 AM Christoph Berg wrote: > > Thanks for summarizing the thread. > > > > Things mentioned in the thread: > > > > 1) rendering of SQL NULLs - include or omit the column > > > > 2) rendering of JSON values - both "quoted string" and "inline as > >JSON" make sense > > > > 3) not quoting numeric values and booleans > > > > 4) no special treatment of other datatypes like arrays or compound > >values, just quote them > > > > 5) row format: JSON object or array (array would be close to CSV > >format) > > > > 6) overall format: array of rows, or simply print each row separately > >("JSON Lines" format, https://jsonlines.org/) > > > > I think 1, 2 and perhaps 6 make sense to have configurable. Two or > > three \pset options (or one option with a list of flags) don't sound > > too bad complexity-wise. > > > > Or maybe just default to "omit NULL columns" and "inline JSON" (and > > render null as NULL). > > If we're going to just have one option, I agree with making that the > default, and I'd default to an array of row objects. If we're going to > have something configurable, I'd at least consider making (4) > configurable. > > It's tempting to just have one option, like \pset jsonformat > > nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array > simply because adding a ton of new options just for this isn't very > appealing. But looking at how long that is, it's probably not a great > idea. So I guess separate options is probably better? > +1 for separate options lot of these proposed options can be used for XML too Regards Pavel > -- > Robert Haas > EDB: http://www.enterprisedb.com > > >
Re: commitfest.postgresql.org is no longer fit for purpose
On Fri, May 17, 2024 at 9:54 AM Joe Conway wrote: > I agree with you. Before commitfests were a thing, we had no structure > at all as I recall. The dates for the dev cycle were more fluid as I > recall, and we had no CF app to track things. Maybe retaining the > structure but going back to the continuous development would be just the > thing, with the CF app tracking just the currently (as of this > week/month/???) active stuff. The main thing that we'd gain from that is to avoid all the work of pushing lots of things forward to the next CommitFest at the end of every CommitFest. While I agree that we need to find a better way to handle that, I don't think it's really the biggest problem. The core problems here are (1) keeping stuff out of CommitFests that don't belong there and (2) labelling stuff that does belong in the CommitFest in useful ways. We should shape the solution around those problems. Maybe that will solve this problem along the way, but if it doesn't, that's easy enough to fix afterward. Like, we could also just have a button that says "move everything that's left to the next CommitFest". That, too, would avoid the manual work that this would avoid. But it wouldn't solve any other problems, so it's not really worth much consideration. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql JSON output format
On Fri, May 17, 2024 at 9:42 AM Christoph Berg wrote: > Thanks for summarizing the thread. > > Things mentioned in the thread: > > 1) rendering of SQL NULLs - include or omit the column > > 2) rendering of JSON values - both "quoted string" and "inline as >JSON" make sense > > 3) not quoting numeric values and booleans > > 4) no special treatment of other datatypes like arrays or compound >values, just quote them > > 5) row format: JSON object or array (array would be close to CSV >format) > > 6) overall format: array of rows, or simply print each row separately >("JSON Lines" format, https://jsonlines.org/) > > I think 1, 2 and perhaps 6 make sense to have configurable. Two or > three \pset options (or one option with a list of flags) don't sound > too bad complexity-wise. > > Or maybe just default to "omit NULL columns" and "inline JSON" (and > render null as NULL). If we're going to just have one option, I agree with making that the default, and I'd default to an array of row objects. If we're going to have something configurable, I'd at least consider making (4) configurable. It's tempting to just have one option, like \pset jsonformat nullcolumns=omit;inlinevalues=json,array;rowformat=object;resultcontainer=array simply because adding a ton of new options just for this isn't very appealing. But looking at how long that is, it's probably not a great idea. So I guess separate options is probably better? -- Robert Haas EDB: http://www.enterprisedb.com
Re: commitfest.postgresql.org is no longer fit for purpose
On 5/17/24 09:08, Peter Eisentraut wrote: On 17.05.24 14:42, Joe Conway wrote: Namely, the week before commitfest I don't actually know if I will have the time during that month, but I will make sure my patch is in the commitfest just in case I get a few clear days to work on it. Because if it isn't there, I can't take advantage of those "found" hours. A solution to both of these issues (yours and mine) would be to allow things to be added *during* the CF month. What is the point of having a "freeze" before every CF anyway? Especially if they start out clean. If something is ready for review on day 8 of the CF, why not let it be added for review? Maybe this all indicates that the idea of bimonthly commitfests has run its course. The original idea might have been, if we have like 50 patches, we can process all of them within a month. We're clearly not doing that anymore. How would the development process look like if we just had one commitfest per dev cycle that runs from July 1st to March 31st? What's old is new again? ;-) I agree with you. Before commitfests were a thing, we had no structure at all as I recall. The dates for the dev cycle were more fluid as I recall, and we had no CF app to track things. Maybe retaining the structure but going back to the continuous development would be just the thing, with the CF app tracking just the currently (as of this week/month/???) active stuff. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: psql: Allow editing query results with \gedit
On Fri, May 17, 2024 at 9:24 AM Christoph Berg wrote: > Tom> The stated complaint was "it's too hard to build UPDATE commands", > Tom> which I can sympathize with. > > ... which this would perfectly address - it's building the commands. > > The editor will have a bit more clutter (the UPDATE SET WHERE > boilerplate), and the fields are somewhat out of order (the key at the > end), but SQL commands are what people understand, and there is > absolutely no ambiguity on what is going to be executed since the > commands are exactly what is leaving the editor. A point to consider is that the user can also do this in the query itself, if desired. It'd just be a matter of assembling the query string with appropriate calls to quote_literal() and quote_ident(), which is not actually all that hard and I suspect many of us have done that at one point or another. And then you know that you'll get the right set of key columns and update the right table (as opposed to, say, a view over the right table, or the wrong one out of several tables that you joined). Now you might say, well, that's not terribly convenient, which is probably true, but this might be a case of -- convenient, reliable, works with arbitrary queries -- pick two. I don't know. I wouldn't be all that surprised if there's something clever and useful we could do in this area, but I sure don't know what it is. -- Robert Haas EDB: http://www.enterprisedb.com
Re: psql JSON output format
Re: Robert Haas > IMHO, the big problem here is that different people want different > corner-case behaviors and it's not clear what to do about that. I > don't think there's a single vote for "don't do this at all". So if > there is a desire to take this work forward, the goal probably ought > to be to try to either (a) figure out one behavior that everyone can > live with or (b) figure out a short list of options that can be used > to customize the behavior to a degree that lets everyone get something > reasonably close to what they want. For instance, "what to do if you > find a SQL null" and "whether to include json values as strings or > json objects" seem like they could potentially be customizable. That's > probably not a silver bullet because (1) that's more work and (2) > there might be more behaviors than we want to code, or maintain the > code for, and (3) if it gets too complicated that can itself become a > source of objections. But it's an idea. Thanks for summarizing the thread. Things mentioned in the thread: 1) rendering of SQL NULLs - include or omit the column 2) rendering of JSON values - both "quoted string" and "inline as JSON" make sense 3) not quoting numeric values and booleans 4) no special treatment of other datatypes like arrays or compound values, just quote them 5) row format: JSON object or array (array would be close to CSV format) 6) overall format: array of rows, or simply print each row separately ("JSON Lines" format, https://jsonlines.org/) I think 1, 2 and perhaps 6 make sense to have configurable. Two or three \pset options (or one option with a list of flags) don't sound too bad complexity-wise. Or maybe just default to "omit NULL columns" and "inline JSON" (and render null as NULL). Thoughts? Christoph
Re: First draft of PG 17 release notes
Bruce Momjian wrote: > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html About the changes in collations: Create a "builtin" collation provider similar to libc's C locale (Jeff Davis) It uses a "C" locale which is identical but independent of libc, but it allows the use of non-"C" collations like "en_US" and "C.UTF-8" with the "C" locale, which libc does not. MORE? The new builtin provider has two collations: * ucs_basic which is 100% identical to "C". It was introduced several versions ago and the v17 novelty is simply to change its pg_collation.collprovider from 'c' to 'b'. * pg_c_utf8 which sorts like "C" but is Unicode-aware for the rest, which makes it quite different from "C". It's also different from the other UTF-8 collations that could be used up to v17 in that it does not depend on an external library, making it free from the collation OS-upgrade risks. The part that is concretely of interest to users is the introduction of pg_c_utf8. As described in [1]: pg_c_utf8 This collation sorts by Unicode code point values rather than natural language order. For the functions lower, initcap, and upper, it uses Unicode simple case mapping. For pattern matching (including regular expressions), it uses the POSIX Compatible variant of Unicode Compatibility Properties. Behavior is efficient and stable within a Postgres major version. This collation is only available for encoding UTF8. I'd suggest that the relnote entry should be more like a condensed version of that description, without mentioning en_US or C.UTF-8, whose existence and semantics are OS-dependent, contrary to pg_c_utf8. [1] https://www.postgresql.org/docs/devel/collation.html Best regards, -- Daniel Vérité https://postgresql.verite.pro/ Twitter: @DanielVerite
Re: psql: Allow editing query results with \gedit
Re: Robert Haas > Based on these comments and the one from David Johnston, I think there > is a consensus that we do not want this patch, so I'm marking it as > Rejected in the CommitFest application. If I've misunderstood the > situation, then please feel free to change the status accordingly. Hi Robert, thanks for looking at the patch. > I feel slightly bad about rejecting this not only because rejecting > patches that people have put work into sucks but also because (1) I do > understand why it could be useful to have something like this and (2) > I think in many ways the patch is quite well-considered, e.g. it has > options like table and key to work around cases where the naive logic > doesn't get the right answer. But I also do understand why the > reactions thus far have been skeptical: there's a lot of pretty > magical stuff in this patch. That's a reliability concern: when you > type \gedit and it works, that's cool, but sometimes it isn't going to > work, and you're not always going to understand why, and you can > probably fix a lot of those cases by using the "table" or "key" > options, but you have to know they exist, and you have to realize that > they're needed, and the whole thing is suddenly a lot less convenient. > I think if we add this feature, a bunch of people won't notice, but > among those who do, I bet there will be a pretty good chance of people > complaining about cases that don't work, and perhaps not understanding > why they don't work, and I suspect fixing some of those complaints may > require something pretty close to solving the halting problem. :-( That's a good summary of the situation, thanks. I still think the feature would be cool to have, but admittedly, in the meantime I've had cases myself where the automatism went into the wrong direction (updating key columns results in DELETE-INSERT cycles that aren't doing the right thing if you didn't select all columns originally), so I now understand the objections and agree people were right about that. This could be fixed by feeding the resulting commands through another editor round, but that just adds complexity instead of removing confusion. I think there is a pretty straightforward way to address the problems, though: instead of letting the user edit JSON, format the query result in the form of UPDATE commands and let the user edit them. As Tom said upthread: Tom> The stated complaint was "it's too hard to build UPDATE commands", Tom> which I can sympathize with. ... which this would perfectly address - it's building the commands. The editor will have a bit more clutter (the UPDATE SET WHERE boilerplate), and the fields are somewhat out of order (the key at the end), but SQL commands are what people understand, and there is absolutely no ambiguity on what is going to be executed since the commands are exactly what is leaving the editor. > Now maybe that's all wrong and we should adopt this patch with > enthusiasm, but if so, we need the patch to have significantly more +1 > votes than -1 votes, and right now the reverse seems to be the case. I'll update the patch and ask here. Thanks! Christoph
Re: First draft of PG 17 release notes
On Thu, May 9, 2024 at 12:04 PM Bruce Momjian wrote: > > I have committed the first draft of the PG 17 release notes; you can > see the results here: > > https://momjian.us/pgsql_docs/release-17.html > > It will be improved until the final release. The item count is 188, > which is similar to recent releases: > This thread mentioned performance. actually this[1] refactored some interval aggregation related functions, which will make these two aggregate function: avg(interval), sum(interval) run faster, especially avg(interval). see [2]. well, I guess, this is a kind of niche performance improvement to be mentioned separately. these 3 items need to be removed, because of https://git.postgresql.org/cgit/postgresql.git/commit/?id=8aee330af55d8a759b2b73f5a771d9d34a7b887f >> Add stratnum GiST support function (Paul A. Jungwirth) >> Allow foreign keys to reference WITHOUT OVERLAPS primary keys (Paul A. >> Jungwirth) >> The keyword PERIOD is used for this purpose. >> Allow PRIMARY KEY and UNIQUE constraints to use WITHOUT OVERLAPS for >> non-overlapping exclusion constraints (Paul A. Jungwirth) [1] https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7 [2] https://www.postgresql.org/message-id/CAEZATCUJ0xjyQUL7SHKxJ5a%2BDm5pjoq-WO3NtkDLi6c76rh58w%40mail.gmail.com
Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs
On Thu, May 16, 2024 at 1:39 PM Jacob Burroughs wrote: > As currently implemented [1], the client sends the server the list of > all compression algorithms it is willing to accept, and the server can > use one of them. If the server knows what `_pq_.compression` means > but doesn't actually support any compression, it will both send the > client its empty list of supported algorithms and just never send any > compressed messages, and everyone involved will be (relatively) happy. > There is a libpq function that a client can use to check what > compression is in use if a client *really* doesn't want to continue > with the conversation without compression, but 99% of the time I can't > see why a client wouldn't prefer to continue using a connection with > whatever compression the server supports (or even none) without more > explicit negotiation. (Unlike TLS, where automagically picking > between using and not using TLS has strange security implications and > effects, compression is a convenience feature for everyone involved.) This all seems sensible to me. > As the protocol layer is currently designed [1], it explicitly makes > it very easy to change/restart compression streams, specifically for > this use case (and in particular for the general connection pooler use > case). Compressed data is already framed in a `CompressedData` > message, and that message has a header byte that corresponds to an > enum value for which algorithm is currently in use. Any time the > compression stream was restarted by the sender, the first > `CompressedData` message will set that byte, and then the client will > restart its decompression stream with the chosen algorithm from that > point. For `CompressedData` messages that continue using the > already-established stream, the byte is simply set to 0. (This is > also how the "each side sends a list" form of negotiation is able to > work without additional round trips, as the `CompressedData` framing > itself communicates which compression algorithm has been selected.) OK, so you made it so that compressed data is fully self-identifying. Hence, there's no need to worry if something gets changed: the receiver, if properly implemented, can't help but notice. The only downside that I can see to this design is that you only have one byte to identify the compression algorithm, but that doesn't actually seem like a real problem at all, because I expect the number of supported compression algorithms to grow very slowly. I think it would take centuries, possibly millenia, before we started to get short of identifiers. So, cool. But, in your system, how does the client ask the server to switch to a different compression algorithm, or to restart the compression stream? -- Robert Haas EDB: http://www.enterprisedb.com
remove Todo item: Allow infinite intervals just like infinite timestamps
hi. https://wiki.postgresql.org/wiki/Todo Dates and Times[edit] Allow infinite intervals just like infinite timestamps https://www.postgresql.org/message-id/4eb095c8.1050...@agliodbs.com this done at https://git.postgresql.org/cgit/postgresql.git/commit/?id=519fc1bd9e9d7b408903e44f55f83f6db30742b7 Should we remove this item?
Re: commitfest.postgresql.org is no longer fit for purpose
> On 17 May 2024, at 15:05, Robert Haas wrote: > > On Fri, May 17, 2024 at 9:02 AM Peter Eisentraut wrote: >> We already have the annotations feature, which is kind of this. > > I didn't realize we had that feature. When was it added, and how is it > supposed to be used? A while back. commit 27cba025a501c9dbcfb08da0c4db95dc6111d647 Author: Magnus Hagander Date: Sat Feb 14 13:07:48 2015 +0100 Implement simple message annotations This feature makes it possible to "pull in" a message in a thread and highlight it with an annotation (free text format). This will list the message in a table along with the annotation and who made it. Annotations have to be attached to a specific message - for a "generic" one it makes sense to attach it to the latest message available, as that will put it at the correct place in time. Magnus' commitmessage explains it well. The way I've used it (albeit infrequently) is to point to a specific mail in the thread where a significant change was proposed, like the patch changhing direction or something similar. -- Daniel Gustafsson
Re: commitfest.postgresql.org is no longer fit for purpose
On 5/17/24 14:51, Andrey M. Borodin wrote: > > >> On 17 May 2024, at 16:39, Robert Haas wrote: >> >> I think Andrey Borodin's nearby suggestion of having a separate CfM >> for each section of the CommitFest does a good job revealing just how >> bad the current situation is. I agree with him: that would actually >> work. Asking somebody, for a one-month period, to be responsible for >> shepherding one-tenth or one-twentieth of the entries in the >> CommitFest would be a reasonable amount of work for somebody. But we >> will not find 10 or 20 highly motivated, well-qualified volunteers >> every other month to do that work; > > Why do you think so? Let’s just try to find more CFMs for July. > When I felt that I’m overwhelmed, I asked for help and Alexander Alekseev > promptly agreed. That helped a lot. > If I was in that position again, I would just ask 10 times on a 1st day :) > >> it's a struggle to find one or two >> highly motivated, well-qualified CommitFest managers, let alone ten or >> twenty. > > Because we are looking for one person to do a job for 10. > Yes. It's probably easier to find more CF managers doing much less work. >> So I think the right interpretation of his comment is that >> managing the CommitFest has become about an order of magnitude more >> difficult than what it needs to be for the task to be done well. > > Let’s scale the process. Reduce responsibility area of a CFM, define it > clearer. > And maybe even explicitly ask CFM to summarize patch status of each entry at > least once a CF. > Should it even be up to the CFM to write the summary, or should he/she be able to request an update from the patch author? Of at least have the choice to do so. I think we'll always struggle with the massive threads, because it's really difficult to find the right balance between brevity and including all the relevant details. Or rather impossible. I did try writing such summaries for a couple of my long-running patches, and while it might have helped, the challenge was to also explain why stuff *not* done in some alternative way, which is one of the things usually discussed. But the summary gets very long, because there are many alternatives. > > Can I do a small poll among those who is on this thread? Would you volunteer to summarize a status of 20 patches in July’s CF? 5 each week or so. One per day. > Not sure. For many patches it'll be trivial. And for a bunch it'll be very very time-consuming. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: commitfest.postgresql.org is no longer fit for purpose
On 17.05.24 14:42, Joe Conway wrote: Namely, the week before commitfest I don't actually know if I will have the time during that month, but I will make sure my patch is in the commitfest just in case I get a few clear days to work on it. Because if it isn't there, I can't take advantage of those "found" hours. A solution to both of these issues (yours and mine) would be to allow things to be added *during* the CF month. What is the point of having a "freeze" before every CF anyway? Especially if they start out clean. If something is ready for review on day 8 of the CF, why not let it be added for review? Maybe this all indicates that the idea of bimonthly commitfests has run its course. The original idea might have been, if we have like 50 patches, we can process all of them within a month. We're clearly not doing that anymore. How would the development process look like if we just had one commitfest per dev cycle that runs from July 1st to March 31st?