Re: making relfilenodes 56 bits
On Tue, Sep 20, 2022 at 7:46 PM Amul Sul wrote: > Thanks for the review > Here are a few minor suggestions I came across while reading this > patch, might be useful: > > +#ifdef USE_ASSERT_CHECKING > + > + { > > Unnecessary space after USE_ASSERT_CHECKING. Changed > > + return InvalidRelFileNumber;/* placate compiler */ > > I don't think we needed this after the error on the latest branches. > -- Changed > + LWLockAcquire(RelFileNumberGenLock, LW_SHARED); > + if (shutdown) > + checkPoint.nextRelFileNumber = ShmemVariableCache->nextRelFileNumber; > + else > + checkPoint.nextRelFileNumber = > ShmemVariableCache->loggedRelFileNumber; > + > + LWLockRelease(RelFileNumberGenLock); > > This is done for the good reason, I think, it should have a comment > describing different checkPoint.nextRelFileNumber assignment > need and crash recovery perspective. > -- Done > +#define SizeOfRelFileLocatorBackend \ > + (offsetof(RelFileLocatorBackend, backend) + sizeof(BackendId)) > > Can append empty parenthesis "()" to the macro name, to look like a > function call at use or change the macro name to uppercase? > -- Yeah we could SizeOfXXX macros are general practice I see used everywhere in Postgres code so left as it is. > + if (val < 0 || val > MAX_RELFILENUMBER) > .. > if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \ > > How about adding a macro for this condition as RelFileNumberIsValid()? > We can replace all the checks referring to MAX_RELFILENUMBER with this. Actually, RelFileNumberIsValid is used to just check whether it is InvalidRelFileNumber value i.e. 0. Maybe for this we can introduce RelFileNumberInValidRange() but I am not sure whether it would be cleaner than what we have now, so left as it is for now. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
On Fri, Sep 23, 2022 at 6:05 AM Bharath Rupireddy wrote: > > On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma > wrote: > > > > PFA that enhances pg_waldump to show the latest LSN and the > > corresponding WAL file when the -l or --lastLSN option is passed an > > argument to pg_waldump. Below is an example: > > Thanks for the patch. I have some quick thoughts about it. > > > When the user passes the '-l' command line option along with the data > > directory path to pg_waldump, it reads the control file from the data > > directory. > > I don't think we need a new option for data directory -D. pg_waldump's > option 'p' can be used, please see the comments around > identify_target_directory(). > -p is the path to the WAL directory. It doesn't necessarily have to be a data directory, however the user can specify the data directory path here as well using which the path to the WAL directory can be recognized, but as I said it doesn't mean -p will always represent the data directory. > > From the control file, it gets information like redo > > pointer and current timeline id. > > Is there any reason for not using get_control_file() from > src/common/controldata_utils.c, but defining the exact same function > in pg_waldump.c? > Will give it a thought on it later. If possible, will try to reuse it. > > The redo pointer is considered to be > > the start pointer from where the pg_waldump starts reading wal data > > until end-of-wal to find the last LSN. For details please check the > > attached patch. > > Making it dependent on the controlfile limits the usability of this > feature. Imagine, using this feature on an archive location or > pg_receivewal target directory where there are WAL files but no > controlfile. I think we can choose the appropriate combinations of > existing pg_waldump options, for instance, let users enter the start > WAL segment via startseg and/or start LSN via --start and the new > option for end WAL segment and end LSN. > I have written this patch assuming that the end user is not aware of any LSN or any other WAL data and wants to know the last LSN. So all he can do is take the help of the control data to find the redo LSN and use that as a reference point (start pointer) to find the last LSN. And whatever is the WAL directory (be it archive location or wall collected via pg_receivewal or pg_wal directory), we will consider the redo pointer as the start pointer. Now, it's possible that the WAL corresponding to the start pointer is not at all available in the WAL directory like archive location or pg_receivewal directory in which this cannot be used, but this is very unlikely. > > Please note that for compressed and .partial wal files this doesn't work. > > Looking forward to the above capability because it expands the > usability of this feature. > This is a different task altogether. We will probably need to work on it separately. -- With Regards, Ashutosh Sharma.
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Sep 22, 2022 at 4:50 PM kuroda.hay...@fujitsu.com wrote: > > Hi Amit, > > > > I checked other flags that are set by signal handlers, their datatype > > > seemed to > > be sig_atomic_t. > > > Is there any reasons that you use normal bool? It should be changed if > > > not. > > > > > > > It follows the logic similar to ParallelMessagePending. Do you see any > > problem with it? > > Hmm, one consideration is: > what will happen if the signal handler HandleParallelApplyMessageInterrupt() > is fired during "ParallelApplyMessagePending = false;"? > IIUC sig_atomic_t has been needed to avoid writing to same data at the same > time. > But we do call HOLD_INTERRUPTS() before we do "ParallelApplyMessagePending = false;", so that should not happen. However, I think it would be better to use sig_atomic_t here for the sake of consistency. I think you can start a separate thread to check if we can change ParallelMessagePending to make it consistent with other such variables. -- With Regards, Amit Kapila.
Re: CI and test improvements
On Fri, Sep 23, 2022 at 9:07 AM Justin Pryzby wrote: > On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > > - # Workaround around performance issues due to 32KB block size > > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > > >create_user_script: | > > > > > pw useradd postgres > > > > > chown -R postgres:postgres . > > > > > > > > What's the story there - at some point that was important for > > > > performance > > > > because of the native block size triggering significant > > > > read-modify-write > > > > cycles with postres' writes. You didn't comment on it in the commit > > > > message. > > > > > > Well, I don't know the history, but it seems to be unneeded now. > > > > It's possible it was mainly needed for testing with aio + dio. But also > > possible that an upgrade improved the situation since. Yeah, it is very important for direct I/O (patches soon...), because every 8KB random write becomes a read-32KB/wait/write-32KB without it. It's slightly less important for buffered I/O, because the kernel caches hide that, but it still triggers I/O bandwidth amplification, and we definitely saw positive effects earlier on the CI system back on the previous generation. FWIW I am planning to see about getting the FreeBSD installer to create the root file system the way we want, instead of this ugliness. > Maybe freebsd got faster as a result of the TAU CPUs? > [data] Very interesting. Would be good to find the exact instance types + storage types to see if there has been a massive IOPS boost, perhaps via local SSD. The newer times are getting closer to a local developer machine.
Re: Removing unused parameter in SnapBuildGetOrBuildSnapshot
On Wed, Sep 21, 2022 at 8:11 PM Zhang Mingli wrote: > > On Sep 21, 2022, 22:22 +0800, Melih Mutlu , wrote: > > Hi hackers, > > Sharing a small patch to remove an unused parameter in > SnapBuildGetOrBuildSnapshot function in snapbuild.c > > With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, SnapBuildBuildSnapshot > no longer needs transaction id. This also makes the xid parameter in > SnapBuildGetOrBuildSnapshot useless. > I couldn't see a reason to keep it and decided to remove it. > > Regards, > Melih > > +1, Good Catch. > LGTM. -- With Regards, Amit Kapila.
Re: libpq error message refactoring
Andres Freund writes: > Heh, figured it out. It's inside #ifdef ENABLE_NLS. So it fails on all > platforms without NLS enabled. Argh, how simple! The question about va_start/va_end placement still stands, though. regards, tom lane
Re: libpq error message refactoring
Hi, On 2022-09-22 19:27:27 -0700, Andres Freund wrote: > I just noticed it when trying to understand the linker failure - which I > still don't... Heh, figured it out. It's inside #ifdef ENABLE_NLS. So it fails on all platforms without NLS enabled. Greetings, Andres Freund
Re: libpq error message refactoring
Andres Freund writes: > I suspect the appendPQExpBufferVA is orthogonal - most (all?) of the other > functions in pqexpbuffer.h are visible, so it feels weird/confusing to not > make appendPQExpBufferVA() available. I thought the same to start with, but if I'm right in my nearby reply that we'll have to make callers loop around appendPQExpBufferVA, then it seems like a good idea to keep it closely held. More than zero commentary about that would be a good thing, too. regards, tom lane
Re: libpq error message refactoring
Peter Eisentraut writes: > On 22.09.22 17:42, Andres Freund wrote: >> It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA >> need to be added to exports.txt? > I don't want to make that function available to users of libpq, just use > it inside libpq across .c files. Is there no visibility level for that? Should "just work", I should think. I agree with Andres that that's not the cause of the build failure. I wonder if somehow the failing links are picking up the wrong libpq.a. Separately from that: is it really okay to delegate use of a va_list argument like that? The other call paths of appendPQExpBufferVA[_internal] write va_start/va_end directly around it, not somewhere else in the call chain. I'm too tired to language-lawyer out what happens when you do it like this, but I'm suspecting that it's not well-defined portable behavior. I think what you probably need to do is export appendPQExpBufferVA as-is and require libpq_append_error to provide the error loop. regards, tom lane
Re: libpq error message refactoring
HHi, On 2022-09-22 22:00:00 -0400, Peter Eisentraut wrote: > On 22.09.22 17:42, Andres Freund wrote: > > This patch has been failing for a while: > > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854 > > > > Interestingly, previously the error only happened when targetting windows, > > but > > meson also shows it on freebsd. > > > > It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA > > need to be added to exports.txt? > > I don't want to make that function available to users of libpq, just use it > inside libpq across .c files. Is there no visibility level for that? Is > that also the problem in the freebsd build? I suspect the appendPQExpBufferVA is orthogonal - most (all?) of the other functions in pqexpbuffer.h are visible, so it feels weird/confusing to not make appendPQExpBufferVA() available. I just noticed it when trying to understand the linker failure - which I still don't... Greetings, Andres Freund
Re: libpq error message refactoring
On 22.09.22 17:42, Andres Freund wrote: This patch has been failing for a while: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854 Interestingly, previously the error only happened when targetting windows, but meson also shows it on freebsd. It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA need to be added to exports.txt? I don't want to make that function available to users of libpq, just use it inside libpq across .c files. Is there no visibility level for that? Is that also the problem in the freebsd build?
Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma wrote: > > PFA that enhances pg_waldump to show the latest LSN and the > corresponding WAL file when the -l or --lastLSN option is passed an > argument to pg_waldump. Below is an example: Thanks for the patch. I have some quick thoughts about it. > When the user passes the '-l' command line option along with the data > directory path to pg_waldump, it reads the control file from the data > directory. I don't think we need a new option for data directory -D. pg_waldump's option 'p' can be used, please see the comments around identify_target_directory(). > From the control file, it gets information like redo > pointer and current timeline id. Is there any reason for not using get_control_file() from src/common/controldata_utils.c, but defining the exact same function in pg_waldump.c? > The redo pointer is considered to be > the start pointer from where the pg_waldump starts reading wal data > until end-of-wal to find the last LSN. For details please check the > attached patch. Making it dependent on the controlfile limits the usability of this feature. Imagine, using this feature on an archive location or pg_receivewal target directory where there are WAL files but no controlfile. I think we can choose the appropriate combinations of existing pg_waldump options, for instance, let users enter the start WAL segment via startseg and/or start LSN via --start and the new option for end WAL segment and end LSN. > Please note that for compressed and .partial wal files this doesn't work. Looking forward to the above capability because it expands the usability of this feature. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Thu, Sep 22, 2022 at 3:17 PM Fujii Masao wrote: > > + MemSet(backup_state, 0, sizeof(BackupState)); > + MemSet(backup_state->name, '\0', sizeof(backup_state->name)); > > The latter MemSet() is not necessary because the former already > resets that with zero, is it? Yes, earlier, the name was a palloc'd string but now it is a fixed array. Removed. > + pfree(tablespace_map); > + tablespace_map = NULL; > + } > + > + tablespace_map = makeStringInfo(); > > tablespace_map doesn't need to be reset to NULL here. > > + pfree(tablespace_map); > + tablespace_map = NULL; > + pfree(backup_state); > + backup_state = NULL; > > It's harmless to set tablespace_map and backup_state to NULL after pfree(), > but it's also unnecessary at least here. Yes. But we can retain it for the sake of consistency with the other places and avoid dangling pointers, if at all any new code gets added in between it will be useful. > /* Free structures allocated in TopMemoryContext */ > - pfree(label_file->data); > - pfree(label_file); > > + pfree(backup_label->data); > + pfree(backup_label); > + backup_label = NULL; > > This source comment is a bit misleading, isn't it? Because the memory > for backup_label is allocated under the memory context other than > TopMemoryContext. Yes, we can just say /* Deallocate backup-related variables. */. The pg_backup_start() has the info about the variables being allocated in TopMemoryContext. > +#include "access/xlog.h" > +#include "access/xlog_internal.h" > +#include "access/xlogbackup.h" > +#include "utils/memutils.h" > > Seems "utils/memutils.h" doesn't need to be included. Yes, removed now. > + XLByteToSeg(state->startpoint, stopsegno, wal_segment_size); > + XLogFileName(stopxlogfile, state->starttli, stopsegno, > wal_segment_size); > + appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file > %s)\n", > + > LSN_FORMAT_ARGS(state->startpoint), stopxlogfile); > > state->stoppoint and state->stoptli should be used instead of > state->startpoint and state->starttli? Nice catch! Corrected. PSA v12 patch with the above review comments addressed. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com From 3caa5f392e50bdf3f60b4af9710da0004f11ff39 Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Thu, 22 Sep 2022 23:44:16 + Subject: [PATCH v12] Refactor backup related code Refactor backup related code, advantages of doing so are following: 1) backup state is more structured now - all in a single structure, callers can create backup_label contents whenever required, either during the pg_backup_start or the pg_backup_stop or in between. 2) no parsing of backup_label file lines now, no error checking for invalid parsing. 3) backup_label and history file contents have most of the things in common, they can now be created within a single function. 4) makes backup related code extensible and readable. This introduces new source files xlogbackup.c/.h for backup related code and adds the new code in there. The xlog.c file has already grown to ~9000 LOC (as of this time). Eventually, we would want to move all the backup related code from xlog.c, xlogfuncs.c, elsewhere to here. Author: Bharath Rupireddy Reviewed-by: Michael Paquier Reviewed-by: Fujii Masao Discussion: https://www.postgresql.org/message-id/CALj2ACVqNUEXkaMKyHHOdvScfN9E4LuCWsX_R-YRNfzQ727CdA%40mail.gmail.com --- src/backend/access/transam/Makefile | 1 + src/backend/access/transam/meson.build | 1 + src/backend/access/transam/xlog.c | 224 src/backend/access/transam/xlogbackup.c | 80 + src/backend/access/transam/xlogfuncs.c | 97 ++ src/backend/backup/basebackup.c | 44 +++-- src/include/access/xlog.h | 10 +- src/include/access/xlogbackup.h | 42 + 8 files changed, 296 insertions(+), 203 deletions(-) create mode 100644 src/backend/access/transam/xlogbackup.c create mode 100644 src/include/access/xlogbackup.h diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile index 3e5444a6f7..661c55a9db 100644 --- a/src/backend/access/transam/Makefile +++ b/src/backend/access/transam/Makefile @@ -29,6 +29,7 @@ OBJS = \ xact.o \ xlog.o \ xlogarchive.o \ + xlogbackup.o \ xlogfuncs.o \ xloginsert.o \ xlogprefetcher.o \ diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build index c32169bd2c..63d17b85ee 100644 --- a/src/backend/access/transam/meson.build +++ b/src/backend/access/transam/meson.build @@ -15,6 +15,7 @@ backend_sources += files( 'xact.c', 'xlog.c', 'xlogarchive.c', + 'xlogbackup.c', 'xlogfuncs.c', 'xloginsert.c',
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Thu, Sep 22, 2022 at 8:55 PM Andres Freund wrote: > > Due to the merge of the meson based build this patch needs some > adjustment. See > https://cirrus-ci.com/build/6146162607521792 > Looks like it just requires adding xlogbackup.c to > src/backend/access/transam/meson.build. Thanks! I will post a new patch with that change. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: pg_basebackup --create-slot-if-not-exists?
On Wed, Sep 21, 2022 at 5:34 PM David G. Johnston < david.g.johns...@gmail.com> wrote: > On Wednesday, September 21, 2022, Ashwin Agrawal > wrote: > >> Currently, pg_basebackup has >> --create-slot option to create slot if not already exists or >> --slot to use existing slot >> >> Which means it needs knowledge on if the slot with the given name already >> exists or not before invoking the command. If pg_basebackup --create-slot >> <> command fails for some reason after creating the slot. Re-triggering the >> same command fails with ERROR slot already exists. Either then need to >> delete the slot and retrigger Or need to add a check before retriggering >> the command to check if the slot exists and based on the same alter the >> command to avoid passing --create-slot option. This poses >> inconvenience while automating on top of pg_basebackup. As checking for >> slot presence before invoking pg_basebackup unnecessarily involves issuing >> separate SQL commands. Would be really helpful for such scenarios if >> similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of >> semantic. (Seems the limitation most likely is coming from CREATE >> REPLICATION SLOT protocol itself), Thoughts? >> > > What’s the use case for automating pg_basebackup with a named replication > slot created by the pg_basebackup command? > Greenplum runs N (some hundred) number of PostgreSQL instances to form a sharded database cluster. Hence, automation/scripts are in place to create replicas, failover failback for these N instances and such. As Michael said for predictable management and monitoring of the slot across these many instances, specific named replication slots are used across all these instances. These named replication slots are used both for pg_basebackup followed by streaming replication. Why can you not leverage a temporary replication slot (i.e., omit —slot). > ISTM the create option is basically obsolete now. > We would be more than happy to use a temporary replication slot if it provided full functionality. It might be a gap in my understanding, but I feel a temporary replication slot only protects WAL deletion for the duration of pg_basebackup. It doesn't protect the window between pg_basebackup completion and streaming replication starting. With --write-recovery-conf option "primary_slot_name" only gets written to postgresql.auto.conf if the named replication slot is provided, which makes sure the same slot will be used for pg_basebackup and streaming replication hence will keep the WAL around till streaming replica connects after pg_basebackup. How to avoid this window with a temp slot? -- *Ashwin Agrawal (VMware)*
Re: [PoC] Let libpq reject unexpected authentication requests
On Thu, Sep 22, 2022 at 4:52 AM Peter Eisentraut wrote: > On 22.09.22 01:37, Jacob Champion wrote: > > I think this is potentially > > dangerous, but it mirrors the current behavior of libpq and I'm not > > sure that we should change it as part of this patch. > > It might be worth reviewing that behavior for other reasons, but I think > semantics of your patch are correct. Sounds good. v9 removes the TODO and adds a better explanation. > > If you're okay with those [GSS] limitations, I will rip out the code. The > > reason I'm not too worried about it is, I don't think it makes much > > sense to be strict about your authentication requirements while at the > > same time leaving the choice of transport encryption up to the server. > > The way I understand what you explained here is that it would be more > sensible to leave that code in. I would be okay with that. I've added a comment there explaining the gssencmode interaction. That leaves no TODOs inside the code itself. I removed the commit message note about not being able to prevent unexpected client cert requests or GSS encryption, since we've decided to handle those cases outside of require_auth. I'm not able to test SSPI easily at the moment; if anyone is able to try it out, that'd be really helpful. There's also the question of SASL forwards compatibility -- if someone adds a new SASL mechanism, the code will treat it like scram-sha-256 until it's changed, and there will be no test to catch it. Should we leave that to the future mechanism implementer to fix, or add a mechanism check now so the client is safe even if they forget? Thanks! --Jacob commit e13f21d596bc5670156a441dc6eec5228864b4b0 Author: Jacob Champion Date: Thu Sep 22 16:39:34 2022 -0700 squash! libpq: let client reject unexpected auth methods Remove TODOs, and document why the code remains as-is. diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c index 565e44fcf6..793888d30f 100644 --- a/src/interfaces/libpq/fe-auth.c +++ b/src/interfaces/libpq/fe-auth.c @@ -921,8 +921,14 @@ check_expected_areq(AuthRequest areq, PGconn *conn) * else that the user has allowed an authentication-less * connection). * -* TODO: how should !auth_required interact with an incomplete -* SCRAM exchange? +* If the user has allowed both SCRAM and unauthenticated +* (trust) connections, then this check will silently accept +* partial SCRAM exchanges, where a misbehaving server does not +* provide its verifier before sending an OK. This is consistent +* with historical behavior, but it may be a point to revisit in +* the future, since it could allow a server that doesn't know +* the user's password to silently harvest material for a brute +* force attack. */ if (!conn->auth_required || conn->client_finished_auth) break; @@ -940,10 +946,9 @@ check_expected_areq(AuthRequest areq, PGconn *conn) /* * If implicit GSS auth has already been performed via GSS * encryption, we don't need to have performed an -* AUTH_REQ_GSS exchange. -* -* TODO: check this assumption. What mutual auth guarantees -* are made in this case? +* AUTH_REQ_GSS exchange. This allows require_auth=gss to be +* combined with gssencmode, since there won't be an +* explicit authentication request in that case. */ } else From f88abcb4b22499466758e566b956c487877118bb Mon Sep 17 00:00:00 2001 From: Jacob Champion Date: Mon, 28 Feb 2022 09:40:43 -0800 Subject: [PATCH v9 1/2] libpq: let client reject unexpected auth methods The require_auth connection option allows the client to choose a list of acceptable authentication types for use with the server. There is no negotiation: if the server does not present one of the allowed authentication requests, the connection fails. Additionally, all methods in the list may be negated, e.g. '!password', in which case the server must NOT use the listed authentication type. The special method "none" allows/disallows the use of
Re: cfbot vs. changes in the set of CI tasks
Andres Freund writes: > On 2022-09-22 17:44:56 -0400, Tom Lane wrote: >> I was confused about how come the new patches I'd just posted in >> the 3848 CF item (write/read support for raw parse trees) are >> showing a mix of passes and fails in the cfbot. I eventually >> realized that the fail results are all old and stale, because >> (for example) there's no longer a "FreeBSD - 13" task to run, >> just "FreeBSD - 13 - Meson". This seems tremendously confusing. >> Can we get the cfbot to not display no-longer-applicable results? > Somewhat tangentially - it seemed the right thing at the time [TM], but now I > wonder if tacking on the buildsystem like I did is the best way? As long as we > have both I think we need it in some form... Yeah, I think those names are fine for now. regards, tom lane
Re: Making C function declaration parameter names consistent with corresponding definition names
On Thu, Sep 22, 2022 at 3:20 PM Tom Lane wrote: > WFM. Okay, pushed a minimally invasive commit to fix the inconsistencies in pg_dump related code just now. That's the last of them. Now the only remaining clang-tidy warnings about inconsistent parameter names are those that seem practically impossible to fix (these are mostly just cases involving flex/bison). It still seems like a good idea to formally create a new coding standard around C function parameter names. We really need a simple clang-tidy workflow to be able to do that. I'll try to get to that soon. Part of the difficulty there will be finding a way to ignore the warnings that we really can't do anything about. Thanks -- Peter Geoghegan
Re: cfbot vs. changes in the set of CI tasks
Hi, On 2022-09-22 17:44:56 -0400, Tom Lane wrote: > I was confused about how come the new patches I'd just posted in > the 3848 CF item (write/read support for raw parse trees) are > showing a mix of passes and fails in the cfbot. I eventually > realized that the fail results are all old and stale, because > (for example) there's no longer a "FreeBSD - 13" task to run, > just "FreeBSD - 13 - Meson". This seems tremendously confusing. > Can we get the cfbot to not display no-longer-applicable results? Somewhat tangentially - it seemed the right thing at the time [TM], but now I wonder if tacking on the buildsystem like I did is the best way? As long as we have both I think we need it in some form... Greetings, Andres Freund
Re: cfbot vs. changes in the set of CI tasks
Thomas Munro writes: > On Fri, Sep 23, 2022 at 10:17 AM Thomas Munro wrote: >> Ah, right. Morning here and I've just spotted that too after the >> first results of the Meson era rolled in overnight. It shows the >> latest result for each distinct task name, which now includes extinct >> tasks (until they eventually get garbage collected due to age in a few >> days). I clearly didn't anticipate tasks going away. Perhaps I >> should figure out how to show only results corresponding to a single >> commit ID... looking into that... > Done, and looking more sane now. Looks better to me too. Thanks! regards, tom lane
Re: cfbot vs. changes in the set of CI tasks
On Fri, Sep 23, 2022 at 10:17 AM Thomas Munro wrote: > On Fri, Sep 23, 2022 at 9:44 AM Tom Lane wrote: > > I was confused about how come the new patches I'd just posted in > > the 3848 CF item (write/read support for raw parse trees) are > > showing a mix of passes and fails in the cfbot. I eventually > > realized that the fail results are all old and stale, because > > (for example) there's no longer a "FreeBSD - 13" task to run, > > just "FreeBSD - 13 - Meson". This seems tremendously confusing. > > Can we get the cfbot to not display no-longer-applicable results? > > > > As a quick-fix hack it might do to just flush all the pre-meson-commit > > results, but I doubt this'll be the last time we make such changes. > > I think an actual filter comparing the result's time to the time of the > > allegedly-being-tested patch would be prudent. > > Ah, right. Morning here and I've just spotted that too after the > first results of the Meson era rolled in overnight. It shows the > latest result for each distinct task name, which now includes extinct > tasks (until they eventually get garbage collected due to age in a few > days). I clearly didn't anticipate tasks going away. Perhaps I > should figure out how to show only results corresponding to a single > commit ID... looking into that... Done, and looking more sane now.
Re: [RFC] building postgres with meson - v13
On Thu, Sep 22, 2022 at 01:28:09PM -0700, Andres Freund wrote: > On 2022-09-22 13:05:33 -0700, Nathan Bossart wrote: >> * I'm using an Ubuntu-based distribution, and the version of meson that apt >> installed was not new enough for Postgres. I ended up cloning meson [0] >> and using the newest tag. This is no big deal. > > I assume this is 20.04 LTS? If so, we're missing it by one version of meson > currently. There's unfortunately a few features that'd be a bit painful to not > have. Yes. I imagine I'll upgrade to 22.04 LTS soon, which appears to provide a new enough version of meson. >> * The installed binaries were unable to locate libraries like libpq. I >> ended up setting the extra_lib_dirs option to the directory where these >> libraries were installed to fix this. This one is probably worth >> investigating further. > > I think that should be "fixed" in a later commit in the meson tree - any > chance you could try that? Yup, after cherry-picking 9bc60bc, this is fixed. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Making C function declaration parameter names consistent with corresponding definition names
Peter Geoghegan writes: > That makes it easy, then. I'll just take the least invasive approach > possible with pg_dump: treat the names from function definitions as > authoritative, and mechanically adjust the function declarations as > needed to make everything agree. WFM. regards, tom lane
Re: cfbot vs. changes in the set of CI tasks
On Fri, Sep 23, 2022 at 9:44 AM Tom Lane wrote: > I was confused about how come the new patches I'd just posted in > the 3848 CF item (write/read support for raw parse trees) are > showing a mix of passes and fails in the cfbot. I eventually > realized that the fail results are all old and stale, because > (for example) there's no longer a "FreeBSD - 13" task to run, > just "FreeBSD - 13 - Meson". This seems tremendously confusing. > Can we get the cfbot to not display no-longer-applicable results? > > As a quick-fix hack it might do to just flush all the pre-meson-commit > results, but I doubt this'll be the last time we make such changes. > I think an actual filter comparing the result's time to the time of the > allegedly-being-tested patch would be prudent. Ah, right. Morning here and I've just spotted that too after the first results of the Meson era rolled in overnight. It shows the latest result for each distinct task name, which now includes extinct tasks (until they eventually get garbage collected due to age in a few days). I clearly didn't anticipate tasks going away. Perhaps I should figure out how to show only results corresponding to a single commit ID... looking into that...
Re: Making C function declaration parameter names consistent with corresponding definition names
On Thu, Sep 22, 2022 at 2:55 PM Tom Lane wrote: > Yeah. I'm not much on board with the AHX->A and AH->A changes you made; > those seem extremely invasive and it's not real clear that they add a > lot of value. That makes it easy, then. I'll just take the least invasive approach possible with pg_dump: treat the names from function definitions as authoritative, and mechanically adjust the function declarations as needed to make everything agree. The commit message for this will note in passing that the inconsistency that this creates at the header file level is a known issue. Thanks -- Peter Geoghegan
Re: [RFC] building postgres with meson - v13
On Thu, Sep 22, 2022 at 2:50 PM Andres Freund wrote: > I'm likely the most biased person on this, but for me the reliable incremental > builds and the readability of the test output are big enough wins that the > answer is pretty clear... Doesn't hurt that running all tests is faster too. It's nice that things are much more discoverable now. For example, if you want to run some random test on its own then you just...do it in the obvious, discoverable way. It took me about 2 minutes to figure out how to do that, without reading any documentation. OTOH doing the same thing with the old autoconf-based build system requires the user to know the exact magical incantation for Postgres tests. You just have to know to run the 2 or 3 tests that are undocumented (or poorly documented) dependencies first. That seems like an enormous usability improvement, especially for those of us that haven't been working on Postgres for years. > time to run all tests (cassert, -Og), in a fully built tree: > time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4' > real1m1.577s > user7m32.579s > sys 2m17.767s > time meson test > real0m42.178s > user7m8.533s > sys 2m17.711s Sold! -- Peter Geoghegan
Re: Making C function declaration parameter names consistent with corresponding definition names
Peter Geoghegan writes: > I would like to give another 24 hours for anybody to lodge final > objections to what I've done in this patch. It seems possible that > there will be concerns about how this might affect backpatching, or > something like that. This patch goes relatively far in the direction > of refactoring to make things consistent at the module level -- unlike > most of the patches, which largely consisted of mechanical adjustments > that were obviously correct, both locally and at the whole-module level. Yeah. I'm not much on board with the AHX->A and AH->A changes you made; those seem extremely invasive and it's not real clear that they add a lot of value. I've never thought that the Archive vs. ArchiveHandle separation in pg_dump was very well thought out. I could perhaps get behind a patch to eliminate that bit of "abstraction"; but I'd still try to avoid wholesale changes in local-variable names from it. I don't think that would buy anything that's worth the back-patching pain. Just accepting that Archive[Handle] variables might be named either AH or AHX depending on historical accident does not seem that bad to me. We have lots more and worse naming inconsistencies in our tree. regards, tom lane
Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER
On 9/21/22 21:55, Andrey Chudnovsky wrote: > First, My message from corp email wasn't displayed in the thread, I see it on the public archives [1]. Your client is choosing some pretty confusing quoting tactics, though, which you may want to adjust. :D I have what I'll call some "skeptical curiosity" here -- you don't need to defend your use cases to me by any means, but I'd love to understand more about them. > Yes, passing a token as a new auth method won't make much sense in > isolation. However: > 1. Since OAUTHBEARER is supported in the ecosystem, passing a token as > a way to authenticate with OAUTHBEARER is more consistent (IMO), then > passing it as a password. Agreed. It's probably not a very strong argument for the new mechanism, though, especially if you're not using the most expensive code inside it. > 2. Validation on the backend side doesn't depend on whether the token > is obtained by libpq or transparently passed by the upstream client. Sure. > 3. Single OAUTH auth method on the server side for both scenarios, > would allow both enterprise clients with their own Token acquisition > and community clients using libpq flows to connect as the same PG > users/roles. Okay, this is a stronger argument. With that in mind, I want to revisit your examples and maybe provide some counterproposals: >> Libpq passing toked directly from an upstream client is useful in other >> scenarios: >> 1. Enterprise clients, built with .Net / Java and using provider-specific >> authentication libraries, like MSAL for AAD. Those can also support more >> advanced provider-specific token acquisition flows. I can see that providing a token directly would help you work around limitations in libpq's "standard" OAuth flows, whether we use iddawc or not. And it's cheap in terms of implementation. But I have a feeling it would fall apart rapidly with error cases, where the server is giving libpq information via the OAUTHBEARER mechanism, but libpq can only communicate to your wrapper through human-readable error messages on stderr. This seems like clear motivation for client-side SASL plugins (which were also discussed on Samay's proposal thread). That's a lot more expensive to implement in libpq, but if it were hypothetically available, wouldn't you rather your provider-specific code be able to speak OAUTHBEARER directly with the server? >> 2. Resource-tight (like IoT) clients. Those can be compiled without the >> optional libpq flag not including the iddawc or other dependency. I want to dig into this much more; resource-constrained systems are near and dear to me. I can see two cases here: Case 1: The device is an IoT client that wants to connect on its own behalf. Why would you want to use OAuth in that case? And how would the IoT device get its Bearer token to begin with? I'm much more used to architectures that provision high-entropy secrets for this, whether they're incredibly long passwords per device (in which case, channel-bound SCRAM should be a fairly strong choice?) or client certs (which can be better decentralized, but make for a lot of bookkeeping). If the answer to that is, "we want an IoT client to be able to connect using the same role as a person", then I think that illustrates a clear need for SASL negotiation. That would let the IoT client choose SCRAM-*-PLUS or EXTERNAL, and the person at the keyboard can choose OAUTHBEARER. Then we have incredible flexibility, because you don't have to engineer one mechanism to handle them all. Case 2: The constrained device is being used as a jump point. So there's an actual person at a keyboard, trying to get into a backend server (maybe behind a firewall layer, etc.), and the middlebox is either not web-connected or is incredibly tiny for some reason. That might be a good use case for a copy-pasted Bearer token, but is there actual demand for that use case? What motivation would you (or your end user) have for choosing a fairly heavy, web-centric authentication method in such a constrained environment? Are there other resource-constrained use cases I've missed? Thanks, --Jacob [1] https://www.postgresql.org/message-id/MN0PR21MB31694BAC193ECE1807FD45358F4F9%40MN0PR21MB3169.namprd21.prod.outlook.com
Re: CI and test improvements
Hi, On 2022-09-22 16:07:02 -0500, Justin Pryzby wrote: > On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > > @@ -71,8 +69,6 @@ task: > > > > > fingerprint_key: ccache/freebsd > > > > > reupload_on_changes: true > > > > > > > > > > - # Workaround around performance issues due to 32KB block size > > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > > >create_user_script: | > > > > > pw useradd postgres > > > > > chown -R postgres:postgres . > > > > > -- > > > > > > > > What's the story there - at some point that was important for > > > > performance > > > > because of the native block size triggering significant > > > > read-modify-write > > > > cycles with postres' writes. You didn't comment on it in the commit > > > > message. > > > > > > Well, I don't know the history, but it seems to be unneeded now. > > > > It's possible it was mainly needed for testing with aio + dio. But also > > possible that an upgrade improved the situation since. > > Maybe freebsd got faster as a result of the TAU CPUs? > https://mobile.twitter.com/cirrus_labs/status/1534982111568052240 > > I noticed because it's been *slower* the last ~24h since cirrusci > disabled TAU, as Thomas commit mentioned. > https://twitter.com/cirrus_labs/status/1572657320093712384 Yea, I noticed that as well. It's entirely possible that something in the "hardware" stack improved sufficiently to avoid problems. > I have no idea if the TAU CPUs eliminate/mitigate the original > performance issue you had with AIO. But they have such a large effect > on freebsd that it could now be the fastest task, if given more than 2 > CPUs. I'm planning to rebase early next week and try that out. Greetings, Andres Freund
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-22 13:21:28 -0700, Peter Geoghegan wrote: > Is it generally recommended that individual hackers mostly switch over > to Meson for their day to day work soon? I'm guessing that this > question doesn't really have a clear answer yet, but thought I'd ask, > just in case. It'll probably depend on who you ask ;) I'm likely the most biased person on this, but for me the reliable incremental builds and the readability of the test output are big enough wins that the answer is pretty clear... Doesn't hurt that running all tests is faster too. The currently existing limitations are imo mostly around making it usable for production, particularly on windows. time to run all tests (cassert, -Og), in a fully built tree: make: time make -j48 -s -Otarget check-world real2m44.206s user6m29.121s sys 1m54.069s time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4' real1m1.577s user7m32.579s sys 2m17.767s meson: time meson test real0m42.178s user7m8.533s sys 2m17.711s FWIW, I just rebased my older patch to cache and copy initdb during the tests. The %user saved is impressive enough to pursue it again... time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4' real0m52.655s user2m19.504s sys 1m26.264s time meson test: real0m36.370s user2m14.748s sys 1m36.741s Greetings, Andres Freund
cfbot vs. changes in the set of CI tasks
I was confused about how come the new patches I'd just posted in the 3848 CF item (write/read support for raw parse trees) are showing a mix of passes and fails in the cfbot. I eventually realized that the fail results are all old and stale, because (for example) there's no longer a "FreeBSD - 13" task to run, just "FreeBSD - 13 - Meson". This seems tremendously confusing. Can we get the cfbot to not display no-longer-applicable results? As a quick-fix hack it might do to just flush all the pre-meson-commit results, but I doubt this'll be the last time we make such changes. I think an actual filter comparing the result's time to the time of the allegedly-being-tested patch would be prudent. regards, tom lane
Re: Making C function declaration parameter names consistent with corresponding definition names
On Wed, Sep 21, 2022 at 6:58 PM Peter Geoghegan wrote: > Attached revision shows where I'm at with this. Would be nice to get > it all out of the way before too long. Attached is v6, which now consists of only one single patch, which fixes things up in pg_dump. (This is almost though not quite identical to the same patch from v5.) I would like to give another 24 hours for anybody to lodge final objections to what I've done in this patch. It seems possible that there will be concerns about how this might affect backpatching, or something like that. This patch goes relatively far in the direction of refactoring to make things consistent at the module level -- unlike most of the patches, which largely consisted of mechanical adjustments that were obviously correct, both locally and at the whole-module level. BTW, I notice that meson seems to have built-in support for running scan-build, a tool that performs static analysis using clang. I'm pretty sure that it's possible to use scan-build to run clang-tidy checks (though I've just been using run-clang-tidy myself). Perhaps it would make sense to use meson's support for scan-build to make it easy for everybody to run the clang-tidy checks locally. -- Peter Geoghegan From ae97a00cb4c69b3fd247bc6606509165912bee29 Mon Sep 17 00:00:00 2001 From: Peter Geoghegan Date: Wed, 21 Sep 2022 14:51:29 -0700 Subject: [PATCH v6] Harmonize parameter names in pg_dump/pg_dumpall. Make sure that function declarations use names that exactly match the corresponding names from function definitions. Having parameter names that are reliably consistent in this way will make it easier to reason about groups of related C functions from the same translation unit as a module. It will also make certain refactoring tasks easier. Like other recent commits that cleaned up function parameter names, this commit was written with help from clang-tidy. Author: Peter Geoghegan Reviewed-By: David Rowley Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com --- src/bin/pg_dump/common.c | 2 +- src/bin/pg_dump/parallel.c| 14 +- src/bin/pg_dump/pg_backup.h | 36 ++-- src/bin/pg_dump/pg_backup_archiver.c | 74 +++ src/bin/pg_dump/pg_backup_archiver.h | 10 +- src/bin/pg_dump/pg_backup_custom.c| 2 +- src/bin/pg_dump/pg_backup_db.c| 40 ++-- src/bin/pg_dump/pg_backup_db.h| 12 +- src/bin/pg_dump/pg_backup_directory.c | 2 +- src/bin/pg_dump/pg_backup_null.c | 8 +- src/bin/pg_dump/pg_backup_tar.c | 4 +- src/bin/pg_dump/pg_dump.c | 290 +- src/bin/pg_dump/pg_dump.h | 6 +- src/bin/pg_dump/pg_dumpall.c | 6 +- 14 files changed, 254 insertions(+), 252 deletions(-) diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c index 395f817fa..44fa52cc5 100644 --- a/src/bin/pg_dump/common.c +++ b/src/bin/pg_dump/common.c @@ -79,7 +79,7 @@ typedef struct _catalogIdMapEntry static catalogid_hash *catalogIdHash = NULL; -static void flagInhTables(Archive *fout, TableInfo *tbinfo, int numTables, +static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables, InhInfo *inhinfo, int numInherits); static void flagInhIndexes(Archive *fout, TableInfo *tblinfo, int numTables); static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables); diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c index c8a70d9bc..ba6df9e0c 100644 --- a/src/bin/pg_dump/parallel.c +++ b/src/bin/pg_dump/parallel.c @@ -146,7 +146,7 @@ static int pgpipe(int handles[2]); typedef struct ShutdownInformation { ParallelState *pstate; - Archive*AHX; + Archive*A; } ShutdownInformation; static ShutdownInformation shutdown_info; @@ -325,9 +325,9 @@ getThreadLocalPQExpBuffer(void) * as soon as they've created the ArchiveHandle. */ void -on_exit_close_archive(Archive *AHX) +on_exit_close_archive(Archive *A) { - shutdown_info.AHX = AHX; + shutdown_info.A = A; on_exit_nicely(archive_close_connection, _info); } @@ -353,8 +353,8 @@ archive_close_connection(int code, void *arg) */ ShutdownWorkersHard(si->pstate); - if (si->AHX) -DisconnectDatabase(si->AHX); + if (si->A) +DisconnectDatabase(si->A); } else { @@ -378,8 +378,8 @@ archive_close_connection(int code, void *arg) else { /* Non-parallel operation: just kill the leader DB connection */ - if (si->AHX) - DisconnectDatabase(si->AHX); + if (si->A) + DisconnectDatabase(si->A); } } diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h index fcc5f6bd0..9dc441902 100644 --- a/src/bin/pg_dump/pg_backup.h +++ b/src/bin/pg_dump/pg_backup.h @@ -270,33 +270,33 @@ typedef int DumpId; * Function pointer prototypes for assorted callback methods. */ -typedef int (*DataDumperPtr) (Archive *AH, const void *userArg); +typedef int (*DataDumperPtr) (Archive *A,
Re: Extending outfuncs support to utility statements
I wrote: > I left off the last one because it fails check-world: we now > get through the core regression tests okay, but then the pg_dump > tests fail on the new SQL function. To fix that, we would have > to extend ruleutils.c's get_utility_query_def() to be able to > fully reconstruct any legal utility query ... which seems like > a pretty dauntingly large amount of tedious manual effort to > start with, and then also a nontrivial additional requirement > on any future patch that adds new utility syntax. Are we sure > it's worth going there? Thinking about that some more, I wondered if we'd even wish to build such code, compared to just saving the original source text for utility statements and printing that. Obviously, this loses all the benefits of new-style SQL functions compared to old-style ... except that those benefits would be illusory anyway, since by definition we have not done parse analysis on a utility statement. So we *cannot* offer any useful guarantees about being search_path change proof, following renames of referenced objects, preventing drops of referenced objects, etc etc. This makes me wonder if this is a feature we even want. If we put it in, we'd have to add a bunch of disclaimers about how utility statements behave entirely differently from DML statements. Perhaps an interesting alternative is to allow a command along the lines of EXECUTE string-expression (of course that name is already taken) where we'd parse-analyze the string-expression at function creation, but then the computed string is executed as a SQL command in the runtime environment. This would make it fairly clear which things you have guarantees of and which you don't. It'd also offer a feature that the PLs have but SQL functions traditionally haven't, ie execution of dynamically-computed SQL. Anyway, this is a bit far afield from the stated topic of this thread. I think we should commit something approximately like what I posted and then start a new thread specifically about what we'd like to do about utility commands in new-style SQL functions. regards, tom lane
Re: Consider parallel for lateral subqueries with limit
On Mon, Sep 19, 2022 at 4:29 PM Robert Haas wrote: > > On Mon, Sep 19, 2022 at 3:58 PM James Coleman wrote: > > But in the case where there's correlation via LATERAL we already don't > > guarantee unique executions for a given set of params into the lateral > > subquery execution, right? For example, suppose we have: > > > > select * > > from foo > > left join lateral ( > > select n > > from bar > > where bar.a = foo.a > > limit 1 > > ) on true > > > > and suppose that foo.a (in the table scan) returns these values in > > order: 1, 2, 1. In that case we'll execute the lateral subquery 3 > > separate times rather than attempting to order the results of foo.a > > such that we can re-execute the subquery only when the param changes > > to a new unique value (and we definitely don't cache the results to > > guarantee unique subquery executions). > > I think this is true, but I don't really understand why we should > focus on LATERAL here. What we really need, and I feel like we've > talked about this before, is a way to reason about where parameters > are set and used. Yes, over in the thread "Parallelize correlated subqueries that execute within each worker" [1] which was meant to build on top of this work (though is technically separate). Your bringing it up here too is making me wonder if we can combine the two and instead of always allowing subqueries with LIMIT instead (like the other patch does) delay final determination of parallel safety of rels and paths (perhaps, as that thread is discussing, until gather/gather merge path creation). Side note: I'd kinda like a way (maybe a development GUC or even a compile time option) to have EXPLAIN show where params are set. If something like that exists, egg on my face I guess, but I'd love to know about it. > Your sample query gets a plan like this: > > Nested Loop Left Join (cost=0.00..1700245.00 rows=1 width=8) >-> Seq Scan on foo (cost=0.00..145.00 rows=1 width=4) >-> Limit (cost=0.00..170.00 rows=1 width=4) > -> Seq Scan on bar (cost=0.00..170.00 rows=1 width=4) >Filter: (foo.a = a) > > If this were to occur inside a larger plan tree someplace, it would be > OK to insert a Gather node above the Nested Loop node without doing > anything further, because then the parameter that stores foo.a would > be both set and used in the worker. If you wanted to insert a Gather > at any other place in this plan, things get more complicated. But just > because you have LATERAL doesn't mean that you have this problem, > because if you delete the "limit 1" then the subqueries get flattened > together and the parameter disappears, For future reference in this email thread when you remove the "limit 1" this is the plan you get: Merge Right Join (cost=372.18..815.71 rows=28815 width=8) Merge Cond: (bar.a = foo.a) -> Sort (cost=158.51..164.16 rows=2260 width=8) Sort Key: bar.a -> Seq Scan on bar (cost=0.00..32.60 rows=2260 width=8) -> Sort (cost=179.78..186.16 rows=2550 width=4) Sort Key: foo.a -> Seq Scan on foo (cost=0.00..35.50 rows=2550 width=4) Just to make sure I'm following: by "doesn't mean that you have this problem" you mean "doesn't mean you have this limitation on parallel query"? The "flattening out" (conversion to join between two table scans executed a single time each) is an interesting case because I consider that to be "just" a performance optimization, and therefore I don't think anything about the guarantees a user expects should change. But interestingly: it does end up providing stronger guarantees about the query results than it would if the conversion didn't happen (the subquery results in only a single table scan whereas without the change a scan per outer row means it's *possible* to get different results across different outer rows even with the same join key value). > and if you delete the lateral > reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but > it no longer refers to an outer parameter. And on the flip side just > because you don't have LATERAL doesn't mean that you don't have this > problem. e.g. the query could instead be: > > select *, (select n from bar where bar.a = foo.a limit 1) from foo; > > ...which I think is pretty much equivalent to your formulation and has > the same problem as far as parallel query as your formulation but does > not involve the LATERAL keyword. Yes, that's a good point too. I need to play with these examples and confirm whether lateral_relids gets set in that case. IIRC when that's set isn't exactly the same as whether or not the LATERAL keyword is used, and I should clarify that my claims here are meant to be about when we execute it that way regardless of the keyword usage. The keyword usage I'd assumed just made it easier to talk about, but maybe you're implying that it's actually generating confusion. James Coleman 1:
Re: CI and test improvements
On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote: > > > > @@ -71,8 +69,6 @@ task: > > > > fingerprint_key: ccache/freebsd > > > > reupload_on_changes: true > > > > > > > > - # Workaround around performance issues due to 32KB block size > > > > - repartition_script: src/tools/ci/gcp_freebsd_repartition.sh > > > >create_user_script: | > > > > pw useradd postgres > > > > chown -R postgres:postgres . > > > > -- > > > > > > What's the story there - at some point that was important for performance > > > because of the native block size triggering significant read-modify-write > > > cycles with postres' writes. You didn't comment on it in the commit > > > message. > > > > Well, I don't know the history, but it seems to be unneeded now. > > It's possible it was mainly needed for testing with aio + dio. But also > possible that an upgrade improved the situation since. Maybe freebsd got faster as a result of the TAU CPUs? https://mobile.twitter.com/cirrus_labs/status/1534982111568052240 I noticed because it's been *slower* the last ~24h since cirrusci disabled TAU, as Thomas commit mentioned. https://twitter.com/cirrus_labs/status/1572657320093712384 For example this CF entry: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3736 https://cirrus-ci.com/task/4670794365140992 5m36s - 4days ago https://cirrus-ci.com/task/4974926233862144 5m25s - 3days ago https://cirrus-ci.com/task/5561409034518528 5m29s - 2days ago https://cirrus-ci.com/task/6432442008469504 9m19s - yesterday CF_BOT's latest tasks seem to be fast again, since 1-2h ago. https://cirrus-ci.com/build/5178906041909248 9m1s https://cirrus-ci.com/build/4593160281128960 5m8s https://cirrus-ci.com/build/4539845644124160 5m22s The logs for July show when freebsd started "being fast": https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3708 https://cirrus-ci.com/task/6316073015312384 10m25s Jul 13 https://cirrus-ci.com/task/5662878987452416 5m48s Jul 15 Maybe that changed in July rather than June because the TAU CPUs were still not available in every region/zone (?) https://cloud.google.com/compute/docs/regions-zones/ I have no idea if the TAU CPUs eliminate/mitigate the original performance issue you had with AIO. But they have such a large effect on freebsd that it could now be the fastest task, if given more than 2 CPUs. -- Justin
Re: Extending outfuncs support to utility statements
Andres Freund writes: > On 2022-09-22 12:48:47 -0400, Tom Lane wrote: >> After staring at the code a bit, I think we don't need to touch >> pg_strtok() per se. I propose that this can be resolved with changes >> at the next higher level. Let's make outToken print NULL as <> as >> it always has, but print an empty string as "" (two double quotes). >> If the raw input string is two double quotes, print it as \"" to >> disambiguate. This'd require a catversion bump when committed, >> but I don't think there are any showstopper problems otherwise. > Makes sense to me. Here is a version of all-but-the-last patch in Peter's series. I left off the last one because it fails check-world: we now get through the core regression tests okay, but then the pg_dump tests fail on the new SQL function. To fix that, we would have to extend ruleutils.c's get_utility_query_def() to be able to fully reconstruct any legal utility query ... which seems like a pretty dauntingly large amount of tedious manual effort to start with, and then also a nontrivial additional requirement on any future patch that adds new utility syntax. Are we sure it's worth going there? But I think it's probably worth committing what we have here just on testability grounds. Some notes: 0001, 0002 not changed. I tweaked 0003 a bit, mainly because I think it's probably not very safe to apply strncmp to a string we don't know the length of. It might be difficult to fall off the end of memory that way, but I wouldn't bet it's impossible. Also, adding the length checks gets rid of the need for a grotty order dependency in _readA_Expr(). 0004 fixes the empty-string problem as per above. I did not like what you'd done about imprecise floats one bit. I think we ought to do it as in 0005 instead: drop all the hard-wired precision assumptions and just print per Ryu. 0006, 0007, 0008 are basically the same as your previous 0004, 0005, 0006, except for getting rid of the float hacking in 0005. If you're good with this approach to the float issue, I think this set is committable (minus 0006 of course, and don't forget the catversion bump). regards, tom lane From 89def573ced57fc320ad191613dac62c8992c27a Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 12 Aug 2022 21:15:40 +0200 Subject: [PATCH v2 1/9] Fix reading of most-negative integer value nodes The main parser checks whether a literal fits into an int when deciding whether it should be put into an Integer or Float node. The parser processes integer literals without signs. So a most-negative integer literal will not fit into Integer and will end up as a Float node. The node tokenizer did this differently. It included the sign when checking whether the literal fit into int. So a most-negative integer would indeed fit that way and end up as an Integer node. In order to preserve the node structure correctly, we need the node tokenizer to also analyze integer literals without sign. There are a number of test cases in the regression tests that have a most-negative integer argument of some utility statement, so this issue is easily reproduced under WRITE_READ_PARSE_PLAN_TREES. --- src/backend/nodes/read.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c index 4a54996b63..a9cb81b129 100644 --- a/src/backend/nodes/read.c +++ b/src/backend/nodes/read.c @@ -267,7 +267,7 @@ nodeTokenType(const char *token, int length) char *endptr; errno = 0; - (void) strtoint(token, , 10); + (void) strtoint(numptr, , 10); if (endptr != token + length || errno == ERANGE) return T_Float; return T_Integer; -- 2.37.1 From 8a469d0a7195be4ecc65155b82c5836dfb13b920 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Fri, 12 Aug 2022 21:16:26 +0200 Subject: [PATCH v2 2/9] Fix reading of BitString nodes The node tokenizer went out of its way to store BitString node values without the leading 'b'. But everything else in the system stores the leading 'b'. This would break if a BitString node is read-printed-read. Also, the node tokenizer didn't know that BitString node tokens could also start with 'x'. --- src/backend/nodes/read.c | 9 - 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c index a9cb81b129..fe84f140ee 100644 --- a/src/backend/nodes/read.c +++ b/src/backend/nodes/read.c @@ -288,7 +288,7 @@ nodeTokenType(const char *token, int length) retval = T_Boolean; else if (*token == '"' && length > 1 && token[length - 1] == '"') retval = T_String; - else if (*token == 'b') + else if (*token == 'b' || *token == 'x') retval = T_BitString; else retval = OTHER_TOKEN; @@ -471,11 +471,10 @@ nodeRead(const char *token, int tok_len) break; case T_BitString: { -char *val = palloc(tok_len); +char *val = palloc(tok_len + 1); -/* skip leading 'b' */ -memcpy(val, token + 1,
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-22 13:05:33 -0700, Nathan Bossart wrote: > I gave the meson build system a try, and it seems to work nicely. It > didn't take long at all to adapt my workflow. > > A few notes from my experience: > > * I'm using an Ubuntu-based distribution, and the version of meson that apt > installed was not new enough for Postgres. I ended up cloning meson [0] > and using the newest tag. This is no big deal. I assume this is 20.04 LTS? If so, we're missing it by one version of meson currently. There's unfortunately a few features that'd be a bit painful to not have. > * The installed binaries were unable to locate libraries like libpq. I > ended up setting the extra_lib_dirs option to the directory where these > libraries were installed to fix this. This one is probably worth > investigating further. I think that should be "fixed" in a later commit in the meson tree - any chance you could try that? https://github.com/anarazel/postgres/tree/meson > * meson really doesn't like it when there are things leftover from > configure/make. Whenever I switch from make to meson, I have to run 'make > maintainer-clean'. Yes. I recommend building out-of-tree with autoconf as well. > Otherwise, all of my usual build options, ccache, etc. are working just > like before. Nice work! Cool! Thanks for testing, Andres Freund
Re: [RFC] building postgres with meson - v13
On Thu, Sep 22, 2022 at 1:05 PM Nathan Bossart wrote: > Otherwise, all of my usual build options, ccache, etc. are working just > like before. Nice work! +1 Is it generally recommended that individual hackers mostly switch over to Meson for their day to day work soon? I'm guessing that this question doesn't really have a clear answer yet, but thought I'd ask, just in case. -- Peter Geoghegan
Re: [RFC] building postgres with meson - v13
I gave the meson build system a try, and it seems to work nicely. It didn't take long at all to adapt my workflow. A few notes from my experience: * I'm using an Ubuntu-based distribution, and the version of meson that apt installed was not new enough for Postgres. I ended up cloning meson [0] and using the newest tag. This is no big deal. * The installed binaries were unable to locate libraries like libpq. I ended up setting the extra_lib_dirs option to the directory where these libraries were installed to fix this. This one is probably worth investigating further. * meson really doesn't like it when there are things leftover from configure/make. Whenever I switch from make to meson, I have to run 'make maintainer-clean'. Otherwise, all of my usual build options, ccache, etc. are working just like before. Nice work! [0] https://github.com/mesonbuild/meson -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Pruning never visible changes
On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera wrote: > > On 2022-Sep-22, Simon Riggs wrote: > > > On Mon, 19 Sept 2022 at 00:16, Greg Stark wrote: > > > > VACUUM was willing to remove a committed-dead tuple immediately if it > > > was > > > deleted by the same transaction that inserted it. The idea is that > > > such a > > > tuple could never have been visible to any other transaction, so we > > > don't > > > need to keep it around to satisfy MVCC snapshots. However, there was > > > already an exception for tuples that are part of an update chain, and > > > this > > > exception created a problem: we might remove TOAST tuples (which are > > > never > > > part of an update chain) while their parent tuple stayed around (if > > > it was > > > part of an update chain). This didn't pose a problem for most things, > > > since the parent tuple is indeed dead: no snapshot will ever consider > > > it > > > visible. But MVCC-safe CLUSTER had a problem, since it will try to > > > copy > > > RECENTLY_DEAD tuples to the new table. It then has to copy their > > > TOAST > > > data too, and would fail if VACUUM had already removed the toast > > > tuples. > > > Good research Greg, thank you. Only took 10 years for me to notice it > > was gone ;-) > > But this begs the question: is the proposed change safe, given that > ancient consideration? I don't think TOAST issues have been mentioned > in this thread so far, so I wonder if there is a test case that verifies > that this problem doesn't occur for some reason. Oh, completely agreed. I will submit a modified patch that adds a test case and just a comment to explain why we can't remove such rows. -- Simon Riggshttp://www.EnterpriseDB.com/
Re: archive modules
On Wed, Sep 14, 2022 at 09:33:46PM +0200, Peter Eisentraut wrote: > Another question on this feature: Currently, if archive_library is set, > archive_command is ignored. I think if both are set, it should be an error. > Compare for example what happens if you set multiple recovery_target_xxx > settings. I don't think silently turning off one setting by setting another > is a good behavior. Peter, would you like to proceed with something like [0] to resolve this? If so, I will work on cleaning the patch up. [0] https://postgr.es/m/20220914222736.GA3042279%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: ICU for global collation
On 2022-09-21 17:53, Peter Eisentraut wrote: Committed with that test, thanks. I think that covers all the ICU issues you reported for PG15 for now? I thought about the order of the ICU checks - if it is ok to check that the selected encoding is supported by ICU after printing all the locale & encoding information, why not to move almost all the ICU checks here?.. Examples of the work of the attached patch: 1. ICU locale vs supported encoding: 1.1. $ initdb --encoding sql-ascii --locale-provider icu hoge ... initdb: error: encoding mismatch initdb: detail: The encoding you selected (SQL_ASCII) is not supported with the ICU provider. initdb: hint: Rerun initdb and either do not specify an encoding explicitly, or choose a matching combination. 1.2. (like before) $ initdb --encoding sql-ascii --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ createdb --encoding sql-ascii --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU 2. For builds without ICU: 2.1. $ initdb --locale-provider icu hoge ... initdb: error: ICU is not supported in this build $ createdb --locale-provider icu hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 2.2. (like before) $ initdb --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen $ createdb --icu-locale en-US hoge createdb: error: database creation failed: ERROR: ICU locale cannot be specified unless locale provider is ICU 2.3. $ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii hoge createdb: error: database creation failed: ERROR: ICU is not supported in this build 4. About errors in initdb: 4.1. If icu_locale is not specified, but it is required, then we get this: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU locale must be specified Almost the same if ICU is not supported in this build: $ initdb --locale-provider icu hoge The files belonging to this database system will be owned by user "marina". This user must also own the server process. The database cluster will be initialized with this locale configuration: provider:icu LC_COLLATE: en_US.UTF-8 LC_CTYPE:en_US.UTF-8 LC_MESSAGES: en_US.UTF-8 LC_MONETARY: ru_RU.UTF-8 LC_NUMERIC: ru_RU.UTF-8 LC_TIME: ru_RU.UTF-8 The default database encoding has been set to "UTF8". initdb: error: ICU is not supported in this build 4.2. If icu_locale is specified for the wrong provider, the error will be at the beginning of the program start as before: $ initdb --icu-locale en-US hoge initdb: error: --icu-locale cannot be specified unless locale provider "icu" is chosen -- Marina Polyakova Postgres Professional: http://www.postgrespro.com The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index e0753c1badcc299e2bac45f3bdd2f23f59d70cbc..2589a2523e07c9543c99c7d7b446438d62382b89 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) if (dblocprovider == COLLPROVIDER_ICU) { +#ifndef USE_ICU + ereport(ERROR, +(errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("ICU is not supported in this build"))); +#else if (!(is_encoding_supported_by_icu(encoding))) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), @@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) errmsg("ICU locale must be specified"))); check_icu_locale(dbiculocale); +#endif } else { diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc) } } -#endif /* USE_ICU */ - /* * Check if the given locale ID is valid, and ereport(ERROR) if it isn't. */ void check_icu_locale(const char *icu_locale) { -#ifdef USE_ICU UCollator *collator; UErrorCode status; @@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale) if (U_ICU_VERSION_MAJOR_NUM < 54) icu_set_collation_attributes(collator, icu_locale); ucol_close(collator); -#else - ereport(ERROR, -
Re: Mingw task for Cirrus CI
Hi Melih, On 2022-09-05 16:52:17 -0700, Andres Freund wrote: > I think we can convert this to meson soon, and that seems a *lot* faster at > configure than autoconf on mingw. Not even close to as fast as on a modern-ish > linux, but not that painful. Now that meson has been merged, could you try to adjust this patch so it builds with meson? Regards, Andres
Re: Extending outfuncs support to utility statements
On 2022-09-22 12:48:47 -0400, Tom Lane wrote: > I wrote: > > I think this is the issue Peter mentioned about needing to distinguish > > between empty strings and NULL strings. We're going to need to rethink > > the behavior of pg_strtok() a bit to fix that. > > After staring at the code a bit, I think we don't need to touch > pg_strtok() per se. I propose that this can be resolved with changes > at the next higher level. Let's make outToken print NULL as <> as > it always has, but print an empty string as "" (two double quotes). > If the raw input string is two double quotes, print it as \"" to > disambiguate. This'd require a catversion bump when committed, > but I don't think there are any showstopper problems otherwise. Makes sense to me.
Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions
On Thu, Sep 22, 2022 at 7:41 AM Bharath Rupireddy wrote: > > On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma wrote: > > > > Yeah, we can either add this functionality to pg_waldump or maybe add > > a new binary itself that would return this information. > > IMV, a separate tool isn't the way, since pg_waldump already reads WAL > files and decodes WAL records, what's proposed here can be an > additional functionality of pg_waldump. > > It will be great if an initial patch is posted here. > PFA that enhances pg_waldump to show the latest LSN and the corresponding WAL file when the -l or --lastLSN option is passed an argument to pg_waldump. Below is an example: ashu@92893de650ed:~/pgsql$ pg_waldump -l -D ./data-dir Latest LSN: 0/148A45F8 Latest WAL filename: 00010014 How has it been coded? When the user passes the '-l' command line option along with the data directory path to pg_waldump, it reads the control file from the data directory. From the control file, it gets information like redo pointer and current timeline id. The redo pointer is considered to be the start pointer from where the pg_waldump starts reading wal data until end-of-wal to find the last LSN. For details please check the attached patch. Please note that for compressed and .partial wal files this doesn't work. -- With Regards, Ashutosh Sharma. 0001-Enhance-pg_waldump-to-show-latest-LSN-and-the-WAL-fi.patch Description: Binary data
Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans
On Wed, Sep 21, 2022 at 9:21 PM Nathan Bossart wrote: > I wouldn't mind giving this a try. Definitely seems worth experimenting with. Many WAL records generated during VACUUM (and during opportunistic pruning/index tuple deletion) have offset number arrays that can be assumed to be both sorted and unique. My guess is that these WAL record types are the most amenable to compression at the WAL record level. > Yeah, it seems likely that we could pack offsets in single bytes in many > cases. A more sophisticated approach could even choose how many bits to > use per offset based on the maximum in the array. Furthermore, we might be > able to make use of SIMD instructions to mitigate any performance penalty. I guess I'd start with some kind of differential compression that relies on the arrays being both sorted and unique. While it might be important to be able to compose together multiple different techniques (something that is more than the sum of its parts can be very useful), it seems most important to quickly validate the basic idea first. One obvious thing that still seems worth pointing out: you may not need to decompress anything. All that you really need to do is to get the logic from some routine like PageIndexMultiDelete() to be executed by a REDO routine. Perhaps it'll make sense to come up with a representation that can just be passed to the routine directly. (I really don't know how likely this is, but it's something to consider.) > I'm tempted to start by just using single-byte offsets when possible since > that should be relatively simple while still yielding a decent improvement > for many workloads. What do you think? The really big wins for WAL size will come from applying high level insights about what is really needed, in all likelihood. The main overhead is very often generic WAL record header overhead. When it is then it'll be hard to really move the needle just by compressing the payload. To do that you'd need to find a way to use fewer WAL records to do the same amount of work. The thing that I think will really make the biggest difference is merging PRUNE records with FREEZE records (and maybe even make the merged record do the work of a VISIBLE record when that's possible). Just because now you have 1 WAL record instead of 2 (or even 3) WAL records. Obviously that's a complicated project, but it's another case where it feels like the more efficient approach might also be simpler. We often write a PRUNE record with only one or two items in the array, in which case it's practically free to do some freezing, at least in terms of WAL space overhead (as long as you can do it with the same WAL record). Plus freezing is already inescapably tied to pruning -- we always prune a page that we're going to try to freeze in VACUUM (we can't safely freeze dead tuples, so there is more or less a dependency already). Not that you shouldn't pursue compressing the payload from WAL records as a project -- maybe that'll work very well. I'm just pointing out that there is a bigger picture, that may or may not end up mattering here. For the patch on this thread there certainly is a bigger picture about costs over time. Something like that could be true for this other patch too. It's definitely worth considering the size of the WAL records when there are only one or two items, how common that may be in each individual case, etc. For example, FREEZE records have a minimum size of 64 bytes in practice, due to WAL record alignment overhead (the payload itself doesn't usually have to be aligned, but the WAL header still does). It may be possible to avoid going over the critical threshold that makes the WAL size one MAXALIGN() quantum larger in the event of having only a few tuples to freeze, a scenario where negative compression is likely. Negative compression is always a potential problem, but maybe you can deal with it very effectively by thinking about the problem holistically. If you're "wasting" space that was just going to be alignment padding anyway, does it really matter at all? (Maybe there is some reason to care, but I offhand I can't think of one.) -- Peter Geoghegan
Re: Extending outfuncs support to utility statements
I wrote: > I think this is the issue Peter mentioned about needing to distinguish > between empty strings and NULL strings. We're going to need to rethink > the behavior of pg_strtok() a bit to fix that. After staring at the code a bit, I think we don't need to touch pg_strtok() per se. I propose that this can be resolved with changes at the next higher level. Let's make outToken print NULL as <> as it always has, but print an empty string as "" (two double quotes). If the raw input string is two double quotes, print it as \"" to disambiguate. This'd require a catversion bump when committed, but I don't think there are any showstopper problems otherwise. I'll work on fleshing that idea out. regards, tom lane
Re: Extending outfuncs support to utility statements
Andres Freund writes: > On 2022-08-24 17:25:31 +0200, Peter Eisentraut wrote: >> Here are patches for that. > These patches have been failing since they were posted, afaict: > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3848 > I assume that's known? I think this is the issue Peter mentioned about needing to distinguish between empty strings and NULL strings. We're going to need to rethink the behavior of pg_strtok() a bit to fix that. regards, tom lane
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hii Wang wei, > > 1. In the function GetCheapestReplicaIdentityFullPath. > + if (rel->pathlist == NIL) > + { > + /* > +* A sequential scan could have been dominated by by an > index scan > +* during make_one_rel(). We should always have a > sequential scan > +* before set_cheapest(). > +*/ > + Path *seqScanPath = create_seqscan_path(root, rel, > NULL, 0); > + > + add_path(rel, seqScanPath); > + } > > This is a question I'm not sure about: > Do we need this part to add sequential scan? > > I think in our case, the sequential scan seems to have been added by the > function make_one_rel (see function set_plain_rel_pathlist). Yes, the sequential scan is added during make_one_rel. > If I am missing > something, please let me know. BTW, there is a typo in above comment: `by > by`. > As the comment mentions, the sequential scan could have been dominated & removed by index scan, see add_path(): > *We also remove from the rel's pathlist any old paths that are dominated * by new_path --- that is, new_path is cheaper, at least as well ordered, * generates no more rows, requires no outer rels not required by the old * path, and is no less parallel-safe. Still, I agree that the comment could be improved, which I pushed. > 2. In the file execReplication.c. > +#ifdef USE_ASSERT_CHECKING > +#include "catalog/index.h" > +#endif > #include "commands/trigger.h" > #include "executor/executor.h" > #include "executor/nodeModifyTable.h" > #include "nodes/nodeFuncs.h" > #include "parser/parse_relation.h" > #include "parser/parsetree.h" > +#ifdef USE_ASSERT_CHECKING > +#include "replication/logicalrelation.h" > +#endif > > I think it's fine to only add `logicalrelation.h` here, because `index.h` > has > been added by `logicalrelation.h`. > > Makes sense, removed thanks. Attached v13. v13_0001_use_index_on_subs_when_pub_rep_ident_full.patch Description: Binary data
Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Hi Peter, Kuroda kuroda.hay...@fujitsu.com , 21 Eyl 2022 Çar, 04:21 tarihinde şunu yazdı: > > One last thing - do you think there is any need to mention this > > behaviour in the pgdocs, or is OK just to be a hidden performance > > improvement? > > FYI - I put my opinion. > We have following sentence in the logical-replication.sgml: > > ``` > ... > If the table does not have any suitable key, then it can be set >to replica identity full, which means the entire row > becomes >the key. This, however, is very inefficient and should only be used as > a >fallback if no other solution is possible. > ... > ``` > > Here the word "very inefficient" may mean that sequential scans will be > executed every time. > I think some descriptions can be added around here. > Making a small edit in that file makes sense. I'll attach v13 in the next email that also includes this change. Also, do you think is this a good time for me to mark the patch "Ready for committer" in the commit fest? Not sure when and who should change the state, but it seems I can change. I couldn't find any documentation on how that process should work. Thanks!
Re: Summary function for pg_buffercache
Hi, On 2022-09-22 18:22:44 +0300, Melih Mutlu wrote: > Since header locks are removed again, I put some doc changes and comments > back. Due to the merge of the meson build system, this needs to adjust meson.build as well. > --- a/contrib/pg_buffercache/expected/pg_buffercache.out > +++ b/contrib/pg_buffercache/expected/pg_buffercache.out > @@ -8,3 +8,12 @@ from pg_buffercache; > t > (1 row) > > +select buffers_used + buffers_unused > 0, > +buffers_dirty < buffers_used, > +buffers_pinned < buffers_used Doesn't these have to be "<=" instead of "<"? > + for (int i = 0; i < NBuffers; i++) > + { > + BufferDesc *bufHdr; > + uint32 buf_state; > + > + /* > + * No need to get locks on buffer headers as we don't rely on > the > + * results in detail. Therefore, we don't get a consistent > snapshot > + * across all buffers and it is not guaranteed that the > information of > + * each buffer is self-consistent as opposed to > pg_buffercache_pages. > + */ I think the "consistent snapshot" bit is misleading - even taking buffer header locks wouldn't give you that. > + if (buffers_used != 0) > + usagecount_avg = usagecount_avg / buffers_used; Perhaps the average should be NULL in the buffers_used == 0 case? > + > + pg_buffercache_pages function > + returns a set of records, plus a view > pg_buffercache that wraps the function for > + convenient use is provided. > + > + > + > + pg_buffercache_summary function returns a table with > a single row > + that contains summarized and aggregated information about shared buffer > caches. > I think these sentences are missing a "The " at the start? "shared buffer caches" isn't right - I think I'd just drop the "caches". > + > + There is a single row to show summarized information of all shared > buffers. > + pg_buffercache_summary is not interested > + in the state of each shared buffer, only shows aggregated information. > + > + > + > + pg_buffercache_summary doesn't take buffer manager > + locks. Unlike pg_buffercache_pages function > + pg_buffercache_summary doesn't take buffer headers > locks > + either, thus the result is not consistent. This is intentional. The > purpose > + of this function is to provide a general idea about the state of shared > + buffers as fast as possible. Additionally, > pg_buffercache_summary > + allocates much less memory. > + > + I don't think this mentioning of buffer header locks is useful for users - nor is it I think correct. Acquiring the buffer header locks wouldn't add *any* additional consistency. Greetings, Andres Freund
Re: libpq error message refactoring
Hi, On 2022-08-25 16:34:26 +0200, Peter Eisentraut wrote: > libpq now contains a mix of error message strings that end with newlines and > don't end with newlines, due to some newer code paths with new ways of > passing errors around. This has now gotten me confused a few too many times > both during development and translation. So I looked into whether we can > unify this, similar to how we have done elsewhere (e.g., pg_upgrade). I > came up with the attached patch. It's not complete, but it shows the idea > and it looks like a nice simplification to me. Thoughts on this approach? This patch has been failing for a while: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854 Interestingly, previously the error only happened when targetting windows, but meson also shows it on freebsd. It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA need to be added to exports.txt? Greetings, Andres Freund
Re: Temporary file access API
I’m a latecomer to the discussion, but as a word of introduction, I’m working with Stephen, and I have started looking over the temp file proposal with the idea of helping it move along. I’ll start by summarizing the temp file proposal and its goals. >From a high level, the proposed code: * Creates an fread/fwrite replacement (BufFileStream) for buffering data to a single file. * Updates BufFile, which reads/writes a set of files, to use BufFileStream internally. * Does not impact the normal (page cached) database I/O. * Updates all the other places where fread/fwrite and read/write are used. * Creates and removes transient files. I see the following goals: * Unify all the “other” file accesses into a single, consistent API. * Integrate with VFDs. * Integrate transient files with transactions and tablespaces. * Create a consolidated framework where features like encryption and compression can be more easily added. * Maintain good streaming performance. Is this a fair description of the proposal? For myself, I’d like to map out how features like compression and encryption would fit into the framework, more as a sanity check than anything else, and I’d like to look closer at some of the implementation details. But at the moment, I want to make sure I have the proper high-level view of the temp file proposal. From: Robert Haas Date: Wednesday, September 21, 2022 at 11:54 AM To: Antonin Houska Cc: PostgreSQL Hackers , Peter Eisentraut , Stephen Frost Subject: Re: Temporary file access API On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska wrote: > > I don't think that (3) is a separate advantage from (1) and (2), so I > > don't have anything separate to say about it. > > I thought that the uncontrollable buffer size is one of the things you > complaint about in > > https://www.postgresql.org/message-id/CA+TgmoYGjN_f=fcerx49bzjhng+gocty+a+xhnrwcvvdy8u...@mail.gmail.com Well, I think that email is mostly complaining about there being no buffering at all in a situation where it would be advantageous to do some buffering. But that particular code I believe is gone now because of the shared-memory stats collector, and when looking through the patch set, I didn't see that any of the cases that it touched were similar to that one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: [PATCH] Introduce array_shuffle() and array_sample()
Am 22.09.22 um 17:23 schrieb Andres Freund: Hi, On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote: Patch update without merge conflicts. Due to the merge of the meson based build, this patch needs to be adjusted. See https://cirrus-ci.com/build/6580671765282816 Looks like it'd just be adding user_prng.c to src/backend/utils/adt/meson.build's list. Greetings, Andres Freund Hi Andres, thanks for the heads up. Adjusted patch is attached. - MartinFrom 3c4a939abf8d29bcf666e49ecb042ade42b36c2c Mon Sep 17 00:00:00 2001 From: Martin Kalcher Date: Sun, 17 Jul 2022 18:06:04 +0200 Subject: [PATCH v4] Introduce array_shuffle() and array_sample() * array_shuffle() shuffles the elements of an array. * array_sample() chooses n elements from an array by random. The new functions share its prng state with random() and thus interoperate with setseed(). --- doc/src/sgml/func.sgml | 40 +- src/backend/utils/adt/Makefile | 1 + src/backend/utils/adt/array_userfuncs.c | 176 src/backend/utils/adt/float.c | 37 + src/backend/utils/adt/meson.build | 1 + src/backend/utils/adt/user_prng.c | 87 src/include/catalog/pg_proc.dat | 6 + src/include/utils/user_prng.h | 19 +++ src/test/regress/expected/arrays.out| 66 + src/test/regress/sql/arrays.sql | 16 +++ 10 files changed, 414 insertions(+), 35 deletions(-) create mode 100644 src/backend/utils/adt/user_prng.c create mode 100644 src/include/utils/user_prng.h diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e1fe4fec57..2a96fc323a 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -1820,7 +1820,8 @@ repeat('Pg', 4) PgPgPgPg void -Sets the seed for subsequent random() calls; +Sets the seed for subsequent calls to random functions, like +random() or array_shuffle(); argument must be between -1.0 and 1.0, inclusive @@ -1838,7 +1839,8 @@ repeat('Pg', 4) PgPgPgPg applications; see the module for a more secure alternative. If setseed() is called, the series of results of - subsequent random() calls in the current session + subsequent calls to random functions, like random() or + array_shuffle(), in the current session can be repeated by re-issuing setseed() with the same argument. Without any prior setseed() call in the same @@ -18468,6 +18470,40 @@ SELECT NULLIF(value, '(none)') ... + + + + array_sample + +array_sample ( array anyarray, n integer ) +anyarray + + +Returns n randomly chosen elements from array in selection order. + + +array_sample(ARRAY[[1,2],[3,4],[5,6]], 2) +{{5,6},{1,2}} + + + + + + + array_shuffle + +array_shuffle ( anyarray ) +anyarray + + +Shuffles the first dimension of the array. + + +array_shuffle(ARRAY[[1,2],[3,4],[5,6]]) +{{5,6},{1,2},{3,4}} + + + diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile index 0de0bbb1b8..17b57a5569 100644 --- a/src/backend/utils/adt/Makefile +++ b/src/backend/utils/adt/Makefile @@ -110,6 +110,7 @@ OBJS = \ tsvector.o \ tsvector_op.o \ tsvector_parser.o \ + user_prng.o \ uuid.o \ varbit.o \ varchar.o \ diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c index ca70590d7d..c4a2117df7 100644 --- a/src/backend/utils/adt/array_userfuncs.c +++ b/src/backend/utils/adt/array_userfuncs.c @@ -17,6 +17,7 @@ #include "utils/array.h" #include "utils/builtins.h" #include "utils/lsyscache.h" +#include "utils/user_prng.h" #include "utils/typcache.h" @@ -902,3 +903,178 @@ array_positions(PG_FUNCTION_ARGS) PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext)); } + +/* + * Produce array with n randomly chosen items from given array in random order. + * + * array: array object to examine (must not be NULL) + * n: number of items (must not be greater than the size of the arrays first dimension) + * elmtyp, elmlen, elmbyval, elmalign: info for the datatype of the items + * + * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info + * from the system catalogs, given the elmtyp. However, the caller is + * in a better position to cache this info across multiple uses, or even + * to hard-wire values if the element type is hard-wired. + */ +static ArrayType * +array_shuffle_n(ArrayType *array, int n, +Oid elmtyp, int elmlen, bool elmbyval, char elmalign) +{ + ArrayType *result; + int ndim, + *dims, + *lbs, +rdims[MAXDIM], +nelm, +nitem, +i, +j, +k; + Datum elm, + *elms, + *ielms, + *jelms; + bool nul, +
Re: Extending outfuncs support to utility statements
Hi, On 2022-08-24 17:25:31 +0200, Peter Eisentraut wrote: > Here are patches for that. These patches have been failing since they were posted, afaict: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3848 I assume that's known? Most of the failures seem to be things like diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/tablespace.out /tmp/cirrus-ci-build/build/testrun/main/regress/results/tablespace.out --- /tmp/cirrus-ci-build/src/test/regress/expected/tablespace.out 2022-09-22 12:30:07.340655000 + +++ /tmp/cirrus-ci-build/build/testrun/main/regress/results/tablespace.out 2022-09-22 12:35:15.075825000 + @@ -3,6 +3,8 @@ ERROR: tablespace location must be an absolute path -- empty tablespace locations are not usually allowed CREATE TABLESPACE regress_tblspace LOCATION ''; -- fail +WARNING: outfuncs/readfuncs failed to produce an equal raw parse tree +WARNING: outfuncs/readfuncs failed to produce an equal rewritten parse tree ERROR: tablespace location must be an absolute path -- as a special developer-only option to allow us to use tablespaces -- with streaming replication on the same server, an empty location Greetings, Andres Freund
Re: Add last_vacuum_index_scans in pg_stat_all_tables
Hi, On 2022-09-16 13:23:06 +0900, Ken Kato wrote: > Thank you for the review! > I fixed it and resubmitting the patch. cfbot flags that the docs aren't valid: https://cirrus-ci.com/task/5309377937670144?logs=docs_build#L295 [15:05:39.683] monitoring.sgml:4574: parser error : Opening and ending tag mismatch: entry line 4567 and row [15:05:39.683] [15:05:39.683]^ The problem is that you're not closing the Greetings, Andres Freund
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
Hi, On 2022-09-22 16:43:19 +0900, Michael Paquier wrote: > I have put my hands on 0001, and finished with the attached, that > includes many fixes and tweaks. Due to the merge of the meson based build this patch needs some adjustment. See https://cirrus-ci.com/build/6146162607521792 Looks like it just requires adding xlogbackup.c to src/backend/access/transam/meson.build. Greetings, Andres Freund
Re: [PATCH] Introduce array_shuffle() and array_sample()
Hi, On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote: > Patch update without merge conflicts. Due to the merge of the meson based build, this patch needs to be adjusted. See https://cirrus-ci.com/build/6580671765282816 Looks like it'd just be adding user_prng.c to src/backend/utils/adt/meson.build's list. Greetings, Andres Freund
Re: Summary function for pg_buffercache
Hi, Since header locks are removed again, I put some doc changes and comments back. Thanks, Melih v10-0001-Added-pg_buffercache_summary-function.patch Description: Binary data
Re: [RFC] building postgres with meson - v13
On 2022-Sep-22, Tom Lane wrote: > Ah, right, the joys of maintaining multiple build systems. I wonder > if there's any way to avoid that by scraping file lists from one > group to the other. Or maybe we could have a file common to both, say OBJS, which both scrape in their own way. That could be easier than one scraping the other. > We got a little spoiled perhaps by the MSVC scripts managing to do > that in most cases. Right ... and it was so annoying in the cases it couldn't. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)
Re: Amcheck verification of GiST and GIN
Hi, On 2022-08-17 17:28:02 +0500, Andrey Borodin wrote: > Here's v13. Changes: > 1. Fixed passing through downlink in GIN index > 2. Fixed GIN tests (one test case was not working) > > Thanks to Vitaliy Kukharik for trying this patches. Due to the merge of the meson based build, this patch needs to be adjusted. See https://cirrus-ci.com/build/6637154947301376 The changes should be fairly simple, just mirroring the Makefile ones. Greetings, Andres Freund
Re: pg_waldump: add test for coverage
Hi, On 2022-08-23 10:50:08 +0900, Dong Wook Lee wrote: > I wrote a test for coverage. Unfortunately the test doesn't seem to pass on windows, and hasn't ever done so: https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3834 Due to the merge of the meson patchset, you should also add 001_basic.pl to the list of tests in meson.build Greetings, Andres Freund
Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.
Hi, On 2022-05-20 17:46:38 +0400, Pavel Borisov wrote: > CFbot says v12 patch does not apply. > Rebased. PFA v13. > Your reviews are very much welcome! Due to the merge of the meson based build this patch needs to be adjusted: https://cirrus-ci.com/build/6350479973154816 Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be installed and t/004_verify_nbtree_unique.pl to the tests. Greetings, Andres Freund
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 11:46 AM John Naylor wrote: > One thing I want to try soon is storing fewer than 16/32 etc entries, so that the whole node fits comfortably inside a power-of-two allocation. That would allow us to use aset without wasting space for the smaller nodes, which would be faster and possibly would solve the fragmentation problem Andres referred to in > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de While calculating node sizes that fit within a power-of-two size, I noticed the current base node is a bit wasteful, taking up 8 bytes. The node kind only has a small number of values, so it doesn't really make sense to use an enum here in the struct (in fact, Andres' prototype used a uint8 for node_kind). We could use a bitfield for the count and kind: uint16 -- kind and count bitfield uint8 shift; uint8 chunk; That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag, the bitfield can just go back to being count only. Here are the v6 node kinds: node4: 8 + 4 +(4)+ 4*8 = 48 bytes node16: 8 + 16 + 16*8 = 152 node32: 8 + 32 + 32*8 = 296 node128: 8 + 256 + 128/8 + 128*8 = 1304 node256: 8 + 256/8 + 256*8 = 2088 And here are the possible ways we could optimize nodes for space using aset allocation. Parentheses are padding bytes. Even if my math has mistakes, the numbers shouldn't be too far off: node3: 4 + 3 +(1)+ 3*8 = 32 bytes node6: 4 + 6 +(6)+ 6*8 = 64 node13: 4 + 13 +(7)+ 13*8 = 128 node28: 4 + 28 + 28*8 = 256 node31: 4 + 256 + 32/8 + 31*8 = 512 (XXX not good) node94: 4 + 256 + 96/8 + 94*8 = 1024 node220: 4 + 256 + 224/8 + 220*8 = 2048 node256: = 4096 The main disadvantage is that node256 would balloon in size. -- John Naylor EDB: http://www.enterprisedb.com
Re: proposal: possibility to read dumped table's name from file
Hi, On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote: > > On 9 Sep 2022, at 11:00, Andrew Dunstan wrote: > > > >> On Sep 9, 2022, at 5:53 PM, John Naylor > >> wrote: > >> > >> Note that the grammar has shift-reduce conflicts. > > > Looks like the last rule for Filters should not be there. > > Correct, fixed in the attached. Due to the merge of the meson build, this patch now needs to adjust the relevant meson.build. This is the cause of the failures at: https://cirrus-ci.com/build/5788292678418432 See e.g. src/bin/pgbench/meson.build Greetings, Andres Freund
Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work
On Tue, 20 Sep 2022 at 00:19, Japin Li wrote: > Hi hacker, > > As $subject detailed, the tab-complete cannot work such as: > >CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=6543' \t > > It seems that the get_previous_words() could not parse the single quote. > > OTOH, it works for CREATE SUBSCRIPTION sub CONNECTION xx \t, should we fix it? Attach fixed this problem. Any thoughts? -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd. >From 40b3ef8decd305bdaefddd797872da05bb5b65b4 Mon Sep 17 00:00:00 2001 From: Japin Li Date: Thu, 22 Sep 2022 22:39:00 +0800 Subject: [PATCH v1] Fix tab-completation for CREATE SUBSCRIPTION --- src/bin/psql/tab-complete.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c index 820f47d23a..73c34aed83 100644 --- a/src/bin/psql/tab-complete.c +++ b/src/bin/psql/tab-complete.c @@ -6068,7 +6068,7 @@ get_previous_words(int point, char **buffer, int *nwords) */ for (start = end; start > 0; start--) { - if (buf[start] == '"') + if (buf[start] == '"' || buf[start] == '\'') inquotes = !inquotes; if (!inquotes) { -- 2.25.1
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-22 04:29:15 -0400, Andrew Dunstan wrote: > Great. Now I'll start on buildfarm support. Given my current > commitments, this will take me a while, but I hope to have a working > client by about the beginning of November. Great! Let me know if there's something I can do to help. Greetings, Andres Freund
Re: [RFC] building postgres with meson - v13
Hi, On 2022-09-22 16:56:57 +0200, Alvaro Herrera wrote: > On 2022-Sep-22, Tom Lane wrote: > > Initial reports from the cfbot are mostly promising, but there are a few > > patches where all the meson builds fail while all the autoconf ones pass, > > so there's something for you to look at. So far CF entries 3464, 3733, > > 3771, 3808 look that way. > > Hmm, but those patches add files, which means they're now outdated: they > need to add these files to the respective meson.build file. Yea, I looked through all of these and they all need need a simple addition of a file to be built or installed. Greetings, Andres Freund
Re: [RFC] building postgres with meson - v13
Alvaro Herrera writes: > On 2022-Sep-22, Tom Lane wrote: >> Initial reports from the cfbot are mostly promising, but there are a few >> patches where all the meson builds fail while all the autoconf ones pass, >> so there's something for you to look at. So far CF entries 3464, 3733, >> 3771, 3808 look that way. > Hmm, but those patches add files, which means they're now outdated: they > need to add these files to the respective meson.build file. Ah, right, the joys of maintaining multiple build systems. I wonder if there's any way to avoid that by scraping file lists from one group to the other. We got a little spoiled perhaps by the MSVC scripts managing to do that in most cases. regards, tom lane
Re: [RFC] building postgres with meson - v13
On 2022-Sep-22, Tom Lane wrote: > Initial reports from the cfbot are mostly promising, but there are a few > patches where all the meson builds fail while all the autoconf ones pass, > so there's something for you to look at. So far CF entries 3464, 3733, > 3771, 3808 look that way. Hmm, but those patches add files, which means they're now outdated: they need to add these files to the respective meson.build file. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "La experiencia nos dice que el hombre peló millones de veces las patatas, pero era forzoso admitir la posibilidad de que en un caso entre millones, las patatas pelarían al hombre" (Ijon Tichy)
Re: [RFC] building postgres with meson - v13
Andres Freund writes: > On 2022-09-21 09:46:30 -0700, Andres Freund wrote: >> I'm planning to commit this today, unless somebody wants to argue against >> that. > And done! Yay! Initial reports from the cfbot are mostly promising, but there are a few patches where all the meson builds fail while all the autoconf ones pass, so there's something for you to look at. So far CF entries 3464, 3733, 3771, 3808 look that way. regards, tom lane
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 7:52 PM John Naylor wrote: > > > On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada wrote: > > Good point. While keeping the chunks in the small nodes in sorted > > order is useful for visiting all keys in sorted order, additional > > branches and memmove calls could be slow. > > Right, the ordering is a property that some users will need, so best to keep it. Although the node128 doesn't have that property -- too slow to do so, I think. Nevermind, I must have been mixing up keys and values there... -- John Naylor EDB: http://www.enterprisedb.com
Re: Pruning never visible changes
On 2022-Sep-22, Simon Riggs wrote: > On Mon, 19 Sept 2022 at 00:16, Greg Stark wrote: > > VACUUM was willing to remove a committed-dead tuple immediately if it > > was > > deleted by the same transaction that inserted it. The idea is that > > such a > > tuple could never have been visible to any other transaction, so we > > don't > > need to keep it around to satisfy MVCC snapshots. However, there was > > already an exception for tuples that are part of an update chain, and > > this > > exception created a problem: we might remove TOAST tuples (which are > > never > > part of an update chain) while their parent tuple stayed around (if it > > was > > part of an update chain). This didn't pose a problem for most things, > > since the parent tuple is indeed dead: no snapshot will ever consider it > > visible. But MVCC-safe CLUSTER had a problem, since it will try to copy > > RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST > > data too, and would fail if VACUUM had already removed the toast tuples. > Good research Greg, thank you. Only took 10 years for me to notice it > was gone ;-) But this begs the question: is the proposed change safe, given that ancient consideration? I don't think TOAST issues have been mentioned in this thread so far, so I wonder if there is a test case that verifies that this problem doesn't occur for some reason. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: RFC: Logging plan of the running query
On 2022-09-21 17:30, Alena Rybakina wrote: Thanks for your reply! I also noticed it. However I also discovered that above function declarations to be aplied for explain command and used to be printed details of the executed query. We have a similar task to print the plan of an interrupted process making a request for a specific pid. In short, I think, this task is different and for separating these parts I added this comment. I'm not sure I understand your comment correctly, do you mean HandleLogQueryPlanInterrupt() should not be placed in explain.c? It may be so. However, given that ProcesLogMemoryContextInterrupt(), which similarly handles interrupts for pg_log_backend_memory_contexts(), is located in mcxt.c, I also think current location might be acceptable. -- Regards, -- Atsushi Torikoshi NTT DATA CORPORATION
Re: why can't a table be part of the same publication as its schema
> On Sep 22, 2022, at 8:02 AM, Alvaro Herrera wrote: > > FWIW I put this to CI: > https://cirrus-ci.com/build/5823276948652032 (master) > > and everything appears to be OK. If anybody has reservations about this > grammar change, please speak up soon, as there's not much time before RC1. > > The one for 15 just started running: > https://cirrus-ci.com/build/4735322423558144 [personal hat, not RMT] Looks like it passed. No objections. Jonathan
Re: Fix snapshot name for SET TRANSACTION documentation
On Thu, 22 Sep 2022 at 12:00, Fujii Masao wrote: > On 2022/09/21 14:40, Fujii Masao wrote: >> On 2022/09/21 12:01, Japin Li wrote: >>> >>> Hi hackers, >>> >>> In 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, we change the snapshot name >>> when exporting snapshot, however, there is one place we missed update the >>> snapshot name in documentation. Attach a patch to fix it. >> Thanks for the patch! Looks good to me. > > Pushed. Thanks! > Thanks! -- Regrads, Japin Li. ChengDu WenWu Information Technology Co.,Ltd.
Re: Pruning never visible changes
On Mon, 19 Sept 2022 at 00:16, Greg Stark wrote: > > On Fri, 16 Sept 2022 at 10:27, Tom Lane wrote: > > > > Simon Riggs writes: > > > A user asked me whether we prune never visible changes, such as > > > BEGIN; > > > INSERT... > > > UPDATE.. (same row) > > > COMMIT; > > > > Didn't we just have this discussion in another thread? > > Well. not "just" :) > > commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414 > Author: Tom Lane > Date: Fri Apr 29 16:29:42 2011 -0400 > > Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum(). > > VACUUM was willing to remove a committed-dead tuple immediately if it was > deleted by the same transaction that inserted it. The idea is that such a > tuple could never have been visible to any other transaction, so we don't > need to keep it around to satisfy MVCC snapshots. However, there was > already an exception for tuples that are part of an update chain, and this > exception created a problem: we might remove TOAST tuples (which are never > part of an update chain) while their parent tuple stayed around (if it was > part of an update chain). This didn't pose a problem for most things, > since the parent tuple is indeed dead: no snapshot will ever consider it > visible. But MVCC-safe CLUSTER had a problem, since it will try to copy > RECENTLY_DEAD tuples to the new table. It then has to copy their TOAST > data too, and would fail if VACUUM had already removed the toast tuples. > > Easiest fix is to get rid of the special case for xmin == xmax. This may > delay reclaiming dead space for a little bit in some cases, but it's by > far > the most reliable way to fix the issue. > > Per bug #5998 from Mark Reid. Back-patch to 8.3, which is the oldest > version with MVCC-safe CLUSTER. Good research Greg, thank you. Only took 10 years for me to notice it was gone ;-) -- Simon Riggshttp://www.EnterpriseDB.com/
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada wrote: > > On Thu, Sep 22, 2022 at 1:46 PM John Naylor > wrote: > > While on the subject, I wonder how important it is to keep the chunks in the small nodes in sorted order. That adds branches and memmove calls, and is the whole reason for the recent "pg_lfind_ge" function. > > Good point. While keeping the chunks in the small nodes in sorted > order is useful for visiting all keys in sorted order, additional > branches and memmove calls could be slow. Right, the ordering is a property that some users will need, so best to keep it. Although the node128 doesn't have that property -- too slow to do so, I think. -- John Naylor EDB: http://www.enterprisedb.com
Re: why can't a table be part of the same publication as its schema
FWIW I put this to CI: https://cirrus-ci.com/build/5823276948652032 (master) and everything appears to be OK. If anybody has reservations about this grammar change, please speak up soon, as there's not much time before RC1. The one for 15 just started running: https://cirrus-ci.com/build/4735322423558144 -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "El miedo atento y previsor es la madre de la seguridad" (E. Burke)
Re: archive modules
On 17.09.22 11:49, Peter Eisentraut wrote: On 14.09.22 23:09, Nathan Bossart wrote: On Wed, Sep 14, 2022 at 09:31:04PM +0200, Peter Eisentraut wrote: Here is a patch that addresses this. My intent was to present archive_command as the built-in archive library, but I can see how this might cause confusion, so this change seems reasonable to me. While working on this, I noticed that in master this conflicts with commit 3cabe45a819f8a2a282d9d57e45f259c84e97c3f. I have posted a message in that thread looking for a resolution. I have received clarification there, so I went ahead with this patch here after some adjustments in master around that other patch.
Re: [PoC] Let libpq reject unexpected authentication requests
On 22.09.22 01:37, Jacob Champion wrote: On Wed, Sep 21, 2022 at 3:36 PM Peter Eisentraut wrote: So let's look at the two TODO comments you have: * TODO: how should !auth_required interact with an incomplete * SCRAM exchange? What specific combination of events are you thinking of here? Let's say the client is using `require_auth=!password`. If the server starts a SCRAM exchange. but doesn't finish it, the connection will still succeed with the current implementation (because it satisfies the "none" case). This is also true for a client setting of `require_auth=scram-sha-256,none`. I think this is potentially dangerous, but it mirrors the current behavior of libpq and I'm not sure that we should change it as part of this patch. It might be worth reviewing that behavior for other reasons, but I think semantics of your patch are correct. In any case, it seems to me that it would be safer to *not* make this assumption at first and then have someone more knowledgeable make the argument that it would be safe. I think I'm okay with that, regardless. Here's one of the wrinkles: right now, both of the following connstrings work: require_auth=gss gssencmode=require require_auth=gss gssencmode=prefer If we don't treat gssencmode as providing GSS auth, then the first case will always fail, because there will be no GSS authentication packet over an encrypted connection. Likewise, the second case will almost always fail, unless the server doesn't support gssencmode at all (so why are you using prefer?). If you're okay with those limitations, I will rip out the code. The reason I'm not too worried about it is, I don't think it makes much sense to be strict about your authentication requirements while at the same time leaving the choice of transport encryption up to the server. The way I understand what you explained here is that it would be more sensible to leave that code in. I would be okay with that.
Re: pg_basebackup's --gzip switch misbehaves
On Thu, Sep 22, 2022 at 12:47:34AM -0400, Tom Lane wrote: > Sure. I'd say we have 48 hours to choose whether to put this in v15. > But not more than that. I have a window to be able to look at the buildfarm today, tomorrow being harder, so I have adjusted that now on both HEAD and REL_15_STABLE for consistency. -- Michael signature.asc Description: PGP signature
RE: Perform streaming logical transactions by background workers and parallel apply
Hi Amit, > > I checked other flags that are set by signal handlers, their datatype > > seemed to > be sig_atomic_t. > > Is there any reasons that you use normal bool? It should be changed if not. > > > > It follows the logic similar to ParallelMessagePending. Do you see any > problem with it? Hmm, one consideration is: what will happen if the signal handler HandleParallelApplyMessageInterrupt() is fired during "ParallelApplyMessagePending = false;"? IIUC sig_atomic_t has been needed to avoid writing to same data at the same time. According to C99 standard(I grepped draft version [1]), the behavior seems to be undefined if the signal handler attaches to not "volatile sig_atomic_t" data. ...But I'm not sure whether this is really problematic in the current system, sorry... ``` If the signal occurs other than as the result of calling the abort or raise function, the behavior is undefined if the signal handler refers to any object with static storage duration other than by assigning a value to an object declared as volatile sig_atomic_t, or the signal handler calls any function in the standard library other than the abort function, the _Exit function, or the signal function with the first argument equal to the signal number corresponding to the signal that caused the invocation of the handler. ``` > > a. > > I was not sure when the cell should be cleaned. Currently we clean up > ParallelApplyWorkersList() only in the parallel_apply_start_worker(), > > but we have chances to remove such a cell like HandleParallelApplyMessages() > or HandleParallelApplyMessage(). How do you think? > > > > Note that HandleParallelApply* are invoked during interrupt handling, > so it may not be advisable to remove it there. > > > > > 12. ConfigureNamesInt > > > > ``` > > + { > > + {"max_parallel_apply_workers_per_subscription", > > + PGC_SIGHUP, > > + REPLICATION_SUBSCRIBERS, > > + gettext_noop("Maximum number of parallel apply > workers per subscription."), > > + NULL, > > + }, > > + _parallel_apply_workers_per_subscription, > > + 2, 0, MAX_BACKENDS, > > + NULL, NULL, NULL > > + }, > > ``` > > > > This parameter can be changed by pg_ctl reload, so the following corner case > may be occurred. > > Should we add a assign hook to handle this? Or, can we ignore it? > > > > I think we can ignore this as it will eventually start respecting the > threshold. Both of you said are reasonable for me. [1]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: cannot to compile master in llvm_resolve_symbols
čt 22. 9. 2022 v 12:54 odesílatel Thomas Munro napsal: > On Thu, Sep 22, 2022 at 10:43 PM Pavel Stehule > wrote: > > Today I found new bug > > > > -o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po > > llvmjit.c: In function ‘llvm_resolve_symbols’: > > llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use > in this function); did you mean ‘LLVMOrcCSymbolMapPair’? > > 1115 | LLVMOrcCSymbolMapPairs symbols = > palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize); > > | > ^ > > | > LLVMOrcCSymbolMapPair > > Hi Pavel, > > Some changes are needed for LLVM 15. I'm working on a patch, but it's > not quite ready yet. Use LLVM 14 for now. There are a few > superficial changes like that that are very easy to fix (that struct's > name changed), but the real problem is that in LLVM 15 you have to do > extra work to track the type of pointers and pass them into API calls > that we have a lot of. https://llvm.org/docs/OpaquePointers.html Thank you for info Regards Pavel
Re: cannot to compile master in llvm_resolve_symbols
On Thu, Sep 22, 2022 at 10:43 PM Pavel Stehule wrote: > Today I found new bug > > -o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po > llvmjit.c: In function ‘llvm_resolve_symbols’: > llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use in > this function); did you mean ‘LLVMOrcCSymbolMapPair’? > 1115 | LLVMOrcCSymbolMapPairs symbols = > palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize); > | > ^ > | > LLVMOrcCSymbolMapPair Hi Pavel, Some changes are needed for LLVM 15. I'm working on a patch, but it's not quite ready yet. Use LLVM 14 for now. There are a few superficial changes like that that are very easy to fix (that struct's name changed), but the real problem is that in LLVM 15 you have to do extra work to track the type of pointers and pass them into API calls that we have a lot of. https://llvm.org/docs/OpaquePointers.html
cannot to compile master in llvm_resolve_symbols
Hi Today I found new bug -o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po llvmjit.c: In function ‘llvm_resolve_symbols’: llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use in this function); did you mean ‘LLVMOrcCSymbolMapPair’? 1115 | LLVMOrcCSymbolMapPairs symbols = palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize); | ^ | LLVMOrcCSymbolMapPair llvmjit.c:1115:57: note: each undeclared identifier is reported only once for each function it appears in llvmjit.c: In function ‘llvm_create_jit_instance’: llvmjit.c:1233:19: error: too few arguments to function ‘LLVMOrcCreateCustomCAPIDefinitionGenerator’ 1233 | ref_gen = LLVMOrcCreateCustomCAPIDefinitionGenerator(llvm_resolve_symbols, NULL); | ^~ In file included from llvmjit.c:22: /usr/include/llvm-c/Orc.h:997:31: note: declared here 997 | LLVMOrcDefinitionGeneratorRef LLVMOrcCreateCustomCAPIDefinitionGenerator( | ^~ I have fedora 37 Regards Pavel
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Sep 22, 2022 at 8:59 AM wangw.f...@fujitsu.com wrote: > Few comments on v33-0001 === 1. + else if (data->streaming == SUBSTREAM_PARALLEL && + data->protocol_version < LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM) + ereport(ERROR, + (errcode(ERRCODE_INVALID_PARAMETER_VALUE), + errmsg("requested proto_version=%d does not support streaming=parallel mode, need %d or higher", + data->protocol_version, LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM))); I think we can improve this error message as: "requested proto_version=%d does not support parallel streaming mode, need %d or higher". 2. --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -3184,7 +3184,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i OID of the relation that the worker is synchronizing; null for the - main apply worker + main apply worker and the apply parallel worker This and other changes in monitoring.sgml refers the workers as "apply parallel worker". Isn't it better to use parallel apply worker as we are using at other places in the patch? But, I have another question, do we really need to display entries for parallel apply workers in pg_stat_subscription if it doesn't have any meaningful information? I think we can easily avoid it in pg_stat_get_subscription by checking apply_leader_pid. 3. ApplyWorkerMain() { ... ... + + if (server_version >= 16 && + MySubscription->stream == SUBSTREAM_PARALLEL) + options.proto.logical.streaming = pstrdup("parallel"); After deciding here whether the parallel streaming mode is enabled or not, we recheck the same thing in apply_handle_stream_abort() and parallel_apply_can_start(). In parallel_apply_can_start(), we do it via two different checks. How about storing this information say in structure MyLogicalRepWorker in ApplyWorkerMain() and then use it at other places? -- With Regards, Amit Kapila.
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On 2022/09/22 16:43, Michael Paquier wrote: Added that part before pg_backup_stop() now where it makes sense with the refactoring. I have put my hands on 0001, and finished with the attached, that includes many fixes and tweaks. Some of the variable renames felt out of place, while some comments were overly verbose for their purpose, though for the last part we did not lose any information in the last version proposed. Thanks for updating the patch! This looks better to me. + MemSet(backup_state, 0, sizeof(BackupState)); + MemSet(backup_state->name, '\0', sizeof(backup_state->name)); The latter MemSet() is not necessary because the former already resets that with zero, is it? + pfree(tablespace_map); + tablespace_map = NULL; + } + + tablespace_map = makeStringInfo(); tablespace_map doesn't need to be reset to NULL here. /* Free structures allocated in TopMemoryContext */ - pfree(label_file->data); - pfree(label_file); + pfree(backup_label->data); + pfree(backup_label); + backup_label = NULL; This source comment is a bit misleading, isn't it? Because the memory for backup_label is allocated under the memory context other than TopMemoryContext. +#include "access/xlog.h" +#include "access/xlog_internal.h" +#include "access/xlogbackup.h" +#include "utils/memutils.h" Seems "utils/memutils.h" doesn't need to be included. + XLByteToSeg(state->startpoint, stopsegno, wal_segment_size); + XLogFileName(stopxlogfile, state->starttli, stopsegno, wal_segment_size); + appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n", + LSN_FORMAT_ARGS(state->startpoint), stopxlogfile); state->stoppoint and state->stoptli should be used instead of state->startpoint and state->starttli? + pfree(tablespace_map); + tablespace_map = NULL; + pfree(backup_state); + backup_state = NULL; It's harmless to set tablespace_map and backup_state to NULL after pfree(), but it's also unnecessary at least here. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: Perform streaming logical transactions by background workers and parallel apply
On Thu, Sep 22, 2022 at 1:37 PM kuroda.hay...@fujitsu.com wrote: > > === > applyparallelworker.c > > 03. declaration > > ``` > +/* > + * Is there a message pending in parallel apply worker which we need to > + * receive? > + */ > +volatile bool ParallelApplyMessagePending = false; > ``` > > I checked other flags that are set by signal handlers, their datatype seemed > to be sig_atomic_t. > Is there any reasons that you use normal bool? It should be changed if not. > It follows the logic similar to ParallelMessagePending. Do you see any problem with it? > 04. HandleParallelApplyMessages() > > ``` > + if (winfo->error_mq_handle == NULL) > + continue; > ``` > > a. > I was not sure when the cell should be cleaned. Currently we clean up > ParallelApplyWorkersList() only in the parallel_apply_start_worker(), > but we have chances to remove such a cell like HandleParallelApplyMessages() > or HandleParallelApplyMessage(). How do you think? > Note that HandleParallelApply* are invoked during interrupt handling, so it may not be advisable to remove it there. > > 12. ConfigureNamesInt > > ``` > + { > + {"max_parallel_apply_workers_per_subscription", > + PGC_SIGHUP, > + REPLICATION_SUBSCRIBERS, > + gettext_noop("Maximum number of parallel apply > workers per subscription."), > + NULL, > + }, > + _parallel_apply_workers_per_subscription, > + 2, 0, MAX_BACKENDS, > + NULL, NULL, NULL > + }, > ``` > > This parameter can be changed by pg_ctl reload, so the following corner case > may be occurred. > Should we add a assign hook to handle this? Or, can we ignore it? > I think we can ignore this as it will eventually start respecting the threshold. -- With Regards, Amit Kapila.
RE: Perform streaming logical transactions by background workers and parallel apply
On Thursday, September 22, 2022 4:08 PM Kuroda, Hayato/黒田 隼人 wrote: > > Thanks for updating the patch! Followings are comments for v33-0001. Thanks for the comments. > 04. HandleParallelApplyMessages() > > ``` > + if (winfo->error_mq_handle == NULL) > + continue; > ``` > > a. > I was not sure when the cell should be cleaned. Currently we clean up > ParallelApplyWorkersList() only in the parallel_apply_start_worker(), but we > have chances to remove such a cell like HandleParallelApplyMessages() or > HandleParallelApplyMessage(). How do you think? HandleParallelApplyxx functions are signal callback functions which I think are unsafe to cleanup the list cells that may be in use before entering these signal callback functions. > > 05. parallel_apply_setup_worker() > > `` > + if (launched) > + { > + ParallelApplyWorkersList = lappend(ParallelApplyWorkersList, > winfo); > + } > ``` > > {} should be removed. I think this style is fine and this was also suggested to be consistent with the else{} part. > > 06. parallel_apply_wait_for_xact_finish() > > ``` > + /* If any workers have died, we have failed. */ > ``` > > This function checked only about a parallel apply worker, so the comment > should be "if worker has..."? The comments seem clear to me as it's a general comment. Best regards, Hou zj
Re: [RFC] building postgres with meson - v13
On 2022-09-22 Th 01:57, Andres Freund wrote: > Hi, > > On 2022-09-21 09:46:30 -0700, Andres Freund wrote: >> I'm planning to commit this today, unless somebody wants to argue against >> that. > And done! > > Changes: > - fixed a few typos (thanks Thomas) > - less duplication in the CI tasks > - removed an incomplete implementation of the target for abbrevs.txt - do we > even want to have that? > - plenty hand wringing on my part > > > I also rebased my meson git tree, which still has plenty additional test > platforms (netbsd, openbsd, debian sid, fedora rawhide, centos 8, centos 7, > opensuse tumbleweed), but without the autoconf versions of those targets. I > also added a commit that translates most of the CompilerWarnings task to > meson. Still need to add a headerscheck / cpluspluscheck target. > Great. Now I'll start on buildfarm support. Given my current commitments, this will take me a while, but I hope to have a working client by about the beginning of November. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
RE: Perform streaming logical transactions by background workers and parallel apply
Dear Wang, Thanks for updating the patch! Followings are comments for v33-0001. === libpqwalreceiver.c 01. inclusion ``` +#include "catalog/pg_subscription.h" ``` We don't have to include it because the analysis of parameters is done at caller. === launcher.c 02. logicalrep_worker_launch() ``` + /* +* Return silently if the number of parallel apply workers reached the +* limit per subscription. +*/ + if (is_subworker && nparallelapplyworkers >= max_parallel_apply_workers_per_subscription) ``` a. I felt that it might be kind if we output some debug messages. b. The if statement seems to be more than 80 characters. You can move to new line around "nparallelapplyworkers >= ...". === applyparallelworker.c 03. declaration ``` +/* + * Is there a message pending in parallel apply worker which we need to + * receive? + */ +volatile bool ParallelApplyMessagePending = false; ``` I checked other flags that are set by signal handlers, their datatype seemed to be sig_atomic_t. Is there any reasons that you use normal bool? It should be changed if not. 04. HandleParallelApplyMessages() ``` + if (winfo->error_mq_handle == NULL) + continue; ``` a. I was not sure when the cell should be cleaned. Currently we clean up ParallelApplyWorkersList() only in the parallel_apply_start_worker(), but we have chances to remove such a cell like HandleParallelApplyMessages() or HandleParallelApplyMessage(). How do you think? b. Comments should be added even if we keep this, like "exited worker, skipped". ``` + else + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), +errmsg("lost connection to the leader apply worker"))); ``` c. This function is called on the leader apply worker, so the hint should be "lost connection to the parallel apply worker". 05. parallel_apply_setup_worker() `` + if (launched) + { + ParallelApplyWorkersList = lappend(ParallelApplyWorkersList, winfo); + } ``` {} should be removed. 06. parallel_apply_wait_for_xact_finish() ``` + /* If any workers have died, we have failed. */ ``` This function checked only about a parallel apply worker, so the comment should be "if worker has..."? === worker.c 07. handle_streamed_transaction() ``` + * For non-streamed transactions, returns false; ``` "returns false;" -> "returns false" apply_handle_commit_prepared(), apply_handle_abort_prepared() These functions are not expected that parallel worker calls so I think Assert() should be added. 08. UpdateWorkerStats() ``` -static void +void UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply) ``` This function is called only in worker.c, should be static. 09. subscription_change_cb() ``` -static void +void subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue) ``` This function is called only in worker.c, should be static. 10. InitializeApplyWorker() ``` +/* + * Initialize the database connection, in-memory subscription and necessary + * config options. + */ void -ApplyWorkerMain(Datum main_arg) +InitializeApplyWorker(void) ``` Some comments should be added about this is a common part of leader and parallel apply worker. === logicalrepworker.h 11. declaration ``` extern PGDLLIMPORT volatile bool ParallelApplyMessagePending; ``` Please refer above comment. === guc_tables.c 12. ConfigureNamesInt ``` + { + {"max_parallel_apply_workers_per_subscription", + PGC_SIGHUP, + REPLICATION_SUBSCRIBERS, + gettext_noop("Maximum number of parallel apply workers per subscription."), + NULL, + }, + _parallel_apply_workers_per_subscription, + 2, 0, MAX_BACKENDS, + NULL, NULL, NULL + }, ``` This parameter can be changed by pg_ctl reload, so the following corner case may be occurred. Should we add a assign hook to handle this? Or, can we ignore it? 1. set max_parallel_apply_workers_per_subscription to 4. 2. start replicating two streaming transactions. 3. commit transactions === Two parallel workers will be remained === 4. change max_parallel_apply_workers_per_subscription to 3 5. We expected that only one worker remains, but two parallel workers remained. It will be not stopped until another streamed transaction is started and committed. Best Regards, Hayato Kuroda FUJITSU LIMITED
RE: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER
We can support both passing the token from an upstream client and libpq implementing OAUTH2 protocol to obtaining one. Libpq implementing OAUTHBEARER is needed for community/3rd party tools to have user-friendly authentication experience: 1. For community client tools, like pg_admin, psql etc. Example experience: pg_admin would be able to open a popup dialog to authenticate customer and keep refresh token to avoid asking the user frequently. 2. For 3rd party connectors supporting generic OAUTH with any provider. Useful for datawiz clients, like Tableau or ETL tools. Those can support both user and client OAUTH flows. Libpq passing toked directly from an upstream client is useful in other scenarios: 1. Enterprise clients, built with .Net / Java and using provider-specific authentication libraries, like MSAL for AAD. Those can also support more advance provider-specific token acquisition flows. 2. Resource-tight (like IoT) clients. Those can be compiled without optional libpq flag not including the iddawc or other dependency. Thanks! Andrey. -Original Message- From: Jacob Champion Sent: Wednesday, September 21, 2022 9:03 AM To: mahendrakar s Cc: pgsql-hack...@postgresql.org; smilingsa...@gmail.com; and...@anarazel.de; Andrey Chudnovskiy ; Mahendrakar Srinivasarao Subject: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER [You don't often get email from jchamp...@timescale.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ] On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion wrote: > > 2. Add support to pass on the OAuth bearer token. In this > > obtaining the bearer token is left to 3rd party application or user. > > > > ./psql -U -d 'dbname=postgres > > oauth_client_id= oauth_bearer_token= > > This hurts, but I think people are definitely going to ask for it, > given the frightening practice of copy-pasting these (incredibly > sensitive > secret) tokens all over the place... After some further thought -- in this case, you already have an opaque Bearer token (and therefore you already know, out of band, which provider needs to be used), you're willing to copy-paste it from whatever service you got it from, and you have an extension plugged into Postgres on the backend that verifies this Bearer blob using some procedure that Postgres knows nothing about. Why do you need the OAUTHBEARER mechanism logic at that point? Isn't that identical to a custom password scheme? It seems like that could be handled completely by Samay's pluggable auth proposal. --Jacob
Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)
On Wed, Sep 21, 2022 at 05:10:39PM +0530, Bharath Rupireddy wrote: >> deallocate_backup_variables() is the only part of xlogbackup.c that >> includes references of the tablespace map_and backup_label >> StringInfos. I would be tempted to fully decouple that from >> xlogbackup.c/h for the time being. > > There's no problem with it IMO, after all, they are backup related > variables. And that function reduces a bit of duplicate code. Hmm. I'd like to disagree with this statement :) > Added that part before pg_backup_stop() now where it makes sense with > the refactoring. I have put my hands on 0001, and finished with the attached, that includes many fixes and tweaks. Some of the variable renames felt out of place, while some comments were overly verbose for their purpose, though for the last part we did not lose any information in the last version proposed. As I suspected, the deallocate routine felt unnecessary, as xlogbackup.c/h have no idea what these are. The remark is particularly true for the StringInfo of the backup_label file: for basebackup.c we need to build it when sending base.tar and in xlogfuncs.c we need it only at the backup stop phase. The code was actually a bit wrong, because free-ing StringInfos requires to free its ->data and then the main object (stringinfo.h explains that). My tweaks have shaved something like 10%~15% of the patch, while making it IMO more readable. A second issue I had was with the build function, and again it seemed much cleaner to let the routine do the makeStringInfo() and return the result. This is not the most popular routine ever, but this reduces the workload of the caller of build_backup_content(). So, opinions? -- Michael From 22216c4b6b75607d45e49620264b1af606396bd4 Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Thu, 22 Sep 2022 16:41:55 +0900 Subject: [PATCH v11] Refactor backup related code Refactor backup related code, advantages of doing so are following: 1) backup state is more structured now - all in a single structure, callers can create backup_label contents whenever required, either during the pg_backup_start or the pg_backup_stop or in between. 2) no parsing of backup_label file lines now, no error checking for invalid parsing. 3) backup_label and history file contents have most of the things in common, they can now be created within a single function. 4) makes backup related code extensible and readable. This introduces new source files xlogbackup.c/.h for backup related code and adds the new code in there. The xlog.c file has already grown to ~9000 LOC (as of this time). Eventually, we would want to move all the backup related code from xlog.c, xlogfuncs.c, elsewhere to here. Author: Bharath Rupireddy Reviewed-by: Michael Paquier Reviewed-by: Fujii Masao Discussion: https://www.postgresql.org/message-id/CALj2ACVqNUEXkaMKyHHOdvScfN9E4LuCWsX_R-YRNfzQ727CdA%40mail.gmail.com --- src/include/access/xlog.h | 10 +- src/include/access/xlogbackup.h | 42 + src/backend/access/transam/Makefile | 1 + src/backend/access/transam/xlog.c | 224 src/backend/access/transam/xlogbackup.c | 81 + src/backend/access/transam/xlogfuncs.c | 96 ++ src/backend/backup/basebackup.c | 44 +++-- 7 files changed, 298 insertions(+), 200 deletions(-) create mode 100644 src/include/access/xlogbackup.h create mode 100644 src/backend/access/transam/xlogbackup.c diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h index 3dbfa6b593..dce265098e 100644 --- a/src/include/access/xlog.h +++ b/src/include/access/xlog.h @@ -11,6 +11,7 @@ #ifndef XLOG_H #define XLOG_H +#include "access/xlogbackup.h" #include "access/xlogdefs.h" #include "access/xlogreader.h" #include "datatype/timestamp.h" @@ -277,11 +278,10 @@ typedef enum SessionBackupState SESSION_BACKUP_RUNNING, } SessionBackupState; -extern XLogRecPtr do_pg_backup_start(const char *backupidstr, bool fast, - TimeLineID *starttli_p, StringInfo labelfile, - List **tablespaces, StringInfo tblspcmapfile); -extern XLogRecPtr do_pg_backup_stop(char *labelfile, bool waitforarchive, - TimeLineID *stoptli_p); +extern void do_pg_backup_start(const char *backupidstr, bool fast, + List **tablespaces, BackupState *state, + StringInfo tblspcmapfile); +extern void do_pg_backup_stop(BackupState *state, bool waitforarchive); extern void do_pg_abort_backup(int code, Datum arg); extern void register_persistent_abort_backup_handler(void); extern SessionBackupState get_backup_status(void); diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h new file mode 100644 index 00..ccdfefe117 --- /dev/null +++ b/src/include/access/xlogbackup.h @@ -0,0 +1,42 @@ +/*- + * + * xlogbackup.h + * Definitions for internals of base backups. + * + * Portions Copyright (c)
Re: Multi-insert related comment in CopyFrom()
On Wed, Sep 21, 2022 at 4:39 PM Etsuro Fujita wrote: > While working on the “Fast COPY FROM based on batch insert” patch, I > noticed this: > > else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL && > resultRelInfo->ri_TrigDesc->trig_insert_new_table) > { > /* > * For partitioned tables we can't support multi-inserts when there > * are any statement level insert triggers. It might be possible to > * allow partitioned tables with such triggers in the future, but for > * now, CopyMultiInsertInfoFlush expects that any before row insert > * and statement level insert triggers are on the same relation. > */ > insertMethod = CIM_SINGLE; > } > > I think there is a thinko in the comment; “before” should be after. > Patch attached. Pushed. Best regards, Etsuro Fujita
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Thu, Sep 22, 2022 at 1:46 PM John Naylor wrote: > > > On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart > wrote: > > > > On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote: > > > > > In short, this code needs to be lower level so that we still have full > > > control while being portable. I will work on this, and also the related > > > code for node dispatch. > > > > Is it possible to use approach #2 here, too? AFAICT space is allocated for > > all of the chunks, so there wouldn't be any danger in searching all them > > and discarding any results >= node->count. > > Sure, the caller could pass the maximum node capacity, and then check if the > returned index is within the range of the node count. > > > Granted, we're depending on the > > number of chunks always being a multiple of elements-per-vector in order to > > avoid the tail path, but that seems like a reasonably safe assumption that > > can be covered with comments. > > Actually, we don't need to depend on that at all. When I said "junk" above, > that can be any bytes, as long as we're not reading off the end of allocated > memory. We'll never do that here, since the child pointers/values follow. In > that case, the caller can hard-code the size (it would even happen to work > now to multiply rt_node_kind by 16, to be sneaky). One thing I want to try > soon is storing fewer than 16/32 etc entries, so that the whole node fits > comfortably inside a power-of-two allocation. That would allow us to use aset > without wasting space for the smaller nodes, which would be faster and > possibly would solve the fragmentation problem Andres referred to in > > https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de > > While on the subject, I wonder how important it is to keep the chunks in the > small nodes in sorted order. That adds branches and memmove calls, and is the > whole reason for the recent "pg_lfind_ge" function. Good point. While keeping the chunks in the small nodes in sorted order is useful for visiting all keys in sorted order, additional branches and memmove calls could be slow. Regards, -- Masahiko Sawada PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com