date:20220922

Re: making relfilenodes 56 bits

2022-09-22 Thread Dilip Kumar

On Tue, Sep 20, 2022 at 7:46 PM Amul Sul  wrote:
>

Thanks for the review

> Here are a few minor suggestions I came across while reading this
> patch, might be useful:
>
> +#ifdef USE_ASSERT_CHECKING
> +
> +   {
>
> Unnecessary space after USE_ASSERT_CHECKING.

Changed

>
> +   return InvalidRelFileNumber;/* placate compiler */
>
> I don't think we needed this after the error on the latest branches.
> --

Changed

> +   LWLockAcquire(RelFileNumberGenLock, LW_SHARED);
> +   if (shutdown)
> +   checkPoint.nextRelFileNumber = ShmemVariableCache->nextRelFileNumber;
> +   else
> +   checkPoint.nextRelFileNumber = 
> ShmemVariableCache->loggedRelFileNumber;
> +
> +   LWLockRelease(RelFileNumberGenLock);
>
> This is done for the good reason, I think, it should have a comment
> describing different checkPoint.nextRelFileNumber assignment
> need and crash recovery perspective.
> --

Done

> +#define SizeOfRelFileLocatorBackend \
> +   (offsetof(RelFileLocatorBackend, backend) + sizeof(BackendId))
>
> Can append empty parenthesis "()" to the macro name, to look like a
> function call at use or change the macro name to uppercase?
> --

Yeah we could SizeOfXXX macros are general practice I see used
everywhere in Postgres code so left as it is.

>  +   if (val < 0 || val > MAX_RELFILENUMBER)
> ..
>  if ((relfilenumber) < 0 || (relfilenumber) > MAX_RELFILENUMBER) \
>
> How about adding a macro for this condition as RelFileNumberIsValid()?
> We can replace all the checks referring to MAX_RELFILENUMBER with this.

Actually, RelFileNumberIsValid is used to just check whether it is
InvalidRelFileNumber value i.e. 0.  Maybe for this we can introduce
RelFileNumberInValidRange() but I am not sure whether it would be
cleaner than what we have now, so left as it is for now.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

2022-09-22 Thread Ashutosh Sharma

On Fri, Sep 23, 2022 at 6:05 AM Bharath Rupireddy
 wrote:
>
> On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma  
> wrote:
> >
> > PFA that enhances pg_waldump to show the latest LSN and the
> > corresponding WAL file when the -l or --lastLSN option is passed an
> > argument to pg_waldump. Below is an example:
>
> Thanks for the patch. I have some quick thoughts about it.
>
> > When the user passes the '-l' command line option along with the data
> > directory path to pg_waldump, it reads the control file from the data
> > directory.
>
> I don't think we need a new option for data directory -D. pg_waldump's
> option 'p' can be used, please see the comments around
> identify_target_directory().
>

-p is the path to the WAL directory. It doesn't necessarily have to be
a data directory, however the user can specify the data directory path
here as well using which the path to the WAL directory can be
recognized, but as I said it doesn't mean -p will always represent the
data directory.

> > From the control file, it gets information like redo
> > pointer and current timeline id.
>
> Is there any reason for not using get_control_file() from
> src/common/controldata_utils.c, but defining the exact same function
> in pg_waldump.c?
>

Will give it a thought on it later. If possible, will try to reuse it.

> > The redo pointer is considered to be
> > the start pointer from where the pg_waldump starts reading wal data
> > until end-of-wal to find the last LSN. For details please check the
> > attached patch.
>
> Making it dependent on the controlfile limits the usability of this
> feature. Imagine, using this feature on an archive location or
> pg_receivewal target directory where there are WAL files but no
> controlfile. I think we can choose the appropriate combinations of
> existing pg_waldump options, for instance, let users enter the start
> WAL segment via startseg and/or start LSN via --start and the new
> option for end WAL segment and end LSN.
>

I have written this patch assuming that the end user is not aware of
any LSN or any other WAL data and wants to know the last LSN. So all
he can do is take the help of the control data to find the redo LSN
and use that as a reference point (start pointer) to find the last
LSN. And whatever is the WAL directory (be it archive location or wall
collected via pg_receivewal or pg_wal directory), we will consider the
redo pointer as the start pointer. Now, it's possible that the WAL
corresponding to the start pointer is not at all available in the WAL
directory like archive location or pg_receivewal directory in which
this cannot be used, but this is very unlikely.

> > Please note that for compressed and .partial wal files this doesn't work.
>
> Looking forward to the above capability because it expands the
> usability of this feature.
>

This is a different task altogether. We will probably need to work on
it separately.

--
With Regards,
Ashutosh Sharma.

Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread Amit Kapila

On Thu, Sep 22, 2022 at 4:50 PM kuroda.hay...@fujitsu.com
 wrote:
>
> Hi Amit,
>
> > > I checked other flags that are set by signal handlers, their datatype 
> > > seemed to
> > be sig_atomic_t.
> > > Is there any reasons that you use normal bool? It should be changed if 
> > > not.
> > >
> >
> > It follows the logic similar to ParallelMessagePending. Do you see any
> > problem with it?
>
> Hmm, one consideration is:
> what will happen if the signal handler HandleParallelApplyMessageInterrupt() 
> is fired during "ParallelApplyMessagePending = false;"?
> IIUC sig_atomic_t has been needed to avoid writing to same data at the same 
> time.
>

But we do call HOLD_INTERRUPTS() before we do
"ParallelApplyMessagePending = false;", so that should not happen.
However, I think it would be better to use sig_atomic_t here for the
sake of consistency.

I think you can start a separate thread to check if we can change
ParallelMessagePending to make it consistent with other such
variables.

-- 
With Regards,
Amit Kapila.

Re: CI and test improvements

2022-09-22 Thread Thomas Munro

On Fri, Sep 23, 2022 at 9:07 AM Justin Pryzby  wrote:
> On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > > -  # Workaround around performance issues due to 32KB block size
> > > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > > >create_user_script: |
> > > > >  pw useradd postgres
> > > > >  chown -R postgres:postgres .
> > > >
> > > > What's the story there - at some point that was important for 
> > > > performance
> > > > because of the native block size triggering significant 
> > > > read-modify-write
> > > > cycles with postres' writes. You didn't comment on it in the commit 
> > > > message.
> > >
> > > Well, I don't know the history, but it seems to be unneeded now.
> >
> > It's possible it was mainly needed for testing with aio + dio. But also
> > possible that an upgrade improved the situation since.

Yeah, it is very important for direct I/O (patches soon...), because
every 8KB random write becomes a read-32KB/wait/write-32KB without it.
It's slightly less important for buffered I/O, because the kernel
caches hide that, but it still triggers I/O bandwidth amplification,
and we definitely saw positive effects earlier on the CI system back
on the previous generation.  FWIW I am planning to see about getting
the FreeBSD installer to create the root file system the way we want,
instead of this ugliness.

> Maybe freebsd got faster as a result of the TAU CPUs?
> [data]

Very interesting.  Would be good to find the exact instance types +
storage types to see if there has been a massive IOPS boost, perhaps
via local SSD.  The newer times are getting closer to a local
developer machine.

Re: Removing unused parameter in SnapBuildGetOrBuildSnapshot

2022-09-22 Thread Amit Kapila

On Wed, Sep 21, 2022 at 8:11 PM Zhang Mingli  wrote:
>
> On Sep 21, 2022, 22:22 +0800, Melih Mutlu , wrote:
>
> Hi hackers,
>
> Sharing a small patch to remove an unused parameter in 
> SnapBuildGetOrBuildSnapshot function in snapbuild.c
>
> With commit 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, SnapBuildBuildSnapshot 
> no longer needs transaction id. This also makes the xid parameter in 
> SnapBuildGetOrBuildSnapshot useless.
> I couldn't see a reason to keep it and decided to remove it.
>
> Regards,
> Melih
>
> +1, Good Catch.
>

LGTM.

-- 
With Regards,
Amit Kapila.

Re: libpq error message refactoring

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> Heh, figured it out. It's inside #ifdef ENABLE_NLS. So it fails on all
> platforms without NLS enabled.

Argh, how simple!

The question about va_start/va_end placement still stands, though.

regards, tom lane

Re: libpq error message refactoring

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 19:27:27 -0700, Andres Freund wrote:
> I just noticed it when trying to understand the linker failure - which I
> still don't...

Heh, figured it out. It's inside #ifdef ENABLE_NLS. So it fails on all
platforms without NLS enabled.

Greetings,

Andres Freund

Re: libpq error message refactoring

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> I suspect the appendPQExpBufferVA is orthogonal - most (all?) of the other
> functions in pqexpbuffer.h are visible, so it feels weird/confusing to not
> make appendPQExpBufferVA() available.

I thought the same to start with, but if I'm right in my nearby reply
that we'll have to make callers loop around appendPQExpBufferVA,
then it seems like a good idea to keep it closely held.

More than zero commentary about that would be a good thing, too.

regards, tom lane

Re: libpq error message refactoring

2022-09-22 Thread Tom Lane

Peter Eisentraut  writes:
> On 22.09.22 17:42, Andres Freund wrote:
>> It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA
>> need to be added to exports.txt?

> I don't want to make that function available to users of libpq, just use 
> it inside libpq across .c files.  Is there no visibility level for that? 

Should "just work", I should think.  I agree with Andres that that's
not the cause of the build failure.  I wonder if somehow the failing
links are picking up the wrong libpq.a.

Separately from that: is it really okay to delegate use of a va_list
argument like that?  The other call paths of
appendPQExpBufferVA[_internal] write va_start/va_end directly around it,
not somewhere else in the call chain.  I'm too tired to language-lawyer
out what happens when you do it like this, but I'm suspecting that it's
not well-defined portable behavior.

I think what you probably need to do is export appendPQExpBufferVA
as-is and require libpq_append_error to provide the error loop.

regards, tom lane

Re: libpq error message refactoring

2022-09-22 Thread Andres Freund

HHi,

On 2022-09-22 22:00:00 -0400, Peter Eisentraut wrote:
> On 22.09.22 17:42, Andres Freund wrote:
> > This patch has been failing for a while:
> > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854
> > 
> > Interestingly, previously the error only happened when targetting windows, 
> > but
> > meson also shows it on freebsd.
> > 
> > It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA
> > need to be added to exports.txt?
> 
> I don't want to make that function available to users of libpq, just use it
> inside libpq across .c files.  Is there no visibility level for that?  Is
> that also the problem in the freebsd build?

I suspect the appendPQExpBufferVA is orthogonal - most (all?) of the other
functions in pqexpbuffer.h are visible, so it feels weird/confusing to not
make appendPQExpBufferVA() available. I just noticed it when trying to
understand the linker failure - which I still don't...

Greetings,

Andres Freund

Re: libpq error message refactoring

2022-09-22 Thread Peter Eisentraut


On 22.09.22 17:42, Andres Freund wrote:

This patch has been failing for a while:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854

Interestingly, previously the error only happened when targetting windows, but
meson also shows it on freebsd.

It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA
need to be added to exports.txt?


I don't want to make that function available to users of libpq, just use 
it inside libpq across .c files.  Is there no visibility level for that? 
 Is that also the problem in the freebsd build?

Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

2022-09-22 Thread Bharath Rupireddy

On Thu, Sep 22, 2022 at 10:25 PM Ashutosh Sharma  wrote:
>
> PFA that enhances pg_waldump to show the latest LSN and the
> corresponding WAL file when the -l or --lastLSN option is passed an
> argument to pg_waldump. Below is an example:

Thanks for the patch. I have some quick thoughts about it.

> When the user passes the '-l' command line option along with the data
> directory path to pg_waldump, it reads the control file from the data
> directory.

I don't think we need a new option for data directory -D. pg_waldump's
option 'p' can be used, please see the comments around
identify_target_directory().

> From the control file, it gets information like redo
> pointer and current timeline id.

Is there any reason for not using get_control_file() from
src/common/controldata_utils.c, but defining the exact same function
in pg_waldump.c?

> The redo pointer is considered to be
> the start pointer from where the pg_waldump starts reading wal data
> until end-of-wal to find the last LSN. For details please check the
> attached patch.

Making it dependent on the controlfile limits the usability of this
feature. Imagine, using this feature on an archive location or
pg_receivewal target directory where there are WAL files but no
controlfile. I think we can choose the appropriate combinations of
existing pg_waldump options, for instance, let users enter the start
WAL segment via startseg and/or start LSN via --start and the new
option for end WAL segment and end LSN.

> Please note that for compressed and .partial wal files this doesn't work.

Looking forward to the above capability because it expands the
usability of this feature.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-22 Thread Bharath Rupireddy

On Thu, Sep 22, 2022 at 3:17 PM Fujii Masao  wrote:
>
> +   MemSet(backup_state, 0, sizeof(BackupState));
> +   MemSet(backup_state->name, '\0', sizeof(backup_state->name));
>
> The latter MemSet() is not necessary because the former already
> resets that with zero, is it?

Yes, earlier, the name was a palloc'd string but now it is a fixed
array. Removed.

> +   pfree(tablespace_map);
> +   tablespace_map = NULL;
> +   }
> +
> +   tablespace_map = makeStringInfo();
>
> tablespace_map doesn't need to be reset to NULL here.
>
> +   pfree(tablespace_map);
> +   tablespace_map = NULL;
> +   pfree(backup_state);
> +   backup_state = NULL;
>
> It's harmless to set tablespace_map and backup_state to NULL after pfree(),
> but it's also unnecessary at least here.

Yes. But we can retain it for the sake of consistency with the other
places and avoid dangling pointers, if at all any new code gets added
in between it will be useful.

> /* Free structures allocated in TopMemoryContext */
> -   pfree(label_file->data);
> -   pfree(label_file);
> 
> +   pfree(backup_label->data);
> +   pfree(backup_label);
> +   backup_label = NULL;
>
> This source comment is a bit misleading, isn't it? Because the memory
> for backup_label is allocated under the memory context other than
> TopMemoryContext.

Yes, we can just say /* Deallocate backup-related variables. */. The
pg_backup_start() has the info about the variables being allocated in
TopMemoryContext.

> +#include "access/xlog.h"
> +#include "access/xlog_internal.h"
> +#include "access/xlogbackup.h"
> +#include "utils/memutils.h"
>
> Seems "utils/memutils.h" doesn't need to be included.

Yes, removed now.

> +   XLByteToSeg(state->startpoint, stopsegno, wal_segment_size);
> +   XLogFileName(stopxlogfile, state->starttli, stopsegno, 
> wal_segment_size);
> +   appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file 
> %s)\n",
> +
> LSN_FORMAT_ARGS(state->startpoint), stopxlogfile);
>
> state->stoppoint and state->stoptli should be used instead of
> state->startpoint and state->starttli?

Nice catch! Corrected.

PSA v12 patch with the above review comments addressed.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com
From 3caa5f392e50bdf3f60b4af9710da0004f11ff39 Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Thu, 22 Sep 2022 23:44:16 +
Subject: [PATCH v12] Refactor backup related code

Refactor backup related code, advantages of doing so are following:

1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now, no error checking
for invalid parsing.
3) backup_label and history file contents have most of the things
in common, they can now be created within a single function.
4) makes backup related code extensible and readable.

This introduces new source files xlogbackup.c/.h for backup related
code and adds the new code in there. The xlog.c file has already grown
to ~9000 LOC (as of this time). Eventually, we would want to move
all the backup related code from xlog.c, xlogfuncs.c, elsewhere to
here.

Author: Bharath Rupireddy
Reviewed-by: Michael Paquier
Reviewed-by: Fujii Masao
Discussion: https://www.postgresql.org/message-id/CALj2ACVqNUEXkaMKyHHOdvScfN9E4LuCWsX_R-YRNfzQ727CdA%40mail.gmail.com
---
 src/backend/access/transam/Makefile |   1 +
 src/backend/access/transam/meson.build  |   1 +
 src/backend/access/transam/xlog.c   | 224 
 src/backend/access/transam/xlogbackup.c |  80 +
 src/backend/access/transam/xlogfuncs.c  |  97 ++
 src/backend/backup/basebackup.c |  44 +++--
 src/include/access/xlog.h   |  10 +-
 src/include/access/xlogbackup.h |  42 +
 8 files changed, 296 insertions(+), 203 deletions(-)
 create mode 100644 src/backend/access/transam/xlogbackup.c
 create mode 100644 src/include/access/xlogbackup.h

diff --git a/src/backend/access/transam/Makefile b/src/backend/access/transam/Makefile
index 3e5444a6f7..661c55a9db 100644
--- a/src/backend/access/transam/Makefile
+++ b/src/backend/access/transam/Makefile
@@ -29,6 +29,7 @@ OBJS = \
 	xact.o \
 	xlog.o \
 	xlogarchive.o \
+	xlogbackup.o \
 	xlogfuncs.o \
 	xloginsert.o \
 	xlogprefetcher.o \
diff --git a/src/backend/access/transam/meson.build b/src/backend/access/transam/meson.build
index c32169bd2c..63d17b85ee 100644
--- a/src/backend/access/transam/meson.build
+++ b/src/backend/access/transam/meson.build
@@ -15,6 +15,7 @@ backend_sources += files(
   'xact.c',
   'xlog.c',
   'xlogarchive.c',
+  'xlogbackup.c',
   'xlogfuncs.c',
   'xloginsert.c',

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-22 Thread Bharath Rupireddy

On Thu, Sep 22, 2022 at 8:55 PM Andres Freund  wrote:
>
> Due to the merge of the meson based build this patch needs some
> adjustment. See
> https://cirrus-ci.com/build/6146162607521792
> Looks like it just requires adding xlogbackup.c to
> src/backend/access/transam/meson.build.

Thanks! I will post a new patch with that change.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: pg_basebackup --create-slot-if-not-exists?

2022-09-22 Thread Ashwin Agrawal

On Wed, Sep 21, 2022 at 5:34 PM David G. Johnston <
david.g.johns...@gmail.com> wrote:

> On Wednesday, September 21, 2022, Ashwin Agrawal 
> wrote:
>
>> Currently, pg_basebackup has
>> --create-slot option to create slot if not already exists or
>> --slot to use existing slot
>>
>> Which means it needs knowledge on if the slot with the given name already
>> exists or not before invoking the command. If pg_basebackup --create-slot
>> <> command fails for some reason after creating the slot. Re-triggering the
>> same command fails with ERROR slot already exists. Either then need to
>> delete the slot and retrigger Or need to add a check before retriggering
>> the command to check if the slot exists and based on the same alter the
>> command to avoid passing --create-slot option. This poses
>> inconvenience while automating on top of pg_basebackup. As checking for
>> slot presence before invoking pg_basebackup unnecessarily involves issuing
>> separate SQL commands. Would be really helpful for such scenarios if
>> similar to CREATE TABLE, pg_basebackup can have IF NOT EXISTS kind of
>> semantic. (Seems the limitation most likely is coming from CREATE
>> REPLICATION SLOT protocol itself), Thoughts?
>>
>
> What’s the use case for automating pg_basebackup with a named replication
> slot created by the pg_basebackup command?
>

Greenplum runs N (some hundred) number of PostgreSQL instances to form a
sharded database cluster. Hence, automation/scripts are in place to create
replicas, failover failback for these N instances and such. As Michael said
for predictable management and monitoring of the slot across these many
instances, specific named replication slots are used across all these
instances. These named replication slots are used both for pg_basebackup
followed by streaming replication.

Why can you not leverage a temporary replication slot (i.e., omit —slot).
> ISTM the create option is basically obsolete now.
>

We would be more than happy to use a temporary replication slot if it
provided full functionality. It might be a gap in my understanding, but I
feel a temporary replication slot only protects WAL deletion for the
duration of pg_basebackup. It doesn't protect the window between
pg_basebackup completion and streaming replication starting.
With --write-recovery-conf option "primary_slot_name" only gets written
to postgresql.auto.conf if the named replication slot is provided, which
makes sure the same slot will be used for pg_basebackup and streaming
replication hence will keep the WAL around till streaming replica connects
after pg_basebackup. How to avoid this window with a temp slot?

-- 
*Ashwin Agrawal (VMware)*

Re: [PoC] Let libpq reject unexpected authentication requests

2022-09-22 Thread Jacob Champion

On Thu, Sep 22, 2022 at 4:52 AM Peter Eisentraut
 wrote:
> On 22.09.22 01:37, Jacob Champion wrote:
> > I think this is potentially
> > dangerous, but it mirrors the current behavior of libpq and I'm not
> > sure that we should change it as part of this patch.
>
> It might be worth reviewing that behavior for other reasons, but I think
> semantics of your patch are correct.

Sounds good. v9 removes the TODO and adds a better explanation.

> > If you're okay with those [GSS] limitations, I will rip out the code. The
> > reason I'm not too worried about it is, I don't think it makes much
> > sense to be strict about your authentication requirements while at the
> > same time leaving the choice of transport encryption up to the server.
>
> The way I understand what you explained here is that it would be more
> sensible to leave that code in.  I would be okay with that.

I've added a comment there explaining the gssencmode interaction. That
leaves no TODOs inside the code itself.

I removed the commit message note about not being able to prevent
unexpected client cert requests or GSS encryption, since we've decided
to handle those cases outside of require_auth.

I'm not able to test SSPI easily at the moment; if anyone is able to
try it out, that'd be really helpful. There's also the question of
SASL forwards compatibility -- if someone adds a new SASL mechanism,
the code will treat it like scram-sha-256 until it's changed, and
there will be no test to catch it. Should we leave that to the future
mechanism implementer to fix, or add a mechanism check now so the
client is safe even if they forget?

Thanks!
--Jacob
commit e13f21d596bc5670156a441dc6eec5228864b4b0
Author: Jacob Champion 
Date:   Thu Sep 22 16:39:34 2022 -0700

squash! libpq: let client reject unexpected auth methods

Remove TODOs, and document why the code remains as-is.

diff --git a/src/interfaces/libpq/fe-auth.c b/src/interfaces/libpq/fe-auth.c
index 565e44fcf6..793888d30f 100644
--- a/src/interfaces/libpq/fe-auth.c
+++ b/src/interfaces/libpq/fe-auth.c
@@ -921,8 +921,14 @@ check_expected_areq(AuthRequest areq, PGconn *conn)
 * else that the user has allowed an 
authentication-less
 * connection).
 *
-* TODO: how should !auth_required interact 
with an incomplete
-* SCRAM exchange?
+* If the user has allowed both SCRAM and 
unauthenticated
+* (trust) connections, then this check will 
silently accept
+* partial SCRAM exchanges, where a misbehaving 
server does not
+* provide its verifier before sending an OK. 
This is consistent
+* with historical behavior, but it may be a 
point to revisit in
+* the future, since it could allow a server 
that doesn't know
+* the user's password to silently harvest 
material for a brute
+* force attack.
 */
if (!conn->auth_required || 
conn->client_finished_auth)
break;
@@ -940,10 +946,9 @@ check_expected_areq(AuthRequest areq, PGconn *conn)
/*
 * If implicit GSS auth has already 
been performed via GSS
 * encryption, we don't need to have 
performed an
-* AUTH_REQ_GSS exchange.
-*
-* TODO: check this assumption. What 
mutual auth guarantees
-* are made in this case?
+* AUTH_REQ_GSS exchange. This allows 
require_auth=gss to be
+* combined with gssencmode, since 
there won't be an
+* explicit authentication request in 
that case.
 */
}
else
From f88abcb4b22499466758e566b956c487877118bb Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Mon, 28 Feb 2022 09:40:43 -0800
Subject: [PATCH v9 1/2] libpq: let client reject unexpected auth methods

The require_auth connection option allows the client to choose a list of
acceptable authentication types for use with the server. There is no
negotiation: if the server does not present one of the allowed
authentication requests, the connection fails. Additionally, all methods
in the list may be negated, e.g. '!password', in which case the server
must NOT use the listed authentication type. The special method "none"
allows/disallows the use of

Re: cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> On 2022-09-22 17:44:56 -0400, Tom Lane wrote:
>> I was confused about how come the new patches I'd just posted in
>> the 3848 CF item (write/read support for raw parse trees) are
>> showing a mix of passes and fails in the cfbot.  I eventually
>> realized that the fail results are all old and stale, because
>> (for example) there's no longer a "FreeBSD - 13" task to run,
>> just "FreeBSD - 13 - Meson".  This seems tremendously confusing.
>> Can we get the cfbot to not display no-longer-applicable results?

> Somewhat tangentially - it seemed the right thing at the time [TM], but now I
> wonder if tacking on the buildsystem like I did is the best way? As long as we
> have both I think we need it in some form...

Yeah, I think those names are fine for now.

regards, tom lane

Re: Making C function declaration parameter names consistent with corresponding definition names

2022-09-22 Thread Peter Geoghegan

On Thu, Sep 22, 2022 at 3:20 PM Tom Lane  wrote:
> WFM.

Okay, pushed a minimally invasive commit to fix the inconsistencies in
pg_dump related code just now. That's the last of them. Now the only
remaining clang-tidy warnings about inconsistent parameter names are
those that seem practically impossible to fix (these are mostly just
cases involving flex/bison).

It still seems like a good idea to formally create a new coding
standard around C function parameter names. We really need a simple
clang-tidy workflow to be able to do that. I'll try to get to that
soon. Part of the difficulty there will be finding a way to ignore the
warnings that we really can't do anything about.

Thanks
-- 
Peter Geoghegan

Re: cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 17:44:56 -0400, Tom Lane wrote:
> I was confused about how come the new patches I'd just posted in
> the 3848 CF item (write/read support for raw parse trees) are
> showing a mix of passes and fails in the cfbot.  I eventually
> realized that the fail results are all old and stale, because
> (for example) there's no longer a "FreeBSD - 13" task to run,
> just "FreeBSD - 13 - Meson".  This seems tremendously confusing.
> Can we get the cfbot to not display no-longer-applicable results?

Somewhat tangentially - it seemed the right thing at the time [TM], but now I
wonder if tacking on the buildsystem like I did is the best way? As long as we
have both I think we need it in some form...

Greetings,

Andres Freund

Re: cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Tom Lane

Thomas Munro  writes:
> On Fri, Sep 23, 2022 at 10:17 AM Thomas Munro  wrote:
>> Ah, right.  Morning here and I've just spotted that too after the
>> first results of the Meson era rolled in overnight.  It shows the
>> latest result for each distinct task name, which now includes extinct
>> tasks (until they eventually get garbage collected due to age in a few
>> days).  I clearly didn't anticipate tasks going away.  Perhaps I
>> should figure out how to show only results corresponding to a single
>> commit ID... looking into that...

> Done, and looking more sane now.

Looks better to me too.  Thanks!

regards, tom lane

Re: cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Thomas Munro

On Fri, Sep 23, 2022 at 10:17 AM Thomas Munro  wrote:
> On Fri, Sep 23, 2022 at 9:44 AM Tom Lane  wrote:
> > I was confused about how come the new patches I'd just posted in
> > the 3848 CF item (write/read support for raw parse trees) are
> > showing a mix of passes and fails in the cfbot.  I eventually
> > realized that the fail results are all old and stale, because
> > (for example) there's no longer a "FreeBSD - 13" task to run,
> > just "FreeBSD - 13 - Meson".  This seems tremendously confusing.
> > Can we get the cfbot to not display no-longer-applicable results?
> >
> > As a quick-fix hack it might do to just flush all the pre-meson-commit
> > results, but I doubt this'll be the last time we make such changes.
> > I think an actual filter comparing the result's time to the time of the
> > allegedly-being-tested patch would be prudent.
>
> Ah, right.  Morning here and I've just spotted that too after the
> first results of the Meson era rolled in overnight.  It shows the
> latest result for each distinct task name, which now includes extinct
> tasks (until they eventually get garbage collected due to age in a few
> days).  I clearly didn't anticipate tasks going away.  Perhaps I
> should figure out how to show only results corresponding to a single
> commit ID... looking into that...

Done, and looking more sane now.

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Nathan Bossart

On Thu, Sep 22, 2022 at 01:28:09PM -0700, Andres Freund wrote:
> On 2022-09-22 13:05:33 -0700, Nathan Bossart wrote:
>> * I'm using an Ubuntu-based distribution, and the version of meson that apt
>> installed was not new enough for Postgres.  I ended up cloning meson [0]
>> and using the newest tag.  This is no big deal.
> 
> I assume this is 20.04 LTS? If so, we're missing it by one version of meson
> currently. There's unfortunately a few features that'd be a bit painful to not
> have.

Yes.  I imagine I'll upgrade to 22.04 LTS soon, which appears to provide a
new enough version of meson.

>> * The installed binaries were unable to locate libraries like libpq.  I
>> ended up setting the extra_lib_dirs option to the directory where these
>> libraries were installed to fix this.  This one is probably worth
>> investigating further.
> 
> I think that should be "fixed" in a later commit in the meson tree - any
> chance you could try that?

Yup, after cherry-picking 9bc60bc, this is fixed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Making C function declaration parameter names consistent with corresponding definition names

2022-09-22 Thread Tom Lane

Peter Geoghegan  writes:
> That makes it easy, then. I'll just take the least invasive approach
> possible with pg_dump: treat the names from function definitions as
> authoritative, and mechanically adjust the function declarations as
> needed to make everything agree.

WFM.

regards, tom lane

Re: cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Thomas Munro

On Fri, Sep 23, 2022 at 9:44 AM Tom Lane  wrote:
> I was confused about how come the new patches I'd just posted in
> the 3848 CF item (write/read support for raw parse trees) are
> showing a mix of passes and fails in the cfbot.  I eventually
> realized that the fail results are all old and stale, because
> (for example) there's no longer a "FreeBSD - 13" task to run,
> just "FreeBSD - 13 - Meson".  This seems tremendously confusing.
> Can we get the cfbot to not display no-longer-applicable results?
>
> As a quick-fix hack it might do to just flush all the pre-meson-commit
> results, but I doubt this'll be the last time we make such changes.
> I think an actual filter comparing the result's time to the time of the
> allegedly-being-tested patch would be prudent.

Ah, right.  Morning here and I've just spotted that too after the
first results of the Meson era rolled in overnight.  It shows the
latest result for each distinct task name, which now includes extinct
tasks (until they eventually get garbage collected due to age in a few
days).  I clearly didn't anticipate tasks going away.  Perhaps I
should figure out how to show only results corresponding to a single
commit ID... looking into that...

Re: Making C function declaration parameter names consistent with corresponding definition names

2022-09-22 Thread Peter Geoghegan

On Thu, Sep 22, 2022 at 2:55 PM Tom Lane  wrote:
> Yeah.  I'm not much on board with the AHX->A and AH->A changes you made;
> those seem extremely invasive and it's not real clear that they add a
> lot of value.

That makes it easy, then. I'll just take the least invasive approach
possible with pg_dump: treat the names from function definitions as
authoritative, and mechanically adjust the function declarations as
needed to make everything agree.

The commit message for this will note in passing that the
inconsistency that this creates at the header file level is a known
issue.

Thanks
-- 
Peter Geoghegan

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Peter Geoghegan

On Thu, Sep 22, 2022 at 2:50 PM Andres Freund  wrote:
> I'm likely the most biased person on this, but for me the reliable incremental
> builds and the readability of the test output are big enough wins that the
> answer is pretty clear... Doesn't hurt that running all tests is faster too.

It's nice that things are much more discoverable now. For example, if
you want to run some random test on its own then you just...do it in
the obvious, discoverable way. It took me about 2 minutes to figure
out how to do that, without reading any documentation.

OTOH doing the same thing with the old autoconf-based build system
requires the user to know the exact magical incantation for Postgres
tests. You just have to know to run the 2 or 3 tests that are
undocumented (or poorly documented) dependencies first. That seems
like an enormous usability improvement, especially for those of us
that haven't been working on Postgres for years.

> time to run all tests (cassert, -Og), in a fully built tree:

> time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4'
> real1m1.577s
> user7m32.579s
> sys 2m17.767s

> time meson test
> real0m42.178s
> user7m8.533s
> sys 2m17.711s

Sold!

-- 
Peter Geoghegan

Re: Making C function declaration parameter names consistent with corresponding definition names

2022-09-22 Thread Tom Lane

Peter Geoghegan  writes:
> I would like to give another 24 hours for anybody to lodge final
> objections to what I've done in this patch. It seems possible that
> there will be concerns about how this might affect backpatching, or
> something like that. This patch goes relatively far in the direction
> of refactoring to make things consistent at the module level -- unlike
> most of the patches, which largely consisted of mechanical adjustments
> that were obviously correct, both locally and at the whole-module level.

Yeah.  I'm not much on board with the AHX->A and AH->A changes you made;
those seem extremely invasive and it's not real clear that they add a
lot of value.

I've never thought that the Archive vs. ArchiveHandle separation in
pg_dump was very well thought out.  I could perhaps get behind a patch
to eliminate that bit of "abstraction"; but I'd still try to avoid
wholesale changes in local-variable names from it.  I don't think that
would buy anything that's worth the back-patching pain.  Just accepting
that Archive[Handle] variables might be named either AH or AHX depending
on historical accident does not seem that bad to me.  We have lots more
and worse naming inconsistencies in our tree.

regards, tom lane

Re: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-22 Thread Jacob Champion

On 9/21/22 21:55, Andrey Chudnovsky wrote:
> First, My message from corp email wasn't displayed in the thread,

I see it on the public archives [1]. Your client is choosing some pretty
confusing quoting tactics, though, which you may want to adjust. :D

I have what I'll call some "skeptical curiosity" here -- you don't need
to defend your use cases to me by any means, but I'd love to understand
more about them.

> Yes, passing a token as a new auth method won't make much sense in
> isolation. However:
> 1. Since OAUTHBEARER is supported in the ecosystem, passing a token as
> a way to authenticate with OAUTHBEARER is more consistent (IMO), then
> passing it as a password.

Agreed. It's probably not a very strong argument for the new mechanism,
though, especially if you're not using the most expensive code inside it.

> 2. Validation on the backend side doesn't depend on whether the token
> is obtained by libpq or transparently passed by the upstream client.

Sure.

> 3. Single OAUTH auth method on the server side for both scenarios,
> would allow both enterprise clients with their own Token acquisition
> and community clients using libpq flows to connect as the same PG
> users/roles.

Okay, this is a stronger argument. With that in mind, I want to revisit
your examples and maybe provide some counterproposals:

>> Libpq passing toked directly from an upstream client is useful in other 
>> scenarios:
>> 1. Enterprise clients, built with .Net / Java and using provider-specific 
>> authentication libraries, like MSAL for AAD. Those can also support more 
>> advanced provider-specific token acquisition flows.

I can see that providing a token directly would help you work around
limitations in libpq's "standard" OAuth flows, whether we use iddawc or
not. And it's cheap in terms of implementation. But I have a feeling it
would fall apart rapidly with error cases, where the server is giving
libpq information via the OAUTHBEARER mechanism, but libpq can only
communicate to your wrapper through human-readable error messages on stderr.

This seems like clear motivation for client-side SASL plugins (which
were also discussed on Samay's proposal thread). That's a lot more
expensive to implement in libpq, but if it were hypothetically
available, wouldn't you rather your provider-specific code be able to
speak OAUTHBEARER directly with the server?

>> 2. Resource-tight (like IoT) clients. Those can be compiled without the 
>> optional libpq flag not including the iddawc or other dependency.

I want to dig into this much more; resource-constrained systems are near
and dear to me. I can see two cases here:

Case 1: The device is an IoT client that wants to connect on its own
behalf. Why would you want to use OAuth in that case? And how would the
IoT device get its Bearer token to begin with? I'm much more used to
architectures that provision high-entropy secrets for this, whether
they're incredibly long passwords per device (in which case,
channel-bound SCRAM should be a fairly strong choice?) or client certs
(which can be better decentralized, but make for a lot of bookkeeping).

If the answer to that is, "we want an IoT client to be able to connect
using the same role as a person", then I think that illustrates a clear
need for SASL negotiation. That would let the IoT client choose
SCRAM-*-PLUS or EXTERNAL, and the person at the keyboard can choose
OAUTHBEARER. Then we have incredible flexibility, because you don't have
to engineer one mechanism to handle them all.

Case 2: The constrained device is being used as a jump point. So there's
an actual person at a keyboard, trying to get into a backend server
(maybe behind a firewall layer, etc.), and the middlebox is either not
web-connected or is incredibly tiny for some reason. That might be a
good use case for a copy-pasted Bearer token, but is there actual demand
for that use case? What motivation would you (or your end user) have for
choosing a fairly heavy, web-centric authentication method in such a
constrained environment?

Are there other resource-constrained use cases I've missed?

Thanks,
--Jacob

[1]
https://www.postgresql.org/message-id/MN0PR21MB31694BAC193ECE1807FD45358F4F9%40MN0PR21MB3169.namprd21.prod.outlook.com

Re: CI and test improvements

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 16:07:02 -0500, Justin Pryzby wrote:
> On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > > @@ -71,8 +69,6 @@ task:
> > > > >  fingerprint_key: ccache/freebsd
> > > > >  reupload_on_changes: true
> > > > >
> > > > > -  # Workaround around performance issues due to 32KB block size
> > > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > > >create_user_script: |
> > > > >  pw useradd postgres
> > > > >  chown -R postgres:postgres .
> > > > > --
> > > >
> > > > What's the story there - at some point that was important for 
> > > > performance
> > > > because of the native block size triggering significant 
> > > > read-modify-write
> > > > cycles with postres' writes. You didn't comment on it in the commit 
> > > > message.
> > >
> > > Well, I don't know the history, but it seems to be unneeded now.
> > 
> > It's possible it was mainly needed for testing with aio + dio. But also
> > possible that an upgrade improved the situation since.
> 
> Maybe freebsd got faster as a result of the TAU CPUs?
> https://mobile.twitter.com/cirrus_labs/status/1534982111568052240
> 
> I noticed because it's been *slower* the last ~24h since cirrusci
> disabled TAU, as Thomas commit mentioned.
> https://twitter.com/cirrus_labs/status/1572657320093712384

Yea, I noticed that as well. It's entirely possible that something in the
"hardware" stack improved sufficiently to avoid problems.


> I have no idea if the TAU CPUs eliminate/mitigate the original
> performance issue you had with AIO.  But they have such a large effect
> on freebsd that it could now be the fastest task, if given more than 2
> CPUs.

I'm planning to rebase early next week and try that out.

Greetings,

Andres Freund

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 13:21:28 -0700, Peter Geoghegan wrote:
> Is it generally recommended that individual hackers mostly switch over
> to Meson for their day to day work soon? I'm guessing that this
> question doesn't really have a clear answer yet, but thought I'd ask,
> just in case.

It'll probably depend on who you ask ;)

I'm likely the most biased person on this, but for me the reliable incremental
builds and the readability of the test output are big enough wins that the
answer is pretty clear... Doesn't hurt that running all tests is faster too.

The currently existing limitations are imo mostly around making it usable for
production, particularly on windows.

time to run all tests (cassert, -Og), in a fully built tree:

make:

time make -j48 -s -Otarget check-world
real2m44.206s
user6m29.121s
sys 1m54.069s

time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4'
real1m1.577s
user7m32.579s
sys 2m17.767s

meson:

time meson test
real0m42.178s
user7m8.533s
sys 2m17.711s

FWIW, I just rebased my older patch to cache and copy initdb during the
tests. The %user saved is impressive enough to pursue it again...

time make -j48 -s -Otarget check-world PROVE_FLAGS='-j4'
real0m52.655s
user2m19.504s
sys 1m26.264s

time meson test:

real0m36.370s
user2m14.748s
sys 1m36.741s

Greetings,

Andres Freund

cfbot vs. changes in the set of CI tasks

2022-09-22 Thread Tom Lane

I was confused about how come the new patches I'd just posted in
the 3848 CF item (write/read support for raw parse trees) are
showing a mix of passes and fails in the cfbot.  I eventually
realized that the fail results are all old and stale, because
(for example) there's no longer a "FreeBSD - 13" task to run,
just "FreeBSD - 13 - Meson".  This seems tremendously confusing.
Can we get the cfbot to not display no-longer-applicable results?

As a quick-fix hack it might do to just flush all the pre-meson-commit
results, but I doubt this'll be the last time we make such changes.
I think an actual filter comparing the result's time to the time of the
allegedly-being-tested patch would be prudent.

regards, tom lane

Re: Making C function declaration parameter names consistent with corresponding definition names

2022-09-22 Thread Peter Geoghegan

On Wed, Sep 21, 2022 at 6:58 PM Peter Geoghegan  wrote:
> Attached revision shows where I'm at with this. Would be nice to get
> it all out of the way before too long.

Attached is v6, which now consists of only one single patch, which
fixes things up in pg_dump. (This is almost though not quite identical
to the same patch from v5.)

I would like to give another 24 hours for anybody to lodge final
objections to what I've done in this patch. It seems possible that
there will be concerns about how this might affect backpatching, or
something like that. This patch goes relatively far in the direction
of refactoring to make things consistent at the module level -- unlike
most of the patches, which largely consisted of mechanical adjustments
that were obviously correct, both locally and at the whole-module level.

BTW, I notice that meson seems to have built-in support for running
scan-build, a tool that performs static analysis using clang. I'm
pretty sure that it's possible to use scan-build to run clang-tidy
checks (though I've just been using run-clang-tidy myself). Perhaps it
would make sense to use meson's support for scan-build to make it easy
for everybody to run the clang-tidy checks locally.

--
Peter Geoghegan
From ae97a00cb4c69b3fd247bc6606509165912bee29 Mon Sep 17 00:00:00 2001
From: Peter Geoghegan 
Date: Wed, 21 Sep 2022 14:51:29 -0700
Subject: [PATCH v6] Harmonize parameter names in pg_dump/pg_dumpall.

Make sure that function declarations use names that exactly match the
corresponding names from function definitions.  Having parameter names
that are reliably consistent in this way will make it easier to reason
about groups of related C functions from the same translation unit as a
module.  It will also make certain refactoring tasks easier.

Like other recent commits that cleaned up function parameter names, this
commit was written with help from clang-tidy.

Author: Peter Geoghegan 
Reviewed-By: David Rowley 
Discussion: https://postgr.es/m/CAH2-WznJt9CMM9KJTMjJh_zbL5hD9oX44qdJ4aqZtjFi-zA3Tg@mail.gmail.com
---
 src/bin/pg_dump/common.c  |   2 +-
 src/bin/pg_dump/parallel.c|  14 +-
 src/bin/pg_dump/pg_backup.h   |  36 ++--
 src/bin/pg_dump/pg_backup_archiver.c  |  74 +++
 src/bin/pg_dump/pg_backup_archiver.h  |  10 +-
 src/bin/pg_dump/pg_backup_custom.c|   2 +-
 src/bin/pg_dump/pg_backup_db.c|  40 ++--
 src/bin/pg_dump/pg_backup_db.h|  12 +-
 src/bin/pg_dump/pg_backup_directory.c |   2 +-
 src/bin/pg_dump/pg_backup_null.c  |   8 +-
 src/bin/pg_dump/pg_backup_tar.c   |   4 +-
 src/bin/pg_dump/pg_dump.c | 290 +-
 src/bin/pg_dump/pg_dump.h |   6 +-
 src/bin/pg_dump/pg_dumpall.c  |   6 +-
 14 files changed, 254 insertions(+), 252 deletions(-)

diff --git a/src/bin/pg_dump/common.c b/src/bin/pg_dump/common.c
index 395f817fa..44fa52cc5 100644
--- a/src/bin/pg_dump/common.c
+++ b/src/bin/pg_dump/common.c
@@ -79,7 +79,7 @@ typedef struct _catalogIdMapEntry

 static catalogid_hash *catalogIdHash = NULL;

-static void flagInhTables(Archive *fout, TableInfo *tbinfo, int numTables,
+static void flagInhTables(Archive *fout, TableInfo *tblinfo, int numTables,
 		  InhInfo *inhinfo, int numInherits);
 static void flagInhIndexes(Archive *fout, TableInfo *tblinfo, int numTables);
 static void flagInhAttrs(DumpOptions *dopt, TableInfo *tblinfo, int numTables);
diff --git a/src/bin/pg_dump/parallel.c b/src/bin/pg_dump/parallel.c
index c8a70d9bc..ba6df9e0c 100644
--- a/src/bin/pg_dump/parallel.c
+++ b/src/bin/pg_dump/parallel.c
@@ -146,7 +146,7 @@ static int	pgpipe(int handles[2]);
 typedef struct ShutdownInformation
 {
 	ParallelState *pstate;
-	Archive*AHX;
+	Archive*A;
 } ShutdownInformation;

 static ShutdownInformation shutdown_info;
@@ -325,9 +325,9 @@ getThreadLocalPQExpBuffer(void)
  * as soon as they've created the ArchiveHandle.
  */
 void
-on_exit_close_archive(Archive *AHX)
+on_exit_close_archive(Archive *A)
 {
-	shutdown_info.AHX = AHX;
+	shutdown_info.A = A;
 	on_exit_nicely(archive_close_connection, _info);
 }

@@ -353,8 +353,8 @@ archive_close_connection(int code, void *arg)
 			 */
 			ShutdownWorkersHard(si->pstate);

-			if (si->AHX)
-DisconnectDatabase(si->AHX);
+			if (si->A)
+DisconnectDatabase(si->A);
 		}
 		else
 		{
@@ -378,8 +378,8 @@ archive_close_connection(int code, void *arg)
 	else
 	{
 		/* Non-parallel operation: just kill the leader DB connection */
-		if (si->AHX)
-			DisconnectDatabase(si->AHX);
+		if (si->A)
+			DisconnectDatabase(si->A);
 	}
 }

diff --git a/src/bin/pg_dump/pg_backup.h b/src/bin/pg_dump/pg_backup.h
index fcc5f6bd0..9dc441902 100644
--- a/src/bin/pg_dump/pg_backup.h
+++ b/src/bin/pg_dump/pg_backup.h
@@ -270,33 +270,33 @@ typedef int DumpId;
  * Function pointer prototypes for assorted callback methods.
  */

-typedef int (*DataDumperPtr) (Archive *AH, const void *userArg);
+typedef int (*DataDumperPtr) (Archive *A,

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Tom Lane

I wrote:
> I left off the last one because it fails check-world: we now
> get through the core regression tests okay, but then the pg_dump
> tests fail on the new SQL function.  To fix that, we would have
> to extend ruleutils.c's get_utility_query_def() to be able to
> fully reconstruct any legal utility query ... which seems like
> a pretty dauntingly large amount of tedious manual effort to
> start with, and then also a nontrivial additional requirement
> on any future patch that adds new utility syntax.  Are we sure
> it's worth going there?

Thinking about that some more, I wondered if we'd even wish to
build such code, compared to just saving the original source text
for utility statements and printing that.  Obviously, this loses
all the benefits of new-style SQL functions compared to old-style
... except that those benefits would be illusory anyway, since by
definition we have not done parse analysis on a utility statement.
So we *cannot* offer any useful guarantees about being search_path
change proof, following renames of referenced objects, preventing
drops of referenced objects, etc etc.

This makes me wonder if this is a feature we even want.  If we
put it in, we'd have to add a bunch of disclaimers about how
utility statements behave entirely differently from DML statements.

Perhaps an interesting alternative is to allow a command along
the lines of

EXECUTE string-expression

(of course that name is already taken) where we'd parse-analyze
the string-expression at function creation, but then the computed
string is executed as a SQL command in the runtime environment.
This would make it fairly clear which things you have guarantees
of and which you don't.  It'd also offer a feature that the PLs
have but SQL functions traditionally haven't, ie execution of
dynamically-computed SQL.

Anyway, this is a bit far afield from the stated topic of this
thread.  I think we should commit something approximately like
what I posted and then start a new thread specifically about
what we'd like to do about utility commands in new-style SQL
functions.

regards, tom lane

Re: Consider parallel for lateral subqueries with limit

2022-09-22 Thread James Coleman

On Mon, Sep 19, 2022 at 4:29 PM Robert Haas  wrote:
>
> On Mon, Sep 19, 2022 at 3:58 PM James Coleman  wrote:
> > But in the case where there's correlation via LATERAL we already don't
> > guarantee unique executions for a given set of params into the lateral
> > subquery execution, right? For example, suppose we have:
> >
> >   select *
> >   from foo
> >   left join lateral (
> > select n
> > from bar
> > where bar.a = foo.a
> > limit 1
> >   ) on true
> >
> > and suppose that foo.a (in the table scan) returns these values in
> > order: 1, 2, 1. In that case we'll execute the lateral subquery 3
> > separate times rather than attempting to order the results of foo.a
> > such that we can re-execute the subquery only when the param changes
> > to a new unique value (and we definitely don't cache the results to
> > guarantee unique subquery executions).
>
> I think this is true, but I don't really understand why we should
> focus on LATERAL here. What we really need, and I feel like we've
> talked about this before, is a way to reason about where parameters
> are set and used.

Yes, over in the thread "Parallelize correlated subqueries that
execute within each worker" [1] which was meant to build on top of
this work (though is technically separate). Your bringing it up here
too is making me wonder if we can combine the two and instead of
always allowing subqueries with LIMIT instead (like the other patch
does) delay final determination of parallel safety of rels and paths
(perhaps, as that thread is discussing, until gather/gather merge path
creation).

Side note: I'd kinda like a way (maybe a development GUC or even a
compile time option) to have EXPLAIN show where params are set. If
something like that exists, egg on my face I guess, but I'd love to
know about it.

> Your sample query gets a plan like this:
>
>  Nested Loop Left Join  (cost=0.00..1700245.00 rows=1 width=8)
>->  Seq Scan on foo  (cost=0.00..145.00 rows=1 width=4)
>->  Limit  (cost=0.00..170.00 rows=1 width=4)
>  ->  Seq Scan on bar  (cost=0.00..170.00 rows=1 width=4)
>Filter: (foo.a = a)
>
> If this were to occur inside a larger plan tree someplace, it would be
> OK to insert a Gather node above the Nested Loop node without doing
> anything further, because then the parameter that stores foo.a would
> be both set and used in the worker. If you wanted to insert a Gather
> at any other place in this plan, things get more complicated. But just
> because you have LATERAL doesn't mean that you have this problem,
> because if you delete the "limit 1" then the subqueries get flattened
> together and the parameter disappears,

For future reference in this email thread when you remove the "limit
1" this is the plan you get:

 Merge Right Join  (cost=372.18..815.71 rows=28815 width=8)
   Merge Cond: (bar.a = foo.a)
   ->  Sort  (cost=158.51..164.16 rows=2260 width=8)
 Sort Key: bar.a
 ->  Seq Scan on bar  (cost=0.00..32.60 rows=2260 width=8)
   ->  Sort  (cost=179.78..186.16 rows=2550 width=4)
 Sort Key: foo.a
 ->  Seq Scan on foo  (cost=0.00..35.50 rows=2550 width=4)

Just to make sure I'm following: by "doesn't mean that you have this
problem" you mean "doesn't mean you have this limitation on parallel
query"?

The "flattening out" (conversion to join between two table scans
executed a single time each) is an interesting case because I consider
that to be "just" a performance optimization, and therefore I don't
think anything about the guarantees a user expects should change. But
interestingly: it does end up providing stronger guarantees about the
query results than it would if the conversion didn't happen (the
subquery results in only a single table scan whereas without the
change a scan per outer row means it's *possible* to get different
results across different outer rows even with the same join key
value).

> and if you delete the lateral
> reference (i.e. WHERE foo.a = bar.a) then there's still a subquery but
> it no longer refers to an outer parameter. And on the flip side just
> because you don't have LATERAL doesn't mean that you don't have this
> problem. e.g. the query could instead be:
>
> select *, (select n from bar where bar.a = foo.a limit 1) from foo;
>
> ...which I think is pretty much equivalent to your formulation and has
> the same problem as far as parallel query as your formulation but does
> not involve the LATERAL keyword.

Yes, that's a good point too. I need to play with these examples and
confirm whether lateral_relids gets set in that case. IIRC when that's
set isn't exactly the same as whether or not the LATERAL keyword is
used, and I should clarify that my claims here are meant to be about
when we execute it that way regardless of the keyword usage. The
keyword usage I'd assumed just made it easier to talk about, but maybe
you're implying that it's actually generating confusion.

James Coleman

1:

Re: CI and test improvements

2022-09-22 Thread Justin Pryzby

On Sun, Aug 28, 2022 at 02:28:02PM -0700, Andres Freund wrote:
> > > > @@ -71,8 +69,6 @@ task:
> > > >  fingerprint_key: ccache/freebsd
> > > >  reupload_on_changes: true
> > > >
> > > > -  # Workaround around performance issues due to 32KB block size
> > > > -  repartition_script: src/tools/ci/gcp_freebsd_repartition.sh
> > > >create_user_script: |
> > > >  pw useradd postgres
> > > >  chown -R postgres:postgres .
> > > > --
> > >
> > > What's the story there - at some point that was important for performance
> > > because of the native block size triggering significant read-modify-write
> > > cycles with postres' writes. You didn't comment on it in the commit 
> > > message.
> >
> > Well, I don't know the history, but it seems to be unneeded now.
> 
> It's possible it was mainly needed for testing with aio + dio. But also
> possible that an upgrade improved the situation since.

Maybe freebsd got faster as a result of the TAU CPUs?
https://mobile.twitter.com/cirrus_labs/status/1534982111568052240

I noticed because it's been *slower* the last ~24h since cirrusci
disabled TAU, as Thomas commit mentioned.
https://twitter.com/cirrus_labs/status/1572657320093712384

For example this CF entry:

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3736
https://cirrus-ci.com/task/4670794365140992 5m36s - 4days ago
https://cirrus-ci.com/task/4974926233862144 5m25s - 3days ago
https://cirrus-ci.com/task/5561409034518528 5m29s - 2days ago
https://cirrus-ci.com/task/6432442008469504 9m19s - yesterday

CF_BOT's latest tasks seem to be fast again, since 1-2h ago.
https://cirrus-ci.com/build/5178906041909248 9m1s
https://cirrus-ci.com/build/4593160281128960 5m8s
https://cirrus-ci.com/build/4539845644124160 5m22s

The logs for July show when freebsd started "being fast":
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/38/3708
https://cirrus-ci.com/task/6316073015312384 10m25s Jul 13
https://cirrus-ci.com/task/5662878987452416 5m48s Jul 15

Maybe that changed in July rather than June because the TAU CPUs were
still not available in every region/zone (?)
https://cloud.google.com/compute/docs/regions-zones/

I have no idea if the TAU CPUs eliminate/mitigate the original
performance issue you had with AIO.  But they have such a large effect
on freebsd that it could now be the fastest task, if given more than 2
CPUs.

-- 
Justin

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> On 2022-09-22 12:48:47 -0400, Tom Lane wrote:
>> After staring at the code a bit, I think we don't need to touch
>> pg_strtok() per se.  I propose that this can be resolved with changes
>> at the next higher level.  Let's make outToken print NULL as <> as
>> it always has, but print an empty string as "" (two double quotes).
>> If the raw input string is two double quotes, print it as \"" to
>> disambiguate.  This'd require a catversion bump when committed,
>> but I don't think there are any showstopper problems otherwise.

> Makes sense to me.

Here is a version of all-but-the-last patch in Peter's series.
I left off the last one because it fails check-world: we now
get through the core regression tests okay, but then the pg_dump
tests fail on the new SQL function.  To fix that, we would have
to extend ruleutils.c's get_utility_query_def() to be able to
fully reconstruct any legal utility query ... which seems like
a pretty dauntingly large amount of tedious manual effort to
start with, and then also a nontrivial additional requirement
on any future patch that adds new utility syntax.  Are we sure
it's worth going there?

But I think it's probably worth committing what we have here
just on testability grounds.

Some notes:

0001, 0002 not changed.

I tweaked 0003 a bit, mainly because I think it's probably not very
safe to apply strncmp to a string we don't know the length of.
It might be difficult to fall off the end of memory that way, but
I wouldn't bet it's impossible.  Also, adding the length checks gets
rid of the need for a grotty order dependency in _readA_Expr().

0004 fixes the empty-string problem as per above.

I did not like what you'd done about imprecise floats one bit.
I think we ought to do it as in 0005 instead: drop all the
hard-wired precision assumptions and just print per Ryu.

0006, 0007, 0008 are basically the same as your previous 0004,
0005, 0006, except for getting rid of the float hacking in 0005.

If you're good with this approach to the float issue, I think
this set is committable (minus 0006 of course, and don't forget
the catversion bump).

regards, tom lane

From 89def573ced57fc320ad191613dac62c8992c27a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 12 Aug 2022 21:15:40 +0200
Subject: [PATCH v2 1/9] Fix reading of most-negative integer value nodes

The main parser checks whether a literal fits into an int when
deciding whether it should be put into an Integer or Float node.  The
parser processes integer literals without signs.  So a most-negative
integer literal will not fit into Integer and will end up as a Float
node.

The node tokenizer did this differently.  It included the sign when
checking whether the literal fit into int.  So a most-negative integer
would indeed fit that way and end up as an Integer node.

In order to preserve the node structure correctly, we need the node
tokenizer to also analyze integer literals without sign.

There are a number of test cases in the regression tests that have a
most-negative integer argument of some utility statement, so this
issue is easily reproduced under WRITE_READ_PARSE_PLAN_TREES.
---
 src/backend/nodes/read.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index 4a54996b63..a9cb81b129 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -267,7 +267,7 @@ nodeTokenType(const char *token, int length)
 		char	   *endptr;

 		errno = 0;
-		(void) strtoint(token, , 10);
+		(void) strtoint(numptr, , 10);
 		if (endptr != token + length || errno == ERANGE)
 			return T_Float;
 		return T_Integer;
-- 
2.37.1

From 8a469d0a7195be4ecc65155b82c5836dfb13b920 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Fri, 12 Aug 2022 21:16:26 +0200
Subject: [PATCH v2 2/9] Fix reading of BitString nodes

The node tokenizer went out of its way to store BitString node values
without the leading 'b'.  But everything else in the system stores the
leading 'b'.  This would break if a BitString node is
read-printed-read.

Also, the node tokenizer didn't know that BitString node tokens could
also start with 'x'.
---
 src/backend/nodes/read.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/src/backend/nodes/read.c b/src/backend/nodes/read.c
index a9cb81b129..fe84f140ee 100644
--- a/src/backend/nodes/read.c
+++ b/src/backend/nodes/read.c
@@ -288,7 +288,7 @@ nodeTokenType(const char *token, int length)
 		retval = T_Boolean;
 	else if (*token == '"' && length > 1 && token[length - 1] == '"')
 		retval = T_String;
-	else if (*token == 'b')
+	else if (*token == 'b' || *token == 'x')
 		retval = T_BitString;
 	else
 		retval = OTHER_TOKEN;
@@ -471,11 +471,10 @@ nodeRead(const char *token, int tok_len)
 			break;
 		case T_BitString:
 			{
-char	   *val = palloc(tok_len);
+char	   *val = palloc(tok_len + 1);

-/* skip leading 'b' */
-memcpy(val, token + 1,

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 13:05:33 -0700, Nathan Bossart wrote:
> I gave the meson build system a try, and it seems to work nicely.  It
> didn't take long at all to adapt my workflow.
> 
> A few notes from my experience:
> 
> * I'm using an Ubuntu-based distribution, and the version of meson that apt
> installed was not new enough for Postgres.  I ended up cloning meson [0]
> and using the newest tag.  This is no big deal.

I assume this is 20.04 LTS? If so, we're missing it by one version of meson
currently. There's unfortunately a few features that'd be a bit painful to not
have.


> * The installed binaries were unable to locate libraries like libpq.  I
> ended up setting the extra_lib_dirs option to the directory where these
> libraries were installed to fix this.  This one is probably worth
> investigating further.

I think that should be "fixed" in a later commit in the meson tree - any
chance you could try that?

https://github.com/anarazel/postgres/tree/meson


> * meson really doesn't like it when there are things leftover from
> configure/make.  Whenever I switch from make to meson, I have to run 'make
> maintainer-clean'.

Yes. I recommend building out-of-tree with autoconf as well.


> Otherwise, all of my usual build options, ccache, etc. are working just
> like before.  Nice work!

Cool!

Thanks for testing,

Andres Freund

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Peter Geoghegan

On Thu, Sep 22, 2022 at 1:05 PM Nathan Bossart  wrote:
> Otherwise, all of my usual build options, ccache, etc. are working just
> like before.  Nice work!

+1

Is it generally recommended that individual hackers mostly switch over
to Meson for their day to day work soon? I'm guessing that this
question doesn't really have a clear answer yet, but thought I'd ask,
just in case.

--
Peter Geoghegan

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Nathan Bossart

I gave the meson build system a try, and it seems to work nicely.  It
didn't take long at all to adapt my workflow.

A few notes from my experience:

* I'm using an Ubuntu-based distribution, and the version of meson that apt
installed was not new enough for Postgres.  I ended up cloning meson [0]
and using the newest tag.  This is no big deal.

* The installed binaries were unable to locate libraries like libpq.  I
ended up setting the extra_lib_dirs option to the directory where these
libraries were installed to fix this.  This one is probably worth
investigating further.

* meson really doesn't like it when there are things leftover from
configure/make.  Whenever I switch from make to meson, I have to run 'make
maintainer-clean'.

Otherwise, all of my usual build options, ccache, etc. are working just
like before.  Nice work!

[0] https://github.com/mesonbuild/meson

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs

On Thu, 22 Sept 2022 at 15:16, Alvaro Herrera  wrote:
>
> On 2022-Sep-22, Simon Riggs wrote:
>
> > On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> > > VACUUM was willing to remove a committed-dead tuple immediately if it 
> > > was
> > > deleted by the same transaction that inserted it.  The idea is that 
> > > such a
> > > tuple could never have been visible to any other transaction, so we 
> > > don't
> > > need to keep it around to satisfy MVCC snapshots.  However, there was
> > > already an exception for tuples that are part of an update chain, and 
> > > this
> > > exception created a problem: we might remove TOAST tuples (which are 
> > > never
> > > part of an update chain) while their parent tuple stayed around (if 
> > > it was
> > > part of an update chain).  This didn't pose a problem for most things,
> > > since the parent tuple is indeed dead: no snapshot will ever consider 
> > > it
> > > visible.  But MVCC-safe CLUSTER had a problem, since it will try to 
> > > copy
> > > RECENTLY_DEAD tuples to the new table.  It then has to copy their 
> > > TOAST
> > > data too, and would fail if VACUUM had already removed the toast 
> > > tuples.
>
> > Good research Greg, thank you. Only took 10 years for me to notice it
> > was gone ;-)
>
> But this begs the question: is the proposed change safe, given that
> ancient consideration?  I don't think TOAST issues have been mentioned
> in this thread so far, so I wonder if there is a test case that verifies
> that this problem doesn't occur for some reason.

Oh, completely agreed.

I will submit a modified patch that adds a test case and just a
comment to explain why we can't remove such rows.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: archive modules

2022-09-22 Thread Nathan Bossart

On Wed, Sep 14, 2022 at 09:33:46PM +0200, Peter Eisentraut wrote:
> Another question on this feature: Currently, if archive_library is set,
> archive_command is ignored.  I think if both are set, it should be an error.
> Compare for example what happens if you set multiple recovery_target_xxx
> settings.  I don't think silently turning off one setting by setting another
> is a good behavior.

Peter, would you like to proceed with something like [0] to resolve this?
If so, I will work on cleaning the patch up.

[0] https://postgr.es/m/20220914222736.GA3042279%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: ICU for global collation

2022-09-22 Thread Marina Polyakova


On 2022-09-21 17:53, Peter Eisentraut wrote:

Committed with that test, thanks.  I think that covers all the ICU
issues you reported for PG15 for now?


I thought about the order of the ICU checks - if it is ok to check that 
the selected encoding is supported by ICU after printing all the locale 
& encoding information, why not to move almost all the ICU checks 
here?..


Examples of the work of the attached patch:

1. ICU locale vs supported encoding:

1.1.

$ initdb --encoding sql-ascii --locale-provider icu hoge
...
initdb: error: encoding mismatch
initdb: detail: The encoding you selected (SQL_ASCII) is not supported 
with the ICU provider.
initdb: hint: Rerun initdb and either do not specify an encoding 
explicitly, or choose a matching combination.


1.2. (like before)

$ initdb --encoding sql-ascii --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider 
"icu" is chosen


$ createdb --encoding sql-ascii --icu-locale en-US hoge
createdb: error: database creation failed: ERROR:  ICU locale cannot be 
specified unless locale provider is ICU


2. For builds without ICU:

2.1.

$ initdb --locale-provider icu hoge
...
initdb: error: ICU is not supported in this build

$ createdb --locale-provider icu hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build


2.2. (like before)

$ initdb --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider 
"icu" is chosen


$ createdb --icu-locale en-US hoge
createdb: error: database creation failed: ERROR:  ICU locale cannot be 
specified unless locale provider is ICU


2.3.

$ createdb --locale-provider icu --icu-locale en-US --encoding sql-ascii 
hoge
createdb: error: database creation failed: ERROR:  ICU is not supported 
in this build


4. About errors in initdb:

4.1. If icu_locale is not specified, but it is required, then we get 
this:


$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user 
"marina".

This user must also own the server process.

The database cluster will be initialized with this locale configuration:
  provider:icu
  LC_COLLATE:  en_US.UTF-8
  LC_CTYPE:en_US.UTF-8
  LC_MESSAGES: en_US.UTF-8
  LC_MONETARY: ru_RU.UTF-8
  LC_NUMERIC:  ru_RU.UTF-8
  LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU locale must be specified

Almost the same if ICU is not supported in this build:

$ initdb --locale-provider icu hoge
The files belonging to this database system will be owned by user 
"marina".

This user must also own the server process.

The database cluster will be initialized with this locale configuration:
  provider:icu
  LC_COLLATE:  en_US.UTF-8
  LC_CTYPE:en_US.UTF-8
  LC_MESSAGES: en_US.UTF-8
  LC_MONETARY: ru_RU.UTF-8
  LC_NUMERIC:  ru_RU.UTF-8
  LC_TIME: ru_RU.UTF-8
The default database encoding has been set to "UTF8".
initdb: error: ICU is not supported in this build

4.2. If icu_locale is specified for the wrong provider, the error will 
be at the beginning of the program start as before:


$ initdb --icu-locale en-US hoge
initdb: error: --icu-locale cannot be specified unless locale provider 
"icu" is chosen


--
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Companydiff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index e0753c1badcc299e2bac45f3bdd2f23f59d70cbc..2589a2523e07c9543c99c7d7b446438d62382b89 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1030,6 +1030,11 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
+#ifndef USE_ICU
+		ereport(ERROR,
+(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("ICU is not supported in this build")));
+#else
 		if (!(is_encoding_supported_by_icu(encoding)))
 			ereport(ERROR,
 	(errcode(ERRCODE_INVALID_PARAMETER_VALUE),
@@ -1046,6 +1051,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 errmsg("ICU locale must be specified")));
 
 		check_icu_locale(dbiculocale);
+#endif
 	}
 	else
 	{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8b751bcb5cda3a1f4c7803a68bc0a4a..743f11e1d1fd62d500fc2846976472d13e08046b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1945,15 +1945,12 @@ icu_set_collation_attributes(UCollator *collator, const char *loc)
 	}
 }
 
-#endif			/* USE_ICU */
-
 /*
  * Check if the given locale ID is valid, and ereport(ERROR) if it isn't.
  */
 void
 check_icu_locale(const char *icu_locale)
 {
-#ifdef USE_ICU
 	UCollator  *collator;
 	UErrorCode	status;
 
@@ -1967,13 +1964,10 @@ check_icu_locale(const char *icu_locale)
 	if (U_ICU_VERSION_MAJOR_NUM < 54)
 		icu_set_collation_attributes(collator, icu_locale);
 	ucol_close(collator);
-#else
-	ereport(ERROR,
-

Re: Mingw task for Cirrus CI

2022-09-22 Thread Andres Freund

Hi Melih,

On 2022-09-05 16:52:17 -0700, Andres Freund wrote:
> I think we can convert this to meson soon, and that seems a *lot* faster at
> configure than autoconf on mingw. Not even close to as fast as on a modern-ish
> linux, but not that painful.

Now that meson has been merged, could you try to adjust this patch so it
builds with meson?

Regards,

Andres

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Andres Freund

On 2022-09-22 12:48:47 -0400, Tom Lane wrote:
> I wrote:
> > I think this is the issue Peter mentioned about needing to distinguish
> > between empty strings and NULL strings.  We're going to need to rethink
> > the behavior of pg_strtok() a bit to fix that.
> 
> After staring at the code a bit, I think we don't need to touch
> pg_strtok() per se.  I propose that this can be resolved with changes
> at the next higher level.  Let's make outToken print NULL as <> as
> it always has, but print an empty string as "" (two double quotes).
> If the raw input string is two double quotes, print it as \"" to
> disambiguate.  This'd require a catversion bump when committed,
> but I don't think there are any showstopper problems otherwise.

Makes sense to me.

Re: binary version of pg_current_wal_insert_lsn and pg_walfile_name functions

2022-09-22 Thread Ashutosh Sharma

On Thu, Sep 22, 2022 at 7:41 AM Bharath Rupireddy
 wrote:
>
> On Wed, Sep 21, 2022 at 9:53 PM Ashutosh Sharma  wrote:
> >
> > Yeah, we can either add this functionality to pg_waldump or maybe add
> > a new binary itself that would return this information.
>
> IMV, a separate tool isn't the way, since pg_waldump already reads WAL
> files and decodes WAL records, what's proposed here can be an
> additional functionality of pg_waldump.
>
> It will be great if an initial patch is posted here.
>

PFA that enhances pg_waldump to show the latest LSN and the
corresponding WAL file when the -l or --lastLSN option is passed an
argument to pg_waldump. Below is an example:

ashu@92893de650ed:~/pgsql$ pg_waldump -l -D ./data-dir
Latest LSN: 0/148A45F8
Latest WAL filename: 00010014

How has it been coded?

When the user passes the '-l' command line option along with the data
directory path to pg_waldump, it reads the control file from the data
directory. From the control file, it gets information like redo
pointer and current timeline id. The redo pointer is considered to be
the start pointer from where the pg_waldump starts reading wal data
until end-of-wal to find the last LSN. For details please check the
attached patch.

Please note that for compressed and .partial wal files this doesn't work.

--
With Regards,
Ashutosh Sharma.

0001-Enhance-pg_waldump-to-show-latest-LSN-and-the-WAL-fi.patch
Description: Binary data

Re: Reducing the WAL overhead of freezing in VACUUM by deduplicating per-tuple freeze plans

2022-09-22 Thread Peter Geoghegan

On Wed, Sep 21, 2022 at 9:21 PM Nathan Bossart  wrote:
> I wouldn't mind giving this a try.

Definitely seems worth experimenting with. Many WAL records generated
during VACUUM (and during opportunistic pruning/index tuple deletion)
have offset number arrays that can be assumed to be both sorted and
unique. My guess is that these WAL record types are the most amenable
to compression at the WAL record level.

> Yeah, it seems likely that we could pack offsets in single bytes in many
> cases.  A more sophisticated approach could even choose how many bits to
> use per offset based on the maximum in the array.  Furthermore, we might be
> able to make use of SIMD instructions to mitigate any performance penalty.

I guess I'd start with some kind of differential compression that
relies on the arrays being both sorted and unique. While it might be
important to be able to compose together multiple different techniques
(something that is more than the sum of its parts can be very useful),
it seems most important to quickly validate the basic idea first.

One obvious thing that still seems worth pointing out: you may not
need to decompress anything. All that you really need to do is to get
the logic from some routine like PageIndexMultiDelete() to be executed
by a REDO routine. Perhaps it'll make sense to come up with a
representation that can just be passed to the routine directly. (I
really don't know how likely this is, but it's something to consider.)

> I'm tempted to start by just using single-byte offsets when possible since
> that should be relatively simple while still yielding a decent improvement
> for many workloads.  What do you think?

The really big wins for WAL size will come from applying high level
insights about what is really needed, in all likelihood. The main
overhead is very often generic WAL record header overhead. When it is
then it'll be hard to really move the needle just by compressing the
payload. To do that you'd need to find a way to use fewer WAL records
to do the same amount of work.

The thing that I think will really make the biggest difference is
merging PRUNE records with FREEZE records (and maybe even make the
merged record do the work of a VISIBLE record when that's possible).
Just because now you have 1 WAL record instead of 2 (or even 3) WAL
records. Obviously that's a complicated project, but it's another case
where it feels like the more efficient approach might also be simpler.
We often write a PRUNE record with only one or two items in the array,
in which case it's practically free to do some freezing, at least in
terms of WAL space overhead (as long as you can do it with the same
WAL record). Plus freezing is already inescapably tied to pruning --
we always prune a page that we're going to try to freeze in VACUUM (we
can't safely freeze dead tuples, so there is more or less a dependency
already).

Not that you shouldn't pursue compressing the payload from WAL records
as a project -- maybe that'll work very well. I'm just pointing out
that there is a bigger picture, that may or may not end up mattering
here. For the patch on this thread there certainly is a bigger picture
about costs over time. Something like that could be true for this
other patch too.

It's definitely worth considering the size of the WAL records when
there are only one or two items, how common that may be in each
individual case, etc.
For example, FREEZE records have a minimum size of 64 bytes in
practice, due to WAL record alignment overhead (the payload itself
doesn't usually have to be aligned, but the WAL header still does). It
may be possible to avoid going over the critical threshold that makes
the WAL size one MAXALIGN() quantum larger in the event of having only
a few tuples to freeze, a scenario where negative compression is
likely.

Negative compression is always a potential problem, but maybe you can
deal with it very effectively by thinking about the problem
holistically. If you're "wasting" space that was just going to be
alignment padding anyway, does it really matter at all? (Maybe there
is some reason to care, but I offhand I can't think of one.)

-- 
Peter Geoghegan

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Tom Lane

I wrote:
> I think this is the issue Peter mentioned about needing to distinguish
> between empty strings and NULL strings.  We're going to need to rethink
> the behavior of pg_strtok() a bit to fix that.

After staring at the code a bit, I think we don't need to touch
pg_strtok() per se.  I propose that this can be resolved with changes
at the next higher level.  Let's make outToken print NULL as <> as
it always has, but print an empty string as "" (two double quotes).
If the raw input string is two double quotes, print it as \"" to
disambiguate.  This'd require a catversion bump when committed,
but I don't think there are any showstopper problems otherwise.

I'll work on fleshing that idea out.

regards, tom lane

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> On 2022-08-24 17:25:31 +0200, Peter Eisentraut wrote:
>> Here are patches for that.

> These patches have been failing since they were posted, afaict:
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3848

> I assume that's known?

I think this is the issue Peter mentioned about needing to distinguish
between empty strings and NULL strings.  We're going to need to rethink
the behavior of pg_strtok() a bit to fix that.

regards, tom lane

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-22 Thread Önder Kalacı

Hii Wang wei,

>
> 1. In the function GetCheapestReplicaIdentityFullPath.
> +   if (rel->pathlist == NIL)
> +   {
> +   /*
> +* A sequential scan could have been dominated by by an
> index scan
> +* during make_one_rel(). We should always have a
> sequential scan
> +* before set_cheapest().
> +*/
> +   Path   *seqScanPath = create_seqscan_path(root, rel,
> NULL, 0);
> +
> +   add_path(rel, seqScanPath);
> +   }
>
> This is a question I'm not sure about:
> Do we need this part to add sequential scan?
>
> I think in our case, the sequential scan seems to have been added by the
> function make_one_rel (see function set_plain_rel_pathlist).


Yes, the sequential scan is added during make_one_rel.


> If I am missing
> something, please let me know. BTW, there is a typo in above comment: `by
> by`.
>

As the comment mentions, the sequential scan could have been dominated &
removed by index scan, see add_path():

> *We also remove from the rel's pathlist any old paths that are dominated
*  by new_path --- that is, new_path is cheaper, at least as well ordered,
*  generates no more rows, requires no outer rels not required by the old
*  path, and is no less parallel-safe.

Still, I agree that the comment could be improved, which I pushed.


> 2. In the file execReplication.c.
> +#ifdef USE_ASSERT_CHECKING
> +#include "catalog/index.h"
> +#endif
>  #include "commands/trigger.h"
>  #include "executor/executor.h"
>  #include "executor/nodeModifyTable.h"
>  #include "nodes/nodeFuncs.h"
>  #include "parser/parse_relation.h"
>  #include "parser/parsetree.h"
> +#ifdef USE_ASSERT_CHECKING
> +#include "replication/logicalrelation.h"
> +#endif
>
> I think it's fine to only add `logicalrelation.h` here, because `index.h`
> has
> been added by `logicalrelation.h`.
>
>
Makes sense, removed thanks.

Attached v13.


v13_0001_use_index_on_subs_when_pub_rep_ident_full.patch
Description: Binary data

Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher

2022-09-22 Thread Önder Kalacı

Hi Peter, Kuroda

kuroda.hay...@fujitsu.com , 21 Eyl 2022 Çar,
04:21 tarihinde şunu yazdı:

> > One last thing - do you think there is any need to mention this
> > behaviour in the pgdocs, or is OK just to be a hidden performance
> > improvement?
>
> FYI - I put my opinion.
> We have following sentence in the logical-replication.sgml:
>
> ```
> ...
> If the table does not have any suitable key, then it can be set
>to replica identity full, which means the entire row
> becomes
>the key.  This, however, is very inefficient and should only be used as
> a
>fallback if no other solution is possible.
> ...
> ```
>
> Here the word "very inefficient" may mean that sequential scans will be
> executed every time.
> I think some descriptions can be added around here.
>

Making a small edit in that file makes sense. I'll attach v13 in the next
email that also includes this change.

Also, do you think is this a good time for me to mark the patch "Ready for
committer" in the commit fest? Not sure when and who should change the
state, but it seems I can change. I couldn't find any documentation on how
that process should work.

Thanks!

Re: Summary function for pg_buffercache

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 18:22:44 +0300, Melih Mutlu wrote:
> Since header locks are removed again, I put some doc changes and comments
> back.

Due to the merge of the meson build system, this needs to adjust meson.build
as well.


> --- a/contrib/pg_buffercache/expected/pg_buffercache.out
> +++ b/contrib/pg_buffercache/expected/pg_buffercache.out
> @@ -8,3 +8,12 @@ from pg_buffercache;
>   t
>  (1 row)
>
> +select buffers_used + buffers_unused > 0,
> +buffers_dirty < buffers_used,
> +buffers_pinned < buffers_used

Doesn't these have to be "<=" instead of "<"?


> + for (int i = 0; i < NBuffers; i++)
> + {
> + BufferDesc *bufHdr;
> + uint32  buf_state;
> +
> + /*
> +  * No need to get locks on buffer headers as we don't rely on 
> the
> +  * results in detail. Therefore, we don't get a consistent 
> snapshot
> +  * across all buffers and it is not guaranteed that the 
> information of
> +  * each buffer is self-consistent as opposed to 
> pg_buffercache_pages.
> +  */

I think the "consistent snapshot" bit is misleading - even taking buffer
header locks wouldn't give you that.


> + if (buffers_used != 0)
> + usagecount_avg = usagecount_avg / buffers_used;

Perhaps the average should be NULL in the buffers_used == 0 case?


> + 
> +  pg_buffercache_pages function
> +  returns a set of records, plus a view 
> pg_buffercache that wraps the function for
> +  convenient use is provided.
> + 
> +
> + 
> +  pg_buffercache_summary function returns a table with 
> a single row
> +  that contains summarized and aggregated information about shared buffer 
> caches.
>   

I think these sentences are missing a "The " at the start?

"shared buffer caches" isn't right - I think I'd just drop the "caches".


> +  
> +   There is a single row to show summarized information of all shared 
> buffers.
> +   pg_buffercache_summary is not interested
> +   in the state of each shared buffer, only shows aggregated information.
> +  
> +
> +  
> +   pg_buffercache_summary doesn't take buffer manager
> +   locks. Unlike pg_buffercache_pages function
> +   pg_buffercache_summary doesn't take buffer headers 
> locks
> +   either, thus the result is not consistent. This is intentional. The 
> purpose
> +   of this function is to provide a general idea about the state of shared
> +   buffers as fast as possible. Additionally, 
> pg_buffercache_summary
> +   allocates much less memory.
> +  
> + 

I don't think this mentioning of buffer header locks is useful for users - nor
is it I think correct. Acquiring the buffer header locks wouldn't add *any*
additional consistency.

Greetings,

Andres Freund

Re: libpq error message refactoring

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-25 16:34:26 +0200, Peter Eisentraut wrote:
> libpq now contains a mix of error message strings that end with newlines and
> don't end with newlines, due to some newer code paths with new ways of
> passing errors around.  This has now gotten me confused a few too many times
> both during development and translation.  So I looked into whether we can
> unify this, similar to how we have done elsewhere (e.g., pg_upgrade).  I
> came up with the attached patch.  It's not complete, but it shows the idea
> and it looks like a nice simplification to me. Thoughts on this approach?

This patch has been failing for a while:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3854

Interestingly, previously the error only happened when targetting windows, but
meson also shows it on freebsd.

It's not the cause of this failure, I think, but doesn't appendPQExpBufferVA
need to be added to exports.txt?

Greetings,

Andres Freund

Re: Temporary file access API

2022-09-22 Thread John Morris

I’m a latecomer to the discussion, but as a word of introduction, I’m working 
with Stephen, and I have started looking over the temp file proposal with the 
idea of helping it move along.

I’ll start by summarizing the temp file proposal and its goals.

>From a high level, the proposed code:

  *   Creates an fread/fwrite replacement (BufFileStream) for buffering data to 
a single file.
  *   Updates BufFile, which reads/writes a set of files, to use BufFileStream 
internally.
  *   Does not impact the normal (page cached) database I/O.
  *   Updates all the other places where fread/fwrite and read/write are used.
  *   Creates and removes transient files.
I see the following goals:

  *   Unify all the “other” file accesses into a single, consistent API.
  *   Integrate with VFDs.
  *   Integrate transient files with transactions and tablespaces.
  *   Create a consolidated framework where features like encryption and 
compression can be more easily added.
  *   Maintain good streaming performance.
Is this a fair description of the proposal?

For myself, I’d like to map out how features like compression and encryption 
would fit into the framework, more as a sanity check than anything else, and 
I’d like to look closer at some of the implementation details. But at the 
moment, I want to make sure I have the proper high-level view of the temp file 
proposal.

From: Robert Haas 
Date: Wednesday, September 21, 2022 at 11:54 AM
To: Antonin Houska 
Cc: PostgreSQL Hackers , Peter Eisentraut 
, Stephen Frost 
Subject: Re: Temporary file access API
On Mon, Aug 8, 2022 at 2:26 PM Antonin Houska  wrote:
> > I don't think that (3) is a separate advantage from (1) and (2), so I
> > don't have anything separate to say about it.
>
> I thought that the uncontrollable buffer size is one of the things you
> complaint about in
>
> https://www.postgresql.org/message-id/CA+TgmoYGjN_f=fcerx49bzjhng+gocty+a+xhnrwcvvdy8u...@mail.gmail.com

Well, I think that email is mostly complaining about there being no
buffering at all in a situation where it would be advantageous to do
some buffering. But that particular code I believe is gone now because
of the shared-memory stats collector, and when looking through the
patch set, I didn't see that any of the cases that it touched were
similar to that one.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-22 Thread Martin Kalcher


Am 22.09.22 um 17:23 schrieb Andres Freund:

Hi,

On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote:

Patch update without merge conflicts.


Due to the merge of the meson based build, this patch needs to be adjusted. See
https://cirrus-ci.com/build/6580671765282816
Looks like it'd just be adding user_prng.c to
src/backend/utils/adt/meson.build's list.

Greetings,

Andres Freund


Hi Andres,

thanks for the heads up. Adjusted patch is attached.

- MartinFrom 3c4a939abf8d29bcf666e49ecb042ade42b36c2c Mon Sep 17 00:00:00 2001
From: Martin Kalcher 
Date: Sun, 17 Jul 2022 18:06:04 +0200
Subject: [PATCH v4] Introduce array_shuffle() and array_sample()

* array_shuffle() shuffles the elements of an array.
* array_sample() chooses n elements from an array by random.

The new functions share its prng state with random() and thus interoperate
with setseed().
---
 doc/src/sgml/func.sgml  |  40 +-
 src/backend/utils/adt/Makefile  |   1 +
 src/backend/utils/adt/array_userfuncs.c | 176 
 src/backend/utils/adt/float.c   |  37 +
 src/backend/utils/adt/meson.build   |   1 +
 src/backend/utils/adt/user_prng.c   |  87 
 src/include/catalog/pg_proc.dat |   6 +
 src/include/utils/user_prng.h   |  19 +++
 src/test/regress/expected/arrays.out|  66 +
 src/test/regress/sql/arrays.sql |  16 +++
 10 files changed, 414 insertions(+), 35 deletions(-)
 create mode 100644 src/backend/utils/adt/user_prng.c
 create mode 100644 src/include/utils/user_prng.h

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e1fe4fec57..2a96fc323a 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1820,7 +1820,8 @@ repeat('Pg', 4) PgPgPgPg
 void


-Sets the seed for subsequent random() calls;
+Sets the seed for subsequent calls to random functions, like
+random() or array_shuffle();
 argument must be between -1.0 and 1.0, inclusive


@@ -1838,7 +1839,8 @@ repeat('Pg', 4) PgPgPgPg
applications; see the  module for a more
secure alternative.
If setseed() is called, the series of results of
-   subsequent random() calls in the current session
+   subsequent calls to random functions, like random() or
+   array_shuffle(), in the current session
can be repeated by re-issuing setseed() with the same
argument.
Without any prior setseed() call in the same
@@ -18468,6 +18470,40 @@ SELECT NULLIF(value, '(none)') ...

   
 
+  
+   
+
+ array_sample
+
+array_sample ( array anyarray, n integer )
+anyarray
+   
+   
+Returns n randomly chosen elements from array in selection order.
+   
+   
+array_sample(ARRAY[[1,2],[3,4],[5,6]], 2)
+{{5,6},{1,2}}
+   
+  
+
+  
+   
+
+ array_shuffle
+
+array_shuffle ( anyarray )
+anyarray
+   
+   
+Shuffles the first dimension of the array.
+   
+   
+array_shuffle(ARRAY[[1,2],[3,4],[5,6]])
+{{5,6},{1,2},{3,4}}
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 0de0bbb1b8..17b57a5569 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -110,6 +110,7 @@ OBJS = \
 	tsvector.o \
 	tsvector_op.o \
 	tsvector_parser.o \
+	user_prng.o \
 	uuid.o \
 	varbit.o \
 	varchar.o \
diff --git a/src/backend/utils/adt/array_userfuncs.c b/src/backend/utils/adt/array_userfuncs.c
index ca70590d7d..c4a2117df7 100644
--- a/src/backend/utils/adt/array_userfuncs.c
+++ b/src/backend/utils/adt/array_userfuncs.c
@@ -17,6 +17,7 @@
 #include "utils/array.h"
 #include "utils/builtins.h"
 #include "utils/lsyscache.h"
+#include "utils/user_prng.h"
 #include "utils/typcache.h"
 
 
@@ -902,3 +903,178 @@ array_positions(PG_FUNCTION_ARGS)
 
 	PG_RETURN_DATUM(makeArrayResult(astate, CurrentMemoryContext));
 }
+
+/*
+ * Produce array with n randomly chosen items from given array in random order.
+ *
+ * array: array object to examine (must not be NULL)
+ * n: number of items (must not be greater than the size of the arrays first dimension)
+ * elmtyp, elmlen, elmbyval, elmalign: info for the datatype of the items
+ *
+ * NOTE: it would be cleaner to look up the elmlen/elmbval/elmalign info
+ * from the system catalogs, given the elmtyp. However, the caller is
+ * in a better position to cache this info across multiple uses, or even
+ * to hard-wire values if the element type is hard-wired.
+ */
+static ArrayType *
+array_shuffle_n(ArrayType *array, int n,
+Oid elmtyp, int elmlen, bool elmbyval, char elmalign)
+{
+	ArrayType  *result;
+	int			ndim,
+			   *dims,
+			   *lbs,
+rdims[MAXDIM],
+nelm,
+nitem,
+i,
+j,
+k;
+	Datum		elm,
+			   *elms,
+			   *ielms,
+			   *jelms;
+	bool		nul,
+

Re: Extending outfuncs support to utility statements

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-24 17:25:31 +0200, Peter Eisentraut wrote:
> Here are patches for that.

These patches have been failing since they were posted, afaict:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3848

I assume that's known? Most of the failures seem to be things like
diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/tablespace.out 
/tmp/cirrus-ci-build/build/testrun/main/regress/results/tablespace.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/tablespace.out   
2022-09-22 12:30:07.340655000 +
+++ /tmp/cirrus-ci-build/build/testrun/main/regress/results/tablespace.out  
2022-09-22 12:35:15.075825000 +
@@ -3,6 +3,8 @@
 ERROR:  tablespace location must be an absolute path
 -- empty tablespace locations are not usually allowed
 CREATE TABLESPACE regress_tblspace LOCATION ''; -- fail
+WARNING:  outfuncs/readfuncs failed to produce an equal raw parse tree
+WARNING:  outfuncs/readfuncs failed to produce an equal rewritten parse tree
 ERROR:  tablespace location must be an absolute path
 -- as a special developer-only option to allow us to use tablespaces
 -- with streaming replication on the same server, an empty location

Greetings,

Andres Freund

Re: Add last_vacuum_index_scans in pg_stat_all_tables

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-16 13:23:06 +0900, Ken Kato wrote:
> Thank you for the review!
> I fixed it and resubmitting the patch.

cfbot flags that the docs aren't valid:
https://cirrus-ci.com/task/5309377937670144?logs=docs_build#L295
[15:05:39.683] monitoring.sgml:4574: parser error : Opening and ending tag 
mismatch: entry line 4567 and row
[15:05:39.683]  
[15:05:39.683]^


The problem is that you're not closing the 

Greetings,

Andres Freund

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 16:43:19 +0900, Michael Paquier wrote:
> I have put my hands on 0001, and finished with the attached, that
> includes many fixes and tweaks.

Due to the merge of the meson based build this patch needs some
adjustment. See
https://cirrus-ci.com/build/6146162607521792
Looks like it just requires adding xlogbackup.c to
src/backend/access/transam/meson.build.

Greetings,

Andres Freund

Re: [PATCH] Introduce array_shuffle() and array_sample()

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-04 07:46:10 +0200, Martin Kalcher wrote:
> Patch update without merge conflicts.

Due to the merge of the meson based build, this patch needs to be adjusted. See
https://cirrus-ci.com/build/6580671765282816
Looks like it'd just be adding user_prng.c to
src/backend/utils/adt/meson.build's list.

Greetings,

Andres Freund

Re: Summary function for pg_buffercache

2022-09-22 Thread Melih Mutlu

Hi,

Since header locks are removed again, I put some doc changes and comments
back.

Thanks,
Melih


v10-0001-Added-pg_buffercache_summary-function.patch
Description: Binary data

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Alvaro Herrera

On 2022-Sep-22, Tom Lane wrote:

> Ah, right, the joys of maintaining multiple build systems.  I wonder
> if there's any way to avoid that by scraping file lists from one
> group to the other.

Or maybe we could have a file common to both, say OBJS, which both
scrape in their own way.  That could be easier than one scraping the
other.

> We got a little spoiled perhaps by the MSVC scripts managing to do
> that in most cases.

Right ... and it was so annoying in the cases it couldn't.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Aprender sin pensar es inútil; pensar sin aprender, peligroso" (Confucio)

Re: Amcheck verification of GiST and GIN

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-17 17:28:02 +0500, Andrey Borodin wrote:
> Here's v13. Changes:
> 1. Fixed passing through downlink in GIN index
> 2. Fixed GIN tests (one test case was not working)
> 
> Thanks to Vitaliy Kukharik for trying this patches.

Due to the merge of the meson based build, this patch needs to be
adjusted. See
https://cirrus-ci.com/build/6637154947301376

The changes should be fairly simple, just mirroring the Makefile ones.

Greetings,

Andres Freund

Re: pg_waldump: add test for coverage

2022-09-22 Thread Andres Freund

Hi,

On 2022-08-23 10:50:08 +0900, Dong Wook Lee wrote:
> I wrote a test for coverage.

Unfortunately the test doesn't seem to pass on windows, and hasn't ever done so:
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3834

Due to the merge of the meson patchset, you should also add 001_basic.pl to
the list of tests in meson.build

Greetings,

Andres Freund

Re: [PATCH] Improve amcheck to also check UNIQUE constraint in btree index.

2022-09-22 Thread Andres Freund

Hi,

On 2022-05-20 17:46:38 +0400, Pavel Borisov wrote:
> CFbot says v12 patch does not apply.
> Rebased. PFA v13.
> Your reviews are very much welcome!

Due to the merge of the meson based build this patch needs to be
adjusted: https://cirrus-ci.com/build/6350479973154816

Looks like you need to add amcheck--1.3--1.4.sql to the list of files to be
installed and t/004_verify_nbtree_unique.pl to the tests.

Greetings,

Andres Freund

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor

On Thu, Sep 22, 2022 at 11:46 AM John Naylor 
wrote:
> One thing I want to try soon is storing fewer than 16/32 etc entries, so
that the whole node fits comfortably inside a power-of-two allocation. That
would allow us to use aset without wasting space for the smaller nodes,
which would be faster and possibly would solve the fragmentation problem
Andres referred to in

>
https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de

While calculating node sizes that fit within a power-of-two size, I noticed
the current base node is a bit wasteful, taking up 8 bytes. The node kind
only has a small number of values, so it doesn't really make sense to use
an enum here in the struct (in fact, Andres' prototype used a uint8 for
node_kind). We could use a bitfield for the count and kind:

uint16 -- kind and count bitfield
uint8 shift;
uint8 chunk;

That's only 4 bytes. Plus, if the kind is ever encoded in a pointer tag,
the bitfield can just go back to being count only.

Here are the v6 node kinds:

node4:   8 +   4 +(4)+   4*8 =   48 bytes
node16:  8 +  16 +  16*8 =  152
node32:  8 +  32 +  32*8 =  296
node128: 8 + 256 + 128/8 + 128*8 = 1304
node256: 8   + 256/8 + 256*8 = 2088

And here are the possible ways we could optimize nodes for space using aset
allocation. Parentheses are padding bytes. Even if my math has mistakes,
the numbers shouldn't be too far off:

node3:   4 +   3 +(1)+   3*8 =   32 bytes
node6:   4 +   6 +(6)+   6*8 =   64
node13:  4 +  13 +(7)+  13*8 =  128
node28:  4 +  28 +  28*8 =  256
node31:  4 + 256 +  32/8 +  31*8 =  512 (XXX not good)
node94:  4 + 256 +  96/8 +  94*8 = 1024
node220: 4 + 256 + 224/8 + 220*8 = 2048
node256: = 4096

The main disadvantage is that node256 would balloon in size.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: proposal: possibility to read dumped table's name from file

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-12 09:58:37 +0200, Daniel Gustafsson wrote:
> > On 9 Sep 2022, at 11:00, Andrew Dunstan  wrote:
> > 
> >> On Sep 9, 2022, at 5:53 PM, John Naylor  
> >> wrote:
> >> 
> >> Note that the grammar has shift-reduce conflicts. 
> 
> > Looks like the last rule for Filters should not be there.
> 
> Correct, fixed in the attached.

Due to the merge of the meson build, this patch now needs to adjust the
relevant meson.build. This is the cause of the failures at:
https://cirrus-ci.com/build/5788292678418432

See e.g. src/bin/pgbench/meson.build

Greetings,

Andres Freund

Re: Tab complete for CREATE SUBSCRIPTION ... CONECTION does not work

2022-09-22 Thread Japin Li


On Tue, 20 Sep 2022 at 00:19, Japin Li  wrote:
> Hi hacker,
>
> As $subject detailed, the tab-complete cannot work such as:
>
>CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres port=6543' \t
>
> It seems that the get_previous_words() could not parse the single quote.
>
> OTOH, it works for CREATE SUBSCRIPTION sub CONNECTION xx \t, should we fix it?

Attach fixed this problem.  Any thoughts?

-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

>From 40b3ef8decd305bdaefddd797872da05bb5b65b4 Mon Sep 17 00:00:00 2001
From: Japin Li 
Date: Thu, 22 Sep 2022 22:39:00 +0800
Subject: [PATCH v1] Fix tab-completation for CREATE SUBSCRIPTION

---
 src/bin/psql/tab-complete.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index 820f47d23a..73c34aed83 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -6068,7 +6068,7 @@ get_previous_words(int point, char **buffer, int *nwords)
 		 */
 		for (start = end; start > 0; start--)
 		{
-			if (buf[start] == '"')
+			if (buf[start] == '"' || buf[start] == '\'')
 inquotes = !inquotes;
 			if (!inquotes)
 			{
-- 
2.25.1

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 04:29:15 -0400, Andrew Dunstan wrote:
> Great. Now I'll start on buildfarm support. Given my current
> commitments, this will take me a while, but I hope to have a working
> client by about the beginning of November.

Great! Let me know if there's something I can do to help.

Greetings,

Andres Freund

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Andres Freund

Hi,

On 2022-09-22 16:56:57 +0200, Alvaro Herrera wrote:
> On 2022-Sep-22, Tom Lane wrote:
> > Initial reports from the cfbot are mostly promising, but there are a few
> > patches where all the meson builds fail while all the autoconf ones pass,
> > so there's something for you to look at.  So far CF entries 3464, 3733,
> > 3771, 3808 look that way.
> 
> Hmm, but those patches add files, which means they're now outdated: they
> need to add these files to the respective meson.build file.

Yea, I looked through all of these and they all need need a simple addition of
a file to be built or installed.

Greetings,

Andres Freund

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Tom Lane

Alvaro Herrera  writes:
> On 2022-Sep-22, Tom Lane wrote:
>> Initial reports from the cfbot are mostly promising, but there are a few
>> patches where all the meson builds fail while all the autoconf ones pass,
>> so there's something for you to look at.  So far CF entries 3464, 3733,
>> 3771, 3808 look that way.

> Hmm, but those patches add files, which means they're now outdated: they
> need to add these files to the respective meson.build file.

Ah, right, the joys of maintaining multiple build systems.  I wonder
if there's any way to avoid that by scraping file lists from one
group to the other.  We got a little spoiled perhaps by the MSVC
scripts managing to do that in most cases.

regards, tom lane

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Alvaro Herrera

On 2022-Sep-22, Tom Lane wrote:

> Initial reports from the cfbot are mostly promising, but there are a few
> patches where all the meson builds fail while all the autoconf ones pass,
> so there's something for you to look at.  So far CF entries 3464, 3733,
> 3771, 3808 look that way.

Hmm, but those patches add files, which means they're now outdated: they
need to add these files to the respective meson.build file.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Tom Lane

Andres Freund  writes:
> On 2022-09-21 09:46:30 -0700, Andres Freund wrote:
>> I'm planning to commit this today, unless somebody wants to argue against
>> that.

> And done!

Yay!

Initial reports from the cfbot are mostly promising, but there are a few
patches where all the meson builds fail while all the autoconf ones pass,
so there's something for you to look at.  So far CF entries 3464, 3733,
3771, 3808 look that way.

regards, tom lane

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor

On Thu, Sep 22, 2022 at 7:52 PM John Naylor 
wrote:
>
>
> On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada 
wrote:
> > Good point. While keeping the chunks in the small nodes in sorted
> > order is useful for visiting all keys in sorted order, additional
> > branches and memmove calls could be slow.
>
> Right, the ordering is a property that some users will need, so best to
keep it. Although the node128 doesn't have that property -- too slow to do
so, I think.

Nevermind, I must have been mixing up keys and values there...

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Pruning never visible changes

2022-09-22 Thread Alvaro Herrera

On 2022-Sep-22, Simon Riggs wrote:

> On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:

> > VACUUM was willing to remove a committed-dead tuple immediately if it 
> > was
> > deleted by the same transaction that inserted it.  The idea is that 
> > such a
> > tuple could never have been visible to any other transaction, so we 
> > don't
> > need to keep it around to satisfy MVCC snapshots.  However, there was
> > already an exception for tuples that are part of an update chain, and 
> > this
> > exception created a problem: we might remove TOAST tuples (which are 
> > never
> > part of an update chain) while their parent tuple stayed around (if it 
> > was
> > part of an update chain).  This didn't pose a problem for most things,
> > since the parent tuple is indeed dead: no snapshot will ever consider it
> > visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
> > RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
> > data too, and would fail if VACUUM had already removed the toast tuples.

> Good research Greg, thank you. Only took 10 years for me to notice it
> was gone ;-)

But this begs the question: is the proposed change safe, given that
ancient consideration?  I don't think TOAST issues have been mentioned
in this thread so far, so I wonder if there is a test case that verifies
that this problem doesn't occur for some reason.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/

Re: RFC: Logging plan of the running query

2022-09-22 Thread torikoshia


On 2022-09-21 17:30, Alena Rybakina wrote:

Thanks for your reply!


I also noticed it. However I also discovered that above function
declarations to be aplied for explain command and used to be printed
details of the executed query.

We have a similar task to print the plan of an interrupted process
making a request for a specific pid.

In short, I think, this task is different and for separating these
parts I added this comment.


I'm not sure I understand your comment correctly, do you mean 
HandleLogQueryPlanInterrupt() should not be placed in explain.c?


It may be so.
However, given that ProcesLogMemoryContextInterrupt(), which similarly 
handles interrupts for pg_log_backend_memory_contexts(), is located in 
mcxt.c, I also think current location might be acceptable.


--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATION

Re: why can't a table be part of the same publication as its schema

2022-09-22 Thread Jonathan S. Katz




> On Sep 22, 2022, at 8:02 AM, Alvaro Herrera  wrote:
> 
> FWIW I put this to CI:
> https://cirrus-ci.com/build/5823276948652032 (master)
> 
> and everything appears to be OK.  If anybody has reservations about this
> grammar change, please speak up soon, as there's not much time before RC1.
> 
> The one for 15 just started running:
> https://cirrus-ci.com/build/4735322423558144

[personal hat, not RMT]

Looks like it passed. No objections.

Jonathan

Re: Fix snapshot name for SET TRANSACTION documentation

2022-09-22 Thread Japin Li



On Thu, 22 Sep 2022 at 12:00, Fujii Masao  wrote:
> On 2022/09/21 14:40, Fujii Masao wrote:
>> On 2022/09/21 12:01, Japin Li wrote:
>>>
>>> Hi hackers,
>>>
>>> In 6c2003f8a1bbc7c192a2e83ec51581c018aa162f, we change the snapshot name
>>> when exporting snapshot, however, there is one place we missed update the
>>> snapshot name in documentation.  Attach a patch to fix it.
>> Thanks for the patch! Looks good to me.
>
> Pushed. Thanks!
>

Thanks!


-- 
Regrads,
Japin Li.
ChengDu WenWu Information Technology Co.,Ltd.

Re: Pruning never visible changes

2022-09-22 Thread Simon Riggs

On Mon, 19 Sept 2022 at 00:16, Greg Stark  wrote:
>
> On Fri, 16 Sept 2022 at 10:27, Tom Lane  wrote:
> >
> > Simon Riggs  writes:
> > > A user asked me whether we prune never visible changes, such as
> > > BEGIN;
> > > INSERT...
> > > UPDATE.. (same row)
> > > COMMIT;
> >
> > Didn't we just have this discussion in another thread?
>
> Well.  not "just" :)
>
> commit 44e4bbf75d56e643b6afefd5cdcffccb68cce414
> Author: Tom Lane 
> Date:   Fri Apr 29 16:29:42 2011 -0400
>
> Remove special case for xmin == xmax in HeapTupleSatisfiesVacuum().
>
> VACUUM was willing to remove a committed-dead tuple immediately if it was
> deleted by the same transaction that inserted it.  The idea is that such a
> tuple could never have been visible to any other transaction, so we don't
> need to keep it around to satisfy MVCC snapshots.  However, there was
> already an exception for tuples that are part of an update chain, and this
> exception created a problem: we might remove TOAST tuples (which are never
> part of an update chain) while their parent tuple stayed around (if it was
> part of an update chain).  This didn't pose a problem for most things,
> since the parent tuple is indeed dead: no snapshot will ever consider it
> visible.  But MVCC-safe CLUSTER had a problem, since it will try to copy
> RECENTLY_DEAD tuples to the new table.  It then has to copy their TOAST
> data too, and would fail if VACUUM had already removed the toast tuples.
>
> Easiest fix is to get rid of the special case for xmin == xmax.  This may
> delay reclaiming dead space for a little bit in some cases, but it's by 
> far
> the most reliable way to fix the issue.
>
> Per bug #5998 from Mark Reid.  Back-patch to 8.3, which is the oldest
> version with MVCC-safe CLUSTER.

Good research Greg, thank you. Only took 10 years for me to notice it
was gone ;-)

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread John Naylor

On Thu, Sep 22, 2022 at 1:26 PM Masahiko Sawada 
wrote:
>
> On Thu, Sep 22, 2022 at 1:46 PM John Naylor
>  wrote:
> > While on the subject, I wonder how important it is to keep the chunks
in the small nodes in sorted order. That adds branches and memmove calls,
and is the whole reason for the recent "pg_lfind_ge" function.
>
> Good point. While keeping the chunks in the small nodes in sorted
> order is useful for visiting all keys in sorted order, additional
> branches and memmove calls could be slow.

Right, the ordering is a property that some users will need, so best to
keep it. Although the node128 doesn't have that property -- too slow to do
so, I think.

--
John Naylor
EDB: http://www.enterprisedb.com

Re: why can't a table be part of the same publication as its schema

2022-09-22 Thread Alvaro Herrera

FWIW I put this to CI:
https://cirrus-ci.com/build/5823276948652032 (master)

and everything appears to be OK.  If anybody has reservations about this
grammar change, please speak up soon, as there's not much time before RC1.

The one for 15 just started running:
https://cirrus-ci.com/build/4735322423558144

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"El miedo atento y previsor es la madre de la seguridad" (E. Burke)

Re: archive modules

2022-09-22 Thread Peter Eisentraut


On 17.09.22 11:49, Peter Eisentraut wrote:

On 14.09.22 23:09, Nathan Bossart wrote:

On Wed, Sep 14, 2022 at 09:31:04PM +0200, Peter Eisentraut wrote:

Here is a patch that addresses this.


My intent was to present archive_command as the built-in archive library,
but I can see how this might cause confusion, so this change seems
reasonable to me.


While working on this, I noticed that in master this conflicts with 
commit 3cabe45a819f8a2a282d9d57e45f259c84e97c3f.  I have posted a 
message in that thread looking for a resolution.


I have received clarification there, so I went ahead with this patch 
here after some adjustments in master around that other patch.

Re: [PoC] Let libpq reject unexpected authentication requests

2022-09-22 Thread Peter Eisentraut


On 22.09.22 01:37, Jacob Champion wrote:

On Wed, Sep 21, 2022 at 3:36 PM Peter Eisentraut
 wrote:

So let's look at the two TODO comments you have:

   * TODO: how should !auth_required interact with an incomplete
   * SCRAM exchange?

What specific combination of events are you thinking of here?


Let's say the client is using `require_auth=!password`. If the server
starts a SCRAM exchange. but doesn't finish it, the connection will
still succeed with the current implementation (because it satisfies
the "none" case). This is also true for a client setting of
`require_auth=scram-sha-256,none`. I think this is potentially
dangerous, but it mirrors the current behavior of libpq and I'm not
sure that we should change it as part of this patch.


It might be worth reviewing that behavior for other reasons, but I think 
semantics of your patch are correct.



In any case, it seems to me that it would be safer to *not*
make this assumption at first and then have someone more knowledgeable
make the argument that it would be safe.


I think I'm okay with that, regardless. Here's one of the wrinkles:
right now, both of the following connstrings work:

 require_auth=gss gssencmode=require
 require_auth=gss gssencmode=prefer

If we don't treat gssencmode as providing GSS auth, then the first
case will always fail, because there will be no GSS authentication
packet over an encrypted connection. Likewise, the second case will
almost always fail, unless the server doesn't support gssencmode at
all (so why are you using prefer?).

If you're okay with those limitations, I will rip out the code. The
reason I'm not too worried about it is, I don't think it makes much
sense to be strict about your authentication requirements while at the
same time leaving the choice of transport encryption up to the server.


The way I understand what you explained here is that it would be more 
sensible to leave that code in.  I would be okay with that.

Re: pg_basebackup's --gzip switch misbehaves

2022-09-22 Thread Michael Paquier

On Thu, Sep 22, 2022 at 12:47:34AM -0400, Tom Lane wrote:
> Sure.  I'd say we have 48 hours to choose whether to put this in v15.
> But not more than that.

I have a window to be able to look at the buildfarm today, tomorrow
being harder, so I have adjusted that now on both HEAD and
REL_15_STABLE for consistency.
--
Michael


signature.asc
Description: PGP signature

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread kuroda.hay...@fujitsu.com

Hi Amit,

> > I checked other flags that are set by signal handlers, their datatype 
> > seemed to
> be sig_atomic_t.
> > Is there any reasons that you use normal bool? It should be changed if not.
> >
> 
> It follows the logic similar to ParallelMessagePending. Do you see any
> problem with it?

Hmm, one consideration is:
what will happen if the signal handler HandleParallelApplyMessageInterrupt() is 
fired during "ParallelApplyMessagePending = false;"?
IIUC sig_atomic_t has been needed to avoid writing to same data at the same 
time.

According to C99 standard(I grepped draft version [1]), the behavior seems to 
be undefined if the signal handler
attaches to not "volatile sig_atomic_t" data.
...But I'm not sure whether this is really problematic in the current system, 
sorry...

```
If the signal occurs other than as the result of calling the abort or raise 
function,
the behavior is undefined if the signal handler refers to any object with 
static storage duration other than by assigning a value to an object declared 
as volatile sig_atomic_t,
or the signal handler calls any function in the standard library other than the 
abort function,
the _Exit function, or the signal function with the first argument equal to the 
signal number corresponding to the signal that caused the invocation of the 
handler.
```

> > a.
> > I was not sure when the cell should be cleaned. Currently we clean up
> ParallelApplyWorkersList() only in the parallel_apply_start_worker(),
> > but we have chances to remove such a cell like HandleParallelApplyMessages()
> or HandleParallelApplyMessage(). How do you think?
> >
> 
> Note that HandleParallelApply* are invoked during interrupt handling,
> so it may not be advisable to remove it there.
> 
> >
> > 12. ConfigureNamesInt
> >
> > ```
> > +   {
> > +   {"max_parallel_apply_workers_per_subscription",
> > +   PGC_SIGHUP,
> > +   REPLICATION_SUBSCRIBERS,
> > +   gettext_noop("Maximum number of parallel apply
> workers per subscription."),
> > +   NULL,
> > +   },
> > +   _parallel_apply_workers_per_subscription,
> > +   2, 0, MAX_BACKENDS,
> > +   NULL, NULL, NULL
> > +   },
> > ```
> >
> > This parameter can be changed by pg_ctl reload, so the following corner case
> may be occurred.
> > Should we add a assign hook to handle this? Or, can we ignore it?
> >
> 
> I think we can ignore this as it will eventually start respecting the 
> threshold.

Both of you said are reasonable for me.

[1]: https://www.open-std.org/JTC1/SC22/WG14/www/docs/n1256.pdf

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Re: cannot to compile master in llvm_resolve_symbols

2022-09-22 Thread Pavel Stehule

čt 22. 9. 2022 v 12:54 odesílatel Thomas Munro 
napsal:

> On Thu, Sep 22, 2022 at 10:43 PM Pavel Stehule 
> wrote:
> > Today I found new bug
> >
> > -o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po
> > llvmjit.c: In function ‘llvm_resolve_symbols’:
> > llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use
> in this function); did you mean ‘LLVMOrcCSymbolMapPair’?
> >  1115 | LLVMOrcCSymbolMapPairs symbols =
> palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize);
> >   |
>  ^
> >   |
>  LLVMOrcCSymbolMapPair
>
> Hi Pavel,
>
> Some changes are needed for LLVM 15.  I'm working on a patch, but it's
> not quite ready yet.  Use LLVM 14 for now.  There are a few
> superficial changes like that that are very easy to fix (that struct's
> name changed), but the real problem is that in LLVM 15 you have to do
> extra work to track the type of pointers and pass them into API calls
> that we have a lot of.  https://llvm.org/docs/OpaquePointers.html


Thank you for info

Regards

Pavel

Re: cannot to compile master in llvm_resolve_symbols

2022-09-22 Thread Thomas Munro

On Thu, Sep 22, 2022 at 10:43 PM Pavel Stehule  wrote:
> Today I found new bug
>
> -o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po
> llvmjit.c: In function ‘llvm_resolve_symbols’:
> llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use in 
> this function); did you mean ‘LLVMOrcCSymbolMapPair’?
>  1115 | LLVMOrcCSymbolMapPairs symbols = 
> palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize);
>   | 
> ^
>   | 
> LLVMOrcCSymbolMapPair

Hi Pavel,

Some changes are needed for LLVM 15.  I'm working on a patch, but it's
not quite ready yet.  Use LLVM 14 for now.  There are a few
superficial changes like that that are very easy to fix (that struct's
name changed), but the real problem is that in LLVM 15 you have to do
extra work to track the type of pointers and pass them into API calls
that we have a lot of.  https://llvm.org/docs/OpaquePointers.html

cannot to compile master in llvm_resolve_symbols

2022-09-22 Thread Pavel Stehule

Hi

Today I found new bug

-o llvmjit_wrap.o llvmjit_wrap.cpp -MMD -MP -MF .deps/llvmjit_wrap.Po
llvmjit.c: In function ‘llvm_resolve_symbols’:
llvmjit.c:1115:57: error: ‘LLVMJITCSymbolMapPair’ undeclared (first use in
this function); did you mean ‘LLVMOrcCSymbolMapPair’?
 1115 | LLVMOrcCSymbolMapPairs symbols =
palloc0(sizeof(LLVMJITCSymbolMapPair) * LookupSetSize);
  |
^
  |
LLVMOrcCSymbolMapPair
llvmjit.c:1115:57: note: each undeclared identifier is reported only once
for each function it appears in
llvmjit.c: In function ‘llvm_create_jit_instance’:
llvmjit.c:1233:19: error: too few arguments to function
‘LLVMOrcCreateCustomCAPIDefinitionGenerator’
 1233 | ref_gen =
LLVMOrcCreateCustomCAPIDefinitionGenerator(llvm_resolve_symbols, NULL);
  |   ^~
In file included from llvmjit.c:22:
/usr/include/llvm-c/Orc.h:997:31: note: declared here
  997 | LLVMOrcDefinitionGeneratorRef
LLVMOrcCreateCustomCAPIDefinitionGenerator(
  |
^~

I have fedora 37

Regards

Pavel

Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread Amit Kapila

On Thu, Sep 22, 2022 at 8:59 AM wangw.f...@fujitsu.com
 wrote:
>

Few comments on v33-0001
===
1.
+ else if (data->streaming == SUBSTREAM_PARALLEL &&
+ data->protocol_version < LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("requested proto_version=%d does not support
streaming=parallel mode, need %d or higher",
+ data->protocol_version, LOGICALREP_PROTO_STREAM_PARALLEL_VERSION_NUM)));

I think we can improve this error message as: "requested
proto_version=%d does not support parallel streaming mode, need %d or
higher".

2.
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -3184,7 +3184,7 @@ SELECT pid, wait_event_type, wait_event FROM
pg_stat_activity WHERE wait_event i
   
   
OID of the relation that the worker is synchronizing; null for the
-   main apply worker
+   main apply worker and the apply parallel worker
   
  

This and other changes in monitoring.sgml refers the workers as "apply
parallel worker". Isn't it better to use parallel apply worker as we
are using at other places in the patch? But, I have another question,
do we really need to display entries for parallel apply workers in
pg_stat_subscription if it doesn't have any meaningful information? I
think we can easily avoid it in pg_stat_get_subscription by checking
apply_leader_pid.

3.
ApplyWorkerMain()
{
...
...
+
+ if (server_version >= 16 &&
+ MySubscription->stream == SUBSTREAM_PARALLEL)
+ options.proto.logical.streaming = pstrdup("parallel");

After deciding here whether the parallel streaming mode is enabled or
not, we recheck the same thing in apply_handle_stream_abort() and
parallel_apply_can_start(). In parallel_apply_can_start(), we do it
via two different checks. How about storing this information say in
structure MyLogicalRepWorker in ApplyWorkerMain() and then use it at
other places?

-- 
With Regards,
Amit Kapila.

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-22 Thread Fujii Masao





On 2022/09/22 16:43, Michael Paquier wrote:

Added that part before pg_backup_stop() now where it makes sense with
the refactoring.


I have put my hands on 0001, and finished with the attached, that
includes many fixes and tweaks.  Some of the variable renames felt out
of place, while some comments were overly verbose for their purpose,
though for the last part we did not lose any information in the last
version proposed.


Thanks for updating the patch! This looks better to me.

+   MemSet(backup_state, 0, sizeof(BackupState));
+   MemSet(backup_state->name, '\0', sizeof(backup_state->name));

The latter MemSet() is not necessary because the former already
resets that with zero, is it?

+   pfree(tablespace_map);
+   tablespace_map = NULL;
+   }
+
+   tablespace_map = makeStringInfo();

tablespace_map doesn't need to be reset to NULL here.

/* Free structures allocated in TopMemoryContext */
-   pfree(label_file->data);
-   pfree(label_file);

+   pfree(backup_label->data);
+   pfree(backup_label);
+   backup_label = NULL;

This source comment is a bit misleading, isn't it? Because the memory
for backup_label is allocated under the memory context other than
TopMemoryContext.

+#include "access/xlog.h"
+#include "access/xlog_internal.h"
+#include "access/xlogbackup.h"
+#include "utils/memutils.h"

Seems "utils/memutils.h" doesn't need to be included.

+   XLByteToSeg(state->startpoint, stopsegno, wal_segment_size);
+   XLogFileName(stopxlogfile, state->starttli, stopsegno, 
wal_segment_size);
+   appendStringInfo(result, "STOP WAL LOCATION: %X/%X (file %s)\n",
+
LSN_FORMAT_ARGS(state->startpoint), stopxlogfile);

state->stoppoint and state->stoptli should be used instead of
state->startpoint and state->starttli?

+   pfree(tablespace_map);
+   tablespace_map = NULL;
+   pfree(backup_state);
+   backup_state = NULL;

It's harmless to set tablespace_map and backup_state to NULL after pfree(),
but it's also unnecessary at least here.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread Amit Kapila

On Thu, Sep 22, 2022 at 1:37 PM kuroda.hay...@fujitsu.com
 wrote:
>
> ===
> applyparallelworker.c
>
> 03. declaration
>
> ```
> +/*
> + * Is there a message pending in parallel apply worker which we need to
> + * receive?
> + */
> +volatile bool ParallelApplyMessagePending = false;
> ```
>
> I checked other flags that are set by signal handlers, their datatype seemed 
> to be sig_atomic_t.
> Is there any reasons that you use normal bool? It should be changed if not.
>

It follows the logic similar to ParallelMessagePending. Do you see any
problem with it?

> 04. HandleParallelApplyMessages()
>
> ```
> +   if (winfo->error_mq_handle == NULL)
> +   continue;
> ```
>
> a.
> I was not sure when the cell should be cleaned. Currently we clean up 
> ParallelApplyWorkersList() only in the parallel_apply_start_worker(),
> but we have chances to remove such a cell like HandleParallelApplyMessages() 
> or HandleParallelApplyMessage(). How do you think?
>

Note that HandleParallelApply* are invoked during interrupt handling,
so it may not be advisable to remove it there.

>
> 12. ConfigureNamesInt
>
> ```
> +   {
> +   {"max_parallel_apply_workers_per_subscription",
> +   PGC_SIGHUP,
> +   REPLICATION_SUBSCRIBERS,
> +   gettext_noop("Maximum number of parallel apply 
> workers per subscription."),
> +   NULL,
> +   },
> +   _parallel_apply_workers_per_subscription,
> +   2, 0, MAX_BACKENDS,
> +   NULL, NULL, NULL
> +   },
> ```
>
> This parameter can be changed by pg_ctl reload, so the following corner case 
> may be occurred.
> Should we add a assign hook to handle this? Or, can we ignore it?
>

I think we can ignore this as it will eventually start respecting the threshold.

-- 
With Regards,
Amit Kapila.

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread houzj.f...@fujitsu.com

On Thursday, September 22, 2022 4:08 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Thanks for updating the patch! Followings are comments for v33-0001.

Thanks for the comments.

> 04. HandleParallelApplyMessages()
> 
> ```
> +   if (winfo->error_mq_handle == NULL)
> +   continue;
> ```
> 
> a.
> I was not sure when the cell should be cleaned. Currently we clean up
> ParallelApplyWorkersList() only in the parallel_apply_start_worker(), but we
> have chances to remove such a cell like HandleParallelApplyMessages() or
> HandleParallelApplyMessage(). How do you think?

HandleParallelApplyxx functions are signal callback functions which I think
are unsafe to cleanup the list cells that may be in use before entering
these signal callback functions.


> 
> 05. parallel_apply_setup_worker()
> 
> ``
> +   if (launched)
> +   {
> +   ParallelApplyWorkersList = lappend(ParallelApplyWorkersList,
> winfo);
> +   }
> ```
> 
> {} should be removed.

I think this style is fine and this was also suggested to be consistent with the
else{} part.


> 
> 06. parallel_apply_wait_for_xact_finish()
> 
> ```
> +   /* If any workers have died, we have failed. */
> ```
> 
> This function checked only about a parallel apply worker, so the comment
> should be "if worker has..."?

The comments seem clear to me as it's a general comment.

Best regards,
Hou zj

Re: [RFC] building postgres with meson - v13

2022-09-22 Thread Andrew Dunstan



On 2022-09-22 Th 01:57, Andres Freund wrote:
> Hi,
>
> On 2022-09-21 09:46:30 -0700, Andres Freund wrote:
>> I'm planning to commit this today, unless somebody wants to argue against
>> that.
> And done!
>
> Changes:
> - fixed a few typos (thanks Thomas)
> - less duplication in the CI tasks
> - removed an incomplete implementation of the target for abbrevs.txt - do we
>   even want to have that?
> - plenty hand wringing on my part
>
>
> I also rebased my meson git tree, which still has plenty additional test
> platforms (netbsd, openbsd, debian sid, fedora rawhide, centos 8, centos 7,
> opensuse tumbleweed), but without the autoconf versions of those targets.  I
> also added a commit that translates most of the CompilerWarnings task to
> meson.  Still need to add a headerscheck / cpluspluscheck target.
>

Great. Now I'll start on buildfarm support. Given my current
commitments, this will take me a while, but I hope to have a working
client by about the beginning of November.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

RE: Perform streaming logical transactions by background workers and parallel apply

2022-09-22 Thread kuroda.hay...@fujitsu.com

Dear Wang,

Thanks for updating the patch! Followings are comments for v33-0001.

===
libpqwalreceiver.c

01. inclusion

```
+#include "catalog/pg_subscription.h"
```

We don't have to include it because the analysis of parameters is done at 
caller.

===
launcher.c

02. logicalrep_worker_launch()

```
+   /*
+* Return silently if the number of parallel apply workers reached the
+* limit per subscription.
+*/
+   if (is_subworker && nparallelapplyworkers >= 
max_parallel_apply_workers_per_subscription)
```

a. 
I felt that it might be kind if we output some debug messages.

b.
The if statement seems to be more than 80 characters. You can move to new line 
around "nparallelapplyworkers >= ...".


===
applyparallelworker.c

03. declaration

```
+/*
+ * Is there a message pending in parallel apply worker which we need to
+ * receive?
+ */
+volatile bool ParallelApplyMessagePending = false;
```

I checked other flags that are set by signal handlers, their datatype seemed to 
be sig_atomic_t.
Is there any reasons that you use normal bool? It should be changed if not.

04. HandleParallelApplyMessages()

```
+   if (winfo->error_mq_handle == NULL)
+   continue;
```

a.
I was not sure when the cell should be cleaned. Currently we clean up 
ParallelApplyWorkersList() only in the parallel_apply_start_worker(),
but we have chances to remove such a cell like HandleParallelApplyMessages() or 
HandleParallelApplyMessage(). How do you think?

b.
Comments should be added even if we keep this, like "exited worker, skipped".

```
+   else
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("lost connection to the leader 
apply worker")));
```

c.
This function is called on the leader apply worker, so the hint should be "lost 
connection to the parallel apply worker".

05. parallel_apply_setup_worker()

``
+   if (launched)
+   {
+   ParallelApplyWorkersList = lappend(ParallelApplyWorkersList, 
winfo);
+   }
```

{} should be removed.


06. parallel_apply_wait_for_xact_finish()

```
+   /* If any workers have died, we have failed. */
```

This function checked only about a parallel apply worker, so the comment should 
be "if worker has..."?

===
worker.c

07. handle_streamed_transaction()

```
+ * For non-streamed transactions, returns false;
```

"returns false;" -> "returns false"

apply_handle_commit_prepared(), apply_handle_abort_prepared()

These functions are not expected that parallel worker calls
so I think Assert() should be added.

08. UpdateWorkerStats()

```
-static void
+void
 UpdateWorkerStats(XLogRecPtr last_lsn, TimestampTz send_time, bool reply)
```

This function is called only in worker.c, should be static.

09. subscription_change_cb()

```
-static void
+void
 subscription_change_cb(Datum arg, int cacheid, uint32 hashvalue)
```

This function is called only in worker.c, should be static.

10. InitializeApplyWorker()

```
+/*
+ * Initialize the database connection, in-memory subscription and necessary
+ * config options.
+ */
 void
-ApplyWorkerMain(Datum main_arg)
+InitializeApplyWorker(void)
```

Some comments should be added about this is a common part of leader and 
parallel apply worker.

===
logicalrepworker.h

11. declaration

```
extern PGDLLIMPORT volatile bool ParallelApplyMessagePending;
```

Please refer above comment.

===
guc_tables.c

12. ConfigureNamesInt

```
+   {
+   {"max_parallel_apply_workers_per_subscription",
+   PGC_SIGHUP,
+   REPLICATION_SUBSCRIBERS,
+   gettext_noop("Maximum number of parallel apply workers 
per subscription."),
+   NULL,
+   },
+   _parallel_apply_workers_per_subscription,
+   2, 0, MAX_BACKENDS,
+   NULL, NULL, NULL
+   },
```

This parameter can be changed by pg_ctl reload, so the following corner case 
may be occurred.
Should we add a assign hook to handle this? Or, can we ignore it?

1. set max_parallel_apply_workers_per_subscription to 4.
2. start replicating two streaming transactions.
3. commit transactions
=== Two parallel workers will be remained ===
4. change max_parallel_apply_workers_per_subscription to 3
5. We expected that only one worker remains, but two parallel workers remained. 
  It will be not stopped until another streamed transaction is started and 
committed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

RE: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER

2022-09-22 Thread Andrey Chudnovskiy

We can support both passing the token from an upstream client and libpq 
implementing OAUTH2 protocol to obtaining one.

Libpq implementing OAUTHBEARER is needed for community/3rd party tools to have 
user-friendly authentication experience:
1. For community client tools, like pg_admin, psql etc. 
   Example experience: pg_admin would be able to open a popup dialog to 
authenticate customer and keep refresh token to avoid asking the user 
frequently.
2. For 3rd party connectors supporting generic OAUTH with any provider. Useful 
for datawiz clients, like Tableau or ETL tools. Those can support both user and 
client OAUTH flows.

Libpq passing toked directly from an upstream client is useful in other 
scenarios:
1. Enterprise clients, built with .Net / Java and using provider-specific 
authentication libraries, like MSAL for AAD. Those can also support more 
advance provider-specific token acquisition flows.
2. Resource-tight (like IoT) clients. Those can be compiled without optional 
libpq flag not including the iddawc or other dependency.

Thanks!
Andrey.

-Original Message-
From: Jacob Champion  
Sent: Wednesday, September 21, 2022 9:03 AM
To: mahendrakar s 
Cc: pgsql-hack...@postgresql.org; smilingsa...@gmail.com; and...@anarazel.de; 
Andrey Chudnovskiy ; Mahendrakar Srinivasarao 

Subject: [EXTERNAL] Re: [PoC] Federated Authn/z with OAUTHBEARER

[You don't often get email from jchamp...@timescale.com. Learn why this is 
important at https://aka.ms/LearnAboutSenderIdentification ]

On Tue, Sep 20, 2022 at 4:19 PM Jacob Champion  wrote:
> > 2. Add support to pass on the OAuth bearer token. In this
> > obtaining the bearer token is left to 3rd party application or user.
> >
> > ./psql -U  -d 'dbname=postgres 
> > oauth_client_id= oauth_bearer_token=
>
> This hurts, but I think people are definitely going to ask for it, 
> given the frightening practice of copy-pasting these (incredibly 
> sensitive
> secret) tokens all over the place...

After some further thought -- in this case, you already have an opaque Bearer 
token (and therefore you already know, out of band, which provider needs to be 
used), you're willing to copy-paste it from whatever service you got it from, 
and you have an extension plugged into Postgres on the backend that verifies 
this Bearer blob using some procedure that Postgres knows nothing about.

Why do you need the OAUTHBEARER mechanism logic at that point? Isn't that 
identical to a custom password scheme? It seems like that could be handled 
completely by Samay's pluggable auth proposal.

--Jacob

Re: Refactor backup related code (was: Is it correct to say, "invalid data in file \"%s\"", BACKUP_LABEL_FILE in do_pg_backup_stop?)

2022-09-22 Thread Michael Paquier

On Wed, Sep 21, 2022 at 05:10:39PM +0530, Bharath Rupireddy wrote:
>> deallocate_backup_variables() is the only part of xlogbackup.c that
>> includes references of the tablespace map_and backup_label
>> StringInfos.  I would be tempted to fully decouple that from
>> xlogbackup.c/h for the time being.
> 
> There's no problem with it IMO, after all, they are backup related
> variables. And that function reduces a bit of duplicate code.

Hmm.  I'd like to disagree with this statement :)

> Added that part before pg_backup_stop() now where it makes sense with
> the refactoring.

I have put my hands on 0001, and finished with the attached, that
includes many fixes and tweaks.  Some of the variable renames felt out
of place, while some comments were overly verbose for their purpose,
though for the last part we did not lose any information in the last
version proposed.

As I suspected, the deallocate routine felt unnecessary, as
xlogbackup.c/h have no idea what these are.  The remark is
particularly true for the StringInfo of the backup_label file: for
basebackup.c we need to build it when sending base.tar and in
xlogfuncs.c we need it only at the backup stop phase.  The code was
actually a bit wrong, because free-ing StringInfos requires to free
its ->data and then the main object (stringinfo.h explains that).  My
tweaks have shaved something like 10%~15% of the patch, while making
it IMO more readable.

A second issue I had was with the build function, and again it seemed
much cleaner to let the routine do the makeStringInfo() and return the
result.  This is not the most popular routine ever, but this reduces
the workload of the caller of build_backup_content().

So, opinions?
--
Michael
From 22216c4b6b75607d45e49620264b1af606396bd4 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 22 Sep 2022 16:41:55 +0900
Subject: [PATCH v11] Refactor backup related code

Refactor backup related code, advantages of doing so are following:

1) backup state is more structured now - all in a single structure,
callers can create backup_label contents whenever required, either
during the pg_backup_start or the pg_backup_stop or in between.
2) no parsing of backup_label file lines now, no error checking
for invalid parsing.
3) backup_label and history file contents have most of the things
in common, they can now be created within a single function.
4) makes backup related code extensible and readable.

This introduces new source files xlogbackup.c/.h for backup related
code and adds the new code in there. The xlog.c file has already grown
to ~9000 LOC (as of this time). Eventually, we would want to move
all the backup related code from xlog.c, xlogfuncs.c, elsewhere to
here.

Author: Bharath Rupireddy
Reviewed-by: Michael Paquier
Reviewed-by: Fujii Masao
Discussion: https://www.postgresql.org/message-id/CALj2ACVqNUEXkaMKyHHOdvScfN9E4LuCWsX_R-YRNfzQ727CdA%40mail.gmail.com
---
 src/include/access/xlog.h   |  10 +-
 src/include/access/xlogbackup.h |  42 +
 src/backend/access/transam/Makefile |   1 +
 src/backend/access/transam/xlog.c   | 224 
 src/backend/access/transam/xlogbackup.c |  81 +
 src/backend/access/transam/xlogfuncs.c  |  96 ++
 src/backend/backup/basebackup.c |  44 +++--
 7 files changed, 298 insertions(+), 200 deletions(-)
 create mode 100644 src/include/access/xlogbackup.h
 create mode 100644 src/backend/access/transam/xlogbackup.c

diff --git a/src/include/access/xlog.h b/src/include/access/xlog.h
index 3dbfa6b593..dce265098e 100644
--- a/src/include/access/xlog.h
+++ b/src/include/access/xlog.h
@@ -11,6 +11,7 @@
 #ifndef XLOG_H
 #define XLOG_H

+#include "access/xlogbackup.h"
 #include "access/xlogdefs.h"
 #include "access/xlogreader.h"
 #include "datatype/timestamp.h"
@@ -277,11 +278,10 @@ typedef enum SessionBackupState
 	SESSION_BACKUP_RUNNING,
 } SessionBackupState;

-extern XLogRecPtr do_pg_backup_start(const char *backupidstr, bool fast,
-	 TimeLineID *starttli_p, StringInfo labelfile,
-	 List **tablespaces, StringInfo tblspcmapfile);
-extern XLogRecPtr do_pg_backup_stop(char *labelfile, bool waitforarchive,
-	TimeLineID *stoptli_p);
+extern void do_pg_backup_start(const char *backupidstr, bool fast,
+			   List **tablespaces, BackupState *state,
+			   StringInfo tblspcmapfile);
+extern void do_pg_backup_stop(BackupState *state, bool waitforarchive);
 extern void do_pg_abort_backup(int code, Datum arg);
 extern void register_persistent_abort_backup_handler(void);
 extern SessionBackupState get_backup_status(void);
diff --git a/src/include/access/xlogbackup.h b/src/include/access/xlogbackup.h
new file mode 100644
index 00..ccdfefe117
--- /dev/null
+++ b/src/include/access/xlogbackup.h
@@ -0,0 +1,42 @@
+/*-
+ *
+ * xlogbackup.h
+ *		Definitions for internals of base backups.
+ *
+ * Portions Copyright (c)

Re: Multi-insert related comment in CopyFrom()

2022-09-22 Thread Etsuro Fujita

On Wed, Sep 21, 2022 at 4:39 PM Etsuro Fujita  wrote:
> While working on the “Fast COPY FROM based on batch insert” patch, I
> noticed this:
>
> else if (proute != NULL && resultRelInfo->ri_TrigDesc != NULL &&
>  resultRelInfo->ri_TrigDesc->trig_insert_new_table)
> {
> /*
>  * For partitioned tables we can't support multi-inserts when there
>  * are any statement level insert triggers. It might be possible to
>  * allow partitioned tables with such triggers in the future, but for
>  * now, CopyMultiInsertInfoFlush expects that any before row insert
>  * and statement level insert triggers are on the same relation.
>  */
> insertMethod = CIM_SINGLE;
> }
>
> I think there is a thinko in the comment; “before” should be after.
> Patch attached.

Pushed.

Best regards,
Etsuro Fujita

Re: [PoC] Improve dead tuple storage for lazy vacuum

2022-09-22 Thread Masahiko Sawada

On Thu, Sep 22, 2022 at 1:46 PM John Naylor
 wrote:
>
>
> On Thu, Sep 22, 2022 at 1:01 AM Nathan Bossart  
> wrote:
> >
> > On Wed, Sep 21, 2022 at 01:17:21PM +0700, John Naylor wrote:
> >
> > > In short, this code needs to be lower level so that we still have full
> > > control while being portable. I will work on this, and also the related
> > > code for node dispatch.
> >
> > Is it possible to use approach #2 here, too?  AFAICT space is allocated for
> > all of the chunks, so there wouldn't be any danger in searching all them
> > and discarding any results >= node->count.
>
> Sure, the caller could pass the maximum node capacity, and then check if the 
> returned index is within the range of the node count.
>
> > Granted, we're depending on the
> > number of chunks always being a multiple of elements-per-vector in order to
> > avoid the tail path, but that seems like a reasonably safe assumption that
> > can be covered with comments.
>
> Actually, we don't need to depend on that at all. When I said "junk" above, 
> that can be any bytes, as long as we're not reading off the end of allocated 
> memory. We'll never do that here, since the child pointers/values follow. In 
> that case, the caller can hard-code the  size (it would even happen to work 
> now to multiply rt_node_kind by 16, to be sneaky). One thing I want to try 
> soon is storing fewer than 16/32 etc entries, so that the whole node fits 
> comfortably inside a power-of-two allocation. That would allow us to use aset 
> without wasting space for the smaller nodes, which would be faster and 
> possibly would solve the fragmentation problem Andres referred to in
>
> https://www.postgresql.org/message-id/20220704220038.at2ane5xkymzzssb%40awork3.anarazel.de
>
> While on the subject, I wonder how important it is to keep the chunks in the 
> small nodes in sorted order. That adds branches and memmove calls, and is the 
> whole reason for the recent "pg_lfind_ge" function.

Good point. While keeping the chunks in the small nodes in sorted
order is useful for visiting all keys in sorted order, additional
branches and memmove calls could be slow.

Regards,

-- 
Masahiko Sawada
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

96 matches

Mail list logo