date:20230524

On Tue, May 23, 2023 at 06:37:59PM -0700, Andres Freund wrote:
> This feels more like a matter of taste to me than anything.

Yup, it is.

> At least some of
> the files touched in the patch use optarg, opterr etc. - which are declared in
> pg_getopt.h. Making it reasonable to directly include pg_getopt.h.

getopt_long.h is included in 21 places of src/bin/, with all of them
touching optarg, while only four of them include pg_getopt.h.  So
removing them as suggested makes sense.  I agree with the tasting
matter as well, still there is also a consistency matter.
--
Michael


signature.asc
Description: PGP signature

Re: ALTER TABLE SET ACCESS METHOD on partitioned tables

On Mon, Apr 24, 2023 at 07:18:54PM -0500, Justin Pryzby wrote:
> On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote:
>> Actually .. I think it'd be a mistake if the relam needed to be
>> interpretted differently for partitioned tables than other relkinds.
>>
>> I updated the patch to allow intermediate partitioned tables to inherit
>> relam from their parent.
>
> Michael ?

Sorry for dropping the subject for so long.  I have spent some time
looking at the patch.  Here are a few comments.

pg_class.h includes the following:

/*
 * Relation kinds that support tablespaces: All relation kinds with storage
 * support tablespaces, except that we don't support moving sequences around
 * into different tablespaces.  Partitioned tables and indexes don't have
 * physical storage, but they have a tablespace settings so that their
 * children can inherit it.
 */
#define RELKIND_HAS_TABLESPACE(relkind) \
((RELKIND_HAS_STORAGE(relkind) || RELKIND_HAS_PARTITIONS(relkind)) \
 && (relkind) != RELKIND_SEQUENCE)
[...]
/*
 * Relation kinds with a table access method (rd_tableam).  Although sequences
 * use the heap table AM, they are enough of a special case in most uses that
 * they are not included here.
 */
#define RELKIND_HAS_TABLE_AM(relkind) \
((relkind) == RELKIND_RELATION || \
 (relkind) == RELKIND_TOASTVALUE || \
 (relkind) == RELKIND_MATVIEW)

It would look much more consistent with the tablespace case if we
included partitioned tables in this case, but this refers to
rd_tableam for the relcache which we surely don't want to fill for
partitioned tables.  I guess that at minimum a comment is in order?
RELKIND_HAS_TABLE_AM() is much more spread than
RELKIND_HAS_TABLESPACE().

 * No need to add an explicit dependency for the toast table, as the
 * main table depends on it.
 */
-   if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE)
+   if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) ||
+   relkind == RELKIND_PARTITIONED_TABLE)

The comment at the top of this code block needs an update.

if (stmt->accessMethod != NULL)
+   accessMethodId = get_table_am_oid(stmt->accessMethod, false);
else if (stmt->partbound &&
+   (RELKIND_HAS_TABLE_AM(relkind) || relkind == 
RELKIND_PARTITIONED_TABLE))
{
+   /*
+* For partitions, if no access method is specified, default to the AM
+* of the parent table.
+*/
+   Assert(list_length(inheritOids) == 1);
+   accessMethodId = get_rel_relam(linitial_oid(inheritOids));
+   if (!OidIsValid(accessMethodId))
+   accessMethodId = get_table_am_oid(default_table_access_method, 
false);
}
+   else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == 
RELKIND_PARTITIONED_TABLE)
+   accessMethodId = get_table_am_oid(default_table_access_method, false);

This structure seems a bit weird.  Could it be cleaner to group the
second and third blocks together?  I would imagine:
if (accessMethod != NULL)
{
//Extract the AM defined in the statement
}
else
{
//This is a relkind that can use a default table AM.
if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE)
{
if (stmt->partbound)
{
//This is a partition, so look at what its partitioned
//table holds.
}
else
{
//No partition, grab the default.
}
}
}

+   /*
+* Only do this for partitioned tables, for which this is just a
+* catalog change.  Tables with storage are handled by Phase 3.
+*/
+   if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   ATExecSetAccessMethodNoStorage(rel, tab->newAccessMethod);

Okay, there is a parallel with tablespaces in this logic.

Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog
update even if ALTER TABLE is defined to use the same table AM as what
is currently set.  There is no need to update the relation's pg_class
entry in this case.  Be careful that InvokeObjectPostAlterHook() still
needs to be checked in this case.  Perhaps some tests should be added
in test_oat_hooks.sql?  It would be tempted to add a new SQL file for
that.

+   else if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE)
+   {
+   /* Do nothing: it's a catalog settings for partitions to inherit */
+   }
Actually, we could add an assertion telling that rd_rel->relam will
always be valid.

-   if (RELKIND_HAS_TABLE_AM(tbinfo->relkind))
+   if (RELKIND_HAS_TABLE_AM(tbinfo->relkind) ||
+   tbinfo->relkind == RELKIND_PARTITIONED_TABLE)
tableam = tbinfo->amname;
I have spent some time pondering on this particular change, concluding
that it should be OK.  It passes my tests, as well.

+-- partition hierarchies
+-- upon ALTER, new children will inherit the new am, whereas the existing
+-- children will remain untouched
 CREATE TABLE

Re: Wrong results due to missing quals

2023-05-24 Thread Richard Guo

On Thu, May 25, 2023 at 5:28 AM Tom Lane  wrote:

> I tried this and it seems to work all right: it fixes the example
> you showed while not causing any new failures.  (Doesn't address
> the broken join-removal logic you showed in the other thread,
> though.)
>
> While at it, I also changed make_restrictinfo to treat has_clone
> and is_clone as first-class citizens, to fix the dubious coding in
> equivclass.c that I mentioned at [1].

The "incompatible_relids" idea is a stroke of genius.  I reviewed the
patch and did not find any problem.  So big +1 to the patch.

Thanks
Richard

Re: PG 16 draft release notes ready

2023-05-24 Thread David Rowley

On Thu, 25 May 2023 at 05:45, Bruce Momjian  wrote:
>
> On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote:
> > > * Reduce palloc() memory overhead for all memory allocations down to 8
> > > bytes on all platforms. (Andres Freund, David Rowley)
> > >
> > > This allows more efficient use of memory and is especially useful in
> > > queries which perform operations (such as sorting or hashing) that
> > > require more than work_mem.
> >
> > Well, this would go in the source code section, but it seems too
> > internal and global to mention.
>
> What was the previous memory allocation overhead?

On 64-bit builds, it was 16 bytes for AllocSet contexts, 24 bytes for
generation contexts and 16 bytes for slab contexts.

David

Re: PostgreSQL 16 Beta 1 release announcement draft

Hi,

On 2023-05-24 23:30:58 -0400, Jonathan S. Katz wrote:
> > Ah, OK, that's why I didn't grok it. I read through the first message
> > in[1] and definitely agree it should be in the announcement. How about:
> > 
> > "PostgreSQL 16 also shows up to a 300% improvement when concurrently
> > loading data with `COPY`"
> 
> I currently have it as the below in the release announcement. If it you send
> any suggested updates, I can try to put them in before release:
> 
> PostgreSQL 16 can also improve the performance of concurrent bulk loading of
> data using [`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to
> a 300%.

It also speeds up concurrent loading when not using COPY, just to a lesser
degree. But I can't come up with a concise phrasing for that right now...

Greetings,

Andres Freund

Re: PostgreSQL 16 Beta 1 release announcement draft


On 5/24/23 11:30 PM, Jonathan S. Katz wrote:

On 5/24/23 9:20 PM, Jonathan S. Katz wrote:


I currently have it as the below in the release announcement. If it you 
send any suggested updates, I can try to put them in before release:


PostgreSQL 16 can also improve the performance of concurrent bulk 
loading of data using 
[`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%.


(without the "a 300%" typo).

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature

Re: PostgreSQL 16 Beta 1 release announcement draft


On 5/24/23 9:20 PM, Jonathan S. Katz wrote:

On 5/24/23 8:04 PM, Andres Freund wrote:

Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:

On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned 
here as

well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a 
quick

grep -- which commit/thread is this?


It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

 hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de


It's in the release notes as:
   Allow more efficient addition of heap and index pages (Andres Freund)


Ah, OK, that's why I didn't grok it. I read through the first message 
in[1] and definitely agree it should be in the announcement. How about:


"PostgreSQL 16 also shows up to a 300% improvement when concurrently 
loading data with `COPY`"


I currently have it as the below in the release announcement. If it you 
send any suggested updates, I can try to put them in before release:


PostgreSQL 16 can also improve the performance of concurrent bulk 
loading of data using 
[`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%.


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Thomas Munro

On Thu, May 25, 2023 at 12:34 PM Tom Lane  wrote:
> Dunno about antique MIPS.  I think there's still some interest in
> not-antique 32-bit MIPS; I have some current-production routers
> with such CPUs.  (Sadly, they don't have enough storage to do
> anything useful with, or I'd think about repurposing one for
> buildfarm.)

FWIW "development of the MIPS architecture has ceased"[1].  (Clearly
there are living ISAs that continue either its spirit or its ...
instructions, but they aren't calling themselves MIPS.)

[1] https://en.wikipedia.org/wiki/MIPS_architecture

Re: PG 16 draft release notes ready

On Thu, May 25, 2023 at 08:31:29AM +0700, John Naylor wrote:
> 
> On Wed, May 24, 2023 at 8:58 PM Bruce Momjian  wrote:
> >
> > Okay, items split into sections and several merged.  I left the
> > CPU-specific parts in Source Code, and moved the rest into a merged item
> > in General Performance, but moved the JSON item to Data Types.
> 
> It looks like it got moved to Functions actually?
> 
> > > The last one refers to new internal functions, so it could stay in source
> code.
> > > (Either way, we don't want to imply that arrays of SQL types are
> accelerated
> > > this way, it's so far only for internal arrays.)
> >
> > Good point.  I called them "C arrays" but it it into the General
> > Performance item.
> 
> Looks good to me, although...
> 
> > Allow xid/subxid searches and ASCII string detection to use vector 
> > operations
> (Nathan Bossart)
> 
> Nathan wrote the former, I did the latter.
> 
> Thanks for working on this!

Ugh, I have to remember to merge authors when I merge items --- fixed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: PG 16 draft release notes ready

2023-05-24 Thread John Naylor

On Wed, May 24, 2023 at 8:58 PM Bruce Momjian  wrote:
>
> Okay, items split into sections and several merged.  I left the
> CPU-specific parts in Source Code, and moved the rest into a merged item
> in General Performance, but moved the JSON item to Data Types.

It looks like it got moved to Functions actually?

> > The last one refers to new internal functions, so it could stay in
source code.
> > (Either way, we don't want to imply that arrays of SQL types are
accelerated
> > this way, it's so far only for internal arrays.)
>
> Good point.  I called them "C arrays" but it it into the General
> Performance item.

Looks good to me, although...

> Allow xid/subxid searches and ASCII string detection to use vector
operations (Nathan Bossart)

Nathan wrote the former, I did the latter.

Thanks for working on this!

--
John Naylor
EDB: http://www.enterprisedb.com

Re: Proposal: Removing 32 bit support starting from PG17++

Hi,

On 2023-05-24 20:34:38 -0400, Tom Lane wrote:
> Thomas Munro  writes:
> > On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
> >> You'll no doubt be glad to hear that I'll be retiring chickadee
> >> in the very near future.
>
> > . o O { I guess chickadee might have been OK anyway, along with e.g.
> > antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
> > any collector is likely to have still running, because they only had
> > one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
> > they had 64 bit load/store instructions, those couldn't be 'divided'
> > by an interrupt, so scheduler switches shouldn't be able to tear them,
> > AFAIK? }
>
> PA-RISC can probably do tear-free 8-byte reads, but Andres also
> wanted to raise the bar enough to include 32-bit atomic instructions,
> which PA-RISC hasn't got; the one such instruction it has is
> limited enough that you can't do much beyond building a spinlock.

Yes, I looked at some point, and it didn't seem viable to do more.


> I think there's still some interest in not-antique 32-bit MIPS; I have some
> current-production routers with such CPUs.  (Sadly, they don't have enough
> storage to do anything useful with, or I'd think about repurposing one for
> buildfarm.)

After spending a bunch more time staring at various reference manuals, it
looks to me like 32bit MIPS supports 64 bit atomics these days, via LLWP /
SCWP.

https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf
documents them as having been added to MIPS32 Release 6, from 2014.

Greetings,

Andres Freund

Re: PostgreSQL 16 Beta 1 release announcement draft


On 5/24/23 8:04 PM, Andres Freund wrote:

Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:

On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a quick
grep -- which commit/thread is this?


It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

 hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de

It's in the release notes as:
   Allow more efficient addition of heap and index pages (Andres Freund)


Ah, OK, that's why I didn't grok it. I read through the first message 
in[1] and definitely agree it should be in the announcement. How about:


"PostgreSQL 16 also shows up to a 300% improvement when concurrently 
loading data with `COPY`"


Thanks,

Jonathan

[1] https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de



OpenPGP_signature
Description: OpenPGP digital signature

Re: Proposal: Removing 32 bit support starting from PG17++

Andres Freund  writes:
> On 2023-05-24 19:51:22 -0400, Tom Lane wrote:
>> So dropping PA-RISC altogether should probably happen for v17, maybe even
>> v16.

> Definitely for 17 - not sure if we have much to gain by doing it in 16.

I'm just thinking that we'll have no way to test it.  I wouldn't advocate
such a change in released branches; but I think it'd be within policy
still for v16, and that would give us one less year of claimed support
for an arch we can't test.

regards, tom lane

Re: Proposal: Removing 32 bit support starting from PG17++

Hi,

On 2023-05-24 19:51:22 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> >> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
> >> platforms?  I'd be on board with this if so, but it sounds a bit
> >> optimistic.
> 
> > ...
> 
> > So it looks like the only certain problem is PA-RISC - which I personally
> > wouldn't include in "relevant" :), with some evaluation needed for 32bit 
> > mips
> > and old arms.
> 
> You'll no doubt be glad to hear that I'll be retiring chickadee
> in the very near future.

Heh, I have to admit, I am.

> So dropping PA-RISC altogether should probably happen for v17, maybe even
> v16.

Definitely for 17 - not sure if we have much to gain by doing it in 16. The
likelihood that somebody will find a PA-RISC specific bug, after we dropped
support for 15, is pretty low, I think.

> Seems like we should poke into ARM more closely, though.

Looks like the necessary support was added in armv6k and armv7a:

https://developer.arm.com/documentation/dui0489/i/arm-and-thumb-instructions/strex
  ARM STREXB, STREXD, and STREXH are available in ARMv6K and above.
  All these 32-bit Thumb instructions are available in ARMv6T2 and above, 
except that STREXD is not available in the ARMv7-M architecture.

ARMv7-M is for microcontrollers without an MMU, so it's not going to be used
for postgres.

The arm buildfarm machines (using the architecture names from there) we have
are:

armv6l: chipmunk
ARMv7: mereswine, gull, grison
arm64: dikkop, sifaka, indri

turako says it's armv7l, but it seems actually to target aarch64.

At least for type of machine available on the buildfarm, it looks like we
actually should be good. They all appear to already be supporting 64bit
atomics, without needing further compiler flags.

Greetings,

Andres Freund

Re: Proposal: Removing 32 bit support starting from PG17++

Thomas Munro  writes:
> On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
>> You'll no doubt be glad to hear that I'll be retiring chickadee
>> in the very near future.

> . o O { I guess chickadee might have been OK anyway, along with e.g.
> antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
> any collector is likely to have still running, because they only had
> one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
> they had 64 bit load/store instructions, those couldn't be 'divided'
> by an interrupt, so scheduler switches shouldn't be able to tear them,
> AFAIK? }

PA-RISC can probably do tear-free 8-byte reads, but Andres also
wanted to raise the bar enough to include 32-bit atomic instructions,
which PA-RISC hasn't got; the one such instruction it has is
limited enough that you can't do much beyond building a spinlock.

Dunno about antique MIPS.  I think there's still some interest in
not-antique 32-bit MIPS; I have some current-production routers
with such CPUs.  (Sadly, they don't have enough storage to do
anything useful with, or I'd think about repurposing one for
buildfarm.)

regards, tom lane

Re: Question about error message in auth.c

On Wed, May 24, 2023 at 02:12:23PM -0400, Dave Cramer wrote:
> The last piece of information is the encryption state. However when an SSL
> connection fails to authenticate the state is not encrypted.
>
> When would it ever be encrypted if authentication fails ?

I am not sure to follow.  Under a SSL connection things should be
encrypted.  Or perhaps that's something related to hostssl and/or
hostnossl?

Back to the point, the SSL handshake happens before any authentication
attempt and any HBA resolution, so a connection could be encrypted
before authentication gets rejected.  The error path you are pointing
at would happen after the SSL handshake is done.  For instance, take
an HBA entry like that:
# TYPE  DATABASEUSER   ADDRESS METHOD
hostall all127.0.0.1/32reject

Then, attempting a connection with sslmode=prefer, one can see the
difference:
$ psql -h 127.0.0.1 -d postgres -U postgres
psql: error: connection to server at "127.0.0.1", port 5432 failed:
FATAL:  pg_hba.conf rejects connection for host "127.0.0.1", user
"postgres", database "postgres", SSL encryption
connection to server at "127.0.0.1", port 5432 failed: FATAL:
pg_hba.conf rejects connection for host "127.0.0.1", user "postgres",
database "postgres", no encryption
--
Michael

signature.asc
Description: PGP signature

Re: Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Thomas Munro

On Thu, May 25, 2023 at 11:51 AM Tom Lane  wrote:
> Andres Freund  writes:
> > On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> > So it looks like the only certain problem is PA-RISC - which I personally
> > wouldn't include in "relevant" :), with some evaluation needed for 32bit 
> > mips
> > and old arms.
>
> You'll no doubt be glad to hear that I'll be retiring chickadee
> in the very near future.

. o O { I guess chickadee might have been OK anyway, along with e.g.
antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that
any collector is likely to have still running, because they only had
one CPU (unlike their Vogon-spaceship-sized siblings).  As long as
they had 64 bit load/store instructions, those couldn't be 'divided'
by an interrupt, so scheduler switches shouldn't be able to tear them,
AFAIK? }

Re: PostgreSQL 16 Beta 1 release announcement draft

Hi,

On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote:
> On 5/24/23 5:28 PM, Andres Freund wrote:
> >
> > I think the relation extension improvements ought to be mentioned here as
> > well? Up to 3x faster concurrent data load with COPY seems practically
> > relevant.
>
> I missed that -- not sure I'm finding it in the release notes with a quick
> grep -- which commit/thread is this?

It was split over quite a few commits, the one improving COPY most
significantly is

commit 00d1e02be24987180115e371abaeb84738257ae2
Author: Andres Freund 
Date:   2023-04-06 16:35:21 -0700

hio: Use ExtendBufferedRelBy() to extend tables more efficiently

Relevant thread: 
https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de

It's in the release notes as:
  Allow more efficient addition of heap and index pages (Andres Freund)

Greetings,

Andres Freund

Re: PostgreSQL 16 Beta 1 release announcement draft


On 5/24/23 5:28 PM, Andres Freund wrote:


I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.


I missed that -- not sure I'm finding it in the release notes with a 
quick grep -- which commit/thread is this?


But yes this does sound like something that should be included, I just 
want to read upon it.


Thanks,

Jonathan



OpenPGP_signature
Description: OpenPGP digital signature

Re: Proposal: Removing 32 bit support starting from PG17++

Andres Freund  writes:
> On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
>> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
>> platforms?  I'd be on board with this if so, but it sounds a bit
>> optimistic.

> ...

> So it looks like the only certain problem is PA-RISC - which I personally
> wouldn't include in "relevant" :), with some evaluation needed for 32bit mips
> and old arms.

You'll no doubt be glad to hear that I'll be retiring chickadee
in the very near future.  (I'm moving/downsizing, and that machine
isn't making the cut.)  So dropping PA-RISC altogether should probably
happen for v17, maybe even v16.  Seems like we should poke into
ARM more closely, though.

regards, tom lane

Re: Proposal: Removing 32 bit support starting from PG17++

Hi,

On 2023-05-24 17:44:36 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > Dropping CPUs without native atomic operations / without a way to do 
> > tear-free
> > 8 byte reads would make several substantial performance improvements easier,
> > while not really dropping any relevant platform.
>
> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
> platforms?  I'd be on board with this if so, but it sounds a bit
> optimistic.

Combining https://wiki.postgresql.org/wiki/Atomics with our docs:

>  In general, PostgreSQL can be expected to work on these CPU architectures:
>  x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V, and PA-RISC, including
>  big-endian, little-endian, 32-bit, and 64-bit variants where applicable. It
>  is often possible to build on an unsupported CPU type by configuring with
>  --disable-spinlocks, but performance will be poor.

On x86 8 byte atomics are supported since the 586 - released in 1993. I don't
think we need to care about i386 / i486?

PPC always had it from what I can tell (the docs don't mention a minimum
version).

Sparc has supported single copy atomicity for 8 byte values since sparc v9, so
~1995 (according to wikipedia there was one V8 chip released in 1996, there's
also "LEON", a bit later, but that's "V8E", which includes the atomicity
guarantees from v9).

On s390 sufficient instructions to make 64bit atomics work natively are
supported (just using cmpxchg).

On older arm it's supported via kernel emulation - which afaict is better than
falling back to a semaphore, our current fallback. I don't currently have
access to armv7*, so I can't run a benchmark.

So the only problematic platforms are 32 bit MIPS and PA-RISC.

I'm not entirely sure whether my determination about 32 bit MIPS from back
then is actually true - I might have read the docs too narrowly back then or
they were updated since. In a newer version of the manual I see:

> The paired instructions, Load Linked and Store Conditional, can be used to
> perform an atomic read-modify-write of word or doubleword cached memory
> locations.

The word width is referenced to be 32bit earlier on the same page. And it's
documented to be true for mips32, not just mips64.

With that one can implement a 64bit cmpxchg, and with 64bit cmpxchg one can
implement a, not too efficient, tear-free read.

What gave me pause for a moment is that both clang and gcc generate calls to
external functions on 32 bit mips - but they do provide libatomic and looking
at the disassembly, there does appear to be a some non-emulated execution. But
tbh, there are so many ABIs and options that I am not sure what is what.

Reading the https://en.wikipedia.org/wiki/MIPS_architecture history part gives
me a headache: "During the mid-1990s, many new 32-bit MIPS processors for
embedded systems were MIPS II implementations because the introduction of the
64-bit MIPS III architecture in 1991 left MIPS II as the newest 32-bit MIPS
architecture until MIPS32 was introduced in 1999.[3]: 19 "

My rough understanding is that it's always doable in 64 mode, and has been
available in 32bit mode for a long time, but that it depends on the the ABI
used.

So it looks like the only certain problem is PA-RISC - which I personally
wouldn't include in "relevant" :), with some evaluation needed for 32bit mips
and old arms.

Greetings,

Andres Freund

Re: Atomic ops for unlogged LSN

On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote:
> I was a bit confused by Michael's comment as well, due to the section of code
> quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
> barrier semantics (it'd be way more expensive), and the patch does contain
> this hunk:

Thanks for the correction.  The part about _add was incorrect.

> So we indeed loose some "barrier strength" - but I think that's fine. For one,
> it's a debugging-only value. But more importantly, I don't see what reordering
> the barrier could prevent - a barrier is useful for things like sequencing two
> memory accesses to happen in the intended order - but unloggedLSN doesn't
> interact with another variable that's accessed within the ControlFileLock
> afaict.

This stuff is usually tricky enough that I am never completely sure
whether it is fine without reading again README.barrier, which is
where unloggedLSN is saved in the control file under its LWLock.
Something that I find confusing in the patch is that it does not
document the reason why this is OK.
--
Michael


signature.asc
Description: PGP signature

Re: SyncRepWaitForLSN waits for XLogFlush?

2023-05-24 Thread Tejasvi Kashi

Hi Andres,

Thanks for your reply. If I understand you correctly, you're saying that
the walsender *waits* for xlogflush, but does not *cause* it. This means
that for a sync_commit=off transaction, the xlog records won't get shipped
out to standbys until the walwriter flushes in-memory xlog contents to disk.

And furthermore, am I correct in assuming that this behaviour is different
from the buffer manager and the slru which both *cause* xlog flush up to a
certain lsn before they proceed with flushing a page to disk?

The reason I'm asking this is that I'm working on modifying the transaction
manager for my thesis project, and the pg_walinspect test is failing when I
run make check-world. So, I'm just trying to understand things to identify
the cause of this issue.

Regards,
Tej

On Wed, 24 May 2023 at 17:33, Andres Freund  wrote:

> Hi,
>
> On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote:
> > I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that
> the
> > XLOG has been flushed locally up to that location before the record is
> > shipped off to standbys?
>
> No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to
> (and
> do) call XLogFlush() separately. See e.g. the XLogFlush() call in
> RecordTransactionCommit().
>
> Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed
> would
> not lead for data to be prematurely sent out - walsender won't send data
> that
> hasn't yet been flushed. So a backend with such a spurious
> SyncRepWaitForLSN()
> would just wait until the LSN is actually flushed and *then* replicated.
>
> Any specific reason for that question?
>
> Greetings,
>
> Andres Freund
>

Re: Proposal: Removing 32 bit support starting from PG17++

On Wed, May 24, 2023 at 10:44:11AM -0400, Tom Lane wrote:
> Hans Buschmann  writes:
> > This inspired me to propose dropping 32bit support for PostgreSQL starting 
> > with PG17.
> 
> I don't think this is a great idea.  Even if Intel isn't interested,
> there'll be plenty of 32-bit left in the lower end of the market
> (think ARM, IoT, and so on).

A few examples of that are the first models of the Raspberry PIs,
which are still produced (until 2026 actually for the first model).
These rely on ARMv6 if I recall correctly, which are 32b.
--
Michael


signature.asc
Description: PGP signature

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

On Wed, May 24, 2023 at 01:03:04PM +0200, Daniel Gustafsson wrote:
> When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6
> using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to 
> drop
> 1.0.1 support during v17.  I haven't studied the patch yet but I'll have a 
> look
> at it.

Great, thanks for the help.
--
Michael


signature.asc
Description: PGP signature

Re: Atomic ops for unlogged LSN

Hi,

On 2023-05-24 08:22:08 -0400, Robert Haas wrote:
> On Tue, May 23, 2023 at 6:26 PM Michael Paquier  wrote:
> > Spinlocks provide a full memory barrier, which may not the case with
> > add_u64() or read_u64().  Could memory ordering be a problem in these
> > code paths?
> 
> It could be a huge problem if what you say were true, but unless I'm
> missing something, the comments in atomics.h say that it isn't. The
> comments for the 64-bit functions say to look at the 32-bit functions,
> and there it says:
> 
> /*
>  * pg_atomic_add_fetch_u32 - atomically add to variable
>  *
>  * Returns the value of ptr after the arithmetic operation.
>  *
>  * Full barrier semantics.
>  */
> 
> Which is probably a good thing, because I'm not sure what good it
> would be to have an operation that both reads and modifies an atomic
> variable but has no barrier semantics.

I was a bit confused by Michael's comment as well, due to the section of code
quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have
barrier semantics (it'd be way more expensive), and the patch does contain
this hunk:

> @@ -6784,9 +6775,7 @@ CreateCheckPoint(int flags)
>* unused on non-shutdown checkpoints, but seems useful to store it 
> always
>* for debugging purposes.
>*/
> - SpinLockAcquire(&XLogCtl->ulsn_lck);
> - ControlFile->unloggedLSN = XLogCtl->unloggedLSN;
> - SpinLockRelease(&XLogCtl->ulsn_lck);
> + ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN);
>  
>   UpdateControlFile();
>   LWLockRelease(ControlFileLock);

So we indeed loose some "barrier strength" - but I think that's fine. For one,
it's a debugging-only value. But more importantly, I don't see what reordering
the barrier could prevent - a barrier is useful for things like sequencing two
memory accesses to happen in the intended order - but unloggedLSN doesn't
interact with another variable that's accessed within the ControlFileLock
afaict.

> That's not to say that I entirely understand the point of this patch.
> It seems like a generally reasonable thing to do AFAICT but somehow I
> wonder if there's more to the story here, because it doesn't feel like
> it would be easy to measure the benefit of this patch in isolation.
> That's not a reason to reject it, though, just something that makes me
> curious.

I doubt it's a meaningful, if even measurable win. But removing atomic ops and
reducing shared memory space isn't a bad thing, even if there's no immediate
benefit...

Greetings,

Andres Freund

Re: Proposal: Removing 32 bit support starting from PG17++

Andres Freund  writes:
> Dropping CPUs without native atomic operations / without a way to do tear-free
> 8 byte reads would make several substantial performance improvements easier,
> while not really dropping any relevant platform.

Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit
platforms?  I'd be on board with this if so, but it sounds a bit
optimistic.

regards, tom lane

Re: Proposal: Removing 32 bit support starting from PG17++

Hi,

On 2023-05-24 14:33:06 +, Hans Buschmann wrote:
> I recently stumbled over the following Intel proposal for dropping 32bit 
> support in x86 processors. [1]

It's a proposal for something in the future. Which, even if implemented as is,
will affect future hardware, several years down the line. I don't think that's
a good reason for removing 32 bit support in postgres.

And postgres is used on non-x86 architectures...

> This inspired me to propose dropping 32bit support for PostgreSQL starting
> with PG17.
> ...
> Even if I am not a postgres hacker I suppose this could simplify things quite 
> a lot.

There's some simplification, but I don't think it'd be that much.

I do think there are code removals and simplifications that would be bigger
than dropping 32bit support.

Dropping support for effectively-obsolete compilers like sun studio (requires
random environment variables to be exported to not run out of memory) and
AIX's xlc (requires a lot of extra compiler flags to be passed in for a sane
build) would remove a fair bit of code.

Dropping CPUs without native atomic operations / without a way to do tear-free
8 byte reads would make several substantial performance improvements easier,
while not really dropping any relevant platform.

Etc.

Greetings,

Andres Freund

Re: testing dist tarballs

Andres Freund  writes:
> First thing I noticed that 'make dist' doesn't work in a vpath, failing in a
> somewhat obscure way (likely because in a vpath build the the copy from the
> source dir doesn't include GNUMakefile). Do we expect it to work?

Don't see how it could possibly be useful in a vpath, because you'd have
the real source files and the generated files in different trees.

regards, tom lane

Re: SyncRepWaitForLSN waits for XLogFlush?

Hi,

On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote:
> I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the
> XLOG has been flushed locally up to that location before the record is
> shipped off to standbys?

No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to (and
do) call XLogFlush() separately. See e.g. the XLogFlush() call in
RecordTransactionCommit().

Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed would
not lead for data to be prematurely sent out - walsender won't send data that
hasn't yet been flushed. So a backend with such a spurious SyncRepWaitForLSN()
would just wait until the LSN is actually flushed and *then* replicated.

Any specific reason for that question?

Greetings,

Andres Freund

Re: Wrong results due to missing quals

I wrote:
> ... Another idea is that maybe we need another
> RestrictInfo field that's directly a set of OJ relids that this clause
> can't be applied above.  That'd reduce clause_is_computable_at to
> basically a bms_intersect test which would be nice speed-wise.  The
> space consumption could be annoying, but I'm thinking that we might
> only have to populate the field in clone clauses, which would
> alleviate that issue.

I tried this and it seems to work all right: it fixes the example
you showed while not causing any new failures.  (Doesn't address
the broken join-removal logic you showed in the other thread,
though.)

While at it, I also changed make_restrictinfo to treat has_clone
and is_clone as first-class citizens, to fix the dubious coding in
equivclass.c that I mentioned at [1].

regards, tom lane

[1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 428ea3810f..c5cada55fb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -6521,8 +6521,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel,
 	  expr,
 	  true,
 	  false,
+	  false,
+	  false,
 	  root->qual_security_level,
 	  grouped_rel->relids,
+	  NULL,
 	  NULL);
 			if (is_foreign_expr(root, grouped_rel, expr))
 fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo);
diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README
index f1a96b8584..2ab4f3dbf3 100644
--- a/src/backend/optimizer/README
+++ b/src/backend/optimizer/README
@@ -539,14 +539,13 @@ set includes the A/B join relid which is not in the input.  However,
 if we form B/C after A/B, then both forms of the clause are applicable
 so far as that test can tell.  We have to look more closely to notice
 that the "Pbc" clause form refers to relation B which is no longer
-directly accessible.  While this check is straightforward, it's not
-especially cheap (see clause_is_computable_at()).  To avoid doing it
-unnecessarily, we mark the variant versions of a redundant clause as
-either "has_clone" or "is_clone".  When considering a clone clause,
-we must check clause_is_computable_at() to disentangle which version
-to apply at the current join level.  (In debug builds, we also Assert
-that non-clone clauses are validly computable at the current level;
-but that seems too expensive for production usage.)
+directly accessible.  While such a check could be performed using the
+per-relation RelOptInfo.nulling_relids data, it would be annoyingly
+expensive to do over and over as we consider different join paths.
+To make this simple and reliable, we compute an "incompatible_relids"
+set for each variant version (clone) of a redundant clause.  A clone
+clause should not be applied if any of the outer-join relids listed in
+incompatible_relids has already been computed below the current join.
 
 
 Optimizer Functions
diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c
index 2db1bf6448..7fa502d6e2 100644
--- a/src/backend/optimizer/path/equivclass.c
+++ b/src/backend/optimizer/path/equivclass.c
@@ -197,9 +197,12 @@ process_equivalence(PlannerInfo *root,
 make_restrictinfo(root,
   (Expr *) ntest,
   restrictinfo->is_pushed_down,
+  restrictinfo->has_clone,
+  restrictinfo->is_clone,
   restrictinfo->pseudoconstant,
   restrictinfo->security_level,
   NULL,
+  restrictinfo->incompatible_relids,
   restrictinfo->outer_relids);
 		}
 		return false;
@@ -1972,7 +1975,8 @@ create_join_clause(PlannerInfo *root,
  * clause into the regular processing, because otherwise the join will be
  * seen as a clauseless join and avoided during join order searching.
  * We handle this by generating a constant-TRUE clause that is marked with
- * required_relids that make it a join between the correct relations.
+ * the same required_relids etc as the removed outer-join clause, thus
+ * making it a join clause between the correct relations.
  */
 void
 reconsider_outer_join_clauses(PlannerInfo *root)
@@ -2001,10 +2005,13 @@ reconsider_outer_join_clauses(PlannerInfo *root)
 /* throw back a dummy replacement clause (see notes above) */
 rinfo = make_restrictinfo(root,
 		  (Expr *) makeBoolConst(true, false),
-		  true, /* is_pushed_down */
+		  rinfo->is_pushed_down,
+		  rinfo->has_clone,
+		  rinfo->is_clone,
 		  false,	/* pseudoconstant */
 		  0,	/* security_level */
 		  rinfo->required_relids,
+		  rinfo->incompatible_relids,
 		  rinfo->outer_relids);
 distribute_restrictinfo_to_rels(root, rinfo);
 			}
@@ -2026,10 +2033,13 @@ reconsider_outer_join_clauses(PlannerInfo *root)
 /* throw back a dummy replacement

Re: PostgreSQL 16 Beta 1 release announcement draft

Hi,

On 2023-05-24 13:06:30 -0400, Jonathan S. Katz wrote:
> PostgreSQL 16 Feature Highlights
> 
> 
> ### Performance
> 
> PostgreSQL 16 includes performance improvements in query execution. This 
> release
> adds more query parallelism, including allowing `FULL` and `RIGHT` joins to
> execute in parallel, and parallel execution of the `string_agg` and 
> `array_agg`
> aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in
> `SELECT DISTINCT` queries. There are also several optimizations for
> [window 
> queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS),
> improvements in lookups for `RANGE` and `LIST` partitions, and support for
> "anti-joins" in `RIGHT` and `OUTER` queries.
> 
> This release also introduces support for CPU acceleration using SIMD for both
> x86 and ARM architectures, including optimizations for processing ASCII and 
> JSON
> strings, and array and subtransaction searches. Additionally, PostgreSQL 16
> introduces [load 
> balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS)
> to libpq, the client library for PostgreSQL.

I think the relation extension improvements ought to be mentioned here as
well? Up to 3x faster concurrent data load with COPY seems practically
relevant.

Greetings,

Andres Freund

testing dist tarballs

Hi,

On 2023-05-23 14:51:03 -0400, Tom Lane wrote:
> Andres Freund  writes:
> > I guess I need to go and check how long the "release" tarball generation
> > takes...
> 
> It's quick except for the documentation-generating steps.  Maybe
> we could test that part only once?

First thing I noticed that 'make dist' doesn't work in a vpath, failing in a
somewhat obscure way (likely because in a vpath build the the copy from the
source dir doesn't include GNUMakefile). Do we expect it to work?

Besides docs, the slowest part appears to be gzip --best and then bzip2, as
those runs serially and takes 11 and 13 seconds respectively here...

The first thing I tried was:
  make -j8 dist GZIP=pigz BZIP2=pbzip2

unfortunately that results in

pigz: abort: cannot provide files in GZIP environment variable

echo GZIP=pigz >> src/Makefile.custom
echo BZIP2=pbzip2 >> src/Makefile.custom

reduces that to

real1m6.472s
user1m28.316s
sys 0m5.340s

real0m54.811s
user1m42.078s
sys 0m6.183s

still not great...

OTOH, we currently already build the docs as part of the CompilerWarnings
test. I don't think there's a reason to test that twice?

For me make distcheck currently fails:

In file included from ../../src/include/postgres.h:46,
 from hashfn.c:24:
../../src/include/utils/elog.h:79:10: fatal error: utils/errcodes.h: No such 
file or directory
   79 | #include "utils/errcodes.h"
  |  ^~
compilation terminated.
make[3]: *** [: hashfn.o] Error 1

at first I thought it was due to my use of -j8 - but it doesn't even work
without that.

That's due to MAKELEVEL:

submake-generated-headers:
ifndef NO_GENERATED_HEADERS
ifeq ($(MAKELEVEL),0)
$(MAKE) -C $(top_builddir)/src/backend generated-headers
endif
endif

So the distcheck target needs to reset MAKELEVEL=0 - unless somebody has a
better idea?

Separately, it's somewhat confusing that we include errcodes.h etc in
src/backend/utils, rather than its final location, in src/include/utils. It
works, even without perl, because copying the file doesn't require perl, it's
just generating it...

Greetings,

Andres Freund

Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)

2023-05-24 Thread Justin Pryzby

I added documentation for the SQL functions in 001.
And updated to say 17

I'm planning to set this patch as ready - it has not changed
significantly in 18 months.  Not for the first time, I've implemented a
workaround at a higher layer.

-- 
Justin
>From 917e5e36d14018b6de457ba9eafe8936c0e88c3c Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Tue, 13 Jul 2021 21:25:48 -0500
Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() ..

See also: 358a897fa, 528ac10c7
---
 doc/src/sgml/func.sgml  |  39 ++
 src/backend/utils/adt/dbsize.c  | 132 
 src/include/catalog/pg_proc.dat |  19 +
 3 files changed, 190 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 5a47ce43434..6cbf4e9aa56 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -27762,6 +27762,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset

   
 
+  
+   
+
+ pg_namespace_size
+
+pg_namespace_size ( name )
+bigint
+   
+   
+pg_namespace_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations in the namespace (schema)
+with the specified name or OID. To use this function, you must
+have CREATE privilege on the specified namespace
+or have privileges of the pg_read_all_stats role,
+unless it is the default namespace for the current database.
+   
+  
+
+  
+   
+
+ pg_am_size
+
+pg_am_size ( name )
+bigint
+   
+   
+pg_am_size ( oid )
+bigint
+   
+   
+Computes the total disk space used by relations using the access method
+with the specified name or OID.
+   
+  
+
   

 
diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c
index e5c0f1c45b6..af0955d1790 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -13,19 +13,25 @@
 
 #include 
 
+#include "access/genam.h"
 #include "access/htup_details.h"
 #include "access/relation.h"
+#include "access/table.h"
 #include "catalog/catalog.h"
 #include "catalog/namespace.h"
 #include "catalog/pg_authid.h"
 #include "catalog/pg_database.h"
+#include "catalog/pg_namespace.h"
 #include "catalog/pg_tablespace.h"
 #include "commands/dbcommands.h"
+#include "commands/defrem.h"
 #include "commands/tablespace.h"
 #include "miscadmin.h"
 #include "storage/fd.h"
 #include "utils/acl.h"
 #include "utils/builtins.h"
+#include "utils/fmgroids.h"
+#include "utils/lsyscache.h"
 #include "utils/numeric.h"
 #include "utils/rel.h"
 #include "utils/relfilenumbermap.h"
@@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS)
 	PG_RETURN_INT64(result);
 }
 
+/*
+ * Return the sum of size of relations for which the given attribute of
+ * pg_class matches the specified OID value.
+ */
+static int64
+calculate_size_attvalue(AttrNumber attnum, Oid attval)
+{
+	int64		totalsize = 0;
+	ScanKeyData skey;
+	Relation	pg_class;
+	SysScanDesc scan;
+	HeapTuple	tuple;
+
+	ScanKeyInit(&skey, attnum,
+BTEqualStrategyNumber, F_OIDEQ, attval);
+
+	pg_class = table_open(RelationRelationId, AccessShareLock);
+	scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey);
+	while (HeapTupleIsValid(tuple = systable_getnext(scan)))
+	{
+		Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple);
+		Relation	rel;
+
+		rel = try_relation_open(classtuple->oid, AccessShareLock);
+		if (!rel)
+			continue;
+
+		for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++)
+			totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum);
+
+		relation_close(rel, AccessShareLock);
+	}
+
+	systable_endscan(scan);
+	table_close(pg_class, AccessShareLock);
+	return totalsize;
+}
+
+/* Compute the size of relations in a schema (namespace) */
+static int64
+calculate_namespace_size(Oid nspOid)
+{
+	/*
+	 * User must be a member of pg_read_all_stats or have CREATE privilege for
+	 * target namespace.
+	 */
+	if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS))
+	{
+		AclResult	aclresult;
+
+		aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE);
+		if (aclresult != ACLCHECK_OK)
+			aclcheck_error(aclresult, OBJECT_SCHEMA,
+		   get_namespace_name(nspOid));
+	}
+
+	return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid);
+}
+
+Datum
+pg_namespace_size_oid(PG_FUNCTION_ARGS)
+{
+	Oid			nspOid = PG_GETARG_OID(0);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_INT64(size);
+}
+
+Datum
+pg_namespace_size_name(PG_FUNCTION_ARGS)
+{
+	Name		nspName = PG_GETARG_NAME(0);
+	Oid			nspOid = get_namespace_oid(NameStr(*nspName), false);
+	int64		size;
+
+	size = calculate_namespace_size(nspOid);
+
+	if (size < 0)
+		PG_RETURN_NULL();
+
+	PG_RETURN_I

Re: walsender performance regression due to logical decoding on standby changes

Hi,

On 2023-05-24 05:53:51 +, Zhijie Hou (Fujitsu) wrote:
> On Tuesday, May 23, 2023 1:53 AM Andres Freund  wrote:
> > On 2023-05-22 12:15:07 +, Zhijie Hou (Fujitsu) wrote:
> > > About "a backend doing logical decoding", do you mean the case when a
> > user
> > > start a backend and invoke pg_logical_slot_get_changes() to do the logical
> > > decoding ? If so, it seems the logical decoding in a backend won't be 
> > > waked
> > up
> > > by startup process because the backend won't be registered as a walsender
> > so
> > > the backend won't be found in WalSndWakeup().
> > 
> > I meant logical decoding happening inside a walsender instance.
> > 
> > 
> > > Or do you mean the deadlock between the real logical walsender and startup
> > > process ? (I might miss something) I think the logical decoding doesn't 
> > > lock
> > > the target user relation when decoding because it normally can get the
> > needed
> > > information from WAL.
> > 
> > It does lock catalog tables briefly. There's no guarantee that such locks 
> > are
> > released immediately. I forgot the details, but IIRC there's some outfuncs
> > (enum?) that intentionally delay releasing locks till transaction commit.
> 
> Thanks for the explanation !
> 
> I understand that the startup process can take lock on the catalog(when
> replaying record) which may conflict with the lock in walsender.
> 
> But in walsender, I think we only start transaction after entering
> ReorderBufferProcessTXN(), and the transaction started here will be released
> soon after processing and outputting the decoded transaction's data(as the
> comment in ReorderBufferProcessTXN() says:" all locks acquired in here to be
> released, not reassigned to the parent and we do not want any database access
> have persistent effects.").

It's possible that there's no immediate danger - but I wouldn't want to bet on
it staying that way. Think about things like streaming out large transactions
etc.  There also are potential dangers outside of plain things like locks -
there could be a recovery conflicts on the snapshot or such (although that
normally should be prevented via hot_standby_feedback or such).

Even if there's no hard deadlock issue - not waking up a
walsender-in-logical-decoding that's currently waiting because (replay_count %
N) != 0 means that the walsender might be delayed for quite a while - there
might not be any further records from the primary in the near future.

I don't think any approach purely based on record counts has any chance to
work well.  You could combine it with something else, e.g. with always waking
up in XLogPageRead() and WaitForWALToBecomeAvailable().  But then you end up
with something relatively complicated again.

Greetings,

Andres Freund

Re: memory leak in trigger handling (since PG12)

Hi,

On 2023-05-24 21:56:22 +0200, Tomas Vondra wrote:
> >> The really hard thing was determining what causes the memory leak - the
> >> simple instrumentation doesn't help with that at all. It tells you there
> >> might be a leak, but you don't know where did the allocations came from.
> >>
> >> What I ended up doing is a simple gdb script that sets breakpoints on
> >> all palloc/pfree variants, and prints info (including the backtrace) for
> >> each call on ExecutorState. And then a script that aggregate those to
> >> identify which backtraces allocated most chunks that were not freed.
> > 
> > FWIW, for things like this I found "heaptrack" to be extremely helpful.
> > 
> > E.g. for a reproducer of the problem here, it gave me the attach "flame 
> > graph"
> > of the peak memory usage - after attaching to a running backend and running 
> > an
> > UPDATE triggering the leak..
> > 
> > Because the view I screenshotted shows the stacks contributing to peak 
> > memory
> > usage, it works nicely to find "temporary leaks", even if memory is actually
> > freed after all etc.
> > 
> 
> That's a nice visualization, but isn't that useful only once you
> determine there's a memory leak? Which I think is the hard problem.

So is your gdb approach, unless I am misunderstanding? The view I
screenshotted shows the "peak" allocated memory, if you have a potential leak,
you can see where most of the allocated memory was allocated. Which at least
provides you with a good idea of where to look for a problem in more detail.


> >>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a 
> >>> shorter
> >>> lived memory context instead? Otherwise we'll just end up with the same
> >>> problem in a few years.
> >>>
> >>
> >> I agree using a shorter lived memory context would be more elegant, and
> >> more in line with how we do things. But it's not clear to me why we'd
> >> end up with the same problem in a few years with what the patch does.
> > 
> > Because it sets up the pattern of manual memory management and continues to
> > run the relevant code within a query-lifetime context.
> > 
> 
> Oh, you mean someone might add new allocations to this code (or into one
> of the functions executed from it), and that'd leak again? Yeah, true.

Yes. It's certainly not obvious this far down that we are called in a
semi-long-lived memory context.

Greetings,

Andres Freund

Re: memory leak in trigger handling (since PG12)

Hi,

On 2023-05-24 21:49:13 +0200, Tomas Vondra wrote:
> >> and then have to pass updatedCols elsewhere. It's tricky to just switch
> >> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
> >> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
> >> lived context).
> > 
> > Hm - on a quick look the allocations in trigger.c itself are done with
> > MemoryContextAlloc().
> > 
> > I did find a problematic path, namely that ExecGetChildToRootMap() ends up
> > building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.
> > 
> > That seems like a flat out bug to me - we can't just store data in a
> > ResultRelInfo without ensuring the memory is lives long enough. Nearby 
> > places
> > like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.
> > 
> > 
> > Did you see other bits of memory getting allocated in CurrentMemoryContext?
> > 
> 
> No, I simply tried to do the context switch and then gave up when it
> crashed on the ExecGetRootToChildMap info. I haven't looked much
> further, but you may be right it's the only bit.
> 
> It didn't occur to me it might be a bug - I think the code simply
> assumes it gets called with suitable memory context, just like we do in
> various other places. Maybe it should document the assumption.

I think architecturally, code storing stuff in a ResultRelInfo - which has
query lifetime - ought to be careful to allocate memory with such lifetime.
Note that the nearby ExecGetRootToChildMap() actually is careful to do so.

> >> So we'd have to do another switch again. Not sure how
> >> backpatch-friendly would that be.
> > 
> > Yea, that's a valid concern. I think it might be reasonable to use something
> > like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
> > short-lived context for the trigger invocations in >= 16.

Hm - stepping back a bit, why are we doing the work in ExecGetAllUpdatedCols()
over and over?  Unless I am missing something, the result doesn't change
across rows. And it doesn't look that cheap to compute, leaving aside the
allocation that bms_union() does.

It's visible in profiles, not as a top entry, but still.

Perhaps the easiest to backpatch fix is to just avoid recomputing the value?
But perhaps it'd be just as problmeatic, because callers might modify
ExecGetAllUpdatedCols()'s return value in place...

Greetings,

Andres Freund

Re: memory leak in trigger handling (since PG12)

On 5/24/23 20:14, Andres Freund wrote:
> Hi,
> 
> On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote:
>> On 5/23/23 19:14, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
 This means that for an UPDATE with triggers, we may end up calling this
 for each row, possibly multiple bitmaps. And those bitmaps are allocated
 in ExecutorState, so won't be freed until the end of the query :-(
>>>
>>> Ugh.
>>>
>>>
>>> I've wondered about some form of instrumentation to detect such issues
>>> before. It's obviously a problem that we can have fairly large leaks, like 
>>> the
>>> one you just discovered, without detecting it for a couple years. It's kinda
>>> made worse by the memory context infrastructure, because it hides such 
>>> issues.
>>>
>>> Could it help to have a mode where the executor shutdown hook checks how 
>>> much
>>> memory is allocated in ExecutorState and warns if its too much? There's 
>>> IIRC a
>>> few places that allocate large things directly in it, but most of those
>>> probably should be dedicated contexts anyway.  Something similar could be
>>> useful for some other long-lived contexts.
>>>
>>
>> Not sure such simple instrumentation would help much, unfortunately :-(
>>
>> We only discovered this because the user reported OOM crashes, which
>> means the executor didn't get to the shutdown hook at all. Yeah, maybe
>> if we had such instrumentation it'd get triggered for milder cases, but
>> I'd bet the amount of noise is going to be significant.
>>
>> For example, there's a nearby thread about hashjoin allocating buffiles
>> etc. in ExecutorState - we ended up moving that to a separate context,
>> but surely there are more such cases (not just for ExecutorState).
> 
> Yes, that's why I said that we would have to more of those into dedicated
> contexts - which is a good idea independent of this issue.
> 

Yeah, I think that's a good idea in principle.

> 
>> The really hard thing was determining what causes the memory leak - the
>> simple instrumentation doesn't help with that at all. It tells you there
>> might be a leak, but you don't know where did the allocations came from.
>>
>> What I ended up doing is a simple gdb script that sets breakpoints on
>> all palloc/pfree variants, and prints info (including the backtrace) for
>> each call on ExecutorState. And then a script that aggregate those to
>> identify which backtraces allocated most chunks that were not freed.
> 
> FWIW, for things like this I found "heaptrack" to be extremely helpful.
> 
> E.g. for a reproducer of the problem here, it gave me the attach "flame graph"
> of the peak memory usage - after attaching to a running backend and running an
> UPDATE triggering the leak..
> 
> Because the view I screenshotted shows the stacks contributing to peak memory
> usage, it works nicely to find "temporary leaks", even if memory is actually
> freed after all etc.
> 

That's a nice visualization, but isn't that useful only once you
determine there's a memory leak? Which I think is the hard problem.

> 
> 
>>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter
>>> lived memory context instead? Otherwise we'll just end up with the same
>>> problem in a few years.
>>>
>>
>> I agree using a shorter lived memory context would be more elegant, and
>> more in line with how we do things. But it's not clear to me why we'd
>> end up with the same problem in a few years with what the patch does.
> 
> Because it sets up the pattern of manual memory management and continues to
> run the relevant code within a query-lifetime context.
> 

Oh, you mean someone might add new allocations to this code (or into one
of the functions executed from it), and that'd leak again? Yeah, true.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: memory leak in trigger handling (since PG12)

Hi,

On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
> While looking for other places allocating stuff in ExecutorState (for
> the UPDATE case) and leaving it there, I found two more cases:

For a bit I thought there was a similar problem in ExecWithCheckOptions() -
but we error out immediately afterwards, so there's no danger of accumulating
memory.

I find it quite depressing that we have at least four copies of:

/*
 * If the tuple has been routed, it's been converted to the partition's
 * rowtype, which might differ from the root table's.  We must convert 
it
 * back to the root table's rowtype so that val_desc in the error 
message
 * matches the input tuple.
 */
if (resultRelInfo->ri_RootResultRelInfo)
{
ResultRelInfo *rootrel = resultRelInfo->ri_RootResultRelInfo;
TupleDesc   old_tupdesc;
AttrMap*map;

root_relid = RelationGetRelid(rootrel->ri_RelationDesc);
tupdesc = RelationGetDescr(rootrel->ri_RelationDesc);

old_tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc);
/* a reverse map */
map = build_attrmap_by_name_if_req(old_tupdesc, tupdesc, false);

/*
 * Partition-specific slot's tupdesc can't be changed, so 
allocate a
 * new one.
 */
if (map != NULL)
slot = execute_attr_map_slot(map, slot,

 MakeTupleTableSlot(tupdesc, &TTSOpsVirtual));
modifiedCols = bms_union(ExecGetInsertedCols(rootrel, estate),

ExecGetUpdatedCols(rootrel, estate));
}

Once in ExecPartitionCheckEmitError(), *twice* in ExecConstraints(),
ExecWithCheckOptions(). Some of the partitioning stuff has been added in a
really myopic way.

Greetings,

Andres Freund

Re: memory leak in trigger handling (since PG12)




On 5/24/23 20:55, Andres Freund wrote:
> Hi,
> 
> On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote:
>> I looked at this again, and I think GetPerTupleMemoryContext(estate)
>> might do the trick, see the 0002 part.
> 
> Yea, that seems like the right thing here.
> 
> 
>> Unfortunately it's not much
>> smaller/simpler than just freeing the chunks, because we end up doing
>>
>> oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
>> updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
>> MemoryContextSwitchTo(oldcxt);
> 
> We could add a variant of ExecGetAllUpdatedCols that switches the context.
> 

Yeah, we could do that. I was thinking about backpatching, and modifying
 ExecGetAllUpdatedCols signature would be ABI break, but adding a
variant should be fine.

> 
>> and then have to pass updatedCols elsewhere. It's tricky to just switch
>> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
>> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
>> lived context).
> 
> Hm - on a quick look the allocations in trigger.c itself are done with
> MemoryContextAlloc().
> 
> I did find a problematic path, namely that ExecGetChildToRootMap() ends up
> building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.
> 
> That seems like a flat out bug to me - we can't just store data in a
> ResultRelInfo without ensuring the memory is lives long enough. Nearby places
> like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.
> 
> 
> Did you see other bits of memory getting allocated in CurrentMemoryContext?
> 

No, I simply tried to do the context switch and then gave up when it
crashed on the ExecGetRootToChildMap info. I haven't looked much
further, but you may be right it's the only bit.

It didn't occur to me it might be a bug - I think the code simply
assumes it gets called with suitable memory context, just like we do in
various other places. Maybe it should document the assumption.

> 
>> So we'd have to do another switch again. Not sure how
>> backpatch-friendly would that be.
> 
> Yea, that's a valid concern. I think it might be reasonable to use something
> like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
> short-lived context for the trigger invocations in >= 16.
> 

WFM


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: memory leak in trigger handling (since PG12)

Hi,

On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote:
> I looked at this again, and I think GetPerTupleMemoryContext(estate)
> might do the trick, see the 0002 part.

Yea, that seems like the right thing here.

> Unfortunately it's not much
> smaller/simpler than just freeing the chunks, because we end up doing
> 
> oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
> updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
> MemoryContextSwitchTo(oldcxt);

We could add a variant of ExecGetAllUpdatedCols that switches the context.

> and then have to pass updatedCols elsewhere. It's tricky to just switch
> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
> AfterTriggerSaveEvent allocates other bits of memory too (in a longer
> lived context).

Hm - on a quick look the allocations in trigger.c itself are done with
MemoryContextAlloc().

I did find a problematic path, namely that ExecGetChildToRootMap() ends up
building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext.

That seems like a flat out bug to me - we can't just store data in a
ResultRelInfo without ensuring the memory is lives long enough. Nearby places
like ExecGetRootToChildMap() do make sure to switch to es_query_cxt.

Did you see other bits of memory getting allocated in CurrentMemoryContext?

> So we'd have to do another switch again. Not sure how
> backpatch-friendly would that be.

Yea, that's a valid concern. I think it might be reasonable to use something
like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a
short-lived context for the trigger invocations in >= 16.

Greetings,

Andres Freund

Re: memory leak in trigger handling (since PG12)

Hi,

On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote:
> On 5/23/23 19:14, Andres Freund wrote:
> > Hi,
> > 
> > On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote:
> >> This means that for an UPDATE with triggers, we may end up calling this
> >> for each row, possibly multiple bitmaps. And those bitmaps are allocated
> >> in ExecutorState, so won't be freed until the end of the query :-(
> > 
> > Ugh.
> > 
> > 
> > I've wondered about some form of instrumentation to detect such issues
> > before. It's obviously a problem that we can have fairly large leaks, like 
> > the
> > one you just discovered, without detecting it for a couple years. It's kinda
> > made worse by the memory context infrastructure, because it hides such 
> > issues.
> > 
> > Could it help to have a mode where the executor shutdown hook checks how 
> > much
> > memory is allocated in ExecutorState and warns if its too much? There's 
> > IIRC a
> > few places that allocate large things directly in it, but most of those
> > probably should be dedicated contexts anyway.  Something similar could be
> > useful for some other long-lived contexts.
> > 
> 
> Not sure such simple instrumentation would help much, unfortunately :-(
> 
> We only discovered this because the user reported OOM crashes, which
> means the executor didn't get to the shutdown hook at all. Yeah, maybe
> if we had such instrumentation it'd get triggered for milder cases, but
> I'd bet the amount of noise is going to be significant.
> 
> For example, there's a nearby thread about hashjoin allocating buffiles
> etc. in ExecutorState - we ended up moving that to a separate context,
> but surely there are more such cases (not just for ExecutorState).

Yes, that's why I said that we would have to more of those into dedicated
contexts - which is a good idea independent of this issue.


> The really hard thing was determining what causes the memory leak - the
> simple instrumentation doesn't help with that at all. It tells you there
> might be a leak, but you don't know where did the allocations came from.
> 
> What I ended up doing is a simple gdb script that sets breakpoints on
> all palloc/pfree variants, and prints info (including the backtrace) for
> each call on ExecutorState. And then a script that aggregate those to
> identify which backtraces allocated most chunks that were not freed.

FWIW, for things like this I found "heaptrack" to be extremely helpful.

E.g. for a reproducer of the problem here, it gave me the attach "flame graph"
of the peak memory usage - after attaching to a running backend and running an
UPDATE triggering the leak..

Because the view I screenshotted shows the stacks contributing to peak memory
usage, it works nicely to find "temporary leaks", even if memory is actually
freed after all etc.



> > Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter
> > lived memory context instead? Otherwise we'll just end up with the same
> > problem in a few years.
> >
> 
> I agree using a shorter lived memory context would be more elegant, and
> more in line with how we do things. But it's not clear to me why we'd
> end up with the same problem in a few years with what the patch does.

Because it sets up the pattern of manual memory management and continues to
run the relevant code within a query-lifetime context.

Greetings,

Andres Freund

Question about error message in auth.c

2023-05-24 Thread Dave Cramer

Greetings,

In
https://github.com/postgres/postgres/blob/5c2c59ba0b5f723b067a6fa8bf8452d41fbb2125/src/backend/libpq/auth.c#L463

The last piece of information is the encryption state. However when an SSL
connection fails to authenticate the state is not encrypted.

When would it ever be encrypted if authentication fails ?

Dave Cramer

Re: memory leak in trigger handling (since PG12)

On 5/24/23 17:37, Tom Lane wrote:
> Tomas Vondra  writes:
>> While looking for other places allocating stuff in ExecutorState (for
>> the UPDATE case) and leaving it there, I found two more cases:
> 
>> 1) copy_plpgsql_datums
> 
>> 2) make_expanded_record_from_tupdesc
>>make_expanded_record_from_exprecord
> 
>> All of this is calls from plpgsql_exec_trigger.
> 
> Can you show a test case in which this happens?  I added some
> instrumentation and verified at least within our regression tests,
> copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's
> "SPI Proc" context, so I do not see how there can be a query-lifespan
> leak there, nor how your 0003 would fix it if there is.
> 

Interesting. I tried to reproduce it, but without success, and it passes
even with an assert on the context name. The only explanation I have is
that the gdb script I used might have been a bit broken - it used
conditional breakpoints like this one:

  break AllocSetAlloc if strcmp(((MemoryContext) $rdi)->name, \
"ExecutorState") == 0
  commands
  bt
  cont
  end

but I just noticed gdb sometimes complains about this:

  Error in testing breakpoint condition:
  '__strcmp_avx2' has unknown return type; cast the call to its declared
  return type

The breakpoint still fires all the commands, which is pretty surprising
behavior, but that might explain why I saw copy_plpgsql_data as another
culprit. And I suspect the make_expanded_record calls might be caused by
the same thing.

I'll check deeper tomorrow, when I get access to the original script
etc. We can ignore these cases until then.

Sorry for the confusion :-/

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: PG 16 draft release notes ready

On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote:
> > * Reduce palloc() memory overhead for all memory allocations down to 8
> > bytes on all platforms. (Andres Freund, David Rowley)
> > 
> > This allows more efficient use of memory and is especially useful in
> > queries which perform operations (such as sorting or hashing) that
> > require more than work_mem.
> 
> Well, this would go in the source code section, but it seems too
> internal and global to mention.

What was the previous memory allocation overhead?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: PG 16 draft release notes ready

On Wed, May 24, 2023 at 08:48:56AM +1200, David Rowley wrote:
> On Tue, 23 May 2023 at 06:04, Bruce Momjian  wrote:
> >
> > On Mon, May 22, 2023 at 10:59:36AM -0700, Andres Freund wrote:
> > > And here it's not just performance, but also memory usage, including 
> > > steady
> > > state memory usage.
> >
> > Understood.  I continue to need help determining which items to include.
> > Can you suggest some text?  This?
> >
> > Improve efficiency of memory usage to allow for better scaling
> 
> Maybe something like:
> 
> * Reduce palloc() memory overhead for all memory allocations down to 8
> bytes on all platforms. (Andres Freund, David Rowley)
> 
> This allows more efficient use of memory and is especially useful in
> queries which perform operations (such as sorting or hashing) that
> require more than work_mem.

Well, this would go in the source code section, but it seems too
internal and global to mention.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.

Re: PostgreSQL 16 Beta 1 release announcement draft

On 5/22/23 3:23 PM, Erik Rijkers wrote:

Op 5/21/23 om 19:07 schreef Jonathan S. Katz:

On 5/19/23 12:17 AM, Jonathan S. Katz wrote:

Hi,

Attached is a draft of the release announcement for PostgreSQL 16
Beta Please provide feedback no later than May 24, 0:00 AoE. This
will give
Thanks everyone for your feedback. Here is the updated text that

'substransaction' should be
'subtransaction'

Fixed.

'use thousands separators' perhaps is better:
'use underscore as digit-separator, as in `5_432` and `1_00_000`'

I looked at how other languages document this, and they do use the term
"thousands separators." I left that in, but explicitly called out the
underscore.

'instrcut' should be
'instruct'

Fixed. Attached is the (hopefully) final draft.

Thanks,

Jonathan

The PostgreSQL Global Development Group announces that the first beta release of
PostgreSQL 16 is now [available for
download](https://www.postgresql.org/download/).
This release contains previews of all features that will be available when
PostgreSQL 16 is made generally available, though some details of the release
can change during the beta period.

You can find information about all of the features and changes found in
PostgreSQL 16 in the [release
notes](https://www.postgresql.org/docs/16/release-16.html):

[https://www.postgresql.org/docs/16/release-16.html](https://www.postgresql.org/docs/16/release-16.html)

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 16 on your systems to help us eliminate
bugs or other issues that may exist. While we do not advise you to run
PostgreSQL 16 Beta 1 in production environments, we encourage you to find ways
to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that the PostgreSQL 16
release upholds our standards of delivering a stable, reliable release of the
world's most advanced open source relational database. Please read more about
our [beta testing process](https://www.postgresql.org/developer/beta/) and how
you can contribute:

[https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/)

PostgreSQL 16 Feature Highlights

### Performance

PostgreSQL 16 includes performance improvements in query execution. This release
adds more query parallelism, including allowing `FULL` and `RIGHT` joins to
execute in parallel, and parallel execution of the `string_agg` and `array_agg`
aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in
`SELECT DISTINCT` queries. There are also several optimizations for
[window
queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS),
improvements in lookups for `RANGE` and `LIST` partitions, and support for
"anti-joins" in `RIGHT` and `OUTER` queries.

This release also introduces support for CPU acceleration using SIMD for both
x86 and ARM architectures, including optimizations for processing ASCII and JSON
strings, and array and subtransaction searches. Additionally, PostgreSQL 16
introduces [load
balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS)
to libpq, the client library for PostgreSQL.

### Logical Replication Enhancements

Logical replication lets PostgreSQL users stream data in real-time to other
PostgreSQL or other external systems that implement the logical protocol. Until
PostgreSQL 16, users could only create logical replication publishers on primary
instances. PostgreSQL 16 adds the ability to perform logical decoding on a
standby instance, giving users more options to distribute their workload, for
example, use a standby that's less busy than a primary to logically replicate
changes.

PostgreSQL 16 also includes several performance improvements to logical
replication. This includes allowing the subscriber to apply large transactions
in parallel, use indexes other than the `PRIMARY KEY` to perform lookups during
`UPDATE` or `DELETE` operations, and allow for tables to be copied using binary
format during initialization.

### Developer Experience

PostgreSQL 16 continues to implement the
[SQL/JSON](https://www.postgresql.org/docs/16/functions-json.html)
standard for manipulating
[JSON](https://www.postgresql.org/docs/16/datatype-json.html)
data, including support for SQL/JSON constructors (e.g. `JSON_ARRAY()`,
`JSON_ARRAYAGG()` et al), and identity functions (`IS JSON`). This release also
adds the SQL standard
[`ANY_VALUE`](https://www.postgresql.org/docs/16/functions-aggregate.html#id-1.5.8.27.5.2.4.1.1.1.1)
aggregate function, which returns any arbitrary value from the aggregate set.
For convenience, PostgreSQL 16 now lets you specify non-decimal integer
literals, such as `0xff`, `0o777`, and `0b101010`, and use underscores as
thousands separators, such as `5_432`.

This release adds support for the extended query protocol to the

Re: PG 16 draft release notes ready


On 5/24/23 12:13 AM, David Rowley wrote:

On Wed, 24 May 2023 at 15:54, Bruce Momjian  wrote:


On Wed, May 24, 2023 at 08:37:45AM +1200, David Rowley wrote:

On Mon, 22 May 2023 at 07:05, Jonathan S. Katz  wrote:

* Parallel execution of queries that use `FULL` and `OUTER` joins


I think this should be `RIGHT` joins rather than `OUTER` joins.

LEFT joins have been parallelizable I think for a long time now.


Well, since we can swap left/right easily, why would we not have just
have swappted the tables and done the join in the past?  I think there
are two things missing in my description.

First, I need to mention parallel _hash_ join.  Second, I think this
item is saying that the _inner_ side of a parallel hash join can be an
OUTER or FULL join.  How about?

 Allow hash joins to be parallelized where the inner side is
 processed as an OUTER or FULL join (Melanie Plageman, Thomas Munro)

In this case, the inner side is the hashed side.


I think Jonathan's text is safe to swap OUTER to RIGHT as it mentions
"execution".


I made this swap in the release announcement. Thanks!

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature

Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Joe Conway


On 5/24/23 11:39, Jeff Davis wrote:

On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:

In practice we're probably getting the "und" ICU locale whereas "fr"
would be appropriate.


This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.


Huge +1

IMHO the experience should be unified to the degree possible.

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: PG 16 draft release notes ready

On Wed, May 24, 2023 at 04:57:59PM +0200, Erik Rijkers wrote:
> Op 5/24/23 om 15:58 schreef Bruce Momjian:
> > On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:
> > > 
> > > On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:
> 
> Typos:
> 
> 'from standbys servers'  should be
> 'from standby servers'
> 
> 'reindexedb'  should be
> 'reindexdb'
>   (2x: the next line mentions, erroneously,  'reindexedb --system')
> 
> 'created only created'  should be
> 'only created'
>   (I think)
> 
> 'could could'  should be
> 'could'
> 
> 'are now require the role'  should be
> 'now require the role'
> 
> 'values is'  should be
> 'value is'
> 
> 'to marked'  should be
> 'to be marked'

All good, patch attached and applied.  Updated docs are at:

https://momjian.us/pgsql_docs/release-16.html

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index bb92fe5cf9..88d6514ad7 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -27,7 +27,7 @@
 
 
  
-  Allow logical replication from standbys servers
+  Allow logical replication from standby servers
  
 
 
@@ -126,11 +126,11 @@ Author: Michael Paquier 
 
 
 
-Change REINDEX DATABASE and reindexedb to not process indexes on system catalogs (Simon Riggs)
+Change REINDEX DATABASE and reindexdb to not process indexes on system catalogs (Simon Riggs)
 
 
 
-Processing such indexes is still possible using REINDEX SYSTEM and reindexedb --system.
+Processing such indexes is still possible using REINDEX SYSTEM and reindexdb --system.
 
 
 
@@ -593,7 +593,7 @@ Create subscription statistics entries at subscription creation time so stats_re
 
 
 
-Previously entries were created only created when the first statistics were reported.
+Previously entries were created only when the first statistics were reported.
 
 
 
@@ -777,7 +777,7 @@ Simplify permissions for LOCK TABLE (Jeff Davis)
 
 
 
-Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions.  For example, UPDATE could could perform all lock levels except ACCESS SHARE, which
+Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions.  For example, UPDATE could perform all lock levels except ACCESS SHARE, which
 required SELECT permissions.  Now UPDATE can issue all lock levels.  MORE?
 
 
@@ -808,7 +808,7 @@ Restrict the privileges of CREATEROLE roles (Robert Haas)
 
 
 
-Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role.  Such changes, including adding members, are now require the role requesting the change to have ADMIN OPTION
+Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role.  Such changes, including adding members, now require the role requesting the change to have ADMIN OPTION
 permission.
 
 
@@ -946,7 +946,7 @@ Add dependency tracking of grantors for GRANT records (Robert Haas)
 
 
 
-This will guarantee that pg_auth_members.grantor values is always valid.
+This guarantees that pg_auth_members.grantor values are always valid.
 
 
 
@@ -3017,7 +3017,7 @@ Author: Tom Lane 
 
 
 
-Allow required extensions to marked as non-relocatable using "no_relocate" (Regina Obe)
+Allow required extensions to be marked as non-relocatable using "no_relocate" (Regina Obe)

Re: memory leak in trigger handling (since PG12)

Tomas Vondra  writes:
> On 5/23/23 23:39, Tom Lane wrote:
>> FWIW, I've had some success localizing palloc memory leaks with valgrind's
>> leak detection mode.  The trick is to ask it for a report before the
>> context gets destroyed.  Beats writing your own infrastructure ...

> I haven't tried valgrind, so can't compare.
> Would it be possible to filter which memory contexts to track? Say we
> know the leak is in ExecutorState, so we don't need to track allocations
> in other contexts. That was a huge speedup for me, maybe it'd help
> valgrind too.

I don't think valgrind has a way to do that, but this'd be something you
set up specially in any case.

> Also, how did you ask for the report before context gets destroyed?

There are several valgrind client requests that could be helpful:

/* Do a full memory leak check (like --leak-check=full) mid-execution. */
#define VALGRIND_DO_LEAK_CHECK   \
VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK,   \
0, 0, 0, 0, 0)

/* Same as VALGRIND_DO_LEAK_CHECK but only showing the entries for
   which there was an increase in leaked bytes or leaked nr of blocks
   since the previous leak search. */
#define VALGRIND_DO_ADDED_LEAK_CHECK\
VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK,  \
0, 1, 0, 0, 0)

/* Return number of leaked, dubious, reachable and suppressed bytes found by
   all previous leak checks.  They must be lvalues.  */
#define VALGRIND_COUNT_LEAK_BLOCKS(leaked, dubious, reachable, suppressed) \

Putting VALGRIND_DO_ADDED_LEAK_CHECK someplace in the executor loop would
help narrow things down pretty quickly, assuming you had a self-contained
example demonstrating the leak.  I don't recall exactly how I used these
but it was something along that line.

>> Yeah, it's not clear whether we could make the still-hypothetical check
>> sensitive enough to find leaks using small test cases without getting an
>> unworkable number of false positives.  Still, might be worth trying.

> I'm not against experimenting with that. Were you thinking about
> something that'd be cheap enough to just be enabled always/everywhere,
> or something we'd enable during testing?

We seem to have already paid the overhead of counting all palloc
allocations, so I don't think it'd be too expensive to occasionally check
the ExecutorState's mem_allocated and see if it's growing (especially
if we only do so in assert-enabled builds).  The trick is to define the
rules for what's worth reporting.

>> It might be an acceptable tradeoff to have stricter rules for what can
>> be allocated in ExecutorState in order to make this sort of problem
>> more easily detectable.

> Would these rules be just customary, or defined/enforced in the code
> somehow? I can't quite imagine how would that work, TBH.

If the check bleated "WARNING: possible executor memory leak" during
regression testing, people would soon become conditioned to doing
whatever they have to do to avoid it ;-)

regards, tom lane

Re: Order changes in PG16 since ICU introduction

2023-05-24 Thread Jeff Davis

On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote:
> While I agree that the LOCALE option in CREATE DATABASE is
> counter-intuitive,

I think it's more than that. As Andreww Gierth pointed out:

   $ initdb --locale=fr_FR
   ...
 ICU locale:  en-US
   ...

Is more than just counter-intuitive. I don't think we can ship 16 that
way.

>  I find it questionable that blending ICU
> and libc locales into it helps that much with the user experience.

Thank you for going through some examples here. I agree that it's not
perfect, but we need some path to a reasonable ICU user experience, and
I think we'll have to accept some rough edges to avoid the worst cases,
like above.

> initdb:
> 
>   Using default ICU locale "fr".
>   Using language tag "fr" for ICU locale "fr".
> 

...

> #1
> 
> postgres=# create database test1 locale='fr_FR.UTF-8';
> NOTICE:  using standard form "fr-FR" for ICU locale "fr_FR.UTF-8"
> ERROR:  new ICU locale (fr-FR) is incompatible with the ICU locale of

I don't see a problem here. If you specify LOCALE to CREATE DATABASE,
you should either be using "TEMPLATE template0", or you should be
expecting an error if the LOCALE doesn't match exactly.

What would you like to see happen here?

> #2
> 
> postgres=# create database test2 locale='C.UTF-8'
> template='template0';
> NOTICE:  using standard form "en-US-u-va-posix" for ICU locale
> "C.UTF-8"
> CREATE DATABASE
> 
> 
> en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so
> this interpretation is arguably not what a user would expect.

As you pointed out, this is not settled in libc either:

https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f%40manitou-mail.org

We really can't expect a particular order for a particular locale name,
unless we handle it specially like "C" or "POSIX". If we pass it to the
provider, we have to trust the provider to match our conceptual
expectations for that locale (and ideally version it properly).

> I would expect the ICU warning or error (icu_validation_level) to
> kick
> in instead of that transliteration.

Earlier versions of ICU (<= 63) do this transformation automatically,
and I don't see a reason to throw an error if ICU considers it valid.
The language tag en-US-u-va-posix will be stored in the catalog, and
that will be considered valid in later versions of ICU.

Later versions of ICU (>= 64) consider locales with a language name of
"C" to be obsolete and no longer valid. I added code to do the
transformation without error in these later versions, but I think we
have agreement to remove it.

If a user specifies the locale as "C.UTF-8", we can either pass it to
ICU and see whether that version accepts it or not (and if not, throw a
warning/error); or if we decide that "C.UTF-8" really means "C", we can
handle it in the memcmp() path like C and never send it to ICU.

> #3
> 
> $ grep french /etc/locale.alias
> french  fr_FR.ISO-8859-1
> 
> postgres=# create database test3 locale='french' template='template0'
> encoding='LATIN1';
> WARNING:  ICU locale "french" has unknown language "french"
> HINT:  To disable ICU locale validation, set parameter
> icu_validation_level
> to DISABLED.
> CREATE DATABASE
> 
> 
> In practice we're probably getting the "und" ICU locale whereas "fr"
> would
> be appropriate.

This is a good point and illustrates that ICU is not a drop-in
replacement for libc in all cases.

I don't see a solution here that doesn't involve some rough edges,
though. "Locale" is a generic term, and if we continue to insist that
it really means a libc locale, then ICU will never be on an equal
footing with libc, let alone the preferred provider.

Regards,
Jeff Davis

Re: memory leak in trigger handling (since PG12)

Tomas Vondra  writes:
> While looking for other places allocating stuff in ExecutorState (for
> the UPDATE case) and leaving it there, I found two more cases:

> 1) copy_plpgsql_datums

> 2) make_expanded_record_from_tupdesc
>make_expanded_record_from_exprecord

> All of this is calls from plpgsql_exec_trigger.

Can you show a test case in which this happens?  I added some
instrumentation and verified at least within our regression tests,
copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's
"SPI Proc" context, so I do not see how there can be a query-lifespan
leak there, nor how your 0003 would fix it if there is.

regards, tom lane

Re: Wrong results due to missing quals

Richard Guo  writes:
> So the qual 't2.a = t5.a' is missing.

Ugh.

> I looked into it and found that both clones of this joinqual are
> rejected by clause_is_computable_at, because their required_relids do
> not include the outer join of t2/(t3/t4), and meanwhile include nullable
> rels of this outer join.
> I think the root cause is that, as Tom pointed out in [1], we're not
> maintaining required_relids very accurately.  In b9c755a2, we make
> clause_is_computable_at test required_relids for clone clauses.  I think
> this is how this issue sneaks in.

Yeah.  I'm starting to think that b9c755a2 took the wrong approach.
Really, required_relids is about making sure that a qual isn't
evaluated "too low", before all necessary joins have been formed.  But
clause_is_computable_at is charged with making sure we don't evaluate
it "too high", after some incompatible join has been formed.  There's
no really good reason to suppose that required_relids can serve both
purposes, even if it were computed perfectly accurately (and what is
perfect, anyway?).

So right now I'm playing with the idea of reverting the change in
clause_is_computable_at and seeing how else we can fix the previous
bug.  Don't have anything to show yet, but one thought is that maybe
deconstruct_distribute_oj_quals needs to set up clause_relids for
clone clauses differently.  Another idea is that maybe we need another
RestrictInfo field that's directly a set of OJ relids that this clause
can't be applied above.  That'd reduce clause_is_computable_at to
basically a bms_intersect test which would be nice speed-wise.  The
space consumption could be annoying, but I'm thinking that we might
only have to populate the field in clone clauses, which would
alleviate that issue.

regards, tom lane

Re: PG 16 draft release notes ready

2023-05-24 Thread Erik Rijkers


Op 5/24/23 om 15:58 schreef Bruce Momjian:

On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:


On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:


Typos:

'from standbys servers'  should be
'from standby servers'

'reindexedb'  should be
'reindexdb'
  (2x: the next line mentions, erroneously,  'reindexedb --system')

'created only created'  should be
'only created'
  (I think)

'could could'  should be
'could'

'are now require the role'  should be
'now require the role'

'values is'  should be
'value is'

'to marked'  should be
'to be marked'


thanks,
Erik

SyncRepWaitForLSN waits for XLogFlush?

2023-05-24 Thread Tejasvi Kashi

Hi everyone,

I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the
XLOG has been flushed locally up to that location before the record is
shipped off to standbys?

Regards,
Tej

Re: Proposal: Removing 32 bit support starting from PG17++

Hans Buschmann  writes:
> This inspired me to propose dropping 32bit support for PostgreSQL starting 
> with PG17.

I don't think this is a great idea.  Even if Intel isn't interested,
there'll be plenty of 32-bit left in the lower end of the market
(think ARM, IoT, and so on).

regards, tom lane

Proposal: Removing 32 bit support starting from PG17++

2023-05-24 Thread Hans Buschmann

I recently stumbled over the following Intel proposal for dropping 32bit 
support in x86 processors. [1]


This inspired me to propose dropping 32bit support for PostgreSQL starting with 
PG17.


It seems obvious that production systems mostly won't use newly installed 32 
bit native code in late 2024 and beyond.


A quick inspection of the buildfarm only showed a very limited number of 32bit 
systems.


This proposal follows the practice for Windows which already (practically) 
dropped 32bit support.


32bit systems may get continuing support in the backbranches until PG16 retires 
in 2028.


Even if I am not a postgres hacker I suppose this could simplify things quite a 
lot.


Any thoughts for discussion are welcome!


Hans Buschmann


[1] https://www.phoronix.com/news/Intel-X86-S-64-bit-Only

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

Daniel Gustafsson  writes:
> It would be nice it the OpenSSL project could grant us an LTS license for a
> buildfarm animal to ensure compatibility but I have no idea how realistic that
> is (or how much the LTS version of 1.0.2 has diverged from the last available
> public 1.0.2 version).

Surely the answer must be "not much".  The entire point of an LTS
version is to not have to change dusty calling applications.

We had definitely better have some animals still using 1.0.2, but
I don't see much reason to think that the last public release
wouldn't be good enough.

regards, tom lane

Re: Make pgbench exit on SIGINT more reliably

2023-05-24 Thread Tristan Partin

On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote:
> On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote:
> > The way that pgbench handled SIGINT changed in
> > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a
> > couple of unintended consequences, at least from what I can tell[1].
> > 
> > - CTRL-C no longer stops the program unless the right point in pgbench
> >   execution is hit
> > - pgbench no longer exits with a non-zero exit code
> > 
> > An easy reproduction of these problems is to run with a large scale
> > factor like: pgbench -i -s 50. Then try to CTRL-C the program.
>
> This comes from the code path where the data is generated client-side,
> and where the current CancelRequested may not be that responsive,
> isn't it?

It comes from the fact that CancelRequested is only checked within the
generation of the pgbench_accounts table, but with a large enough scale
factor or enough latency, say cross-continent communication, generating
the other tables can be time consuming too. But, yes you are more likely
to run into this issue when generating client-side data.

-- 
Tristan Partin
Neon (https://neon.tech)

Re: PG 16 draft release notes ready

On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote:
> 
> On Wed, May 24, 2023 at 11:19 AM Bruce Momjian  wrote:
> >
> > Second, you might be correct that the section is wrong.  I thought of
> > CPU instructions as something tied to the compiler, so part of the build
> > process or source code, but the point we should be make is that we have
> > these acceleration, not how it is implemented.  We can move the entire
> > group to the "General Performance" section, or we can split it out:
> 
> Splitting out like that seems like a good idea to me. 

Okay, items split into sections and several merged.  I left the
CPU-specific parts in Source Code, and moved the rest into a merged item
in General Performance, but moved the JSON item to Data Types.

Patch attached, and you can see the results at:

https://momjian.us/pgsql_docs/release-16.html

> The last one refers to new internal functions, so it could stay in source 
> code.
> (Either way, we don't want to imply that arrays of SQL types are accelerated
> this way, it's so far only for internal arrays.)

Good point.  I called them "C arrays" but it it into the General
Performance item.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
commit ad5406246b
Author: Bruce Momjian 
Date:   Wed May 24 09:54:34 2023 -0400

doc: PG 16 relnotes, merge and move vector items

Reported-by: John Naylor

Discussion: https://postgr.es/m/CAFBsxsEPg8L2MmGqavc8JByC=wf_mnkhn-kknfpkcqh0hyd...@mail.gmail.com

diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index c30c530065..bb92fe5cf9 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -472,6 +472,28 @@ Author: David Rowley 
 
 Improve the speed of updating the process title (David Rowley)
 
+
+
+
+
+
+
+Allow xid/subxid searches and ASCII string detection to use vector operations (Nathan Bossart)
+
+
+
+ASCII detection is particularly useful for COPY FROM.  Vector operations are also used for some C array searches.
+
+
 
 
  
@@ -1781,6 +1803,17 @@ The IS JSON checks include checks for values, arrays, objects, scalars, and uniq
 
 
 
+
+
+
+
+Allow JSON string parsing to use vector operations (John Naylor)
+
+
+
 
-
-
-
-Allow ASCII string detection to use vector operations (John Naylor)
-
-
-
-
-
-
-
-Allow JSON string parsing to use vector operations (John Naylor)
-
-
-
-ARM?
-
-
-
-
-
-
-
-Allow array searches to use vector operations (John Naylor)
-
-
-
-
-
-
-
-Allow xid/subxid searches to use vector operations (Nathan Bossart)
-
-
-

Re: memory leak in trigger handling (since PG12)



On 5/23/23 22:57, Tomas Vondra wrote:
> 
> 
> On 5/23/23 18:39, Tom Lane wrote:
>> Tomas Vondra  writes:
>>> it seems there's a fairly annoying memory leak in trigger code,
>>> introduced by
>>> ...
>>> Attached is a patch, restoring the pre-12 behavior for me.
>>
>>> While looking for other places allocating stuff in ExecutorState (for
>>> the UPDATE case) and leaving it there, I found two more cases:
>>
>>> 1) copy_plpgsql_datums
>>
>>> 2) make_expanded_record_from_tupdesc
>>>make_expanded_record_from_exprecord
>>
>>> All of this is calls from plpgsql_exec_trigger.
>>
>> Not sure about the expanded-record case, but both of your other two
>> fixes feel like poor substitutes for pushing the memory into a
>> shorter-lived context.  In particular I'm quite surprised that
>> plpgsql isn't already allocating that workspace in the "procedure"
>> memory context.
>>
> 
> I don't disagree, but which memory context should this use and
> when/where should we switch to it?
> 
> I haven't seen any obvious memory context candidate in the code
> calling ExecGetAllUpdatedCols, so I guess we'd have to pass it from
> above. Is that a good idea for backbranches ...
> 

I looked at this again, and I think GetPerTupleMemoryContext(estate)
might do the trick, see the 0002 part. Unfortunately it's not much
smaller/simpler than just freeing the chunks, because we end up doing

oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate));
updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
MemoryContextSwitchTo(oldcxt);

and then have to pass updatedCols elsewhere. It's tricky to just switch
to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as
AfterTriggerSaveEvent allocates other bits of memory too (in a longer
lived context). So we'd have to do another switch again. Not sure how
backpatch-friendly would that be.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 85439e2587f035040c82e7303b96b887e650b01f Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Tue, 23 May 2023 17:45:47 +0200
Subject: [PATCH 1/3] Free updatedCols bitmaps

---
 src/backend/commands/trigger.c  | 22 --
 src/backend/executor/execMain.c |  9 +++--
 2 files changed, 27 insertions(+), 4 deletions(-)

diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c
index 4b295f8da5e..fb11203c473 100644
--- a/src/backend/commands/trigger.c
+++ b/src/backend/commands/trigger.c
@@ -2916,6 +2916,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo)
 	(errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED),
 	 errmsg("BEFORE STATEMENT trigger cannot return a value")));
 	}
+
+	bms_free(updatedCols);
 }
 
 void
@@ -2928,12 +2930,18 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 	Assert(relinfo->ri_RootResultRelInfo == NULL);
 
 	if (trigdesc && trigdesc->trig_update_after_statement)
+	{
+		Bitmapset *updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
+
 		AfterTriggerSaveEvent(estate, relinfo, NULL, NULL,
 			  TRIGGER_EVENT_UPDATE,
 			  false, NULL, NULL, NIL,
-			  ExecGetAllUpdatedCols(relinfo, estate),
+			  updatedCols,
 			  transition_capture,
 			  false);
+
+		bms_free(updatedCols);
+	}
 }
 
 bool
@@ -3043,6 +3051,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 heap_freetuple(trigtuple);
 			if (should_free_new)
 heap_freetuple(oldtuple);
+
+			bms_free(updatedCols);
+
 			return false;		/* "do nothing" */
 		}
 		else if (newtuple != oldtuple)
@@ -3068,6 +3079,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate,
 	if (should_free_trig)
 		heap_freetuple(trigtuple);
 
+	bms_free(updatedCols);
+
 	return true;
 }
 
@@ -3107,6 +3120,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		 */
 		TupleTableSlot *oldslot;
 		ResultRelInfo *tupsrc;
+		Bitmapset *updatedCols;
 
 		Assert((src_partinfo != NULL && dst_partinfo != NULL) ||
 			   !is_crosspart_update);
@@ -3129,14 +3143,18 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo,
 		else
 			ExecClearTuple(oldslot);
 
+		updatedCols = ExecGetAllUpdatedCols(relinfo, estate);
+
 		AfterTriggerSaveEvent(estate, relinfo,
 			  src_partinfo, dst_partinfo,
 			  TRIGGER_EVENT_UPDATE,
 			  true,
 			  oldslot, newslot, recheckIndexes,
-			  ExecGetAllUpdatedCols(relinfo, estate),
+			  updatedCols,
 			  transition_capture,
 			  is_crosspart_update);
+
+		bms_free(updatedCols);
 	}
 }
 
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index c76fdf59ec4..98502b956c2 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -2367,6 +2367,7 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo)
 {
 	Bitmapset  *keyCols;
 	Bitmapset  *updatedCols;
+	LockTupleMode	lockMode;
 
 	/*
 	 * Compute lock mode to use.  If columns that are part of the key have not
@@ -2378,9 +2379

Re: memory leak in trigger handling (since PG12)




On 5/24/23 10:19, Jakub Wartak wrote:
> Hi, just two cents:
> 
> On Tue, May 23, 2023 at 8:01 PM Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2023-05-23 13:28:30 -0400, Tom Lane wrote:
>>> Andres Freund  writes:
 Could it help to have a mode where the executor shutdown hook checks how 
 much
 memory is allocated in ExecutorState and warns if its too much?
>>>
>>> It'd be very hard to set a limit for what's "too much", since the amount
>>> of stuff created initially will depend on the plan size.
>>
>> I was thinking of some limit that should really never be reached outside of a
>> leak or work_mem based allocations, say 2GB or so.
> 
> RE: instrumentation subthread:
> if that helps then below technique can work somewhat good on normal
> binaries for end users (given there are debug symbols installed), so
> maybe we don't need that much infrastructure added in to see the hot
> code path:
> 
> perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on
> x86_64(palloc size arg0)
> perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p
>  sleep 3 # cannot be longer, huge overhead (~3s=~2GB)
> 
> it produces:
> 50.27%  (563d0e380670) size=24
> |
> ---palloc
>bms_copy
>ExecUpdateLockMode
>ExecBRUpdateTriggers
>ExecUpdate
> [..]
> 
> 49.73%  (563d0e380670) size=16
> |
> ---palloc
>bms_copy
>RelationGetIndexAttrBitmap
>ExecUpdateLockMode
>ExecBRUpdateTriggers
>ExecUpdate
> [..]
> 
> Now we know that those small palloc() are guilty, but we didn't know
> at the time with Tomas. The problem here is that we do not know in
> palloc() - via its own arguments for which MemoryContext this is going
> to be allocated for. This is problematic for perf, because on RHEL8, I
> was not able to generate an uprobe that was capable of reaching a
> global variable (CurrentMemoryContext) at that time.
> 

I think there are a couple even bigger issues:

(a) There are other methods that allocate memory - repalloc, palloc0,
MemoryContextAlloc, ... and so on. But presumably we can trace all of
them at once? We'd have to trace reset/deletes too.

(b) This doesn't say if/when the allocated chunks get freed - even with
a fix, we'll still do exactly the same number of allocations, except
that we'll free the memory shortly after that. I wonder if we could
print a bit more info for each probe, to match the alloc/free requests.

> Additionally what was even more frustrating on diagnosing that case on
> the customer end system, was that such OOMs were crashing other
> PostgreSQL clusters on the same OS. Even knowing the exact guilty
> statement it was impossible to limit RSS memory usage of that
> particular backend. So, what you are proposing makes a lot of sense.
> Also it got me thinking of implementing safety-memory-net-GUC
> debug_query_limit_backend_memory=X MB that would inject
> setrlimit(RLIMIT_DATA) through external extension via hook(s) and
> un-set it later, but the man page states it works for mmap() only
> after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 -
> maybe that's still good enough?; Or, well maybe try to hack a palloc()
> a little, but that has probably too big overhead, right? (just
> thinking loud).
> 

Not sure about setting a hard limit - that seems pretty tricky and may
easily backfire. It's already possible to set such memory limit using
e.g. cgroups, or even better use VMs to isolate the instances.

I certainly agree it's annoying that when OOM hits, we end up with no
information about what used the memory. Maybe we could have a threshold
triggering a call to MemoryContextStats? So that we have at least some
memory usage info in the log in case the OOM killer intervenes.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Large files for relations

2023-05-24 Thread Robert Haas

On Wed, May 24, 2023 at 2:18 AM Peter Eisentraut
 wrote:
> > What I'm hearing is that something simple like this might be more 
> > acceptable:
> >
> > * initdb --rel-segsize (cf --wal-segsize), default unchanged
>
> makes sense

+1.

> > * pg_upgrade would convert if source and target don't match
>
> This would be good, but it could also be an optional or later feature.

+1. I think that would be nice to have, but not absolutely required.

IMHO it's best not to overcomplicate these projects. Not everything
needs to be part of the initial commit. If the initial commit happens
2 months from now and then stuff like this gets added over the next 8,
that's strictly better than trying to land the whole patch set next
March.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Atomic ops for unlogged LSN

2023-05-24 Thread Robert Haas

On Tue, May 23, 2023 at 6:26 PM Michael Paquier  wrote:
> Spinlocks provide a full memory barrier, which may not the case with
> add_u64() or read_u64().  Could memory ordering be a problem in these
> code paths?

It could be a huge problem if what you say were true, but unless I'm
missing something, the comments in atomics.h say that it isn't. The
comments for the 64-bit functions say to look at the 32-bit functions,
and there it says:

/*
 * pg_atomic_add_fetch_u32 - atomically add to variable
 *
 * Returns the value of ptr after the arithmetic operation.
 *
 * Full barrier semantics.
 */

Which is probably a good thing, because I'm not sure what good it
would be to have an operation that both reads and modifies an atomic
variable but has no barrier semantics.

That's not to say that I entirely understand the point of this patch.
It seems like a generally reasonable thing to do AFAICT but somehow I
wonder if there's more to the story here, because it doesn't feel like
it would be easy to measure the benefit of this patch in isolation.
That's not a reason to reject it, though, just something that makes me
curious.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Docs: Encourage strong server verification with SCRAM

2023-05-24 Thread Daniel Gustafsson

> On 23 May 2023, at 23:02, Stephen Frost  wrote:
> * Jacob Champion (jchamp...@timescale.com) wrote:

>> - low iteration counts accepted by the client make it easier than it
>> probably should be for a MITM to brute-force passwords (note that
>> PG16's scram_iterations GUC, being server-side, does not mitigate
>> this)
> 
> This would be good to improve on.

The mechanics of this are quite straighforward, the problem IMHO lies in how to
inform and educate users what a reasonable iteration count is, not to mention
what an iteration count is in the first place.

> Perhaps more succinctly- maybe we should be making adjustments to the
> current language instead of just adding a new paragraph.

+1

--
Daniel Gustafsson

Re: pgsql: TAP test for logical decoding on standby

2023-05-24 Thread Drouvot, Bertrand


Hi,

On 5/23/23 5:15 PM, Robert Haas wrote:

On Sat, Apr 8, 2023 at 5:26 AM Andres Freund  wrote:

TAP test for logical decoding on standby


Small nitpicks:

1. The test names generated by check_slots_conflicting_status() start
with a capital letter, while most other test names start with a
lower-case letter.



Yeah, not sure that would deserve a fix for its own but if we address 2.
then let's do 1. too.


2. The function is called 7 times, 6 with a true argument and 1 with a
false argument, but the test name only depends on whether the argument
is true or false, so we get the same test name 6 times. Maybe there's
not a reasonable way to do better, I'm not sure, but it's not ideal.



I agree that's not ideal (but one could still figure out which one is
failing if any by looking at the perl script).

If we want to "improve" this, what about passing a second argument that
would provide more context in the test name?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Wrong results due to missing quals

2023-05-24 Thread Richard Guo

Testing with SQLancer reports a wrong results issue on master and I
reduced it to the repro query below.

create table t (a int, b int);

explain (costs off)
select * from t t1 left join
(t t2 left join t t3 full join t t4 on false on false)
left join t t5 on t2.a = t5.a
on t2.b = 1;
QUERY PLAN
--
 Nested Loop Left Join
   ->  Seq Scan on t t1
   ->  Materialize
 ->  Nested Loop Left Join
   ->  Nested Loop Left Join
 Join Filter: false
 ->  Seq Scan on t t2
   Filter: (b = 1)
 ->  Result
   One-Time Filter: false
   ->  Materialize
 ->  Seq Scan on t t5
(12 rows)

So the qual 't2.a = t5.a' is missing.

I looked into it and found that both clones of this joinqual are
rejected by clause_is_computable_at, because their required_relids do
not include the outer join of t2/(t3/t4), and meanwhile include nullable
rels of this outer join.

I think the root cause is that, as Tom pointed out in [1], we're not
maintaining required_relids very accurately.  In b9c755a2, we make
clause_is_computable_at test required_relids for clone clauses.  I think
this is how this issue sneaks in.

To fix it, it seems to me that the ideal way would be to always compute
accurate required_relids.  But I'm not sure how difficult it is.

[1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us

Thanks
Richard

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Daniel Gustafsson

> On 24 May 2023, at 11:52, Michael Paquier  wrote:
> 
> On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote:
>> 1.0.2 is also an LTS version available commercially for premium support
>> customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
>> slated for release next week.  This raises the likelyhood of Postgres
>> installations using 1.0.2 in production still, and for some time to come.
> 
> Good point.  Indeed, that makes it pretty clear that not dropping
> 1.0.2 would be the best option for the time being, so 0001 would be
> enough.

I think thats the right move re 1.0.2 support.  1.0.2 is also the version in
RHEL7 which is in ELS until 2026.

When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6
using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to drop
1.0.1 support during v17.  I haven't studied the patch yet but I'll have a look
at it.

> I am wondering if we should worry about having a buildfarm member that
> could test these binaries, though, in case they have compatibility
> issues..  But it would be harder to debug without the code at hand, as
> well.

It would be nice it the OpenSSL project could grant us an LTS license for a
buildfarm animal to ensure compatibility but I have no idea how realistic that
is (or how much the LTS version of 1.0.2 has diverged from the last available
public 1.0.2 version).

--
Daniel Gustafsson

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote:
> 1.0.2 is also an LTS version available commercially for premium support
> customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
> slated for release next week.  This raises the likelyhood of Postgres
> installations using 1.0.2 in production still, and for some time to come.

Good point.  Indeed, that makes it pretty clear that not dropping
1.0.2 would be the best option for the time being, so 0001 would be
enough.

I am wondering if we should worry about having a buildfarm member that
could test these binaries, though, in case they have compatibility
issues..  But it would be harder to debug without the code at hand, as
well.
--
Michael


signature.asc
Description: PGP signature

Re: RFI: Extending the TOAST Pointer

2023-05-24 Thread Matthias van de Meent

On Tue, 23 May 2023 at 18:34, Robert Haas  wrote:
>
> On Thu, May 18, 2023 at 8:06 AM Matthias van de Meent
>  wrote:
> > This enum still has many options to go before it exceeds the maximum
> > of the uint8 va_tag field. Therefore, I don't think we have no disk
> > representations left, nor do I think we'll need to add another option
> > to the ToastCompressionId enum.
> > As an example, we can add another VARTAG option for dictionary-enabled
> > external toast; like what the pluggable toast patch worked on. I think
> > we can salvage some ideas from that patch, even if the main idea got
> > stuck.
>
> Adding another VARTAG option is somewhat different from adding another
> ToastCompressionId. I think that right now we have embedded in various
> places the idea that VARTAG_EXTERNAL is the only thing that shows up
> on disk, and we'd need to track down all such places and adjust them
> if we add other VARTAG types in the future. Depending on how it is to
> be used, adding a new ToastCompressionId might be less work. However,
> I don't think we can use the possibility of adding a new VARTAG value
> as a reason why it's OK to use up the last possible ToastCompressionId
> value for something non-extensible.

I think you might not have picked up on what I was arguing for, but I
agree with what you just said.

My comment on not needing to invent a new ToastCompressionId was on
the topic of adding capabilities^ to toast pointers that do things
differently than the current TOAST and need more info than just sizes,
2x 32-bit ID and a compression algorithm.

^ capabilities such as compression dictionaries (which would need to
store a dictionary ID in the pointer), TOAST IDs that are larger than
32 bits, and other such advances.

Kind regards,

Matthias van de Meent
Neon, Inc.

Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?

2023-05-24 Thread Daniel Gustafsson

> On 24 May 2023, at 10:22, Michael Paquier  wrote:

> The removal of 1.0.2 would be nice, but the tweaks
> needed for LibreSSL make it less appealing.

1.0.2 is also an LTS version available commercially for premium support
customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh
slated for release next week.  This raises the likelyhood of Postgres
installations using 1.0.2 in production still, and for some time to come.

--
Daniel Gustafsson

Re: RFI: Extending the TOAST Pointer

2023-05-24 Thread Nikita Malakhov

Hi!

I've made a WIP patch that uses 64-bit TOAST value ID instead of 32-bit,
and sent it as a part of discussion, but there was no feedback on such a
solution. There was a link to that discussion at the top of this thread.

Also, I have to note that, based on our work on Pluggable TOAST - extending
TOAST pointer with additional structures would require review of the logical
replication engine, currently it is not suitable for any custom TOAST
pointers.
Currently we have no final solution for problems with logical replication
for
custom TOAST pointers.

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/

Re: memory leak in trigger handling (since PG12)

On 5/23/23 23:39, Tom Lane wrote:
> Tomas Vondra  writes:
>> The really hard thing was determining what causes the memory leak - the
>> simple instrumentation doesn't help with that at all. It tells you there
>> might be a leak, but you don't know where did the allocations came from.
> 
>> What I ended up doing is a simple gdb script that sets breakpoints on
>> all palloc/pfree variants, and prints info (including the backtrace) for
>> each call on ExecutorState. And then a script that aggregate those to
>> identify which backtraces allocated most chunks that were not freed.
> 
> FWIW, I've had some success localizing palloc memory leaks with valgrind's
> leak detection mode.  The trick is to ask it for a report before the
> context gets destroyed.  Beats writing your own infrastructure ...
> 

I haven't tried valgrind, so can't compare.

Would it be possible to filter which memory contexts to track? Say we
know the leak is in ExecutorState, so we don't need to track allocations
in other contexts. That was a huge speedup for me, maybe it'd help
valgrind too.

Also, how did you ask for the report before context gets destroyed?

>> Would require testing with more data, though. I doubt we'd find much
>> with our regression tests, which have tiny data sets.
> 
> Yeah, it's not clear whether we could make the still-hypothetical check
> sensitive enough to find leaks using small test cases without getting an
> unworkable number of false positives.  Still, might be worth trying.

I'm not against experimenting with that. Were you thinking about
something that'd be cheap enough to just be enabled always/everywhere,
or something we'd enable during testing?

This reminded me a strangeloop talk [1] [2] about the Scalene memory
profiler from UMass. That's for Python, but they did some smart tricks
to reduce the cost of profiling - maybe we could do something similar,
possibly by extending the memory contexts a bit.

[1] https://youtu.be/vVUnCXKuNOg?t=1405
[2] https://youtu.be/vVUnCXKuNOg?t=1706

> It might be an acceptable tradeoff to have stricter rules for what can
> be allocated in ExecutorState in order to make this sort of problem
> more easily detectable.
> 

Would these rules be just customary, or defined/enforced in the code
somehow? I can't quite imagine how would that work, TBH.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?