Re: unnecessary #include "pg_getopt.h"?
On Tue, May 23, 2023 at 06:37:59PM -0700, Andres Freund wrote: > This feels more like a matter of taste to me than anything. Yup, it is. > At least some of > the files touched in the patch use optarg, opterr etc. - which are declared in > pg_getopt.h. Making it reasonable to directly include pg_getopt.h. getopt_long.h is included in 21 places of src/bin/, with all of them touching optarg, while only four of them include pg_getopt.h. So removing them as suggested makes sense. I agree with the tasting matter as well, still there is also a consistency matter. -- Michael signature.asc Description: PGP signature
Re: ALTER TABLE SET ACCESS METHOD on partitioned tables
On Mon, Apr 24, 2023 at 07:18:54PM -0500, Justin Pryzby wrote: > On Thu, Mar 30, 2023 at 12:07:58AM -0500, Justin Pryzby wrote: >> Actually .. I think it'd be a mistake if the relam needed to be >> interpretted differently for partitioned tables than other relkinds. >> >> I updated the patch to allow intermediate partitioned tables to inherit >> relam from their parent. > > Michael ? Sorry for dropping the subject for so long. I have spent some time looking at the patch. Here are a few comments. pg_class.h includes the following: /* * Relation kinds that support tablespaces: All relation kinds with storage * support tablespaces, except that we don't support moving sequences around * into different tablespaces. Partitioned tables and indexes don't have * physical storage, but they have a tablespace settings so that their * children can inherit it. */ #define RELKIND_HAS_TABLESPACE(relkind) \ ((RELKIND_HAS_STORAGE(relkind) || RELKIND_HAS_PARTITIONS(relkind)) \ && (relkind) != RELKIND_SEQUENCE) [...] /* * Relation kinds with a table access method (rd_tableam). Although sequences * use the heap table AM, they are enough of a special case in most uses that * they are not included here. */ #define RELKIND_HAS_TABLE_AM(relkind) \ ((relkind) == RELKIND_RELATION || \ (relkind) == RELKIND_TOASTVALUE || \ (relkind) == RELKIND_MATVIEW) It would look much more consistent with the tablespace case if we included partitioned tables in this case, but this refers to rd_tableam for the relcache which we surely don't want to fill for partitioned tables. I guess that at minimum a comment is in order? RELKIND_HAS_TABLE_AM() is much more spread than RELKIND_HAS_TABLESPACE(). * No need to add an explicit dependency for the toast table, as the * main table depends on it. */ - if (RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) + if ((RELKIND_HAS_TABLE_AM(relkind) && relkind != RELKIND_TOASTVALUE) || + relkind == RELKIND_PARTITIONED_TABLE) The comment at the top of this code block needs an update. if (stmt->accessMethod != NULL) + accessMethodId = get_table_am_oid(stmt->accessMethod, false); else if (stmt->partbound && + (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE)) { + /* +* For partitions, if no access method is specified, default to the AM +* of the parent table. +*/ + Assert(list_length(inheritOids) == 1); + accessMethodId = get_rel_relam(linitial_oid(inheritOids)); + if (!OidIsValid(accessMethodId)) + accessMethodId = get_table_am_oid(default_table_access_method, false); } + else if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) + accessMethodId = get_table_am_oid(default_table_access_method, false); This structure seems a bit weird. Could it be cleaner to group the second and third blocks together? I would imagine: if (accessMethod != NULL) { //Extract the AM defined in the statement } else { //This is a relkind that can use a default table AM. if (RELKIND_HAS_TABLE_AM(relkind) || relkind == RELKIND_PARTITIONED_TABLE) { if (stmt->partbound) { //This is a partition, so look at what its partitioned //table holds. } else { //No partition, grab the default. } } } + /* +* Only do this for partitioned tables, for which this is just a +* catalog change. Tables with storage are handled by Phase 3. +*/ + if (rel->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + ATExecSetAccessMethodNoStorage(rel, tab->newAccessMethod); Okay, there is a parallel with tablespaces in this logic. Speaking of which, ATExecSetAccessMethodNoStorage() does a catalog update even if ALTER TABLE is defined to use the same table AM as what is currently set. There is no need to update the relation's pg_class entry in this case. Be careful that InvokeObjectPostAlterHook() still needs to be checked in this case. Perhaps some tests should be added in test_oat_hooks.sql? It would be tempted to add a new SQL file for that. + else if (relation->rd_rel->relkind == RELKIND_PARTITIONED_TABLE) + { + /* Do nothing: it's a catalog settings for partitions to inherit */ + } Actually, we could add an assertion telling that rd_rel->relam will always be valid. - if (RELKIND_HAS_TABLE_AM(tbinfo->relkind)) + if (RELKIND_HAS_TABLE_AM(tbinfo->relkind) || + tbinfo->relkind == RELKIND_PARTITIONED_TABLE) tableam = tbinfo->amname; I have spent some time pondering on this particular change, concluding that it should be OK. It passes my tests, as well. +-- partition hierarchies +-- upon ALTER, new children will inherit the new am, whereas the existing +-- children will remain untouched CREATE TABLE
Re: Wrong results due to missing quals
On Thu, May 25, 2023 at 5:28 AM Tom Lane wrote: > I tried this and it seems to work all right: it fixes the example > you showed while not causing any new failures. (Doesn't address > the broken join-removal logic you showed in the other thread, > though.) > > While at it, I also changed make_restrictinfo to treat has_clone > and is_clone as first-class citizens, to fix the dubious coding in > equivclass.c that I mentioned at [1]. The "incompatible_relids" idea is a stroke of genius. I reviewed the patch and did not find any problem. So big +1 to the patch. Thanks Richard
Re: PG 16 draft release notes ready
On Thu, 25 May 2023 at 05:45, Bruce Momjian wrote: > > On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote: > > > * Reduce palloc() memory overhead for all memory allocations down to 8 > > > bytes on all platforms. (Andres Freund, David Rowley) > > > > > > This allows more efficient use of memory and is especially useful in > > > queries which perform operations (such as sorting or hashing) that > > > require more than work_mem. > > > > Well, this would go in the source code section, but it seems too > > internal and global to mention. > > What was the previous memory allocation overhead? On 64-bit builds, it was 16 bytes for AllocSet contexts, 24 bytes for generation contexts and 16 bytes for slab contexts. David
Re: PostgreSQL 16 Beta 1 release announcement draft
Hi, On 2023-05-24 23:30:58 -0400, Jonathan S. Katz wrote: > > Ah, OK, that's why I didn't grok it. I read through the first message > > in[1] and definitely agree it should be in the announcement. How about: > > > > "PostgreSQL 16 also shows up to a 300% improvement when concurrently > > loading data with `COPY`" > > I currently have it as the below in the release announcement. If it you send > any suggested updates, I can try to put them in before release: > > PostgreSQL 16 can also improve the performance of concurrent bulk loading of > data using [`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to > a 300%. It also speeds up concurrent loading when not using COPY, just to a lesser degree. But I can't come up with a concise phrasing for that right now... Greetings, Andres Freund
Re: PostgreSQL 16 Beta 1 release announcement draft
On 5/24/23 11:30 PM, Jonathan S. Katz wrote: On 5/24/23 9:20 PM, Jonathan S. Katz wrote: I currently have it as the below in the release announcement. If it you send any suggested updates, I can try to put them in before release: PostgreSQL 16 can also improve the performance of concurrent bulk loading of data using [`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%. (without the "a 300%" typo). Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: PostgreSQL 16 Beta 1 release announcement draft
On 5/24/23 9:20 PM, Jonathan S. Katz wrote: On 5/24/23 8:04 PM, Andres Freund wrote: Hi, On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote: On 5/24/23 5:28 PM, Andres Freund wrote: I think the relation extension improvements ought to be mentioned here as well? Up to 3x faster concurrent data load with COPY seems practically relevant. I missed that -- not sure I'm finding it in the release notes with a quick grep -- which commit/thread is this? It was split over quite a few commits, the one improving COPY most significantly is commit 00d1e02be24987180115e371abaeb84738257ae2 Author: Andres Freund Date: 2023-04-06 16:35:21 -0700 hio: Use ExtendBufferedRelBy() to extend tables more efficiently Relevant thread: https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de It's in the release notes as: Allow more efficient addition of heap and index pages (Andres Freund) Ah, OK, that's why I didn't grok it. I read through the first message in[1] and definitely agree it should be in the announcement. How about: "PostgreSQL 16 also shows up to a 300% improvement when concurrently loading data with `COPY`" I currently have it as the below in the release announcement. If it you send any suggested updates, I can try to put them in before release: PostgreSQL 16 can also improve the performance of concurrent bulk loading of data using [`COPY`](https://www.postgresql.org/docs/16/sql-copy.html) up to a 300%. Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Proposal: Removing 32 bit support starting from PG17++
On Thu, May 25, 2023 at 12:34 PM Tom Lane wrote: > Dunno about antique MIPS. I think there's still some interest in > not-antique 32-bit MIPS; I have some current-production routers > with such CPUs. (Sadly, they don't have enough storage to do > anything useful with, or I'd think about repurposing one for > buildfarm.) FWIW "development of the MIPS architecture has ceased"[1]. (Clearly there are living ISAs that continue either its spirit or its ... instructions, but they aren't calling themselves MIPS.) [1] https://en.wikipedia.org/wiki/MIPS_architecture
Re: PG 16 draft release notes ready
On Thu, May 25, 2023 at 08:31:29AM +0700, John Naylor wrote: > > On Wed, May 24, 2023 at 8:58 PM Bruce Momjian wrote: > > > > Okay, items split into sections and several merged. I left the > > CPU-specific parts in Source Code, and moved the rest into a merged item > > in General Performance, but moved the JSON item to Data Types. > > It looks like it got moved to Functions actually? > > > > The last one refers to new internal functions, so it could stay in source > code. > > > (Either way, we don't want to imply that arrays of SQL types are > accelerated > > > this way, it's so far only for internal arrays.) > > > > Good point. I called them "C arrays" but it it into the General > > Performance item. > > Looks good to me, although... > > > Allow xid/subxid searches and ASCII string detection to use vector > > operations > (Nathan Bossart) > > Nathan wrote the former, I did the latter. > > Thanks for working on this! Ugh, I have to remember to merge authors when I merge items --- fixed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: PG 16 draft release notes ready
On Wed, May 24, 2023 at 8:58 PM Bruce Momjian wrote: > > Okay, items split into sections and several merged. I left the > CPU-specific parts in Source Code, and moved the rest into a merged item > in General Performance, but moved the JSON item to Data Types. It looks like it got moved to Functions actually? > > The last one refers to new internal functions, so it could stay in source code. > > (Either way, we don't want to imply that arrays of SQL types are accelerated > > this way, it's so far only for internal arrays.) > > Good point. I called them "C arrays" but it it into the General > Performance item. Looks good to me, although... > Allow xid/subxid searches and ASCII string detection to use vector operations (Nathan Bossart) Nathan wrote the former, I did the latter. Thanks for working on this! -- John Naylor EDB: http://www.enterprisedb.com
Re: Proposal: Removing 32 bit support starting from PG17++
Hi, On 2023-05-24 20:34:38 -0400, Tom Lane wrote: > Thomas Munro writes: > > On Thu, May 25, 2023 at 11:51 AM Tom Lane wrote: > >> You'll no doubt be glad to hear that I'll be retiring chickadee > >> in the very near future. > > > . o O { I guess chickadee might have been OK anyway, along with e.g. > > antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that > > any collector is likely to have still running, because they only had > > one CPU (unlike their Vogon-spaceship-sized siblings). As long as > > they had 64 bit load/store instructions, those couldn't be 'divided' > > by an interrupt, so scheduler switches shouldn't be able to tear them, > > AFAIK? } > > PA-RISC can probably do tear-free 8-byte reads, but Andres also > wanted to raise the bar enough to include 32-bit atomic instructions, > which PA-RISC hasn't got; the one such instruction it has is > limited enough that you can't do much beyond building a spinlock. Yes, I looked at some point, and it didn't seem viable to do more. > I think there's still some interest in not-antique 32-bit MIPS; I have some > current-production routers with such CPUs. (Sadly, they don't have enough > storage to do anything useful with, or I'd think about repurposing one for > buildfarm.) After spending a bunch more time staring at various reference manuals, it looks to me like 32bit MIPS supports 64 bit atomics these days, via LLWP / SCWP. https://s3-eu-west-1.amazonaws.com/downloads-mips/documents/MD00086-2B-MIPS32BIS-AFP-6.06.pdf documents them as having been added to MIPS32 Release 6, from 2014. Greetings, Andres Freund
Re: PostgreSQL 16 Beta 1 release announcement draft
On 5/24/23 8:04 PM, Andres Freund wrote: Hi, On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote: On 5/24/23 5:28 PM, Andres Freund wrote: I think the relation extension improvements ought to be mentioned here as well? Up to 3x faster concurrent data load with COPY seems practically relevant. I missed that -- not sure I'm finding it in the release notes with a quick grep -- which commit/thread is this? It was split over quite a few commits, the one improving COPY most significantly is commit 00d1e02be24987180115e371abaeb84738257ae2 Author: Andres Freund Date: 2023-04-06 16:35:21 -0700 hio: Use ExtendBufferedRelBy() to extend tables more efficiently Relevant thread: https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de It's in the release notes as: Allow more efficient addition of heap and index pages (Andres Freund) Ah, OK, that's why I didn't grok it. I read through the first message in[1] and definitely agree it should be in the announcement. How about: "PostgreSQL 16 also shows up to a 300% improvement when concurrently loading data with `COPY`" Thanks, Jonathan [1] https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de OpenPGP_signature Description: OpenPGP digital signature
Re: Proposal: Removing 32 bit support starting from PG17++
Andres Freund writes: > On 2023-05-24 19:51:22 -0400, Tom Lane wrote: >> So dropping PA-RISC altogether should probably happen for v17, maybe even >> v16. > Definitely for 17 - not sure if we have much to gain by doing it in 16. I'm just thinking that we'll have no way to test it. I wouldn't advocate such a change in released branches; but I think it'd be within policy still for v16, and that would give us one less year of claimed support for an arch we can't test. regards, tom lane
Re: Proposal: Removing 32 bit support starting from PG17++
Hi, On 2023-05-24 19:51:22 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2023-05-24 17:44:36 -0400, Tom Lane wrote: > >> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit > >> platforms? I'd be on board with this if so, but it sounds a bit > >> optimistic. > > > ... > > > So it looks like the only certain problem is PA-RISC - which I personally > > wouldn't include in "relevant" :), with some evaluation needed for 32bit > > mips > > and old arms. > > You'll no doubt be glad to hear that I'll be retiring chickadee > in the very near future. Heh, I have to admit, I am. > So dropping PA-RISC altogether should probably happen for v17, maybe even > v16. Definitely for 17 - not sure if we have much to gain by doing it in 16. The likelihood that somebody will find a PA-RISC specific bug, after we dropped support for 15, is pretty low, I think. > Seems like we should poke into ARM more closely, though. Looks like the necessary support was added in armv6k and armv7a: https://developer.arm.com/documentation/dui0489/i/arm-and-thumb-instructions/strex ARM STREXB, STREXD, and STREXH are available in ARMv6K and above. All these 32-bit Thumb instructions are available in ARMv6T2 and above, except that STREXD is not available in the ARMv7-M architecture. ARMv7-M is for microcontrollers without an MMU, so it's not going to be used for postgres. The arm buildfarm machines (using the architecture names from there) we have are: armv6l: chipmunk ARMv7: mereswine, gull, grison arm64: dikkop, sifaka, indri turako says it's armv7l, but it seems actually to target aarch64. At least for type of machine available on the buildfarm, it looks like we actually should be good. They all appear to already be supporting 64bit atomics, without needing further compiler flags. Greetings, Andres Freund
Re: Proposal: Removing 32 bit support starting from PG17++
Thomas Munro writes: > On Thu, May 25, 2023 at 11:51 AM Tom Lane wrote: >> You'll no doubt be glad to hear that I'll be retiring chickadee >> in the very near future. > . o O { I guess chickadee might have been OK anyway, along with e.g. > antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that > any collector is likely to have still running, because they only had > one CPU (unlike their Vogon-spaceship-sized siblings). As long as > they had 64 bit load/store instructions, those couldn't be 'divided' > by an interrupt, so scheduler switches shouldn't be able to tear them, > AFAIK? } PA-RISC can probably do tear-free 8-byte reads, but Andres also wanted to raise the bar enough to include 32-bit atomic instructions, which PA-RISC hasn't got; the one such instruction it has is limited enough that you can't do much beyond building a spinlock. Dunno about antique MIPS. I think there's still some interest in not-antique 32-bit MIPS; I have some current-production routers with such CPUs. (Sadly, they don't have enough storage to do anything useful with, or I'd think about repurposing one for buildfarm.) regards, tom lane
Re: Question about error message in auth.c
On Wed, May 24, 2023 at 02:12:23PM -0400, Dave Cramer wrote: > The last piece of information is the encryption state. However when an SSL > connection fails to authenticate the state is not encrypted. > > When would it ever be encrypted if authentication fails ? I am not sure to follow. Under a SSL connection things should be encrypted. Or perhaps that's something related to hostssl and/or hostnossl? Back to the point, the SSL handshake happens before any authentication attempt and any HBA resolution, so a connection could be encrypted before authentication gets rejected. The error path you are pointing at would happen after the SSL handshake is done. For instance, take an HBA entry like that: # TYPE DATABASEUSER ADDRESS METHOD hostall all127.0.0.1/32reject Then, attempting a connection with sslmode=prefer, one can see the difference: $ psql -h 127.0.0.1 -d postgres -U postgres psql: error: connection to server at "127.0.0.1", port 5432 failed: FATAL: pg_hba.conf rejects connection for host "127.0.0.1", user "postgres", database "postgres", SSL encryption connection to server at "127.0.0.1", port 5432 failed: FATAL: pg_hba.conf rejects connection for host "127.0.0.1", user "postgres", database "postgres", no encryption -- Michael signature.asc Description: PGP signature
Re: Proposal: Removing 32 bit support starting from PG17++
On Thu, May 25, 2023 at 11:51 AM Tom Lane wrote: > Andres Freund writes: > > On 2023-05-24 17:44:36 -0400, Tom Lane wrote: > > So it looks like the only certain problem is PA-RISC - which I personally > > wouldn't include in "relevant" :), with some evaluation needed for 32bit > > mips > > and old arms. > > You'll no doubt be glad to hear that I'll be retiring chickadee > in the very near future. . o O { I guess chickadee might have been OK anyway, along with e.g. antique low-end SGI MIPS gear etc of "workstation"/"desktop" form that any collector is likely to have still running, because they only had one CPU (unlike their Vogon-spaceship-sized siblings). As long as they had 64 bit load/store instructions, those couldn't be 'divided' by an interrupt, so scheduler switches shouldn't be able to tear them, AFAIK? }
Re: PostgreSQL 16 Beta 1 release announcement draft
Hi, On 2023-05-24 19:57:39 -0400, Jonathan S. Katz wrote: > On 5/24/23 5:28 PM, Andres Freund wrote: > > > > I think the relation extension improvements ought to be mentioned here as > > well? Up to 3x faster concurrent data load with COPY seems practically > > relevant. > > I missed that -- not sure I'm finding it in the release notes with a quick > grep -- which commit/thread is this? It was split over quite a few commits, the one improving COPY most significantly is commit 00d1e02be24987180115e371abaeb84738257ae2 Author: Andres Freund Date: 2023-04-06 16:35:21 -0700 hio: Use ExtendBufferedRelBy() to extend tables more efficiently Relevant thread: https://postgr.es/m/20221029025420.eplyow6k7tgu6...@awork3.anarazel.de It's in the release notes as: Allow more efficient addition of heap and index pages (Andres Freund) Greetings, Andres Freund
Re: PostgreSQL 16 Beta 1 release announcement draft
On 5/24/23 5:28 PM, Andres Freund wrote: I think the relation extension improvements ought to be mentioned here as well? Up to 3x faster concurrent data load with COPY seems practically relevant. I missed that -- not sure I'm finding it in the release notes with a quick grep -- which commit/thread is this? But yes this does sound like something that should be included, I just want to read upon it. Thanks, Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Proposal: Removing 32 bit support starting from PG17++
Andres Freund writes: > On 2023-05-24 17:44:36 -0400, Tom Lane wrote: >> Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit >> platforms? I'd be on board with this if so, but it sounds a bit >> optimistic. > ... > So it looks like the only certain problem is PA-RISC - which I personally > wouldn't include in "relevant" :), with some evaluation needed for 32bit mips > and old arms. You'll no doubt be glad to hear that I'll be retiring chickadee in the very near future. (I'm moving/downsizing, and that machine isn't making the cut.) So dropping PA-RISC altogether should probably happen for v17, maybe even v16. Seems like we should poke into ARM more closely, though. regards, tom lane
Re: Proposal: Removing 32 bit support starting from PG17++
Hi, On 2023-05-24 17:44:36 -0400, Tom Lane wrote: > Andres Freund writes: > > Dropping CPUs without native atomic operations / without a way to do > > tear-free > > 8 byte reads would make several substantial performance improvements easier, > > while not really dropping any relevant platform. > > Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit > platforms? I'd be on board with this if so, but it sounds a bit > optimistic. Combining https://wiki.postgresql.org/wiki/Atomics with our docs: > In general, PostgreSQL can be expected to work on these CPU architectures: > x86, PowerPC, S/390, SPARC, ARM, MIPS, RISC-V, and PA-RISC, including > big-endian, little-endian, 32-bit, and 64-bit variants where applicable. It > is often possible to build on an unsupported CPU type by configuring with > --disable-spinlocks, but performance will be poor. On x86 8 byte atomics are supported since the 586 - released in 1993. I don't think we need to care about i386 / i486? PPC always had it from what I can tell (the docs don't mention a minimum version). Sparc has supported single copy atomicity for 8 byte values since sparc v9, so ~1995 (according to wikipedia there was one V8 chip released in 1996, there's also "LEON", a bit later, but that's "V8E", which includes the atomicity guarantees from v9). On s390 sufficient instructions to make 64bit atomics work natively are supported (just using cmpxchg). On older arm it's supported via kernel emulation - which afaict is better than falling back to a semaphore, our current fallback. I don't currently have access to armv7*, so I can't run a benchmark. So the only problematic platforms are 32 bit MIPS and PA-RISC. I'm not entirely sure whether my determination about 32 bit MIPS from back then is actually true - I might have read the docs too narrowly back then or they were updated since. In a newer version of the manual I see: > The paired instructions, Load Linked and Store Conditional, can be used to > perform an atomic read-modify-write of word or doubleword cached memory > locations. The word width is referenced to be 32bit earlier on the same page. And it's documented to be true for mips32, not just mips64. With that one can implement a 64bit cmpxchg, and with 64bit cmpxchg one can implement a, not too efficient, tear-free read. What gave me pause for a moment is that both clang and gcc generate calls to external functions on 32 bit mips - but they do provide libatomic and looking at the disassembly, there does appear to be a some non-emulated execution. But tbh, there are so many ABIs and options that I am not sure what is what. Reading the https://en.wikipedia.org/wiki/MIPS_architecture history part gives me a headache: "During the mid-1990s, many new 32-bit MIPS processors for embedded systems were MIPS II implementations because the introduction of the 64-bit MIPS III architecture in 1991 left MIPS II as the newest 32-bit MIPS architecture until MIPS32 was introduced in 1999.[3]: 19 " My rough understanding is that it's always doable in 64 mode, and has been available in 32bit mode for a long time, but that it depends on the the ABI used. So it looks like the only certain problem is PA-RISC - which I personally wouldn't include in "relevant" :), with some evaluation needed for 32bit mips and old arms. Greetings, Andres Freund
Re: Atomic ops for unlogged LSN
On Wed, May 24, 2023 at 02:49:58PM -0700, Andres Freund wrote: > I was a bit confused by Michael's comment as well, due to the section of code > quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have > barrier semantics (it'd be way more expensive), and the patch does contain > this hunk: Thanks for the correction. The part about _add was incorrect. > So we indeed loose some "barrier strength" - but I think that's fine. For one, > it's a debugging-only value. But more importantly, I don't see what reordering > the barrier could prevent - a barrier is useful for things like sequencing two > memory accesses to happen in the intended order - but unloggedLSN doesn't > interact with another variable that's accessed within the ControlFileLock > afaict. This stuff is usually tricky enough that I am never completely sure whether it is fine without reading again README.barrier, which is where unloggedLSN is saved in the control file under its LWLock. Something that I find confusing in the patch is that it does not document the reason why this is OK. -- Michael signature.asc Description: PGP signature
Re: SyncRepWaitForLSN waits for XLogFlush?
Hi Andres, Thanks for your reply. If I understand you correctly, you're saying that the walsender *waits* for xlogflush, but does not *cause* it. This means that for a sync_commit=off transaction, the xlog records won't get shipped out to standbys until the walwriter flushes in-memory xlog contents to disk. And furthermore, am I correct in assuming that this behaviour is different from the buffer manager and the slru which both *cause* xlog flush up to a certain lsn before they proceed with flushing a page to disk? The reason I'm asking this is that I'm working on modifying the transaction manager for my thesis project, and the pg_walinspect test is failing when I run make check-world. So, I'm just trying to understand things to identify the cause of this issue. Regards, Tej On Wed, 24 May 2023 at 17:33, Andres Freund wrote: > Hi, > > On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote: > > I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that > the > > XLOG has been flushed locally up to that location before the record is > > shipped off to standbys? > > No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to > (and > do) call XLogFlush() separately. See e.g. the XLogFlush() call in > RecordTransactionCommit(). > > Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed > would > not lead for data to be prematurely sent out - walsender won't send data > that > hasn't yet been flushed. So a backend with such a spurious > SyncRepWaitForLSN() > would just wait until the LSN is actually flushed and *then* replicated. > > Any specific reason for that question? > > Greetings, > > Andres Freund >
Re: Proposal: Removing 32 bit support starting from PG17++
On Wed, May 24, 2023 at 10:44:11AM -0400, Tom Lane wrote: > Hans Buschmann writes: > > This inspired me to propose dropping 32bit support for PostgreSQL starting > > with PG17. > > I don't think this is a great idea. Even if Intel isn't interested, > there'll be plenty of 32-bit left in the lower end of the market > (think ARM, IoT, and so on). A few examples of that are the first models of the Raspberry PIs, which are still produced (until 2026 actually for the first model). These rely on ARMv6 if I recall correctly, which are 32b. -- Michael signature.asc Description: PGP signature
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, May 24, 2023 at 01:03:04PM +0200, Daniel Gustafsson wrote: > When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6 > using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to > drop > 1.0.1 support during v17. I haven't studied the patch yet but I'll have a > look > at it. Great, thanks for the help. -- Michael signature.asc Description: PGP signature
Re: Atomic ops for unlogged LSN
Hi, On 2023-05-24 08:22:08 -0400, Robert Haas wrote: > On Tue, May 23, 2023 at 6:26 PM Michael Paquier wrote: > > Spinlocks provide a full memory barrier, which may not the case with > > add_u64() or read_u64(). Could memory ordering be a problem in these > > code paths? > > It could be a huge problem if what you say were true, but unless I'm > missing something, the comments in atomics.h say that it isn't. The > comments for the 64-bit functions say to look at the 32-bit functions, > and there it says: > > /* > * pg_atomic_add_fetch_u32 - atomically add to variable > * > * Returns the value of ptr after the arithmetic operation. > * > * Full barrier semantics. > */ > > Which is probably a good thing, because I'm not sure what good it > would be to have an operation that both reads and modifies an atomic > variable but has no barrier semantics. I was a bit confused by Michael's comment as well, due to the section of code quoted. But he does have a point: pg_atomic_read_u32() does indeed *not* have barrier semantics (it'd be way more expensive), and the patch does contain this hunk: > @@ -6784,9 +6775,7 @@ CreateCheckPoint(int flags) >* unused on non-shutdown checkpoints, but seems useful to store it > always >* for debugging purposes. >*/ > - SpinLockAcquire(&XLogCtl->ulsn_lck); > - ControlFile->unloggedLSN = XLogCtl->unloggedLSN; > - SpinLockRelease(&XLogCtl->ulsn_lck); > + ControlFile->unloggedLSN = pg_atomic_read_u64(&XLogCtl->unloggedLSN); > > UpdateControlFile(); > LWLockRelease(ControlFileLock); So we indeed loose some "barrier strength" - but I think that's fine. For one, it's a debugging-only value. But more importantly, I don't see what reordering the barrier could prevent - a barrier is useful for things like sequencing two memory accesses to happen in the intended order - but unloggedLSN doesn't interact with another variable that's accessed within the ControlFileLock afaict. > That's not to say that I entirely understand the point of this patch. > It seems like a generally reasonable thing to do AFAICT but somehow I > wonder if there's more to the story here, because it doesn't feel like > it would be easy to measure the benefit of this patch in isolation. > That's not a reason to reject it, though, just something that makes me > curious. I doubt it's a meaningful, if even measurable win. But removing atomic ops and reducing shared memory space isn't a bad thing, even if there's no immediate benefit... Greetings, Andres Freund
Re: Proposal: Removing 32 bit support starting from PG17++
Andres Freund writes: > Dropping CPUs without native atomic operations / without a way to do tear-free > 8 byte reads would make several substantial performance improvements easier, > while not really dropping any relevant platform. Hmm, can we really expect atomic 8-byte reads on "relevant" 32-bit platforms? I'd be on board with this if so, but it sounds a bit optimistic. regards, tom lane
Re: Proposal: Removing 32 bit support starting from PG17++
Hi, On 2023-05-24 14:33:06 +, Hans Buschmann wrote: > I recently stumbled over the following Intel proposal for dropping 32bit > support in x86 processors. [1] It's a proposal for something in the future. Which, even if implemented as is, will affect future hardware, several years down the line. I don't think that's a good reason for removing 32 bit support in postgres. And postgres is used on non-x86 architectures... > This inspired me to propose dropping 32bit support for PostgreSQL starting > with PG17. > ... > Even if I am not a postgres hacker I suppose this could simplify things quite > a lot. There's some simplification, but I don't think it'd be that much. I do think there are code removals and simplifications that would be bigger than dropping 32bit support. Dropping support for effectively-obsolete compilers like sun studio (requires random environment variables to be exported to not run out of memory) and AIX's xlc (requires a lot of extra compiler flags to be passed in for a sane build) would remove a fair bit of code. Dropping CPUs without native atomic operations / without a way to do tear-free 8 byte reads would make several substantial performance improvements easier, while not really dropping any relevant platform. Etc. Greetings, Andres Freund
Re: testing dist tarballs
Andres Freund writes: > First thing I noticed that 'make dist' doesn't work in a vpath, failing in a > somewhat obscure way (likely because in a vpath build the the copy from the > source dir doesn't include GNUMakefile). Do we expect it to work? Don't see how it could possibly be useful in a vpath, because you'd have the real source files and the generated files in different trees. regards, tom lane
Re: SyncRepWaitForLSN waits for XLogFlush?
Hi, On 2023-05-24 10:53:51 -0400, Tejasvi Kashi wrote: > I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the > XLOG has been flushed locally up to that location before the record is > shipped off to standbys? No, SyncRepWaitForLSN() doesn't directly ensure that. The callers have to (and do) call XLogFlush() separately. See e.g. the XLogFlush() call in RecordTransactionCommit(). Note that calling SyncRepWaitForLSN() for an LSN that is not yet flushed would not lead for data to be prematurely sent out - walsender won't send data that hasn't yet been flushed. So a backend with such a spurious SyncRepWaitForLSN() would just wait until the LSN is actually flushed and *then* replicated. Any specific reason for that question? Greetings, Andres Freund
Re: Wrong results due to missing quals
I wrote: > ... Another idea is that maybe we need another > RestrictInfo field that's directly a set of OJ relids that this clause > can't be applied above. That'd reduce clause_is_computable_at to > basically a bms_intersect test which would be nice speed-wise. The > space consumption could be annoying, but I'm thinking that we might > only have to populate the field in clone clauses, which would > alleviate that issue. I tried this and it seems to work all right: it fixes the example you showed while not causing any new failures. (Doesn't address the broken join-removal logic you showed in the other thread, though.) While at it, I also changed make_restrictinfo to treat has_clone and is_clone as first-class citizens, to fix the dubious coding in equivclass.c that I mentioned at [1]. regards, tom lane [1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 428ea3810f..c5cada55fb 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -6521,8 +6521,11 @@ foreign_grouping_ok(PlannerInfo *root, RelOptInfo *grouped_rel, expr, true, false, + false, + false, root->qual_security_level, grouped_rel->relids, + NULL, NULL); if (is_foreign_expr(root, grouped_rel, expr)) fpinfo->remote_conds = lappend(fpinfo->remote_conds, rinfo); diff --git a/src/backend/optimizer/README b/src/backend/optimizer/README index f1a96b8584..2ab4f3dbf3 100644 --- a/src/backend/optimizer/README +++ b/src/backend/optimizer/README @@ -539,14 +539,13 @@ set includes the A/B join relid which is not in the input. However, if we form B/C after A/B, then both forms of the clause are applicable so far as that test can tell. We have to look more closely to notice that the "Pbc" clause form refers to relation B which is no longer -directly accessible. While this check is straightforward, it's not -especially cheap (see clause_is_computable_at()). To avoid doing it -unnecessarily, we mark the variant versions of a redundant clause as -either "has_clone" or "is_clone". When considering a clone clause, -we must check clause_is_computable_at() to disentangle which version -to apply at the current join level. (In debug builds, we also Assert -that non-clone clauses are validly computable at the current level; -but that seems too expensive for production usage.) +directly accessible. While such a check could be performed using the +per-relation RelOptInfo.nulling_relids data, it would be annoyingly +expensive to do over and over as we consider different join paths. +To make this simple and reliable, we compute an "incompatible_relids" +set for each variant version (clone) of a redundant clause. A clone +clause should not be applied if any of the outer-join relids listed in +incompatible_relids has already been computed below the current join. Optimizer Functions diff --git a/src/backend/optimizer/path/equivclass.c b/src/backend/optimizer/path/equivclass.c index 2db1bf6448..7fa502d6e2 100644 --- a/src/backend/optimizer/path/equivclass.c +++ b/src/backend/optimizer/path/equivclass.c @@ -197,9 +197,12 @@ process_equivalence(PlannerInfo *root, make_restrictinfo(root, (Expr *) ntest, restrictinfo->is_pushed_down, + restrictinfo->has_clone, + restrictinfo->is_clone, restrictinfo->pseudoconstant, restrictinfo->security_level, NULL, + restrictinfo->incompatible_relids, restrictinfo->outer_relids); } return false; @@ -1972,7 +1975,8 @@ create_join_clause(PlannerInfo *root, * clause into the regular processing, because otherwise the join will be * seen as a clauseless join and avoided during join order searching. * We handle this by generating a constant-TRUE clause that is marked with - * required_relids that make it a join between the correct relations. + * the same required_relids etc as the removed outer-join clause, thus + * making it a join clause between the correct relations. */ void reconsider_outer_join_clauses(PlannerInfo *root) @@ -2001,10 +2005,13 @@ reconsider_outer_join_clauses(PlannerInfo *root) /* throw back a dummy replacement clause (see notes above) */ rinfo = make_restrictinfo(root, (Expr *) makeBoolConst(true, false), - true, /* is_pushed_down */ + rinfo->is_pushed_down, + rinfo->has_clone, + rinfo->is_clone, false, /* pseudoconstant */ 0, /* security_level */ rinfo->required_relids, + rinfo->incompatible_relids, rinfo->outer_relids); distribute_restrictinfo_to_rels(root, rinfo); } @@ -2026,10 +2033,13 @@ reconsider_outer_join_clauses(PlannerInfo *root) /* throw back a dummy replacement
Re: PostgreSQL 16 Beta 1 release announcement draft
Hi, On 2023-05-24 13:06:30 -0400, Jonathan S. Katz wrote: > PostgreSQL 16 Feature Highlights > > > ### Performance > > PostgreSQL 16 includes performance improvements in query execution. This > release > adds more query parallelism, including allowing `FULL` and `RIGHT` joins to > execute in parallel, and parallel execution of the `string_agg` and > `array_agg` > aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in > `SELECT DISTINCT` queries. There are also several optimizations for > [window > queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS), > improvements in lookups for `RANGE` and `LIST` partitions, and support for > "anti-joins" in `RIGHT` and `OUTER` queries. > > This release also introduces support for CPU acceleration using SIMD for both > x86 and ARM architectures, including optimizations for processing ASCII and > JSON > strings, and array and subtransaction searches. Additionally, PostgreSQL 16 > introduces [load > balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS) > to libpq, the client library for PostgreSQL. I think the relation extension improvements ought to be mentioned here as well? Up to 3x faster concurrent data load with COPY seems practically relevant. Greetings, Andres Freund
testing dist tarballs
Hi, On 2023-05-23 14:51:03 -0400, Tom Lane wrote: > Andres Freund writes: > > I guess I need to go and check how long the "release" tarball generation > > takes... > > It's quick except for the documentation-generating steps. Maybe > we could test that part only once? First thing I noticed that 'make dist' doesn't work in a vpath, failing in a somewhat obscure way (likely because in a vpath build the the copy from the source dir doesn't include GNUMakefile). Do we expect it to work? Besides docs, the slowest part appears to be gzip --best and then bzip2, as those runs serially and takes 11 and 13 seconds respectively here... The first thing I tried was: make -j8 dist GZIP=pigz BZIP2=pbzip2 unfortunately that results in pigz: abort: cannot provide files in GZIP environment variable echo GZIP=pigz >> src/Makefile.custom echo BZIP2=pbzip2 >> src/Makefile.custom reduces that to real1m6.472s user1m28.316s sys 0m5.340s real0m54.811s user1m42.078s sys 0m6.183s still not great... OTOH, we currently already build the docs as part of the CompilerWarnings test. I don't think there's a reason to test that twice? For me make distcheck currently fails: In file included from ../../src/include/postgres.h:46, from hashfn.c:24: ../../src/include/utils/elog.h:79:10: fatal error: utils/errcodes.h: No such file or directory 79 | #include "utils/errcodes.h" | ^~ compilation terminated. make[3]: *** [: hashfn.o] Error 1 at first I thought it was due to my use of -j8 - but it doesn't even work without that. That's due to MAKELEVEL: submake-generated-headers: ifndef NO_GENERATED_HEADERS ifeq ($(MAKELEVEL),0) $(MAKE) -C $(top_builddir)/src/backend generated-headers endif endif So the distcheck target needs to reset MAKELEVEL=0 - unless somebody has a better idea? Separately, it's somewhat confusing that we include errcodes.h etc in src/backend/utils, rather than its final location, in src/include/utils. It works, even without perl, because copying the file doesn't require perl, it's just generating it... Greetings, Andres Freund
Re: [PATCH] psql: \dn+ to show size of each schema (and \dA+ for AMs)
I added documentation for the SQL functions in 001. And updated to say 17 I'm planning to set this patch as ready - it has not changed significantly in 18 months. Not for the first time, I've implemented a workaround at a higher layer. -- Justin >From 917e5e36d14018b6de457ba9eafe8936c0e88c3c Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Tue, 13 Jul 2021 21:25:48 -0500 Subject: [PATCH 1/4] Add pg_am_size(), pg_namespace_size() .. See also: 358a897fa, 528ac10c7 --- doc/src/sgml/func.sgml | 39 ++ src/backend/utils/adt/dbsize.c | 132 src/include/catalog/pg_proc.dat | 19 + 3 files changed, 190 insertions(+) diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 5a47ce43434..6cbf4e9aa56 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -27762,6 +27762,45 @@ postgres=# SELECT '0/0'::pg_lsn + pd.segment_number * ps.setting::int + :offset + + + + pg_namespace_size + +pg_namespace_size ( name ) +bigint + + +pg_namespace_size ( oid ) +bigint + + +Computes the total disk space used by relations in the namespace (schema) +with the specified name or OID. To use this function, you must +have CREATE privilege on the specified namespace +or have privileges of the pg_read_all_stats role, +unless it is the default namespace for the current database. + + + + + + + pg_am_size + +pg_am_size ( name ) +bigint + + +pg_am_size ( oid ) +bigint + + +Computes the total disk space used by relations using the access method +with the specified name or OID. + + + diff --git a/src/backend/utils/adt/dbsize.c b/src/backend/utils/adt/dbsize.c index e5c0f1c45b6..af0955d1790 100644 --- a/src/backend/utils/adt/dbsize.c +++ b/src/backend/utils/adt/dbsize.c @@ -13,19 +13,25 @@ #include +#include "access/genam.h" #include "access/htup_details.h" #include "access/relation.h" +#include "access/table.h" #include "catalog/catalog.h" #include "catalog/namespace.h" #include "catalog/pg_authid.h" #include "catalog/pg_database.h" +#include "catalog/pg_namespace.h" #include "catalog/pg_tablespace.h" #include "commands/dbcommands.h" +#include "commands/defrem.h" #include "commands/tablespace.h" #include "miscadmin.h" #include "storage/fd.h" #include "utils/acl.h" #include "utils/builtins.h" +#include "utils/fmgroids.h" +#include "utils/lsyscache.h" #include "utils/numeric.h" #include "utils/rel.h" #include "utils/relfilenumbermap.h" @@ -858,6 +864,132 @@ pg_size_bytes(PG_FUNCTION_ARGS) PG_RETURN_INT64(result); } +/* + * Return the sum of size of relations for which the given attribute of + * pg_class matches the specified OID value. + */ +static int64 +calculate_size_attvalue(AttrNumber attnum, Oid attval) +{ + int64 totalsize = 0; + ScanKeyData skey; + Relation pg_class; + SysScanDesc scan; + HeapTuple tuple; + + ScanKeyInit(&skey, attnum, +BTEqualStrategyNumber, F_OIDEQ, attval); + + pg_class = table_open(RelationRelationId, AccessShareLock); + scan = systable_beginscan(pg_class, InvalidOid, false, NULL, 1, &skey); + while (HeapTupleIsValid(tuple = systable_getnext(scan))) + { + Form_pg_class classtuple = (Form_pg_class) GETSTRUCT(tuple); + Relation rel; + + rel = try_relation_open(classtuple->oid, AccessShareLock); + if (!rel) + continue; + + for (ForkNumber forkNum = 0; forkNum <= MAX_FORKNUM; forkNum++) + totalsize += calculate_relation_size(&(rel->rd_locator), rel->rd_backend, forkNum); + + relation_close(rel, AccessShareLock); + } + + systable_endscan(scan); + table_close(pg_class, AccessShareLock); + return totalsize; +} + +/* Compute the size of relations in a schema (namespace) */ +static int64 +calculate_namespace_size(Oid nspOid) +{ + /* + * User must be a member of pg_read_all_stats or have CREATE privilege for + * target namespace. + */ + if (!is_member_of_role(GetUserId(), ROLE_PG_READ_ALL_STATS)) + { + AclResult aclresult; + + aclresult = object_aclcheck(NamespaceRelationId, nspOid, GetUserId(), ACL_CREATE); + if (aclresult != ACLCHECK_OK) + aclcheck_error(aclresult, OBJECT_SCHEMA, + get_namespace_name(nspOid)); + } + + return calculate_size_attvalue(Anum_pg_class_relnamespace, nspOid); +} + +Datum +pg_namespace_size_oid(PG_FUNCTION_ARGS) +{ + Oid nspOid = PG_GETARG_OID(0); + int64 size; + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_INT64(size); +} + +Datum +pg_namespace_size_name(PG_FUNCTION_ARGS) +{ + Name nspName = PG_GETARG_NAME(0); + Oid nspOid = get_namespace_oid(NameStr(*nspName), false); + int64 size; + + size = calculate_namespace_size(nspOid); + + if (size < 0) + PG_RETURN_NULL(); + + PG_RETURN_I
Re: walsender performance regression due to logical decoding on standby changes
Hi, On 2023-05-24 05:53:51 +, Zhijie Hou (Fujitsu) wrote: > On Tuesday, May 23, 2023 1:53 AM Andres Freund wrote: > > On 2023-05-22 12:15:07 +, Zhijie Hou (Fujitsu) wrote: > > > About "a backend doing logical decoding", do you mean the case when a > > user > > > start a backend and invoke pg_logical_slot_get_changes() to do the logical > > > decoding ? If so, it seems the logical decoding in a backend won't be > > > waked > > up > > > by startup process because the backend won't be registered as a walsender > > so > > > the backend won't be found in WalSndWakeup(). > > > > I meant logical decoding happening inside a walsender instance. > > > > > > > Or do you mean the deadlock between the real logical walsender and startup > > > process ? (I might miss something) I think the logical decoding doesn't > > > lock > > > the target user relation when decoding because it normally can get the > > needed > > > information from WAL. > > > > It does lock catalog tables briefly. There's no guarantee that such locks > > are > > released immediately. I forgot the details, but IIRC there's some outfuncs > > (enum?) that intentionally delay releasing locks till transaction commit. > > Thanks for the explanation ! > > I understand that the startup process can take lock on the catalog(when > replaying record) which may conflict with the lock in walsender. > > But in walsender, I think we only start transaction after entering > ReorderBufferProcessTXN(), and the transaction started here will be released > soon after processing and outputting the decoded transaction's data(as the > comment in ReorderBufferProcessTXN() says:" all locks acquired in here to be > released, not reassigned to the parent and we do not want any database access > have persistent effects."). It's possible that there's no immediate danger - but I wouldn't want to bet on it staying that way. Think about things like streaming out large transactions etc. There also are potential dangers outside of plain things like locks - there could be a recovery conflicts on the snapshot or such (although that normally should be prevented via hot_standby_feedback or such). Even if there's no hard deadlock issue - not waking up a walsender-in-logical-decoding that's currently waiting because (replay_count % N) != 0 means that the walsender might be delayed for quite a while - there might not be any further records from the primary in the near future. I don't think any approach purely based on record counts has any chance to work well. You could combine it with something else, e.g. with always waking up in XLogPageRead() and WaitForWALToBecomeAvailable(). But then you end up with something relatively complicated again. Greetings, Andres Freund
Re: memory leak in trigger handling (since PG12)
Hi, On 2023-05-24 21:56:22 +0200, Tomas Vondra wrote: > >> The really hard thing was determining what causes the memory leak - the > >> simple instrumentation doesn't help with that at all. It tells you there > >> might be a leak, but you don't know where did the allocations came from. > >> > >> What I ended up doing is a simple gdb script that sets breakpoints on > >> all palloc/pfree variants, and prints info (including the backtrace) for > >> each call on ExecutorState. And then a script that aggregate those to > >> identify which backtraces allocated most chunks that were not freed. > > > > FWIW, for things like this I found "heaptrack" to be extremely helpful. > > > > E.g. for a reproducer of the problem here, it gave me the attach "flame > > graph" > > of the peak memory usage - after attaching to a running backend and running > > an > > UPDATE triggering the leak.. > > > > Because the view I screenshotted shows the stacks contributing to peak > > memory > > usage, it works nicely to find "temporary leaks", even if memory is actually > > freed after all etc. > > > > That's a nice visualization, but isn't that useful only once you > determine there's a memory leak? Which I think is the hard problem. So is your gdb approach, unless I am misunderstanding? The view I screenshotted shows the "peak" allocated memory, if you have a potential leak, you can see where most of the allocated memory was allocated. Which at least provides you with a good idea of where to look for a problem in more detail. > >>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a > >>> shorter > >>> lived memory context instead? Otherwise we'll just end up with the same > >>> problem in a few years. > >>> > >> > >> I agree using a shorter lived memory context would be more elegant, and > >> more in line with how we do things. But it's not clear to me why we'd > >> end up with the same problem in a few years with what the patch does. > > > > Because it sets up the pattern of manual memory management and continues to > > run the relevant code within a query-lifetime context. > > > > Oh, you mean someone might add new allocations to this code (or into one > of the functions executed from it), and that'd leak again? Yeah, true. Yes. It's certainly not obvious this far down that we are called in a semi-long-lived memory context. Greetings, Andres Freund
Re: memory leak in trigger handling (since PG12)
Hi, On 2023-05-24 21:49:13 +0200, Tomas Vondra wrote: > >> and then have to pass updatedCols elsewhere. It's tricky to just switch > >> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as > >> AfterTriggerSaveEvent allocates other bits of memory too (in a longer > >> lived context). > > > > Hm - on a quick look the allocations in trigger.c itself are done with > > MemoryContextAlloc(). > > > > I did find a problematic path, namely that ExecGetChildToRootMap() ends up > > building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext. > > > > That seems like a flat out bug to me - we can't just store data in a > > ResultRelInfo without ensuring the memory is lives long enough. Nearby > > places > > like ExecGetRootToChildMap() do make sure to switch to es_query_cxt. > > > > > > Did you see other bits of memory getting allocated in CurrentMemoryContext? > > > > No, I simply tried to do the context switch and then gave up when it > crashed on the ExecGetRootToChildMap info. I haven't looked much > further, but you may be right it's the only bit. > > It didn't occur to me it might be a bug - I think the code simply > assumes it gets called with suitable memory context, just like we do in > various other places. Maybe it should document the assumption. I think architecturally, code storing stuff in a ResultRelInfo - which has query lifetime - ought to be careful to allocate memory with such lifetime. Note that the nearby ExecGetRootToChildMap() actually is careful to do so. > >> So we'd have to do another switch again. Not sure how > >> backpatch-friendly would that be. > > > > Yea, that's a valid concern. I think it might be reasonable to use something > > like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a > > short-lived context for the trigger invocations in >= 16. Hm - stepping back a bit, why are we doing the work in ExecGetAllUpdatedCols() over and over? Unless I am missing something, the result doesn't change across rows. And it doesn't look that cheap to compute, leaving aside the allocation that bms_union() does. It's visible in profiles, not as a top entry, but still. Perhaps the easiest to backpatch fix is to just avoid recomputing the value? But perhaps it'd be just as problmeatic, because callers might modify ExecGetAllUpdatedCols()'s return value in place... Greetings, Andres Freund
Re: memory leak in trigger handling (since PG12)
On 5/24/23 20:14, Andres Freund wrote: > Hi, > > On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote: >> On 5/23/23 19:14, Andres Freund wrote: >>> Hi, >>> >>> On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote: This means that for an UPDATE with triggers, we may end up calling this for each row, possibly multiple bitmaps. And those bitmaps are allocated in ExecutorState, so won't be freed until the end of the query :-( >>> >>> Ugh. >>> >>> >>> I've wondered about some form of instrumentation to detect such issues >>> before. It's obviously a problem that we can have fairly large leaks, like >>> the >>> one you just discovered, without detecting it for a couple years. It's kinda >>> made worse by the memory context infrastructure, because it hides such >>> issues. >>> >>> Could it help to have a mode where the executor shutdown hook checks how >>> much >>> memory is allocated in ExecutorState and warns if its too much? There's >>> IIRC a >>> few places that allocate large things directly in it, but most of those >>> probably should be dedicated contexts anyway. Something similar could be >>> useful for some other long-lived contexts. >>> >> >> Not sure such simple instrumentation would help much, unfortunately :-( >> >> We only discovered this because the user reported OOM crashes, which >> means the executor didn't get to the shutdown hook at all. Yeah, maybe >> if we had such instrumentation it'd get triggered for milder cases, but >> I'd bet the amount of noise is going to be significant. >> >> For example, there's a nearby thread about hashjoin allocating buffiles >> etc. in ExecutorState - we ended up moving that to a separate context, >> but surely there are more such cases (not just for ExecutorState). > > Yes, that's why I said that we would have to more of those into dedicated > contexts - which is a good idea independent of this issue. > Yeah, I think that's a good idea in principle. > >> The really hard thing was determining what causes the memory leak - the >> simple instrumentation doesn't help with that at all. It tells you there >> might be a leak, but you don't know where did the allocations came from. >> >> What I ended up doing is a simple gdb script that sets breakpoints on >> all palloc/pfree variants, and prints info (including the backtrace) for >> each call on ExecutorState. And then a script that aggregate those to >> identify which backtraces allocated most chunks that were not freed. > > FWIW, for things like this I found "heaptrack" to be extremely helpful. > > E.g. for a reproducer of the problem here, it gave me the attach "flame graph" > of the peak memory usage - after attaching to a running backend and running an > UPDATE triggering the leak.. > > Because the view I screenshotted shows the stacks contributing to peak memory > usage, it works nicely to find "temporary leaks", even if memory is actually > freed after all etc. > That's a nice visualization, but isn't that useful only once you determine there's a memory leak? Which I think is the hard problem. > > >>> Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter >>> lived memory context instead? Otherwise we'll just end up with the same >>> problem in a few years. >>> >> >> I agree using a shorter lived memory context would be more elegant, and >> more in line with how we do things. But it's not clear to me why we'd >> end up with the same problem in a few years with what the patch does. > > Because it sets up the pattern of manual memory management and continues to > run the relevant code within a query-lifetime context. > Oh, you mean someone might add new allocations to this code (or into one of the functions executed from it), and that'd leak again? Yeah, true. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: memory leak in trigger handling (since PG12)
Hi, On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote: > While looking for other places allocating stuff in ExecutorState (for > the UPDATE case) and leaving it there, I found two more cases: For a bit I thought there was a similar problem in ExecWithCheckOptions() - but we error out immediately afterwards, so there's no danger of accumulating memory. I find it quite depressing that we have at least four copies of: /* * If the tuple has been routed, it's been converted to the partition's * rowtype, which might differ from the root table's. We must convert it * back to the root table's rowtype so that val_desc in the error message * matches the input tuple. */ if (resultRelInfo->ri_RootResultRelInfo) { ResultRelInfo *rootrel = resultRelInfo->ri_RootResultRelInfo; TupleDesc old_tupdesc; AttrMap*map; root_relid = RelationGetRelid(rootrel->ri_RelationDesc); tupdesc = RelationGetDescr(rootrel->ri_RelationDesc); old_tupdesc = RelationGetDescr(resultRelInfo->ri_RelationDesc); /* a reverse map */ map = build_attrmap_by_name_if_req(old_tupdesc, tupdesc, false); /* * Partition-specific slot's tupdesc can't be changed, so allocate a * new one. */ if (map != NULL) slot = execute_attr_map_slot(map, slot, MakeTupleTableSlot(tupdesc, &TTSOpsVirtual)); modifiedCols = bms_union(ExecGetInsertedCols(rootrel, estate), ExecGetUpdatedCols(rootrel, estate)); } Once in ExecPartitionCheckEmitError(), *twice* in ExecConstraints(), ExecWithCheckOptions(). Some of the partitioning stuff has been added in a really myopic way. Greetings, Andres Freund
Re: memory leak in trigger handling (since PG12)
On 5/24/23 20:55, Andres Freund wrote: > Hi, > > On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote: >> I looked at this again, and I think GetPerTupleMemoryContext(estate) >> might do the trick, see the 0002 part. > > Yea, that seems like the right thing here. > > >> Unfortunately it's not much >> smaller/simpler than just freeing the chunks, because we end up doing >> >> oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); >> updatedCols = ExecGetAllUpdatedCols(relinfo, estate); >> MemoryContextSwitchTo(oldcxt); > > We could add a variant of ExecGetAllUpdatedCols that switches the context. > Yeah, we could do that. I was thinking about backpatching, and modifying ExecGetAllUpdatedCols signature would be ABI break, but adding a variant should be fine. > >> and then have to pass updatedCols elsewhere. It's tricky to just switch >> to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as >> AfterTriggerSaveEvent allocates other bits of memory too (in a longer >> lived context). > > Hm - on a quick look the allocations in trigger.c itself are done with > MemoryContextAlloc(). > > I did find a problematic path, namely that ExecGetChildToRootMap() ends up > building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext. > > That seems like a flat out bug to me - we can't just store data in a > ResultRelInfo without ensuring the memory is lives long enough. Nearby places > like ExecGetRootToChildMap() do make sure to switch to es_query_cxt. > > > Did you see other bits of memory getting allocated in CurrentMemoryContext? > No, I simply tried to do the context switch and then gave up when it crashed on the ExecGetRootToChildMap info. I haven't looked much further, but you may be right it's the only bit. It didn't occur to me it might be a bug - I think the code simply assumes it gets called with suitable memory context, just like we do in various other places. Maybe it should document the assumption. > >> So we'd have to do another switch again. Not sure how >> backpatch-friendly would that be. > > Yea, that's a valid concern. I think it might be reasonable to use something > like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a > short-lived context for the trigger invocations in >= 16. > WFM regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: memory leak in trigger handling (since PG12)
Hi, On 2023-05-24 15:38:41 +0200, Tomas Vondra wrote: > I looked at this again, and I think GetPerTupleMemoryContext(estate) > might do the trick, see the 0002 part. Yea, that seems like the right thing here. > Unfortunately it's not much > smaller/simpler than just freeing the chunks, because we end up doing > > oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); > updatedCols = ExecGetAllUpdatedCols(relinfo, estate); > MemoryContextSwitchTo(oldcxt); We could add a variant of ExecGetAllUpdatedCols that switches the context. > and then have to pass updatedCols elsewhere. It's tricky to just switch > to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as > AfterTriggerSaveEvent allocates other bits of memory too (in a longer > lived context). Hm - on a quick look the allocations in trigger.c itself are done with MemoryContextAlloc(). I did find a problematic path, namely that ExecGetChildToRootMap() ends up building resultRelInfo->ri_ChildToRootMap in CurrentMemoryContext. That seems like a flat out bug to me - we can't just store data in a ResultRelInfo without ensuring the memory is lives long enough. Nearby places like ExecGetRootToChildMap() do make sure to switch to es_query_cxt. Did you see other bits of memory getting allocated in CurrentMemoryContext? > So we'd have to do another switch again. Not sure how > backpatch-friendly would that be. Yea, that's a valid concern. I think it might be reasonable to use something like ExecGetAllUpdatedColsCtx() in the backbranches, and switch to a short-lived context for the trigger invocations in >= 16. Greetings, Andres Freund
Re: memory leak in trigger handling (since PG12)
Hi, On 2023-05-23 23:26:42 +0200, Tomas Vondra wrote: > On 5/23/23 19:14, Andres Freund wrote: > > Hi, > > > > On 2023-05-23 18:23:00 +0200, Tomas Vondra wrote: > >> This means that for an UPDATE with triggers, we may end up calling this > >> for each row, possibly multiple bitmaps. And those bitmaps are allocated > >> in ExecutorState, so won't be freed until the end of the query :-( > > > > Ugh. > > > > > > I've wondered about some form of instrumentation to detect such issues > > before. It's obviously a problem that we can have fairly large leaks, like > > the > > one you just discovered, without detecting it for a couple years. It's kinda > > made worse by the memory context infrastructure, because it hides such > > issues. > > > > Could it help to have a mode where the executor shutdown hook checks how > > much > > memory is allocated in ExecutorState and warns if its too much? There's > > IIRC a > > few places that allocate large things directly in it, but most of those > > probably should be dedicated contexts anyway. Something similar could be > > useful for some other long-lived contexts. > > > > Not sure such simple instrumentation would help much, unfortunately :-( > > We only discovered this because the user reported OOM crashes, which > means the executor didn't get to the shutdown hook at all. Yeah, maybe > if we had such instrumentation it'd get triggered for milder cases, but > I'd bet the amount of noise is going to be significant. > > For example, there's a nearby thread about hashjoin allocating buffiles > etc. in ExecutorState - we ended up moving that to a separate context, > but surely there are more such cases (not just for ExecutorState). Yes, that's why I said that we would have to more of those into dedicated contexts - which is a good idea independent of this issue. > The really hard thing was determining what causes the memory leak - the > simple instrumentation doesn't help with that at all. It tells you there > might be a leak, but you don't know where did the allocations came from. > > What I ended up doing is a simple gdb script that sets breakpoints on > all palloc/pfree variants, and prints info (including the backtrace) for > each call on ExecutorState. And then a script that aggregate those to > identify which backtraces allocated most chunks that were not freed. FWIW, for things like this I found "heaptrack" to be extremely helpful. E.g. for a reproducer of the problem here, it gave me the attach "flame graph" of the peak memory usage - after attaching to a running backend and running an UPDATE triggering the leak.. Because the view I screenshotted shows the stacks contributing to peak memory usage, it works nicely to find "temporary leaks", even if memory is actually freed after all etc. > > Hm. Somehow this doesn't seem quite right. Shouldn't we try to use a shorter > > lived memory context instead? Otherwise we'll just end up with the same > > problem in a few years. > > > > I agree using a shorter lived memory context would be more elegant, and > more in line with how we do things. But it's not clear to me why we'd > end up with the same problem in a few years with what the patch does. Because it sets up the pattern of manual memory management and continues to run the relevant code within a query-lifetime context. Greetings, Andres Freund
Question about error message in auth.c
Greetings, In https://github.com/postgres/postgres/blob/5c2c59ba0b5f723b067a6fa8bf8452d41fbb2125/src/backend/libpq/auth.c#L463 The last piece of information is the encryption state. However when an SSL connection fails to authenticate the state is not encrypted. When would it ever be encrypted if authentication fails ? Dave Cramer
Re: memory leak in trigger handling (since PG12)
On 5/24/23 17:37, Tom Lane wrote: > Tomas Vondra writes: >> While looking for other places allocating stuff in ExecutorState (for >> the UPDATE case) and leaving it there, I found two more cases: > >> 1) copy_plpgsql_datums > >> 2) make_expanded_record_from_tupdesc >>make_expanded_record_from_exprecord > >> All of this is calls from plpgsql_exec_trigger. > > Can you show a test case in which this happens? I added some > instrumentation and verified at least within our regression tests, > copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's > "SPI Proc" context, so I do not see how there can be a query-lifespan > leak there, nor how your 0003 would fix it if there is. > Interesting. I tried to reproduce it, but without success, and it passes even with an assert on the context name. The only explanation I have is that the gdb script I used might have been a bit broken - it used conditional breakpoints like this one: break AllocSetAlloc if strcmp(((MemoryContext) $rdi)->name, \ "ExecutorState") == 0 commands bt cont end but I just noticed gdb sometimes complains about this: Error in testing breakpoint condition: '__strcmp_avx2' has unknown return type; cast the call to its declared return type The breakpoint still fires all the commands, which is pretty surprising behavior, but that might explain why I saw copy_plpgsql_data as another culprit. And I suspect the make_expanded_record calls might be caused by the same thing. I'll check deeper tomorrow, when I get access to the original script etc. We can ignore these cases until then. Sorry for the confusion :-/ regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: PG 16 draft release notes ready
On Wed, May 24, 2023 at 01:43:50PM -0400, Bruce Momjian wrote: > > * Reduce palloc() memory overhead for all memory allocations down to 8 > > bytes on all platforms. (Andres Freund, David Rowley) > > > > This allows more efficient use of memory and is especially useful in > > queries which perform operations (such as sorting or hashing) that > > require more than work_mem. > > Well, this would go in the source code section, but it seems too > internal and global to mention. What was the previous memory allocation overhead? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: PG 16 draft release notes ready
On Wed, May 24, 2023 at 08:48:56AM +1200, David Rowley wrote: > On Tue, 23 May 2023 at 06:04, Bruce Momjian wrote: > > > > On Mon, May 22, 2023 at 10:59:36AM -0700, Andres Freund wrote: > > > And here it's not just performance, but also memory usage, including > > > steady > > > state memory usage. > > > > Understood. I continue to need help determining which items to include. > > Can you suggest some text? This? > > > > Improve efficiency of memory usage to allow for better scaling > > Maybe something like: > > * Reduce palloc() memory overhead for all memory allocations down to 8 > bytes on all platforms. (Andres Freund, David Rowley) > > This allows more efficient use of memory and is especially useful in > queries which perform operations (such as sorting or hashing) that > require more than work_mem. Well, this would go in the source code section, but it seems too internal and global to mention. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: PostgreSQL 16 Beta 1 release announcement draft
On 5/22/23 3:23 PM, Erik Rijkers wrote: Op 5/21/23 om 19:07 schreef Jonathan S. Katz: On 5/19/23 12:17 AM, Jonathan S. Katz wrote: Hi, Attached is a draft of the release announcement for PostgreSQL 16 Beta Please provide feedback no later than May 24, 0:00 AoE. This will give Thanks everyone for your feedback. Here is the updated text that 'substransaction' should be 'subtransaction' Fixed. 'use thousands separators' perhaps is better: 'use underscore as digit-separator, as in `5_432` and `1_00_000`' I looked at how other languages document this, and they do use the term "thousands separators." I left that in, but explicitly called out the underscore. 'instrcut' should be 'instruct' Fixed. Attached is the (hopefully) final draft. Thanks, Jonathan The PostgreSQL Global Development Group announces that the first beta release of PostgreSQL 16 is now [available for download](https://www.postgresql.org/download/). This release contains previews of all features that will be available when PostgreSQL 16 is made generally available, though some details of the release can change during the beta period. You can find information about all of the features and changes found in PostgreSQL 16 in the [release notes](https://www.postgresql.org/docs/16/release-16.html): [https://www.postgresql.org/docs/16/release-16.html](https://www.postgresql.org/docs/16/release-16.html) In the spirit of the open source PostgreSQL community, we strongly encourage you to test the new features of PostgreSQL 16 on your systems to help us eliminate bugs or other issues that may exist. While we do not advise you to run PostgreSQL 16 Beta 1 in production environments, we encourage you to find ways to run your typical application workloads against this beta release. Your testing and feedback will help the community ensure that the PostgreSQL 16 release upholds our standards of delivering a stable, reliable release of the world's most advanced open source relational database. Please read more about our [beta testing process](https://www.postgresql.org/developer/beta/) and how you can contribute: [https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/) PostgreSQL 16 Feature Highlights ### Performance PostgreSQL 16 includes performance improvements in query execution. This release adds more query parallelism, including allowing `FULL` and `RIGHT` joins to execute in parallel, and parallel execution of the `string_agg` and `array_agg` aggregate functions. Additionally, PostgreSQL 16 can use incremental sorts in `SELECT DISTINCT` queries. There are also several optimizations for [window queries](https://www.postgresql.org/docs/16/sql-expressions.html#SYNTAX-WINDOW-FUNCTIONS), improvements in lookups for `RANGE` and `LIST` partitions, and support for "anti-joins" in `RIGHT` and `OUTER` queries. This release also introduces support for CPU acceleration using SIMD for both x86 and ARM architectures, including optimizations for processing ASCII and JSON strings, and array and subtransaction searches. Additionally, PostgreSQL 16 introduces [load balancing](https://www.postgresql.org/docs/16/libpq-connect.html#LIBPQ-CONNECT-LOAD-BALANCE-HOSTS) to libpq, the client library for PostgreSQL. ### Logical Replication Enhancements Logical replication lets PostgreSQL users stream data in real-time to other PostgreSQL or other external systems that implement the logical protocol. Until PostgreSQL 16, users could only create logical replication publishers on primary instances. PostgreSQL 16 adds the ability to perform logical decoding on a standby instance, giving users more options to distribute their workload, for example, use a standby that's less busy than a primary to logically replicate changes. PostgreSQL 16 also includes several performance improvements to logical replication. This includes allowing the subscriber to apply large transactions in parallel, use indexes other than the `PRIMARY KEY` to perform lookups during `UPDATE` or `DELETE` operations, and allow for tables to be copied using binary format during initialization. ### Developer Experience PostgreSQL 16 continues to implement the [SQL/JSON](https://www.postgresql.org/docs/16/functions-json.html) standard for manipulating [JSON](https://www.postgresql.org/docs/16/datatype-json.html) data, including support for SQL/JSON constructors (e.g. `JSON_ARRAY()`, `JSON_ARRAYAGG()` et al), and identity functions (`IS JSON`). This release also adds the SQL standard [`ANY_VALUE`](https://www.postgresql.org/docs/16/functions-aggregate.html#id-1.5.8.27.5.2.4.1.1.1.1) aggregate function, which returns any arbitrary value from the aggregate set. For convenience, PostgreSQL 16 now lets you specify non-decimal integer literals, such as `0xff`, `0o777`, and `0b101010`, and use underscores as thousands separators, such as `5_432`. This release adds support for the extended query protocol to the
Re: PG 16 draft release notes ready
On 5/24/23 12:13 AM, David Rowley wrote: On Wed, 24 May 2023 at 15:54, Bruce Momjian wrote: On Wed, May 24, 2023 at 08:37:45AM +1200, David Rowley wrote: On Mon, 22 May 2023 at 07:05, Jonathan S. Katz wrote: * Parallel execution of queries that use `FULL` and `OUTER` joins I think this should be `RIGHT` joins rather than `OUTER` joins. LEFT joins have been parallelizable I think for a long time now. Well, since we can swap left/right easily, why would we not have just have swappted the tables and done the join in the past? I think there are two things missing in my description. First, I need to mention parallel _hash_ join. Second, I think this item is saying that the _inner_ side of a parallel hash join can be an OUTER or FULL join. How about? Allow hash joins to be parallelized where the inner side is processed as an OUTER or FULL join (Melanie Plageman, Thomas Munro) In this case, the inner side is the hashed side. I think Jonathan's text is safe to swap OUTER to RIGHT as it mentions "execution". I made this swap in the release announcement. Thanks! Jonathan OpenPGP_signature Description: OpenPGP digital signature
Re: Order changes in PG16 since ICU introduction
On 5/24/23 11:39, Jeff Davis wrote: On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote: In practice we're probably getting the "und" ICU locale whereas "fr" would be appropriate. This is a good point and illustrates that ICU is not a drop-in replacement for libc in all cases. I don't see a solution here that doesn't involve some rough edges, though. "Locale" is a generic term, and if we continue to insist that it really means a libc locale, then ICU will never be on an equal footing with libc, let alone the preferred provider. Huge +1 IMHO the experience should be unified to the degree possible. -- Joe Conway PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: PG 16 draft release notes ready
On Wed, May 24, 2023 at 04:57:59PM +0200, Erik Rijkers wrote: > Op 5/24/23 om 15:58 schreef Bruce Momjian: > > On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote: > > > > > > On Wed, May 24, 2023 at 11:19 AM Bruce Momjian wrote: > > Typos: > > 'from standbys servers' should be > 'from standby servers' > > 'reindexedb' should be > 'reindexdb' > (2x: the next line mentions, erroneously, 'reindexedb --system') > > 'created only created' should be > 'only created' > (I think) > > 'could could' should be > 'could' > > 'are now require the role' should be > 'now require the role' > > 'values is' should be > 'value is' > > 'to marked' should be > 'to be marked' All good, patch attached and applied. Updated docs are at: https://momjian.us/pgsql_docs/release-16.html -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml index bb92fe5cf9..88d6514ad7 100644 --- a/doc/src/sgml/release-16.sgml +++ b/doc/src/sgml/release-16.sgml @@ -27,7 +27,7 @@ - Allow logical replication from standbys servers + Allow logical replication from standby servers @@ -126,11 +126,11 @@ Author: Michael Paquier -Change REINDEX DATABASE and reindexedb to not process indexes on system catalogs (Simon Riggs) +Change REINDEX DATABASE and reindexdb to not process indexes on system catalogs (Simon Riggs) -Processing such indexes is still possible using REINDEX SYSTEM and reindexedb --system. +Processing such indexes is still possible using REINDEX SYSTEM and reindexdb --system. @@ -593,7 +593,7 @@ Create subscription statistics entries at subscription creation time so stats_re -Previously entries were created only created when the first statistics were reported. +Previously entries were created only when the first statistics were reported. @@ -777,7 +777,7 @@ Simplify permissions for LOCK TABLE (Jeff Davis) -Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions. For example, UPDATE could could perform all lock levels except ACCESS SHARE, which +Previously the ability to perform LOCK TABLE at various lock levels was bound to specific query-type permissions. For example, UPDATE could perform all lock levels except ACCESS SHARE, which required SELECT permissions. Now UPDATE can issue all lock levels. MORE? @@ -808,7 +808,7 @@ Restrict the privileges of CREATEROLE roles (Robert Haas) -Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role. Such changes, including adding members, are now require the role requesting the change to have ADMIN OPTION +Previously roles with CREATEROLE privileges could change many aspects of any non-superuser role. Such changes, including adding members, now require the role requesting the change to have ADMIN OPTION permission. @@ -946,7 +946,7 @@ Add dependency tracking of grantors for GRANT records (Robert Haas) -This will guarantee that pg_auth_members.grantor values is always valid. +This guarantees that pg_auth_members.grantor values are always valid. @@ -3017,7 +3017,7 @@ Author: Tom Lane -Allow required extensions to marked as non-relocatable using "no_relocate" (Regina Obe) +Allow required extensions to be marked as non-relocatable using "no_relocate" (Regina Obe)
Re: memory leak in trigger handling (since PG12)
Tomas Vondra writes: > On 5/23/23 23:39, Tom Lane wrote: >> FWIW, I've had some success localizing palloc memory leaks with valgrind's >> leak detection mode. The trick is to ask it for a report before the >> context gets destroyed. Beats writing your own infrastructure ... > I haven't tried valgrind, so can't compare. > Would it be possible to filter which memory contexts to track? Say we > know the leak is in ExecutorState, so we don't need to track allocations > in other contexts. That was a huge speedup for me, maybe it'd help > valgrind too. I don't think valgrind has a way to do that, but this'd be something you set up specially in any case. > Also, how did you ask for the report before context gets destroyed? There are several valgrind client requests that could be helpful: /* Do a full memory leak check (like --leak-check=full) mid-execution. */ #define VALGRIND_DO_LEAK_CHECK \ VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK, \ 0, 0, 0, 0, 0) /* Same as VALGRIND_DO_LEAK_CHECK but only showing the entries for which there was an increase in leaked bytes or leaked nr of blocks since the previous leak search. */ #define VALGRIND_DO_ADDED_LEAK_CHECK\ VALGRIND_DO_CLIENT_REQUEST_STMT(VG_USERREQ__DO_LEAK_CHECK, \ 0, 1, 0, 0, 0) /* Return number of leaked, dubious, reachable and suppressed bytes found by all previous leak checks. They must be lvalues. */ #define VALGRIND_COUNT_LEAK_BLOCKS(leaked, dubious, reachable, suppressed) \ Putting VALGRIND_DO_ADDED_LEAK_CHECK someplace in the executor loop would help narrow things down pretty quickly, assuming you had a self-contained example demonstrating the leak. I don't recall exactly how I used these but it was something along that line. >> Yeah, it's not clear whether we could make the still-hypothetical check >> sensitive enough to find leaks using small test cases without getting an >> unworkable number of false positives. Still, might be worth trying. > I'm not against experimenting with that. Were you thinking about > something that'd be cheap enough to just be enabled always/everywhere, > or something we'd enable during testing? We seem to have already paid the overhead of counting all palloc allocations, so I don't think it'd be too expensive to occasionally check the ExecutorState's mem_allocated and see if it's growing (especially if we only do so in assert-enabled builds). The trick is to define the rules for what's worth reporting. >> It might be an acceptable tradeoff to have stricter rules for what can >> be allocated in ExecutorState in order to make this sort of problem >> more easily detectable. > Would these rules be just customary, or defined/enforced in the code > somehow? I can't quite imagine how would that work, TBH. If the check bleated "WARNING: possible executor memory leak" during regression testing, people would soon become conditioned to doing whatever they have to do to avoid it ;-) regards, tom lane
Re: Order changes in PG16 since ICU introduction
On Mon, 2023-05-22 at 22:09 +0200, Daniel Verite wrote: > While I agree that the LOCALE option in CREATE DATABASE is > counter-intuitive, I think it's more than that. As Andreww Gierth pointed out: $ initdb --locale=fr_FR ... ICU locale: en-US ... Is more than just counter-intuitive. I don't think we can ship 16 that way. > I find it questionable that blending ICU > and libc locales into it helps that much with the user experience. Thank you for going through some examples here. I agree that it's not perfect, but we need some path to a reasonable ICU user experience, and I think we'll have to accept some rough edges to avoid the worst cases, like above. > initdb: > > Using default ICU locale "fr". > Using language tag "fr" for ICU locale "fr". > ... > #1 > > postgres=# create database test1 locale='fr_FR.UTF-8'; > NOTICE: using standard form "fr-FR" for ICU locale "fr_FR.UTF-8" > ERROR: new ICU locale (fr-FR) is incompatible with the ICU locale of I don't see a problem here. If you specify LOCALE to CREATE DATABASE, you should either be using "TEMPLATE template0", or you should be expecting an error if the LOCALE doesn't match exactly. What would you like to see happen here? > #2 > > postgres=# create database test2 locale='C.UTF-8' > template='template0'; > NOTICE: using standard form "en-US-u-va-posix" for ICU locale > "C.UTF-8" > CREATE DATABASE > > > en-US-u-va-posix does not sort like C.UTF-8 in glibc 2.35, so > this interpretation is arguably not what a user would expect. As you pointed out, this is not settled in libc either: https://www.postgresql.org/message-id/8a3dc06f-9b9d-4ed7-9a12-2070d8b0165f%40manitou-mail.org We really can't expect a particular order for a particular locale name, unless we handle it specially like "C" or "POSIX". If we pass it to the provider, we have to trust the provider to match our conceptual expectations for that locale (and ideally version it properly). > I would expect the ICU warning or error (icu_validation_level) to > kick > in instead of that transliteration. Earlier versions of ICU (<= 63) do this transformation automatically, and I don't see a reason to throw an error if ICU considers it valid. The language tag en-US-u-va-posix will be stored in the catalog, and that will be considered valid in later versions of ICU. Later versions of ICU (>= 64) consider locales with a language name of "C" to be obsolete and no longer valid. I added code to do the transformation without error in these later versions, but I think we have agreement to remove it. If a user specifies the locale as "C.UTF-8", we can either pass it to ICU and see whether that version accepts it or not (and if not, throw a warning/error); or if we decide that "C.UTF-8" really means "C", we can handle it in the memcmp() path like C and never send it to ICU. > #3 > > $ grep french /etc/locale.alias > french fr_FR.ISO-8859-1 > > postgres=# create database test3 locale='french' template='template0' > encoding='LATIN1'; > WARNING: ICU locale "french" has unknown language "french" > HINT: To disable ICU locale validation, set parameter > icu_validation_level > to DISABLED. > CREATE DATABASE > > > In practice we're probably getting the "und" ICU locale whereas "fr" > would > be appropriate. This is a good point and illustrates that ICU is not a drop-in replacement for libc in all cases. I don't see a solution here that doesn't involve some rough edges, though. "Locale" is a generic term, and if we continue to insist that it really means a libc locale, then ICU will never be on an equal footing with libc, let alone the preferred provider. Regards, Jeff Davis
Re: memory leak in trigger handling (since PG12)
Tomas Vondra writes: > While looking for other places allocating stuff in ExecutorState (for > the UPDATE case) and leaving it there, I found two more cases: > 1) copy_plpgsql_datums > 2) make_expanded_record_from_tupdesc >make_expanded_record_from_exprecord > All of this is calls from plpgsql_exec_trigger. Can you show a test case in which this happens? I added some instrumentation and verified at least within our regression tests, copy_plpgsql_datums' CurrentMemoryContext is always plpgsql's "SPI Proc" context, so I do not see how there can be a query-lifespan leak there, nor how your 0003 would fix it if there is. regards, tom lane
Re: Wrong results due to missing quals
Richard Guo writes: > So the qual 't2.a = t5.a' is missing. Ugh. > I looked into it and found that both clones of this joinqual are > rejected by clause_is_computable_at, because their required_relids do > not include the outer join of t2/(t3/t4), and meanwhile include nullable > rels of this outer join. > I think the root cause is that, as Tom pointed out in [1], we're not > maintaining required_relids very accurately. In b9c755a2, we make > clause_is_computable_at test required_relids for clone clauses. I think > this is how this issue sneaks in. Yeah. I'm starting to think that b9c755a2 took the wrong approach. Really, required_relids is about making sure that a qual isn't evaluated "too low", before all necessary joins have been formed. But clause_is_computable_at is charged with making sure we don't evaluate it "too high", after some incompatible join has been formed. There's no really good reason to suppose that required_relids can serve both purposes, even if it were computed perfectly accurately (and what is perfect, anyway?). So right now I'm playing with the idea of reverting the change in clause_is_computable_at and seeing how else we can fix the previous bug. Don't have anything to show yet, but one thought is that maybe deconstruct_distribute_oj_quals needs to set up clause_relids for clone clauses differently. Another idea is that maybe we need another RestrictInfo field that's directly a set of OJ relids that this clause can't be applied above. That'd reduce clause_is_computable_at to basically a bms_intersect test which would be nice speed-wise. The space consumption could be annoying, but I'm thinking that we might only have to populate the field in clone clauses, which would alleviate that issue. regards, tom lane
Re: PG 16 draft release notes ready
Op 5/24/23 om 15:58 schreef Bruce Momjian: On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote: On Wed, May 24, 2023 at 11:19 AM Bruce Momjian wrote: Typos: 'from standbys servers' should be 'from standby servers' 'reindexedb' should be 'reindexdb' (2x: the next line mentions, erroneously, 'reindexedb --system') 'created only created' should be 'only created' (I think) 'could could' should be 'could' 'are now require the role' should be 'now require the role' 'values is' should be 'value is' 'to marked' should be 'to be marked' thanks, Erik
SyncRepWaitForLSN waits for XLogFlush?
Hi everyone, I was wondering if waiting for an LSN in SyncRepWaitForLSN ensures that the XLOG has been flushed locally up to that location before the record is shipped off to standbys? Regards, Tej
Re: Proposal: Removing 32 bit support starting from PG17++
Hans Buschmann writes: > This inspired me to propose dropping 32bit support for PostgreSQL starting > with PG17. I don't think this is a great idea. Even if Intel isn't interested, there'll be plenty of 32-bit left in the lower end of the market (think ARM, IoT, and so on). regards, tom lane
Proposal: Removing 32 bit support starting from PG17++
I recently stumbled over the following Intel proposal for dropping 32bit support in x86 processors. [1] This inspired me to propose dropping 32bit support for PostgreSQL starting with PG17. It seems obvious that production systems mostly won't use newly installed 32 bit native code in late 2024 and beyond. A quick inspection of the buildfarm only showed a very limited number of 32bit systems. This proposal follows the practice for Windows which already (practically) dropped 32bit support. 32bit systems may get continuing support in the backbranches until PG16 retires in 2028. Even if I am not a postgres hacker I suppose this could simplify things quite a lot. Any thoughts for discussion are welcome! Hans Buschmann [1] https://www.phoronix.com/news/Intel-X86-S-64-bit-Only
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Daniel Gustafsson writes: > It would be nice it the OpenSSL project could grant us an LTS license for a > buildfarm animal to ensure compatibility but I have no idea how realistic that > is (or how much the LTS version of 1.0.2 has diverged from the last available > public 1.0.2 version). Surely the answer must be "not much". The entire point of an LTS version is to not have to change dusty calling applications. We had definitely better have some animals still using 1.0.2, but I don't see much reason to think that the last public release wouldn't be good enough. regards, tom lane
Re: Make pgbench exit on SIGINT more reliably
On Tue May 23, 2023 at 7:31 PM CDT, Michael Paquier wrote: > On Mon, May 22, 2023 at 10:02:02AM -0500, Tristan Partin wrote: > > The way that pgbench handled SIGINT changed in > > 1d468b9ad81b9139b4a0b16b416c3597925af4b0. Unfortunately this had a > > couple of unintended consequences, at least from what I can tell[1]. > > > > - CTRL-C no longer stops the program unless the right point in pgbench > > execution is hit > > - pgbench no longer exits with a non-zero exit code > > > > An easy reproduction of these problems is to run with a large scale > > factor like: pgbench -i -s 50. Then try to CTRL-C the program. > > This comes from the code path where the data is generated client-side, > and where the current CancelRequested may not be that responsive, > isn't it? It comes from the fact that CancelRequested is only checked within the generation of the pgbench_accounts table, but with a large enough scale factor or enough latency, say cross-continent communication, generating the other tables can be time consuming too. But, yes you are more likely to run into this issue when generating client-side data. -- Tristan Partin Neon (https://neon.tech)
Re: PG 16 draft release notes ready
On Wed, May 24, 2023 at 12:23:02PM +0700, John Naylor wrote: > > On Wed, May 24, 2023 at 11:19 AM Bruce Momjian wrote: > > > > Second, you might be correct that the section is wrong. I thought of > > CPU instructions as something tied to the compiler, so part of the build > > process or source code, but the point we should be make is that we have > > these acceleration, not how it is implemented. We can move the entire > > group to the "General Performance" section, or we can split it out: > > Splitting out like that seems like a good idea to me. Okay, items split into sections and several merged. I left the CPU-specific parts in Source Code, and moved the rest into a merged item in General Performance, but moved the JSON item to Data Types. Patch attached, and you can see the results at: https://momjian.us/pgsql_docs/release-16.html > The last one refers to new internal functions, so it could stay in source > code. > (Either way, we don't want to imply that arrays of SQL types are accelerated > this way, it's so far only for internal arrays.) Good point. I called them "C arrays" but it it into the General Performance item. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you. commit ad5406246b Author: Bruce Momjian Date: Wed May 24 09:54:34 2023 -0400 doc: PG 16 relnotes, merge and move vector items Reported-by: John Naylor Discussion: https://postgr.es/m/CAFBsxsEPg8L2MmGqavc8JByC=wf_mnkhn-kknfpkcqh0hyd...@mail.gmail.com diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml index c30c530065..bb92fe5cf9 100644 --- a/doc/src/sgml/release-16.sgml +++ b/doc/src/sgml/release-16.sgml @@ -472,6 +472,28 @@ Author: David Rowley Improve the speed of updating the process title (David Rowley) + + + + + + +Allow xid/subxid searches and ASCII string detection to use vector operations (Nathan Bossart) + + + +ASCII detection is particularly useful for COPY FROM. Vector operations are also used for some C array searches. + + @@ -1781,6 +1803,17 @@ The IS JSON checks include checks for values, arrays, objects, scalars, and uniq + + + + +Allow JSON string parsing to use vector operations (John Naylor) + + + - - - -Allow ASCII string detection to use vector operations (John Naylor) - - - - - - - -Allow JSON string parsing to use vector operations (John Naylor) - - - -ARM? - - - - - - - -Allow array searches to use vector operations (John Naylor) - - - - - - - -Allow xid/subxid searches to use vector operations (Nathan Bossart) - - -
Re: memory leak in trigger handling (since PG12)
On 5/23/23 22:57, Tomas Vondra wrote: > > > On 5/23/23 18:39, Tom Lane wrote: >> Tomas Vondra writes: >>> it seems there's a fairly annoying memory leak in trigger code, >>> introduced by >>> ... >>> Attached is a patch, restoring the pre-12 behavior for me. >> >>> While looking for other places allocating stuff in ExecutorState (for >>> the UPDATE case) and leaving it there, I found two more cases: >> >>> 1) copy_plpgsql_datums >> >>> 2) make_expanded_record_from_tupdesc >>>make_expanded_record_from_exprecord >> >>> All of this is calls from plpgsql_exec_trigger. >> >> Not sure about the expanded-record case, but both of your other two >> fixes feel like poor substitutes for pushing the memory into a >> shorter-lived context. In particular I'm quite surprised that >> plpgsql isn't already allocating that workspace in the "procedure" >> memory context. >> > > I don't disagree, but which memory context should this use and > when/where should we switch to it? > > I haven't seen any obvious memory context candidate in the code > calling ExecGetAllUpdatedCols, so I guess we'd have to pass it from > above. Is that a good idea for backbranches ... > I looked at this again, and I think GetPerTupleMemoryContext(estate) might do the trick, see the 0002 part. Unfortunately it's not much smaller/simpler than just freeing the chunks, because we end up doing oldcxt = MemoryContextSwitchTo(GetPerTupleMemoryContext(estate)); updatedCols = ExecGetAllUpdatedCols(relinfo, estate); MemoryContextSwitchTo(oldcxt); and then have to pass updatedCols elsewhere. It's tricky to just switch to the context (e.g. in ExecASUpdateTriggers/ExecARUpdateTriggers), as AfterTriggerSaveEvent allocates other bits of memory too (in a longer lived context). So we'd have to do another switch again. Not sure how backpatch-friendly would that be. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL CompanyFrom 85439e2587f035040c82e7303b96b887e650b01f Mon Sep 17 00:00:00 2001 From: Tomas Vondra Date: Tue, 23 May 2023 17:45:47 +0200 Subject: [PATCH 1/3] Free updatedCols bitmaps --- src/backend/commands/trigger.c | 22 -- src/backend/executor/execMain.c | 9 +++-- 2 files changed, 27 insertions(+), 4 deletions(-) diff --git a/src/backend/commands/trigger.c b/src/backend/commands/trigger.c index 4b295f8da5e..fb11203c473 100644 --- a/src/backend/commands/trigger.c +++ b/src/backend/commands/trigger.c @@ -2916,6 +2916,8 @@ ExecBSUpdateTriggers(EState *estate, ResultRelInfo *relinfo) (errcode(ERRCODE_E_R_I_E_TRIGGER_PROTOCOL_VIOLATED), errmsg("BEFORE STATEMENT trigger cannot return a value"))); } + + bms_free(updatedCols); } void @@ -2928,12 +2930,18 @@ ExecASUpdateTriggers(EState *estate, ResultRelInfo *relinfo, Assert(relinfo->ri_RootResultRelInfo == NULL); if (trigdesc && trigdesc->trig_update_after_statement) + { + Bitmapset *updatedCols = ExecGetAllUpdatedCols(relinfo, estate); + AfterTriggerSaveEvent(estate, relinfo, NULL, NULL, TRIGGER_EVENT_UPDATE, false, NULL, NULL, NIL, - ExecGetAllUpdatedCols(relinfo, estate), + updatedCols, transition_capture, false); + + bms_free(updatedCols); + } } bool @@ -3043,6 +3051,9 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate, heap_freetuple(trigtuple); if (should_free_new) heap_freetuple(oldtuple); + + bms_free(updatedCols); + return false; /* "do nothing" */ } else if (newtuple != oldtuple) @@ -3068,6 +3079,8 @@ ExecBRUpdateTriggers(EState *estate, EPQState *epqstate, if (should_free_trig) heap_freetuple(trigtuple); + bms_free(updatedCols); + return true; } @@ -3107,6 +3120,7 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo, */ TupleTableSlot *oldslot; ResultRelInfo *tupsrc; + Bitmapset *updatedCols; Assert((src_partinfo != NULL && dst_partinfo != NULL) || !is_crosspart_update); @@ -3129,14 +3143,18 @@ ExecARUpdateTriggers(EState *estate, ResultRelInfo *relinfo, else ExecClearTuple(oldslot); + updatedCols = ExecGetAllUpdatedCols(relinfo, estate); + AfterTriggerSaveEvent(estate, relinfo, src_partinfo, dst_partinfo, TRIGGER_EVENT_UPDATE, true, oldslot, newslot, recheckIndexes, - ExecGetAllUpdatedCols(relinfo, estate), + updatedCols, transition_capture, is_crosspart_update); + + bms_free(updatedCols); } } diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c index c76fdf59ec4..98502b956c2 100644 --- a/src/backend/executor/execMain.c +++ b/src/backend/executor/execMain.c @@ -2367,6 +2367,7 @@ ExecUpdateLockMode(EState *estate, ResultRelInfo *relinfo) { Bitmapset *keyCols; Bitmapset *updatedCols; + LockTupleMode lockMode; /* * Compute lock mode to use. If columns that are part of the key have not @@ -2378,9 +2379
Re: memory leak in trigger handling (since PG12)
On 5/24/23 10:19, Jakub Wartak wrote: > Hi, just two cents: > > On Tue, May 23, 2023 at 8:01 PM Andres Freund wrote: >> >> Hi, >> >> On 2023-05-23 13:28:30 -0400, Tom Lane wrote: >>> Andres Freund writes: Could it help to have a mode where the executor shutdown hook checks how much memory is allocated in ExecutorState and warns if its too much? >>> >>> It'd be very hard to set a limit for what's "too much", since the amount >>> of stuff created initially will depend on the plan size. >> >> I was thinking of some limit that should really never be reached outside of a >> leak or work_mem based allocations, say 2GB or so. > > RE: instrumentation subthread: > if that helps then below technique can work somewhat good on normal > binaries for end users (given there are debug symbols installed), so > maybe we don't need that much infrastructure added in to see the hot > code path: > > perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on > x86_64(palloc size arg0) > perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p > sleep 3 # cannot be longer, huge overhead (~3s=~2GB) > > it produces: > 50.27% (563d0e380670) size=24 > | > ---palloc >bms_copy >ExecUpdateLockMode >ExecBRUpdateTriggers >ExecUpdate > [..] > > 49.73% (563d0e380670) size=16 > | > ---palloc >bms_copy >RelationGetIndexAttrBitmap >ExecUpdateLockMode >ExecBRUpdateTriggers >ExecUpdate > [..] > > Now we know that those small palloc() are guilty, but we didn't know > at the time with Tomas. The problem here is that we do not know in > palloc() - via its own arguments for which MemoryContext this is going > to be allocated for. This is problematic for perf, because on RHEL8, I > was not able to generate an uprobe that was capable of reaching a > global variable (CurrentMemoryContext) at that time. > I think there are a couple even bigger issues: (a) There are other methods that allocate memory - repalloc, palloc0, MemoryContextAlloc, ... and so on. But presumably we can trace all of them at once? We'd have to trace reset/deletes too. (b) This doesn't say if/when the allocated chunks get freed - even with a fix, we'll still do exactly the same number of allocations, except that we'll free the memory shortly after that. I wonder if we could print a bit more info for each probe, to match the alloc/free requests. > Additionally what was even more frustrating on diagnosing that case on > the customer end system, was that such OOMs were crashing other > PostgreSQL clusters on the same OS. Even knowing the exact guilty > statement it was impossible to limit RSS memory usage of that > particular backend. So, what you are proposing makes a lot of sense. > Also it got me thinking of implementing safety-memory-net-GUC > debug_query_limit_backend_memory=X MB that would inject > setrlimit(RLIMIT_DATA) through external extension via hook(s) and > un-set it later, but the man page states it works for mmap() only > after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 - > maybe that's still good enough?; Or, well maybe try to hack a palloc() > a little, but that has probably too big overhead, right? (just > thinking loud). > Not sure about setting a hard limit - that seems pretty tricky and may easily backfire. It's already possible to set such memory limit using e.g. cgroups, or even better use VMs to isolate the instances. I certainly agree it's annoying that when OOM hits, we end up with no information about what used the memory. Maybe we could have a threshold triggering a call to MemoryContextStats? So that we have at least some memory usage info in the log in case the OOM killer intervenes. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Large files for relations
On Wed, May 24, 2023 at 2:18 AM Peter Eisentraut wrote: > > What I'm hearing is that something simple like this might be more > > acceptable: > > > > * initdb --rel-segsize (cf --wal-segsize), default unchanged > > makes sense +1. > > * pg_upgrade would convert if source and target don't match > > This would be good, but it could also be an optional or later feature. +1. I think that would be nice to have, but not absolutely required. IMHO it's best not to overcomplicate these projects. Not everything needs to be part of the initial commit. If the initial commit happens 2 months from now and then stuff like this gets added over the next 8, that's strictly better than trying to land the whole patch set next March. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Atomic ops for unlogged LSN
On Tue, May 23, 2023 at 6:26 PM Michael Paquier wrote: > Spinlocks provide a full memory barrier, which may not the case with > add_u64() or read_u64(). Could memory ordering be a problem in these > code paths? It could be a huge problem if what you say were true, but unless I'm missing something, the comments in atomics.h say that it isn't. The comments for the 64-bit functions say to look at the 32-bit functions, and there it says: /* * pg_atomic_add_fetch_u32 - atomically add to variable * * Returns the value of ptr after the arithmetic operation. * * Full barrier semantics. */ Which is probably a good thing, because I'm not sure what good it would be to have an operation that both reads and modifies an atomic variable but has no barrier semantics. That's not to say that I entirely understand the point of this patch. It seems like a generally reasonable thing to do AFAICT but somehow I wonder if there's more to the story here, because it doesn't feel like it would be easy to measure the benefit of this patch in isolation. That's not a reason to reject it, though, just something that makes me curious. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Docs: Encourage strong server verification with SCRAM
> On 23 May 2023, at 23:02, Stephen Frost wrote: > * Jacob Champion (jchamp...@timescale.com) wrote: >> - low iteration counts accepted by the client make it easier than it >> probably should be for a MITM to brute-force passwords (note that >> PG16's scram_iterations GUC, being server-side, does not mitigate >> this) > > This would be good to improve on. The mechanics of this are quite straighforward, the problem IMHO lies in how to inform and educate users what a reasonable iteration count is, not to mention what an iteration count is in the first place. > Perhaps more succinctly- maybe we should be making adjustments to the > current language instead of just adding a new paragraph. +1 -- Daniel Gustafsson
Re: pgsql: TAP test for logical decoding on standby
Hi, On 5/23/23 5:15 PM, Robert Haas wrote: On Sat, Apr 8, 2023 at 5:26 AM Andres Freund wrote: TAP test for logical decoding on standby Small nitpicks: 1. The test names generated by check_slots_conflicting_status() start with a capital letter, while most other test names start with a lower-case letter. Yeah, not sure that would deserve a fix for its own but if we address 2. then let's do 1. too. 2. The function is called 7 times, 6 with a true argument and 1 with a false argument, but the test name only depends on whether the argument is true or false, so we get the same test name 6 times. Maybe there's not a reasonable way to do better, I'm not sure, but it's not ideal. I agree that's not ideal (but one could still figure out which one is failing if any by looking at the perl script). If we want to "improve" this, what about passing a second argument that would provide more context in the test name? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Wrong results due to missing quals
Testing with SQLancer reports a wrong results issue on master and I reduced it to the repro query below. create table t (a int, b int); explain (costs off) select * from t t1 left join (t t2 left join t t3 full join t t4 on false on false) left join t t5 on t2.a = t5.a on t2.b = 1; QUERY PLAN -- Nested Loop Left Join -> Seq Scan on t t1 -> Materialize -> Nested Loop Left Join -> Nested Loop Left Join Join Filter: false -> Seq Scan on t t2 Filter: (b = 1) -> Result One-Time Filter: false -> Materialize -> Seq Scan on t t5 (12 rows) So the qual 't2.a = t5.a' is missing. I looked into it and found that both clones of this joinqual are rejected by clause_is_computable_at, because their required_relids do not include the outer join of t2/(t3/t4), and meanwhile include nullable rels of this outer join. I think the root cause is that, as Tom pointed out in [1], we're not maintaining required_relids very accurately. In b9c755a2, we make clause_is_computable_at test required_relids for clone clauses. I think this is how this issue sneaks in. To fix it, it seems to me that the ideal way would be to always compute accurate required_relids. But I'm not sure how difficult it is. [1] https://www.postgresql.org/message-id/395264.1684698283%40sss.pgh.pa.us Thanks Richard
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 24 May 2023, at 11:52, Michael Paquier wrote: > > On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote: >> 1.0.2 is also an LTS version available commercially for premium support >> customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh >> slated for release next week. This raises the likelyhood of Postgres >> installations using 1.0.2 in production still, and for some time to come. > > Good point. Indeed, that makes it pretty clear that not dropping > 1.0.2 would be the best option for the time being, so 0001 would be > enough. I think thats the right move re 1.0.2 support. 1.0.2 is also the version in RHEL7 which is in ELS until 2026. When we moved the goalposts to 1.0.1 (commit 7b283d0e1d1) we referred to RHEL6 using 1.0.1, and RHEL6 goes out of ELS in late June 2024 seems possible to drop 1.0.1 support during v17. I haven't studied the patch yet but I'll have a look at it. > I am wondering if we should worry about having a buildfarm member that > could test these binaries, though, in case they have compatibility > issues.. But it would be harder to debug without the code at hand, as > well. It would be nice it the OpenSSL project could grant us an LTS license for a buildfarm animal to ensure compatibility but I have no idea how realistic that is (or how much the LTS version of 1.0.2 has diverged from the last available public 1.0.2 version). -- Daniel Gustafsson
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Wed, May 24, 2023 at 11:36:56AM +0200, Daniel Gustafsson wrote: > 1.0.2 is also an LTS version available commercially for premium support > customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh > slated for release next week. This raises the likelyhood of Postgres > installations using 1.0.2 in production still, and for some time to come. Good point. Indeed, that makes it pretty clear that not dropping 1.0.2 would be the best option for the time being, so 0001 would be enough. I am wondering if we should worry about having a buildfarm member that could test these binaries, though, in case they have compatibility issues.. But it would be harder to debug without the code at hand, as well. -- Michael signature.asc Description: PGP signature
Re: RFI: Extending the TOAST Pointer
On Tue, 23 May 2023 at 18:34, Robert Haas wrote: > > On Thu, May 18, 2023 at 8:06 AM Matthias van de Meent > wrote: > > This enum still has many options to go before it exceeds the maximum > > of the uint8 va_tag field. Therefore, I don't think we have no disk > > representations left, nor do I think we'll need to add another option > > to the ToastCompressionId enum. > > As an example, we can add another VARTAG option for dictionary-enabled > > external toast; like what the pluggable toast patch worked on. I think > > we can salvage some ideas from that patch, even if the main idea got > > stuck. > > Adding another VARTAG option is somewhat different from adding another > ToastCompressionId. I think that right now we have embedded in various > places the idea that VARTAG_EXTERNAL is the only thing that shows up > on disk, and we'd need to track down all such places and adjust them > if we add other VARTAG types in the future. Depending on how it is to > be used, adding a new ToastCompressionId might be less work. However, > I don't think we can use the possibility of adding a new VARTAG value > as a reason why it's OK to use up the last possible ToastCompressionId > value for something non-extensible. I think you might not have picked up on what I was arguing for, but I agree with what you just said. My comment on not needing to invent a new ToastCompressionId was on the topic of adding capabilities^ to toast pointers that do things differently than the current TOAST and need more info than just sizes, 2x 32-bit ID and a compression algorithm. ^ capabilities such as compression dictionaries (which would need to store a dictionary ID in the pointer), TOAST IDs that are larger than 32 bits, and other such advances. Kind regards, Matthias van de Meent Neon, Inc.
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 24 May 2023, at 10:22, Michael Paquier wrote: > The removal of 1.0.2 would be nice, but the tweaks > needed for LibreSSL make it less appealing. 1.0.2 is also an LTS version available commercially for premium support customers of OpenSSL (1.1.1 will become an LTS version as well), with 1.0.2zh slated for release next week. This raises the likelyhood of Postgres installations using 1.0.2 in production still, and for some time to come. -- Daniel Gustafsson
Re: RFI: Extending the TOAST Pointer
Hi! I've made a WIP patch that uses 64-bit TOAST value ID instead of 32-bit, and sent it as a part of discussion, but there was no feedback on such a solution. There was a link to that discussion at the top of this thread. Also, I have to note that, based on our work on Pluggable TOAST - extending TOAST pointer with additional structures would require review of the logical replication engine, currently it is not suitable for any custom TOAST pointers. Currently we have no final solution for problems with logical replication for custom TOAST pointers. -- Regards, Nikita Malakhov Postgres Professional The Russian Postgres Company https://postgrespro.ru/
Re: memory leak in trigger handling (since PG12)
On 5/23/23 23:39, Tom Lane wrote: > Tomas Vondra writes: >> The really hard thing was determining what causes the memory leak - the >> simple instrumentation doesn't help with that at all. It tells you there >> might be a leak, but you don't know where did the allocations came from. > >> What I ended up doing is a simple gdb script that sets breakpoints on >> all palloc/pfree variants, and prints info (including the backtrace) for >> each call on ExecutorState. And then a script that aggregate those to >> identify which backtraces allocated most chunks that were not freed. > > FWIW, I've had some success localizing palloc memory leaks with valgrind's > leak detection mode. The trick is to ask it for a report before the > context gets destroyed. Beats writing your own infrastructure ... > I haven't tried valgrind, so can't compare. Would it be possible to filter which memory contexts to track? Say we know the leak is in ExecutorState, so we don't need to track allocations in other contexts. That was a huge speedup for me, maybe it'd help valgrind too. Also, how did you ask for the report before context gets destroyed? >> Would require testing with more data, though. I doubt we'd find much >> with our regression tests, which have tiny data sets. > > Yeah, it's not clear whether we could make the still-hypothetical check > sensitive enough to find leaks using small test cases without getting an > unworkable number of false positives. Still, might be worth trying. I'm not against experimenting with that. Were you thinking about something that'd be cheap enough to just be enabled always/everywhere, or something we'd enable during testing? This reminded me a strangeloop talk [1] [2] about the Scalene memory profiler from UMass. That's for Python, but they did some smart tricks to reduce the cost of profiling - maybe we could do something similar, possibly by extending the memory contexts a bit. [1] https://youtu.be/vVUnCXKuNOg?t=1405 [2] https://youtu.be/vVUnCXKuNOg?t=1706 > It might be an acceptable tradeoff to have stricter rules for what can > be allocated in ExecutorState in order to make this sort of problem > more easily detectable. > Would these rules be just customary, or defined/enforced in the code somehow? I can't quite imagine how would that work, TBH. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
Hi all, $Subject says it all. Based on an analysis of the code, I can note the following when it comes to the removal of 1.0.1: - A lot of code related to channel binding is now simplified as X509_get_signature_nid() always exists, mostly around its CFLAGS. - This removes the comments related to 1.0.1e introduced by 74242c2. Then for 1.0.2, the following flags can be gone: HAVE_ASN1_STRING_GET0_DATA HAVE_BIO_GET_DATA HAVE_BIO_METH_NEW HAVE_HMAC_CTX_FREE HAVE_HMAC_CTX_NEW It would be nice to remove CRYPTO_lock(), but from what I can see the function still exists in LibreSSL, meaning that libpq locking wouldn't be thread-safe if these function's paths are removed. Another related question is how much do we care about builds with LibreSSL with MSVC? This patch sets takes the simple path of assuming that this has never been really tested, and if you look at the MSVC scripts on HEAD we rely on a version number from OpenSSL, which is not something LibreSSL copes nicely with already, as documented in configure.ac. OpenSSL 1.0.2 has been EOLd at the end of 2019, and 1.0.1 had its last minor release in September 2019, so with Postgres 17 targetted in September/October 2024, we would be five years behind that. Last comes the buildfarm, and I suspect that a few animals are building with 1.0.2. Among what I have spotted: - wobbegong and vulpes, on Fedora 27, though I could not find references about the version used there. - bonito, on Fedora 29. - SUSE 12 SP{4,5} have 1.0.2 as their newer version. - butterflyfish may not like that, if I recall correctly, as it should have 1.0.2. So, it seems to me that 1.0.1 would be a rather silent move for the buildfarm, and 1.0.2 could lead to some noise. Note as well that 1.1.0 support has been stopped by upstream at the same time as 1.0.1, with a last minor release in September 2019, though that feels like a huge jump at this stage. On a code basis, removing 1.0.1 leads to the most cleanup. The removal of 1.0.2 would be nice, but the tweaks needed for LibreSSL make it less appealing. Attached are two patches to remove support for 1.0.1 and 1.0.2 for now, kept separated for clarity, to be considered as something to do at the beginning of the v17 cycle. 0001 is in a rather good shape seen from here. Now, for 0002 and the removal of 1.0.2, I am seeing two things once OPENSSL_API_COMPAT is bumped from 0x10002000L to 0x1010L: - be-secure-openssl.c requires an extra openssl/bn.h, which is not a big deal, from what I get. - Much more interesting: OpenSSL_add_all_algorithms() has two macros that get removed, see include/openssl/evp.h: # ifdef OPENSSL_LOAD_CONF # define OpenSSL_add_all_algorithms() \ OPENSSL_init_crypto(OPENSSL_INIT_ADD_ALL_CIPHERS \ | OPENSSL_INIT_ADD_ALL_DIGESTS \ | OPENSSL_INIT_LOAD_CONFIG, NULL) # else # define OpenSSL_add_all_algorithms() \ OPENSSL_init_crypto(OPENSSL_INIT_ADD_ALL_CIPHERS \ | OPENSSL_INIT_ADD_ALL_DIGESTS, NULL) # endif The part where I am not quite sure of what to do is about OPENSSL_LOAD_CONF. We could call directly OPENSSL_init_crypto() and add OPENSSL_INIT_LOAD_CONFIG if building with OPENSSL_LOAD_CONF, but it feels a bit ugly to copy-paste this code from OpenSSL itself. Note that patch 0002 still has OPENSSL_API_COMPAT at 0x10002000L. OPENSSL_init_crypto() looks to be in LibreSSL, and it is new in OpenSSL 1.1.0, so switching the minimum to OpenSSL 1.1.0 should not require a new cflags on this one. Thoughts? -- Michael From 553b9eb7e24250a92d7551cb28a301d9e63ba7ad Mon Sep 17 00:00:00 2001 From: Michael Paquier Date: Wed, 24 May 2023 16:22:02 +0900 Subject: [PATCH v1 1/2] Remove support for OpenSSL 1.0.1 A few notes about that: - This simplifies a lot of code areas related to channel binding, as X509_get_signature_nid() always exists. - This removes the comments related to 1.0.1e introduced by 74242c2. - Do we need to care about the case of building the Postgres code with LibreSSL on Windows for the MSVC scripts, leading to the removal of the check with HAVE_SSL_CTX_SET_CERT_CB? --- src/include/libpq/libpq-be.h | 6 +--- src/include/pg_config.h.in | 3 -- src/backend/libpq/auth-scram.c | 20 ++-- src/backend/libpq/be-secure-openssl.c| 4 --- src/interfaces/libpq/fe-auth-scram.c | 8 ++--- src/interfaces/libpq/fe-auth.c | 2 +- src/interfaces/libpq/fe-secure-openssl.c | 4 --- src/interfaces/libpq/libpq-int.h | 6 +--- src/test/ssl/t/002_scram.pl | 41 doc/src/sgml/installation.sgml | 2 +- configure| 16 - configure.ac | 9 +++--- meson.build | 5 ++- src/tools/msvc/Solution.pm | 10 +- 14 files changed, 38 insertions(+), 98 deletions(-) diff --git a/src/include/libpq/libpq-be.h b/src/include/libpq
Re: memory leak in trigger handling (since PG12)
Hi, just two cents: On Tue, May 23, 2023 at 8:01 PM Andres Freund wrote: > > Hi, > > On 2023-05-23 13:28:30 -0400, Tom Lane wrote: > > Andres Freund writes: > > > Could it help to have a mode where the executor shutdown hook checks how > > > much > > > memory is allocated in ExecutorState and warns if its too much? > > > > It'd be very hard to set a limit for what's "too much", since the amount > > of stuff created initially will depend on the plan size. > > I was thinking of some limit that should really never be reached outside of a > leak or work_mem based allocations, say 2GB or so. RE: instrumentation subthread: if that helps then below technique can work somewhat good on normal binaries for end users (given there are debug symbols installed), so maybe we don't need that much infrastructure added in to see the hot code path: perf probe -x /path/to/postgres 'palloc' 'size=%di:u64' # RDI on x86_64(palloc size arg0) perf record -avg --call-graph dwarf -e probe_postgres:palloc -aR -p sleep 3 # cannot be longer, huge overhead (~3s=~2GB) it produces: 50.27% (563d0e380670) size=24 | ---palloc bms_copy ExecUpdateLockMode ExecBRUpdateTriggers ExecUpdate [..] 49.73% (563d0e380670) size=16 | ---palloc bms_copy RelationGetIndexAttrBitmap ExecUpdateLockMode ExecBRUpdateTriggers ExecUpdate [..] Now we know that those small palloc() are guilty, but we didn't know at the time with Tomas. The problem here is that we do not know in palloc() - via its own arguments for which MemoryContext this is going to be allocated for. This is problematic for perf, because on RHEL8, I was not able to generate an uprobe that was capable of reaching a global variable (CurrentMemoryContext) at that time. Additionally what was even more frustrating on diagnosing that case on the customer end system, was that such OOMs were crashing other PostgreSQL clusters on the same OS. Even knowing the exact guilty statement it was impossible to limit RSS memory usage of that particular backend. So, what you are proposing makes a lot of sense. Also it got me thinking of implementing safety-memory-net-GUC debug_query_limit_backend_memory=X MB that would inject setrlimit(RLIMIT_DATA) through external extension via hook(s) and un-set it later, but the man page states it works for mmap() only after Linux 4.7+ so it is future proof but won't work e.g. on RHEL7 - maybe that's still good enough?; Or, well maybe try to hack a palloc() a little, but that has probably too big overhead, right? (just thinking loud). -Jakub Wartak.