pgsql: Optimize visibilitymap_count() with AVX-512 instructions.

2024-04-06 Thread Nathan Bossart
Optimize visibilitymap_count() with AVX-512 instructions.

Commit 792752af4e added infrastructure for using AVX-512 intrinsic
functions, and this commit uses that infrastructure to optimize
visibilitymap_count().  Specificially, a new pg_popcount_masked()
function is introduced that applies a bitmask to every byte in the
buffer prior to calculating the population count, which is used to
filter out the all-visible or all-frozen bits as needed.  Platforms
without AVX-512 support should also see a nice speedup due to the
reduced number of calls to a function pointer.

Co-authored-by: Ants Aasma
Discussion: 
https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/41c51f0c68b21b4603bd2a9c3d3ad017fdd22627

Modified Files
--
src/backend/access/heap/visibilitymap.c |  25 ++-
src/include/port/pg_bitutils.h  |  34 +
src/port/pg_bitutils.c  | 126 
src/port/pg_popcount_avx512.c   |  60 +++
4 files changed, 225 insertions(+), 20 deletions(-)



pgsql: Optimize pg_popcount() with AVX-512 instructions.

2024-04-06 Thread Nathan Bossart
Optimize pg_popcount() with AVX-512 instructions.

Presently, pg_popcount() processes data in 32-bit or 64-bit chunks
when possible.  Newer hardware that supports AVX-512 instructions
can use 512-bit chunks, which provides a nice speedup, especially
for larger buffers.  This commit introduces the infrastructure
required to detect compiler and CPU support for the required
AVX-512 intrinsic functions, and it adds a new pg_popcount()
implementation that uses these functions.  If CPU support for this
optimized implementation is detected at runtime, a function pointer
is updated so that it is used by subsequent calls to pg_popcount().

Most of the existing in-tree calls to pg_popcount() should benefit
from these instructions, and calls with smaller buffers should at
least not regress compared to v16.  The new infrastructure
introduced by this commit can also be used to optimize
visibilitymap_count(), but that is left for a follow-up commit.

Co-authored-by: Paul Amonson, Ants Aasma
Reviewed-by: Matthias van de Meent, Tom Lane, Noah Misch, Akash Shankaran, 
Alvaro Herrera, Andres Freund, David Rowley
Discussion: 
https://postgr.es/m/BL1PR11MB5304097DF7EA81D04C33F3D1DCA6A%40BL1PR11MB5304.namprd11.prod.outlook.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/792752af4eb5cf7b5b8b0470dbf22901c5178fe5

Modified Files
--
config/c-compiler.m4 |  58 
configure| 252 +++
configure.ac |  51 +++
meson.build  |  87 
src/Makefile.global.in   |   5 +
src/include/pg_config.h.in   |  12 ++
src/include/port/pg_bitutils.h   |  11 ++
src/makefiles/meson.build|   4 +-
src/port/Makefile|  11 ++
src/port/meson.build |   6 +-
src/port/pg_bitutils.c   |   5 +
src/port/pg_popcount_avx512.c|  81 +++
src/port/pg_popcount_avx512_choose.c |  88 
src/test/regress/expected/bit.out|  24 
src/test/regress/sql/bit.sql |   4 +
15 files changed, 696 insertions(+), 3 deletions(-)



pgsql: Fix if/while thinko in read_stream.c edge case.

2024-04-06 Thread Thomas Munro
Fix if/while thinko in read_stream.c edge case.

When we determine that a wanted block can't be combined with the current
pending read, it's time to start that read to get it out of the way.  An
"if" in that code path should have been a "while", because it might take
more than one go in case of partial reads.  This was only broken for
smaller ranges, as the more common case of io_combine_limit-sized ranges
is handled earlier in the code and knows how to loop, hiding the bug for
a while.

Discovered while testing large parallel sequential scans of partially
cached tables.  The ramp-up-and-down block allocator for parallel scans
could hit the problem case and skip some blocks near the end that should
have been streamed.

Defect in commit b5a9b18c.

Discussion: 
https://postgr.es/m/CA%2BhUKG%2Bh8Whpv0YsJqjMVkjYX%2B80fTVc6oi-V%2BzxJvykLpLHYQ%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/158f5819236806b7c9cab323658c231e9371c458

Modified Files
--
src/backend/storage/aio/read_stream.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)



pgsql: Disable parallel query in psql error-with-FETCH_COUNT test.

2024-04-06 Thread Tom Lane
Disable parallel query in psql error-with-FETCH_COUNT test.

The buildfarm members using debug_parallel_query = regress are mostly
unhappy with this test.  I guess what is happening is that rows
generated by a parallel worker are buffered, and might or might not
get to the leader before the expected error occurs.  We did not see
any variability in the old version of this test because each FETCH
would succeed or fail atomically, leading to a predictable number of
rows emitted before failure.  I don't find this to be a bug, just
unspecified behavior, so let's disable parallel query for this one
test case to make the results stable.

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/beb012b42f5c32f578661fc1b033ca25905b27d6

Modified Files
--
src/test/regress/expected/psql.out | 3 +++
src/test/regress/sql/psql.sql  | 3 +++
2 files changed, 6 insertions(+)



pgsql: Support retrieval of results in chunks with libpq.

2024-04-06 Thread Tom Lane
Support retrieval of results in chunks with libpq.

This patch generalizes libpq's existing single-row mode to allow
individual partial-result PGresults to contain up to N rows, rather
than always one row.  This reduces malloc overhead compared to plain
single-row mode, and it is very useful for psql's FETCH_COUNT feature,
since otherwise we'd have to add code (and cycles) to either merge
single-row PGresults into a bigger one or teach psql's
results-printing logic to accept arrays of PGresults.

To avoid API breakage, PQsetSingleRowMode() remains the same, and we
add a new function PQsetChunkedRowsMode() to invoke the more general
case.  Also, PGresults obtained the old way continue to carry the
PGRES_SINGLE_TUPLE status code, while if PQsetChunkedRowsMode() is
used then their status code is PGRES_TUPLES_CHUNK.  The underlying
logic is the same either way, though.

Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)

Discussion: 
https://postgr.es/m/cakzirmxsvtko928cm+-advsmyepmu3l9dqca9nwqjvlpcee...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/4643a2b265e967cc5f13ffa0c7c6912dbb3466d0

Modified Files
--
doc/src/sgml/libpq.sgml| 107 +++
.../libpqwalreceiver/libpqwalreceiver.c|   3 +-
src/bin/pg_amcheck/pg_amcheck.c|   1 +
src/interfaces/libpq/exports.txt   |   1 +
src/interfaces/libpq/fe-exec.c | 146 +
src/interfaces/libpq/fe-protocol3.c|   3 +-
src/interfaces/libpq/libpq-fe.h|   4 +-
src/interfaces/libpq/libpq-int.h   |  10 +-
src/test/modules/libpq_pipeline/libpq_pipeline.c   |  40 ++
.../modules/libpq_pipeline/traces/singlerow.trace  |  14 ++
10 files changed, 243 insertions(+), 86 deletions(-)



pgsql: Re-implement psql's FETCH_COUNT feature atop libpq's chunked mod

2024-04-06 Thread Tom Lane
Re-implement psql's FETCH_COUNT feature atop libpq's chunked mode.

Formerly this was done with a cursor, which is problematic since
not all result-set-returning query types can be put into a cursor.
The new implementation is better integrated into other psql
features, too.

Daniel Vérité, reviewed by Laurenz Albe and myself (and whacked
around a bit by me, so any remaining bugs are my fault)

Discussion: 
https://postgr.es/m/cakzirmxsvtko928cm+-advsmyepmu3l9dqca9nwqjvlpcee...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/90f5178211cd63ac16fb8c8b2fe43d53d2854da1

Modified Files
--
src/bin/psql/common.c  | 522 -
src/bin/psql/t/001_basic.pl|   6 +-
src/test/regress/expected/psql.out |   4 +-
src/test/regress/sql/psql.sql  |   4 +-
4 files changed, 171 insertions(+), 365 deletions(-)



pgsql: Change BitmapAdjustPrefetchIterator to accept BlockNumber

2024-04-06 Thread Tomas Vondra
Change BitmapAdjustPrefetchIterator to accept BlockNumber

BitmapAdjustPrefetchIterator() only used the blockno member of the
passed in TBMIterateResult to ensure that the prefetch iterator and
regular iterator stay in sync. Pass it the BlockNumber only, so that we
can move away from using the TBMIterateResult outside of table AM
specific code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: 
https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/92641d8d651e685b49a6e2842d306aa5fe7ba500

Modified Files
--
src/backend/executor/nodeBitmapHeapscan.c | 8 
1 file changed, 4 insertions(+), 4 deletions(-)



pgsql: BitmapHeapScan: Use correct recheck flag for skip_fetch

2024-04-06 Thread Tomas Vondra
BitmapHeapScan: Use correct recheck flag for skip_fetch

As of 7c70996ebf0949b142a9, BitmapPrefetch() used the recheck flag for
the current block to determine whether or not it should skip prefetching
the proposed prefetch block. As explained in the comment, this assumed
the index AM will report the same recheck value for the future page as
it did for the current page - but there's no guarantee.

This only affects prefetching - if the recheck flag changes, we may
prefetch blocks unecessarily and not prefetch blocks that will be
needed. But we don't need to rely on that assumption - we know the
recheck flag for the block we're considering prefetching, so we can
use that.

The impact is very limited in practice - the opclass would need to
assign different recheck flags to different blocks, but none of the
built-in opclasses seems to do that.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Tom Lane
Discussion: https://postgr.es/m/1939305.1712415547%40sss.pgh.pa.us

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/1fdb0ce9b10970a4b02f1ef0c269e2c1fbbecd25

Modified Files
--
src/backend/executor/nodeBitmapHeapscan.c | 9 ++---
1 file changed, 2 insertions(+), 7 deletions(-)



pgsql: BitmapHeapScan: Push skip_fetch optimization into table AM

2024-04-06 Thread Tomas Vondra
BitmapHeapScan: Push skip_fetch optimization into table AM

Commit 7c70996ebf0949b142 introduced an optimization to allow bitmap
scans to operate like index-only scans by not fetching a block from the
heap if none of the underlying data is needed and the block is marked
all visible in the visibility map.

With the introduction of table AMs, a FIXME was added to this code
indicating that the skip_fetch logic should be pushed into the table
AM-specific code, as not all table AMs may use a visibility map in the
same way.

This commit resolves this FIXME for the current block. The layering
violation is still present in BitmapHeapScans's prefetching code, which
uses the visibility map to decide whether or not to prefetch a block.
However, this can be addressed independently.

Author: Melanie Plageman
Reviewed-by: Andres Freund, Heikki Linnakangas, Tomas Vondra, Mark Dilger
Discussion: 
https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/04e72ed617be354a53a076b76c6644e364ed80a3

Modified Files
--
src/backend/access/heap/heapam.c  |  15 
src/backend/access/heap/heapam_handler.c  |  29 +++
src/backend/executor/nodeBitmapHeapscan.c | 124 ++
src/include/access/heapam.h   |  10 +++
src/include/access/tableam.h  |  12 ++-
src/include/nodes/execnodes.h |   8 +-
6 files changed, 105 insertions(+), 93 deletions(-)



pgsql: Implement ALTER TABLE ... MERGE PARTITIONS ... command

2024-04-06 Thread Alexander Korotkov
Implement ALTER TABLE ... MERGE PARTITIONS ... command

This new DDL command merges several partitions into the one partition of the
target table.  The target partition is created using new
createPartitionTable() function with parent partition as the template.

This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.

Discussion: 
https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/1adf16b8fba45f77056d91573cd7138ed9da4ebf

Modified Files
--
doc/src/sgml/ddl.sgml  |  19 +
doc/src/sgml/ref/alter_table.sgml  |  77 ++-
src/backend/commands/tablecmds.c   | 354 +-
src/backend/parser/gram.y  |  22 +-
src/backend/parser/parse_utilcmd.c |  89 +++
src/backend/partitioning/partbounds.c  | 207 ++
src/include/nodes/parsenodes.h |  14 +
src/include/parser/kwlist.h|   1 +
src/include/partitioning/partbounds.h  |   6 +
src/test/isolation/expected/partition-merge.out| 199 ++
src/test/isolation/isolation_schedule  |   1 +
src/test/isolation/specs/partition-merge.spec  |  54 ++
.../modules/test_ddl_deparse/test_ddl_deparse.c|   3 +
src/test/regress/expected/partition_merge.out  | 732 +
src/test/regress/parallel_schedule |   2 +-
src/test/regress/sql/partition_merge.sql   | 430 
src/tools/pgindent/typedefs.list   |   1 +
17 files changed, 2189 insertions(+), 22 deletions(-)



pgsql: Implement ALTER TABLE ... SPLIT PARTITION ... command

2024-04-06 Thread Alexander Korotkov
Implement ALTER TABLE ... SPLIT PARTITION ... command

This new DDL command splits a single partition into several parititions.
Just like ALTER TABLE ... MERGE PARTITIONS ... command, new patitions are
created using createPartitionTable() function with parent partition as the
template.

This commit comprises quite naive implementation which works in single process
and holds the ACCESS EXCLUSIVE LOCK on the parent table during all the
operations including the tuple routing.  This is why this new DDL command
can't be recommended for large partitioned tables under a high load.  However,
this implementation come in handy in certain cases even as is.
Also, it could be used as a foundation for future implementations with lesser
locking and possibly parallel.

Discussion: 
https://postgr.es/m/c73a1746-0cd0-6bdd-6b23-3ae0b7c0c582%40postgrespro.ru
Author: Dmitry Koval
Reviewed-by: Matthias van de Meent, Laurenz Albe, Zhihong Yu, Justin Pryzby
Reviewed-by: Alvaro Herrera, Robert Haas, Stephane Tachoires

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/87c21bb9412c8ba2727dec5ebcd74d44c2232d11

Modified Files
--
doc/src/sgml/ddl.sgml  |   19 +
doc/src/sgml/ref/alter_table.sgml  |   65 +-
src/backend/commands/tablecmds.c   |  411 ++
src/backend/parser/gram.y  |   38 +-
src/backend/parser/parse_utilcmd.c |   62 +-
src/backend/partitioning/partbounds.c  |  657 +
src/backend/utils/adt/ruleutils.c  |   18 +
src/include/nodes/parsenodes.h |1 +
src/include/parser/kwlist.h|1 +
src/include/partitioning/partbounds.h  |5 +
src/include/utils/ruleutils.h  |2 +
src/test/isolation/expected/partition-split.out|  190 +++
src/test/isolation/isolation_schedule  |1 +
src/test/isolation/specs/partition-split.spec  |   54 +
.../modules/test_ddl_deparse/test_ddl_deparse.c|3 +
src/test/regress/expected/partition_split.out  | 1417 
src/test/regress/parallel_schedule |2 +-
src/test/regress/sql/partition_split.sql   |  833 
src/tools/pgindent/typedefs.list   |1 +
19 files changed, 3766 insertions(+), 14 deletions(-)



pgsql: BitmapHeapScan: postpone setting can_skip_fetch

2024-04-06 Thread Tomas Vondra
BitmapHeapScan: postpone setting can_skip_fetch

Set BitmapHeapScanState->can_skip_fetch in BitmapHeapNext() instead of
in ExecInitBitmapHeapScan(). This is a preliminary step to pushing the
skip fetch optimization into heap AM code.

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: 
https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/fe1431e39cdde5f65cb52f068bc86a7490f8a4e3

Modified Files
--
src/backend/executor/nodeBitmapHeapscan.c | 21 +++--
1 file changed, 11 insertions(+), 10 deletions(-)



pgsql: Use an LWLock instead of a spinlock in waitlsn.c

2024-04-06 Thread Alexander Korotkov
Use an LWLock instead of a spinlock in waitlsn.c

This should prevent busy-waiting when number of waiting processes is high.

Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql
Author: Alvaro Herrera

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/25f42429e2ff2acca35c9154fc2e36b75c79227a

Modified Files
--
src/backend/commands/waitlsn.c  | 15 +++
src/backend/utils/activity/wait_event_names.txt |  1 +
src/include/commands/waitlsn.h  |  5 +
src/include/storage/lwlocklist.h|  1 +
4 files changed, 10 insertions(+), 12 deletions(-)



pgsql: Call WaitLSNCleanup() in AbortTransaction()

2024-04-06 Thread Alexander Korotkov
Call WaitLSNCleanup() in AbortTransaction()

Even though waiting for replay LSN happens without explicit transaction,
AbortTransaction() is responsible for the cleanup of the shared memory if
the error is thrown in a stored procedure.  So, we need to do WaitLSNCleanup()
there to clean up after some unexpected error happened while waiting for
replay LSN.

Discussion: https://postgr.es/m/202404051815.eri4u5q6oj26%40alvherre.pgsql
Author: Alvaro Herrera

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/74eaf66f988c868deb0816bae4dd184eedae1448

Modified Files
--
src/backend/access/transam/xact.c | 6 ++
1 file changed, 6 insertions(+)



pgsql: Clarify what is protected by WaitLSNLock

2024-04-06 Thread Alexander Korotkov
Clarify what is protected by WaitLSNLock

Not just WaitLSNState.waitersHeap, but also WaitLSNState.procInfos and
updating of WaitLSNState.minWaitedLSN is protected by WaitLSNLock.  There
is one now documented exclusion on fast-path checking of
WaitLSNProcInfo.inHeap flag.

Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/ee79928441e7e291532b833455ebfee27d7cab5c

Modified Files
--
src/backend/commands/waitlsn.c | 10 --
src/include/commands/waitlsn.h |  7 +--
2 files changed, 13 insertions(+), 4 deletions(-)



pgsql: BitmapHeapScan: begin scan after bitmap creation

2024-04-06 Thread Tomas Vondra
BitmapHeapScan: begin scan after bitmap creation

Change the order so that the table scan is initialized only after
initializing the index scan and building the bitmap.

This is mostly a cosmetic change for now, but later commits will need
to pass parameters to table_beginscan_bm() that are unavailable in
ExecInitBitmapHeapScan().

Author: Melanie Plageman
Reviewed-by: Tomas Vondra, Andres Freund, Heikki Linnakangas
Discussion: 
https://postgr.es/m/CAAKRu_ZwCwWFeL_H3ia26bP2e7HiKLWt0ZmGXPVwPO6uXq0vaA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/1577081e9614345534a018e788a2c0bab4da4dc5

Modified Files
--
src/backend/executor/nodeBitmapHeapscan.c | 27 ---
1 file changed, 20 insertions(+), 7 deletions(-)



pgsql: Backport IPC::Run optimization to src/test/perl.

2024-04-06 Thread Noah Misch
Backport IPC::Run optimization to src/test/perl.

This one-liner makes the TAP portion of "make check-world" 7% faster on
a non-Windows machine.

Discussion: https://postgr.es/m/20240331050310...@rfd.leadboat.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/06558f49529553aecb6ad52a0470d63cb59d7df9

Modified Files
--
src/test/perl/PostgreSQL/Test/Utils.pm | 5 +
1 file changed, 5 insertions(+)



pgsql: Enhance nbtree ScalarArrayOp execution.

2024-04-06 Thread Peter Geoghegan
Enhance nbtree ScalarArrayOp execution.

Commit 9e8da0f7 taught nbtree to handle ScalarArrayOpExpr quals
natively.  This works by pushing down the full context (the array keys)
to the nbtree index AM, enabling it to execute multiple primitive index
scans that the planner treats as one continuous index scan/index path.
This earlier enhancement enabled nbtree ScalarArrayOp index-only scans.
It also allowed scans with ScalarArrayOp quals to return ordered results
(with some notable restrictions, described further down).

Take this general approach a lot further: teach nbtree SAOP index scans
to decide how to execute ScalarArrayOp scans (when and where to start
the next primitive index scan) based on physical index characteristics.
This can be far more efficient.  All SAOP scans will now reliably avoid
duplicative leaf page accesses (just like any other nbtree index scan).
SAOP scans whose array keys are naturally clustered together now require
far fewer index descents, since we'll reliably avoid starting a new
primitive scan just to get to a later offset from the same leaf page.

The scan's arrays now advance using binary searches for the array
element that best matches the next tuple's attribute value.  Required
scan key arrays (i.e. arrays from scan keys that can terminate the scan)
ratchet forward in lockstep with the index scan.  Non-required arrays
(i.e. arrays from scan keys that can only exclude non-matching tuples)
"advance" without the process ever rolling over to a higher-order array.

Naturally, only required SAOP scan keys trigger skipping over leaf pages
(non-required arrays cannot safely end or start primitive index scans).
Consequently, even index scans of a composite index with a high-order
inequality scan key (which we'll mark required) and a low-order SAOP
scan key (which we won't mark required) now avoid repeating leaf page
accesses -- that benefit isn't limited to simpler equality-only cases.
In general, all nbtree index scans now output tuples as if they were one
continuous index scan -- even scans that mix a high-order inequality
with lower-order SAOP equalities reliably output tuples in index order.
This allows us to remove a couple of special cases that were applied
when building index paths with SAOP clauses during planning.

Bugfix commit 807a40c5 taught the planner to avoid generating unsafe
path keys: path keys on a multicolumn index path, with a SAOP clause on
any attribute beyond the first/most significant attribute.  These cases
are now all safe, so we go back to generating path keys without regard
for the presence of SAOP clauses (just like with any other clause type).
Affected queries can now exploit scan output order in all the usual ways
(e.g., certain "ORDER BY ... LIMIT n" queries can now terminate early).

Also undo changes from follow-up bugfix commit a4523c5a, which taught
the planner to produce alternative index paths, with path keys, but
without low-order SAOP index quals (filter quals were used instead).
We'll no longer generate these alternative paths, since they can no
longer offer any meaningful advantages over standard index qual paths.
Affected queries thereby avoid all of the disadvantages that come from
using filter quals within index scan nodes.  They can avoid extra heap
page accesses from using filter quals to exclude non-matching tuples
(index quals will never have that problem).  They can also skip over
irrelevant sections of the index in more cases (though only when nbtree
determines that starting another primitive scan actually makes sense).

There is a theoretical risk that removing restrictions on SAOP index
paths from the planner will break compatibility with amcanorder-based
index AMs maintained as extensions.  Such an index AM could have the
same limitations around ordered SAOP scans as nbtree had up until now.
Adding a pro forma incompatibility item about the issue to the Postgres
17 release notes seems like a good idea.

Author: Peter Geoghegan 
Author: Matthias van de Meent 
Reviewed-By: Heikki Linnakangas 
Reviewed-By: Matthias van de Meent 
Reviewed-By: Tomas Vondra 
Discussion: 
https://postgr.es/m/CAH2-Wz=ksvn_sjcnd1+bt-wtifra5ok48adynq3pkkhxgmq...@mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/5bf748b86bc6786a3fc57fc7ce296c37da6564b0

Modified Files
--
doc/src/sgml/indexam.sgml |   10 +-
doc/src/sgml/monitoring.sgml  |   13 +
src/backend/access/index/indexam.c|   10 +-
src/backend/access/nbtree/nbtree.c|  202 +-
src/backend/access/nbtree/nbtsearch.c |  249 ++-
src/backend/access/nbtree/nbtutils.c  | 2968 ++---
src/backend/executor/nodeIndexonlyscan.c  |2 +
src/backend/executor/nodeIndexscan.c  |2 +
src/backend/optimizer/path/indxpath.c |   90 +-
src/backend/utils/adt/selfuncs.c  |   83 +-
src/include/access/amapi.h|2 +-

pgsql: Remove obsolete comment in CopyReadLineText().

2024-04-06 Thread Tom Lane
Remove obsolete comment in CopyReadLineText().

When this bit of commentary was written, it was alluding to the
fact that we looked for newlines and EOD markers in the raw
(not yet encoding-converted) input data.  We don't do that anymore,
preferring to batch the conversion of larger chunks of input and
split it into lines later.  Hence there's no longer any need for
assumptions about the relevant characters being encoding-invariant,
and we should remove this comment saying we assume that.

Discussion: https://postgr.es/m/1461688.1712347...@sss.pgh.pa.us

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/ddd9e43a92417dd0c2b60822d6e75862c73b139a

Modified Files
--
src/backend/commands/copyfromparse.c | 3 ---
1 file changed, 3 deletions(-)



pgsql: Teach fasthash_accum to use platform endianness for bytewise loa

2024-04-06 Thread John Naylor
Teach fasthash_accum to use platform endianness for bytewise loads

This function previously used a mix of word-wise loads and bytewise
loads. The bytewise loads happened to be little-endian regardless of
platform. This in itself is not a problem. However, a future commit
will require the same result whether A) the input is loaded as a
word with the relevent bytes masked-off, or B) the input is loaded
one byte at a time.

While at it, improve debuggability of the internal hash state.

Discussion: 
https://postgr.es/m/CANWCAZZpuV1mES1mtSpAq8tWJewbrv4gEz6R_k4gzNG8GZ5gag%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/0c25fee35903ef08af6d6b0c0fdb90fc01e37fa1

Modified Files
--
src/include/common/hashfn_unstable.h | 44 
1 file changed, 39 insertions(+), 5 deletions(-)



pgsql: Speed up tail processing when hashing aligned C strings, take tw

2024-04-06 Thread John Naylor
Speed up tail processing when hashing aligned C strings, take two

After encountering the NUL terminator, the word-at-a-time loop exits
and we must hash the remaining bytes. Previously we calculated
the terminator's position and re-loaded the remaining bytes from
the input string. This was slower than the unaligned case for very
short strings. We already have all the data we need in a register,
so let's just mask off the bytes we need and hash them immediately.

In addition to endianness issues, the previous attempt upset valgrind
in the way it computed the mask. Whether by accident or by wisdom,
the author's proposed method passes locally with valgrind 3.22.

Ants Aasma, with cosmetic adjustments by me

Discussion: 
https://postgr.es/m/CANwKhkP7pCiW_5fAswLhs71-JKGEz1c1%2BPC0a_w1fwY4iGMqUA%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/a365d9e2e8c1ead27203a4431211098292777d3b

Modified Files
--
src/include/common/hashfn_unstable.h | 46 
1 file changed, 36 insertions(+), 10 deletions(-)



pgsql: Allow BufferAccessStrategy to limit pin count.

2024-04-06 Thread Thomas Munro
Allow BufferAccessStrategy to limit pin count.

While pinning extra buffers to look ahead, users of strategies are in
danger of using too many buffers.  For some strategies, that means
"escaping" from the ring, and in others it means forcing dirty data to
disk very frequently with associated WAL flushing.  Since external code
has no insight into any of that, allow individual strategy types to
expose a clamp that should be applied when deciding how many buffers to
pin at once.

Reviewed-by: Andres Freund 
Reviewed-by: Melanie Plageman 
Discussion: 
https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/3bd8439ed628c7e9ac250b1a042d9044303c37e7

Modified Files
--
src/backend/storage/aio/read_stream.c |  5 +
src/backend/storage/buffer/freelist.c | 42 +++
src/include/storage/bufmgr.h  |  1 +
3 files changed, 48 insertions(+)



pgsql: Increase default vacuum_buffer_usage_limit to 2MB.

2024-04-06 Thread Thomas Munro
Increase default vacuum_buffer_usage_limit to 2MB.

The BAS_VACUUM ring size has been 256kB since commit d526575f introduced
the mechanism 17 years ago.  Commit 1cbbee03 recently made it
configurable but retained the traditional default.  The correct default
size has been debated for years, but 256kB is certainly very small.
VACUUM soon needs to write back data it dirtied only 32 blocks ago,
which usually requires flushing the WAL.  New experiments in prefetching
pages for VACUUM exacerbated the problem by crashing into dirty data
even sooner.  Let's make the default 2MB.  That's 1.6% of the default
toy buffer pool size, and 0.2% of 1GB, which would be a considered a
small shared_buffers setting for a real system these days.  Users are
still free to set the GUC to a different value.

Reviewed-by: Andres Freund 
Discussion: 
https://postgr.es/m/20240403221257.md4gfki3z75cdyf6%40awork3.anarazel.de
Discussion: 
https://postgr.es/m/CA%2BhUKGLY4Q4ZY4f1rvnFtv6%2BPkjNf8MejdPkcju3Qii9DYqqcQ%40mail.gmail.com

Branch
--
master

Details
---
https://git.postgresql.org/pg/commitdiff/98f320eb2ef05072b6fe67fcdcdc26c226e6cea4

Modified Files
--
doc/src/sgml/config.sgml  | 2 +-
src/backend/storage/buffer/freelist.c | 2 +-
src/backend/utils/init/globals.c  | 2 +-
src/backend/utils/misc/guc_tables.c   | 2 +-
src/backend/utils/misc/postgresql.conf.sample | 2 +-
5 files changed, 5 insertions(+), 5 deletions(-)