Re: Testing autovacuum wraparound (including failsafe)

2023-09-02 Thread Noah Misch
On Wed, Jul 12, 2023 at 01:47:51PM +0200, Daniel Gustafsson wrote:
> > On 12 Jul 2023, at 09:52, Masahiko Sawada  wrote:
> > Agreed. The timeout can be set by manually setting
> > PG_TEST_TIMEOUT_DEFAULT, but I bump it to 10 min by default. And it
> > now require setting PG_TET_EXTRA to run it.
> 
> +# bump the query timeout to avoid false negatives on slow test syetems.
> typo: s/syetems/systems/
> 
> 
> +# bump the query timeout to avoid false negatives on slow test syetems.
> +$ENV{PG_TEST_TIMEOUT_DEFAULT} = 600;
> Does this actually work?  Utils.pm read the environment variable at compile
> time in the BEGIN block so this setting won't be seen?  A quick testprogram
> seems to confirm this but I might be missing something.

The correct way to get a longer timeout is "IPC::Run::timer(4 *
$PostgreSQL::Test::Utils::timeout_default);".  Even if changing env worked,
that would be removing the ability for even-slower systems to set timeouts
greater than 10min.




add (void) cast inside advance_aggregates for function ExecEvalExprSwitchContext

2023-09-02 Thread jian he
Hi.
In src/backend/executor/nodeAgg.c
817: advance_aggregates(AggState *aggstate)

Do we need to add "(void)" before ExecEvalExprSwitchContext?




Re: Query execution in Perl TAP tests needs work

2023-09-02 Thread Thomas Munro
On Sun, Sep 3, 2023 at 6:42 AM Andrew Dunstan  wrote:
> I guess the next thing would be to test it on a few more platforms and also 
> to see if we need to expand the coverage of libpq for the intended uses.

Nice.  It works fine on my FreeBSD battlestation after "sudo pkg
install p5-FFI-Platypus" and adjusting that lib path.  I wonder if
there is a nice way to extract those constants from our headers...

It's using https://sourceware.org/libffi/ under the covers (like most
other scripting language FFI things), and that knows calling
conventions for everything we care about including weird OSes and
architectures.  It might be a slight pain to build it on systems that
have no package manager, if cpan can't do it for you?  I guess AIX
would be the most painful?

(Huh, while contemplating trying that, I just noticed that the GCC
build farm's AIX 7.2 system seems to have given up the ghost a few
weeks ago.  I wonder if it'll come back online with the current
release, or if that's the end...  There is still the
overloaded-to-the-point-of-being-hard-to-interact-with AIX 7.1 (=EOL)
machine.)

> I confess I'm a little reluctant to impose this burden on buildfarm owners. 
> We should think about some sort of fallback in case this isn't supported on 
> some platform, either due to technological barriers or buildfarm owner 
> reluctance.

I guess you're thinking that it could be done in such a way that it is
automatically used for $node->safe_psql() and various other things if
Platypus is detected, but otherwise forking psql as now, for a
transition period?  Then we could nag build farm owners, and
eventually turn off the fallback stuff after N months.  After that it
would begin to be possible to use this in more interesting and
advanced ways.




Re: Row pattern recognition

2023-09-02 Thread Tatsuo Ishii
> Hi,
> 
> The patches compile & tests run fine but this statement from the
> documentation crashes an assert-enabled server:
> 
> SELECT company, tdate, price, max(price) OVER w FROM stock
> WINDOW w AS (
> PARTITION BY company
> ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
> AFTER MATCH SKIP PAST LAST ROW
> INITIAL
> PATTERN (LOWPRICE UP+ DOWN+)
> DEFINE
> LOWPRICE AS price <= 100,
> UP AS price > PREV(price),
> DOWN AS price < PREV(price)
> );
> server closed the connection unexpectedly
>   This probably means the server terminated abnormally
>   before or while processing the request.
> connection to server was lost

Thank you for the report. Currently the patch has an issue with
aggregate functions including max. I have been working on aggregations
in row pattern recognition but will take more time to complete the
part.

In the mean time if you want to play with RPR, you can try window
functions. Examples can be found in src/test/regress/sql/rpr.sql.
Here is one of this:

-- the first row start with less than or equal to 100
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER 
w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price),
  DOWN AS price < PREV(price)
);

-- second row raises 120%
SELECT company, tdate, price, first_value(price) OVER w, last_value(price) OVER 
w
 FROM stock
 WINDOW w AS (
 PARTITION BY company
 ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
 INITIAL
 PATTERN (LOWPRICE UP+ DOWN+)
 DEFINE
  LOWPRICE AS price <= 100,
  UP AS price > PREV(price) * 1.2,
  DOWN AS price < PREV(price)
);

Sorry for inconvenience.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-09-02 Thread Thomas Munro
I agree that the code lacks barriers.  I haven't been able to figure
out how any reordering could cause this hang, though, because in these
old branches procsignal_sigusr1_handler is used for latch wakeups, and
it also calls SetLatch(MyLatch) itself, right at the end.  That is,
SetLatch() gets called twice, first in the waker process and then
again in the awoken process, so it should be impossible for the latter
not to see MyLatch->is_set == true after procsignal_sigusr1_handler
completes.

That made me think the handler didn't run, which is consistent with
procstat -i showing it as pending ('P').  Which made me start to
suspect a kernel bug, unless we can explain what we did to block it...

But... perhaps I am confused about that and did something wrong when
looking into it.  It's hard to investigate when you aren't allowed to
take core files or connect a debugger (both will reliably trigger
EINTR).




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-09-02 Thread Alexander Lakhin

Hello Robert,

01.09.2023 23:21, Robert Haas wrote:

On Fri, Sep 1, 2023 at 6:13 AM Alexander Lakhin  wrote:

(Placing "pg_compiler_barrier();" just after "waiting = true;" fixed the
issue for us.)

Maybe it'd be worth trying something stronger, like
pg_memory_barrier(). A compiler barrier doesn't prevent the CPU from
reordering loads and stores as it goes, and ARM64 has weak memory
ordering.


Indeed, thank you for the tip!
So maybe here we deal with not compiler's, but with CPU's optimization.
The wider code fragment is:
  805c48: 52800028  mov w8, #1 // true
  805c4c: 52800319  mov w25, #24
  805c50: 5280073a  mov w26, #57
  805c54: fd446128  ldr d8, [x9, #2240]
  805c58: 9d7b  adrp    x27, 0x9b1000 
  805c5c: fd415949  ldr d9, [x10, #688]
  805c60: f9071d68  str x8, [x11, #3640] // waiting = true (x8 = w8)
  805c64: f90003f3  str x19, [sp]
  805c68: 1410  b   0x805ca8 

  805ca8: f9400a88  ldr x8, [x20, #16] // if (set->latch && 
set->latch->is_set)
  805cac: b468  cbz x8, 0x805cb8 
  805cb0: f9400108  ldr x8, [x8]
  805cb4: b5001248  cbnz    x8, 0x805efc 
  805cb8: f9401280  ldr x0, [x20, #32]

If that CPU can delay the writing to the variable waiting
(str x8, [x11, #3640]) in it's internal form like
"store 1 to [address]" to 805cb0 or a later instruction, then we can get the
behavior discussed. Something like that is shown in the ARM documentation:
https://developer.arm.com/documentation/102336/0100/Memory-ordering?lang=en
I'll try to test this guess on the target machine...

Best regards,
Alexander




Re: Initdb-time block size specification

2023-09-02 Thread Tomas Vondra



On 9/1/23 16:57, Robert Haas wrote:
> On Thu, Aug 31, 2023 at 2:32 PM David Christensen
>  wrote:
>> Here's a patch atop the series which converts to 16-bit uints and
>> passes regressions, but I don't consider well-vetted at this point.
> 
> For what it's worth, my gut reaction to this patch series is similar
> to that of Andres: I think it will be a disaster. If the disaster is
> not evident to us, that's far more likely to mean that we've failed to
> test the right things than it is to mean that there is no disaster.
> 

Perhaps. The block size certainly affects a lot of places - both in
terms of the actual value, and being known (constant) at compile time.

> I don't see that there is a lot of upside, either. I don't think we
> have a lot of evidence that changing the block size is really going to
> help performance.

I don't think that's quite true. We have plenty of empirical evidence
that smaller block sizes bring significant improvements for certain
workloads. And we also have theoretical explanations for why that is.

> In fact, my guess is that there are large amounts of
> code that are heavily optimized, without the authors even realizing
> it, for 8kB blocks, because that's what we've always had. If we had
> much larger or smaller blocks, the structure of heap pages or of the
> various index AMs used for blocks might no longer be optimal, or might
> be less optimal than they are for an 8kB block size. If you use really
> large blocks, your blocks may need more internal structure than we
> have today in order to avoid CPU inefficiencies. I suspect there's
> been so little testing of non-default block sizes that I wouldn't even
> count on the code to not be outright buggy.
> 

Sure, and there are even various places where the page size implies hard
limits (e.g. index key size for btree indexes).

But so what? If that matters for your workload, keep using 8kB ...

> If we could find a safe way to get rid of full page writes, I would
> certainly agree that that was worth considering. I'm not sure that
> anything in this thread adds up to that being a reasonable way to go,
> but the savings would be massive.
> 

That's true, that'd be great. But that's clearly just a next level of
the optimization. It doesn't mean that if you can't eliminate FPW for
whatever reason it's worthless.

> I feel like the proposal here is a bit like deciding to change the
> speed limit on all American highways from 65 mph or whatever it is to
> 130 mph or 32.5 mph and see which way works out best. The whole
> infrastructure has basically been designed around the current rules.
> The rate of curvature of the roads is appropriate for the speed that
> you're currently allowed to drive on them. The vehicles are optimized
> for long-term operation at about that speed. The people who drive the
> vehicles are accustomed to driving at that speed, and the people who
> maintain them are accustomed to the problems that happen when you
> drive them at that speed. Just changing the speed limit doesn't change
> all that other stuff, and changing all that other stuff is a truly
> massive undertaking. Maybe this example somewhat overstates the
> difficulties here, but I do think the difficulties are considerable.
> The fact that we have 8kB block sizes has affected the thinking of
> hundreds of developers over decades in making thousands or tens of
> thousands or hundreds of thousands of decisions about algorithm
> selection and page format and all kinds of stuff. Even if some other
> page size seems to work better in a certain context, it's pretty hard
> to believe that it has much chance of being better overall, even
> without the added overhead of run-time configuration.
> 

Except that no one is forcing you to actually go 130mph or 32mph, right?
You make it seem like this patch forces people to use some other page
size, but that's clearly not what it's doing - it gives you the option
to use smaller or larger block, if you chose to. Just like increasing
the speed limit to 130mph doesn't mean you can't keep going 65mph.

The thing is - we *already* allow using different block size, except
that you have to do custom build. This just makes it easier.

I don't have strong opinions on how the patch actually does that, and
there certainly can be negative effects of making it dynamic. And yes,
we will have to do more testing with non-default block sizes. But
frankly, that's a gap we probably need to address anyway, considering we
allow changing the block size.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Inefficiency in parallel pg_restore with many tables

2023-09-02 Thread Nathan Bossart
On Fri, Sep 01, 2023 at 01:52:48PM -0700, Nathan Bossart wrote:
> On Fri, Sep 01, 2023 at 04:00:44PM -0400, Robert Haas wrote:
>> In hindsight, I think that making binaryheap depend on Datum was a bad
>> idea. I think that was my idea, and I think it wasn't very smart.
>> Considering that people have coded to that decision up until now, it
>> might not be too easy to change at this point. But in principle I
>> guess you'd want to be able to make a heap out of any C data type,
>> rather than just Datum, or just Datum in the backend and just void *
>> in the frontend.
> 
> Yeah, something similar to simplehash for binary heaps could be nice.  That
> being said, I don't know if there's a strong reason to specialize the
> implementation for a given C data type in most cases.  I suspect many
> callers are just fine with dealing with pointers (e.g., I wouldn't store an
> entire TocEntry in the array), and smaller types like integers are already
> stored directly in the array thanks to the use of Datum.  However, it
> _would_ allow us to abandon this frontend/backend void */Datum kludge,
> which is something.

I ended up hacking together a (nowhere near committable) patch to see how
hard it would be to allow using any type with binaryheap.  It doesn't seem
too bad.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From fa51a6b48d1a7cdcffbd5ccbe4274f39dfb0c1b5 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 2 Sep 2023 11:48:36 -0700
Subject: [PATCH 1/1] allow binaryheap to use any type

---
 src/backend/executor/nodeGatherMerge.c| 20 +++---
 src/backend/executor/nodeMergeAppend.c| 20 +++---
 src/backend/lib/binaryheap.c  | 71 ++-
 src/backend/postmaster/pgarch.c   | 37 +-
 .../replication/logical/reorderbuffer.c   | 22 +++---
 src/backend/storage/buffer/bufmgr.c   | 21 +++---
 src/include/lib/binaryheap.h  | 17 ++---
 7 files changed, 112 insertions(+), 96 deletions(-)

diff --git a/src/backend/executor/nodeGatherMerge.c b/src/backend/executor/nodeGatherMerge.c
index 9d5e1a46e9..4c75d224c8 100644
--- a/src/backend/executor/nodeGatherMerge.c
+++ b/src/backend/executor/nodeGatherMerge.c
@@ -52,7 +52,7 @@ typedef struct GMReaderTupleBuffer
 } GMReaderTupleBuffer;
 
 static TupleTableSlot *ExecGatherMerge(PlanState *pstate);
-static int32 heap_compare_slots(Datum a, Datum b, void *arg);
+static int32 heap_compare_slots(void *a, void *b, void *arg);
 static TupleTableSlot *gather_merge_getnext(GatherMergeState *gm_state);
 static MinimalTuple gm_readnext_tuple(GatherMergeState *gm_state, int nreader,
 	  bool nowait, bool *done);
@@ -428,7 +428,7 @@ gather_merge_setup(GatherMergeState *gm_state)
 	}
 
 	/* Allocate the resources for the merge */
-	gm_state->gm_heap = binaryheap_allocate(nreaders + 1,
+	gm_state->gm_heap = binaryheap_allocate(nreaders + 1, sizeof(int),
 			heap_compare_slots,
 			gm_state);
 }
@@ -489,7 +489,7 @@ reread:
 /* Don't have a tuple yet, try to get one */
 if (gather_merge_readnext(gm_state, i, nowait))
 	binaryheap_add_unordered(gm_state->gm_heap,
-			 Int32GetDatum(i));
+			 );
 			}
 			else
 			{
@@ -565,14 +565,14 @@ gather_merge_getnext(GatherMergeState *gm_state)
 		 * the heap, because it might now compare differently against the
 		 * other elements of the heap.
 		 */
-		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
+		binaryheap_first(gm_state->gm_heap, );
 
 		if (gather_merge_readnext(gm_state, i, false))
-			binaryheap_replace_first(gm_state->gm_heap, Int32GetDatum(i));
+			binaryheap_replace_first(gm_state->gm_heap, );
 		else
 		{
 			/* reader exhausted, remove it from heap */
-			(void) binaryheap_remove_first(gm_state->gm_heap);
+			binaryheap_remove_first(gm_state->gm_heap, );
 		}
 	}
 
@@ -585,7 +585,7 @@ gather_merge_getnext(GatherMergeState *gm_state)
 	else
 	{
 		/* Return next tuple from whichever participant has the leading one */
-		i = DatumGetInt32(binaryheap_first(gm_state->gm_heap));
+		binaryheap_first(gm_state->gm_heap, );
 		return gm_state->gm_slots[i];
 	}
 }
@@ -750,11 +750,11 @@ typedef int32 SlotNumber;
  * Compare the tuples in the two given slots.
  */
 static int32
-heap_compare_slots(Datum a, Datum b, void *arg)
+heap_compare_slots(void *a, void *b, void *arg)
 {
 	GatherMergeState *node = (GatherMergeState *) arg;
-	SlotNumber	slot1 = DatumGetInt32(a);
-	SlotNumber	slot2 = DatumGetInt32(b);
+	SlotNumber	slot1 = *((int *) a);
+	SlotNumber	slot2 = *((int *) b);
 
 	TupleTableSlot *s1 = node->gm_slots[slot1];
 	TupleTableSlot *s2 = node->gm_slots[slot2];
diff --git a/src/backend/executor/nodeMergeAppend.c b/src/backend/executor/nodeMergeAppend.c
index 21b5726e6e..928b4b3719 100644
--- a/src/backend/executor/nodeMergeAppend.c
+++ b/src/backend/executor/nodeMergeAppend.c
@@ -52,7 +52,7 @@
 typedef int32 SlotNumber;
 
 static TupleTableSlot *ExecMergeAppend(PlanState 

Re: Query execution in Perl TAP tests needs work

2023-09-02 Thread Andrew Dunstan


On 2023-08-30 We 21:29, Thomas Munro wrote:

On Thu, Aug 31, 2023 at 10:32 AM Andrew Dunstan  wrote:

#!/usr/bin/perl

use strict; use warnings;

use FFI::Platypus;

my $ffi = FFI::Platypus->new(api=>1);
$ffi->lib("inst/lib/libpq.so");


$ffi->type('opaque' => 'PGconn');
$ffi->attach(PQconnectdb => [ 'string' ] => 'PGconn');
$ffi->attach(PQfinish => [ 'PGconn' ] => 'void');

$ffi->type('opaque' => 'PGresult');
$ffi->attach(PQexec => [ 'PGconn', 'string' ] => 'PGresult');
$ffi->attach(PQgetvalue => [ 'PGresult', 'int', 'int' ] => 'string');

my $pgconn = PQconnectdb("dbname=postgres host=/tmp");
my $res = PQexec($pgconn, "select count(*) from pg_class");
my $count = PQgetvalue( $res, 0, 0);

print "count: $count\n";

PQfinish($pgconn);

It looks very promising so far.  How hard would it be for us to add
this dependency?  Mostly pinging build farm owners?

I'm still on the fence, but the more I know about IPC::Run, the better
the various let's-connect-directly-from-Perl options sound...



Here's some progress. I have put it all in a perl module, which I have 
tested on Windows (both mingw and MSVC) as well as Ubuntu. I think this 
is probably something worth having in itself. I wrapped a substantial 
portion of libpq, but left out things to do with large objects, async 
processing, pipelining, SSL and some other things. We can fill in the 
gaps in due course.


The test program now looks like this:

   use strict;
   use warnings;

   use lib ".";
   use PqFFI;

   PqFFI::setup("inst/lib");

   my $conn = PQconnectdb("dbname=postgres host=/tmp");
   my $res = PQexec($conn, 'select count(*) from pg_class');
   my $count = PQgetvalue($res,0,0);
   print "$count rows in pg_class\n";
   PQfinish($conn);

I guess the next thing would be to test it on a few more platforms and 
also to see if we need to expand the coverage of libpq for the intended 
uses.


I confess I'm a little reluctant to impose this burden on buildfarm 
owners. We should think about some sort of fallback in case this isn't 
supported on some platform, either due to technological barriers or 
buildfarm owner reluctance.



cheers


andrew


--
Andrew Dunstan
EDB:https://www.enterprisedb.com


PqFFI.pm
Description: Perl program


Re: Row pattern recognition

2023-09-02 Thread Erik Rijkers

Op 9/2/23 om 08:52 schreef Tatsuo Ishii:


Attached is the v5 patch. Differences from previous patch include:



Hi,

The patches compile & tests run fine but this statement from the 
documentation crashes an assert-enabled server:


SELECT company, tdate, price, max(price) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price <= 100,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
connection to server was lost


Log file:

TRAP: failed Assert("aggregatedupto_nonrestarted <= 
winstate->aggregatedupto"), File: "nodeWindowAgg.c", Line: 1054, PID: 68975
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(ExceptionalCondition+0x54)[0x9b0824]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x71ae8d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(standard_ExecutorRun+0x13a)[0x6def9a]

/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/pg_stat_statements.so(+0x55e5)[0x7ff3798b95e5]
/home/aardvark/pg_stuff/pg_installations/pgsql.rpr/lib/auto_explain.so(+0x2680)[0x7ff3798ab680]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x88a4ff]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PortalRun+0x240)[0x88bb50]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x887cca]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostgresMain+0x14dc)[0x88958c]

postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) SELECT[0x7fb0da]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(PostmasterMain+0xd2d)[0x7fc01d]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(main+0x1e0)[0x5286d0]

/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea)[0x7ff378e9dd0a]
postgres: 17_rpr_d0ec_gulo: aardvark testdb ::1(34808) 
SELECT(_start+0x2a)[0x5289aa]
2023-09-02 19:59:05.329 CEST 46723 LOG:  server process (PID 68975) was 
terminated by signal 6: Aborted
2023-09-02 19:59:05.329 CEST 46723 DETAIL:  Failed process was running: 
SELECT company, tdate, price, max(price) OVER w FROM stock

WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (LOWPRICE UP+ DOWN+)
DEFINE
LOWPRICE AS price <= 100,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
2023-09-02 19:59:05.329 CEST 46723 LOG:  terminating any other active 
server processes




Erik Rijkers




Re: Add tracking of backend memory allocated to pg_stat_activity

2023-09-02 Thread Ted Yu
On Thu, Aug 31, 2023 at 9:19 AM John Morris 
wrote:

> Here is an updated version of the earlier work.
>
> This version:
>1) Tracks memory as requested by the backend.
>2) Includes allocations made during program startup.
>3) Optimizes the "fast path" to only update two local variables.
>4) Places a cluster wide limit on total memory allocated.
>
> The cluster wide limit is useful for multi-hosting. One greedy cluster
> doesn't starve
> the other clusters of memory.
>
> Note there isn't a good way to track actual memory used by a cluster.
> Ideally, we like to get the working set size of each memory segment along
> with
> the size of the associated kernel data structures.
> Gathering that info in a portable way is a "can of worms".
> Instead, we're managing memory as requested by the application.
> While not identical, the two approaches are strongly correlated.
>
>  The memory model used is
>1) Each process is assumed to use a certain amount of memory
>simply by existing.
>2) All pg memory allocations are counted, including those before
>the process is fully initialized.
> 3) Each process maintains its own local counters. These are the
> "truth".
>  4) Periodically,
> -  local counters are added into the global, shared memory
> counters.
>  - pgstats is updated
>  - total memory is checked.
>
> For efficiency, the global total is an approximation, not a precise number.
> It can be off by as much as 1 MB per process. Memory limiting
> doesn't need precision, just a consistent and reasonable approximation.
>
> Repeating the earlier benchmark test, there is no measurable loss of
> performance.
>
> Hi,
In `InitProcGlobal`:

+elog(WARNING, "proc init: max_total=%d  result=%d\n",
max_total_bkend_mem, result);

Is the above log for debugging purposes ? Maybe the log level should be
changed.

+
errmsg("max_total_backend_memory %dMB - shared_memory_size %dMB is <=
100MB",

The `<=` in the error message is inconsistent with the `max_total_bkend_mem
< result + 100` check.
Please modify one of them.

For update_global_allocation :

+
Assert((int64)pg_atomic_read_u64(>total_bkend_mem_bytes) >= 0);
+
Assert((int64)pg_atomic_read_u64(>global_dsm_allocation) >= 0);

Should the assertions be done earlier in the function?

For reserve_backend_memory:

+   return true;
+
+   /* CASE: the new allocation is within bounds. Take the fast path. */
+   else if (my_memory.allocated_bytes + size <= allocation_upper_bound)

The `else` can be omitted (the preceding if block returns).

For `atomic_add_within_bounds_i64`

+   newval = oldval + add;
+
+   /* check if we are out of bounds */
+   if (newval < lower_bound || newval > upper_bound ||
addition_overflow(oldval, add))

Since the summation is stored in `newval`, you can pass newval to
`addition_overflow` so that `addition_overflow` doesn't add them again.

There are debug statements, such as:

+   debug("done\n");

you can drop them in the next patch.

Thanks


Re: Impact of checkpointer during pg_upgrade

2023-09-02 Thread Dilip Kumar
On Sat, Sep 2, 2023 at 10:09 AM Amit Kapila  wrote:
>
> During pg_upgrade, we start the server for the old cluster which can
> allow the checkpointer to remove the WAL files. It has been noticed
> that we do generate certain types of WAL records (e.g
> XLOG_RUNNING_XACTS, XLOG_CHECKPOINT_ONLINE, and XLOG_FPI_FOR_HINT)
> even during pg_upgrade for old cluster, so additional WAL records
> could let checkpointer decide that certain WAL segments can be removed
> (e.g. say wal size crosses max_slot_wal_keep_size_mb) and invalidate
> the slots. Currently, I can't see any problem with this but for future
> work where we want to migrate logical slots during an upgrade[1], we
> need to decide what to do for such cases. The initial idea we had was
> that if the old cluster has some invalid slots, we won't allow an
> upgrade unless the user removes such slots or uses some option like
> --exclude-slots. It is quite possible that slots got invalidated
> during pg_upgrade due to no user activity. Now, even though the
> possibility of the same is less I think it is worth considering what
> should be the behavior.

Right

> The other possibilities apart from not allowing an upgrade in such a
> case could be (a) Before starting the old cluster, we fetch the slots
> directly from the disk using some tool like [2] and make the decisions
> based on that state;

Okay, so IIUC along with dumping the slot data we also need to dump
the latest checkpoint LSN because during upgrade we do check that the
confirmed flush lsn for all the slots should be the same as the latest
checkpoint.  Yeah but I think we could work this out.

 (b) During the upgrade, we don't allow WAL to be
> removed if it can invalidate slots; (c) Copy/Migrate the invalid slots
> as well but for that, we need to expose an API to invalidate the
> slots;

 (d) somehow distinguish the slots that are invalidated during
> an upgrade and then simply copy such slots because anyway we ensure
> that all the WAL required by slot is sent before shutdown.

Yeah this could also be an option, although we need to think the
mechanism of distinguishing those slots looks clean and fit well with
other architecture.

Alternatively can't we just ignore all the invalidated slots and do
not migrate them at all.  Because such scenarios are very rare that
some of the segments are getting dropped just during the upgrade time
and that too from the old cluster so in such cases not migrating the
slots which are invalidated should be fine no?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Avoid a possible null pointer (src/backend/utils/adt/pg_locale.c)

2023-09-02 Thread Ranier Vilela
Em sex., 1 de set. de 2023 às 17:17, Robert Haas 
escreveu:

> On Fri, Sep 1, 2023 at 11:47 AM Ranier Vilela  wrote:
> > If a null locale is reached in these paths.
> > elog will dereference a null pointer.
>
> True. That's sloppy coding.
>
> I don't know enough about this code to be sure whether the error
> messages that you propose are for the best.
>
I tried to keep the same behavior as before.
Note that if the locale equals COLLPROVIDER_LIBC,
the message to the user will be the same.

best regards,
Ranier Vilela


Re: Incremental View Maintenance, take 2

2023-09-02 Thread Tatsuo Ishii
> attached is my refactor. there is some whitespace errors in the
> patches, you need use
> git apply --reject --whitespace=fix
> basedon_v29_matview_c_refactor_update_set_clause.patch
> 
> Also you patch cannot use git apply, i finally found out bulk apply

I have no problem with applying Yugo's v29 patches using git apply, no
white space errors.

$ git apply ~/v29*

(the patches are saved under my home directory).

I suggest you to check your email application whether it correctly
saved the patch files for you.

FYI, here are results from sha256sum:

ffac37cb455788c1105ffc01c6b606de75f53321c2f235f7efa19f3f52d12b9e  
v29-0001-Add-a-syntax-to-create-Incrementally-Maintainabl.patch
f684485e7c9ac1b2990943a3c73fa49a9091a268917547d9e116baef5118cca7  
v29-0002-Add-relisivm-column-to-pg_class-system-catalog.patch
fcf5bc8ae562ed1c2ab397b499544ddab03ad2c3acb2263d0195a3ec799b131c  
v29-0003-Allow-to-prolong-life-span-of-transition-tables-.patch
a7a13ef8e73c4717166db079d5607f78d21199379de341a0e8175beef5ea1c1a  
v29-0004-Add-Incremental-View-Maintenance-support-to-pg_d.patch
a2aa51d035774867bfab1580ef14143998dc71c1b941bd1a3721dc019bc62649  
v29-0005-Add-Incremental-View-Maintenance-support-to-psql.patch
fe0225d761a08eb80082f1a2c039b9b8b20626169b03abaf649db9c74fe99194  
v29-0006-Add-Incremental-View-Maintenance-support.patch
68b007befedcf92fc83ab8c3347ac047a50816f061c77b69281e12d52944db82  
v29-0007-Add-DISTINCT-support-for-IVM.patch
2201241a22095f736a17383fc8b26d48a459ebf1c2f5cf120896cfc0ce5e03e4  
v29-0008-Add-aggregates-support-in-IVM.patch
6390117c559bf1585349c5a09b77b784e086ccc22eb530cd364ce78371c66741  
v29-0009-Add-support-for-min-max-aggregates-for-IVM.patch
7019a116c64127783bd9c682ddf1ee3792286d0e41c91a33010111e7be2c9459  
v29-0010-Add-regression-tests-for-Incremental-View-Mainte.patch
189afdc7da866bd958e2d554ba12adf93d7e6d0acb581290a48d72fcf640e243  
v29-0011-Add-documentations-about-Incremental-View-Mainte.patch

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: generate syscache info automatically

2023-09-02 Thread John Naylor
I wrote:

> + # XXX This one neither, but if I add it to @skip, PerfectHash will
fail. (???)
> + #FIXME: AttributeRelationId
>
> I took a quick look at this, and attached is the least invasive way to
get it working for now, which is to bump the table size slightly. The
comment says doing this isn't reliable, but it happens to work in this
case. Playing around with the functions is hit-or-miss, and when that
fails, somehow the larger table saves the day.

To while away a rainy day, I poked at this a bit more and found the input
is pathological with our current methods. Even with a large-ish exhaustive
search, the two success are strange in that they only succeeded by
accidentally bumping the table size up to where I got it to work before
(77):

With multipliers (5, 19), it recognizes that the initial table size (75) is
a multiple of 5, so increases the table size to 76, which is a multiple of
19, so it increases it again to 77 and succeeds.

Same with (3, 76): 75 is a multiple of 3, so up to 76, which of course
divides 76, so bumps it to 77 likewise.

Turning the loop into

a = (a ^ c) * 257;
b = (b ^ c) * X;

...seems to work very well.

In fact, now trying some powers-of-two for X before the primes works most
of the time with our inputs, even for some unicode normalization functions,
on the first seed iteration. That likely won't make any difference in
practice, but it's an interesting demo. I've attached these two draft ideas
as text.

--
John Naylor
EDB: http://www.enterprisedb.com
From c72fa746108ed853f0219b10e7368b6e833e5fc5 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 2 Sep 2023 16:25:23 +0700
Subject: [PATCH v301 1/2] Use XOR for combining and do it before mixing

Previously, we mixed first and then combined the input
character via addition. This failed on a small set of
OIDs, per report from Peter Eisentraut.
---
 src/tools/PerfectHash.pm | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/tools/PerfectHash.pm b/src/tools/PerfectHash.pm
index e54905a3ef..0d6826141f 100644
--- a/src/tools/PerfectHash.pm
+++ b/src/tools/PerfectHash.pm
@@ -144,8 +144,8 @@ sub generate_hash_function
$f .= sprintf "\t\tunsigned char c = *k++";
$f .= sprintf " | 0x20" if $case_fold; # see comment 
below
$f .= sprintf ";\n\n";
-   $f .= sprintf "\t\ta = a * %d + c;\n", $hash_mult1;
-   $f .= sprintf "\t\tb = b * %d + c;\n", $hash_mult2;
+   $f .= sprintf "\t\ta = (a ^ c) * %d;\n", $hash_mult1;
+   $f .= sprintf "\t\tb = (b ^ c) * %d;\n", $hash_mult2;
$f .= sprintf "\t}\n";
$f .= sprintf "\treturn h[a %% %d] + h[b %% %d];\n", $nhash, $nhash;
$f .= sprintf "}\n";
@@ -171,7 +171,7 @@ sub _calc_hash
{
my $cn = ord($c);
$cn |= 0x20 if $case_fold;
-   $result = ($result * $mult + $cn) % 4294967296;
+   $result = (($result ^ $cn) * $mult) % 4294967296;
}
return $result;
 }
@@ -203,8 +203,8 @@ sub _construct_hash_table
my $nverts = 2 * $nedges + 1;# number of vertices
 
# However, it would be very bad if $nverts were exactly equal to either
-   # $hash_mult1 or $hash_mult2: effectively, that hash function would be
-   # sensitive to only the last byte of each key.  Cases where $nverts is a
+   # $hash_mult1 or $hash_mult2: the corresponding hash function would
+   # always have a modulus of zero.  Cases where $nverts is a
# multiple of either multiplier likewise lose information.  (But $nverts
# can't actually divide them, if they've been intelligently chosen as
# primes.)  We can avoid such problems by adjusting the table size.
-- 
2.41.0

From a407ee7138c597403be5e0521416e4bf5a9c2e46 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Sat, 2 Sep 2023 16:07:53 +0700
Subject: [PATCH v301 2/2] Use powers of two when choosing multipliers for
 perfect hash functions

---
 src/tools/PerfectHash.pm | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/src/tools/PerfectHash.pm b/src/tools/PerfectHash.pm
index 0d6826141f..dc5a9b4427 100644
--- a/src/tools/PerfectHash.pm
+++ b/src/tools/PerfectHash.pm
@@ -77,8 +77,10 @@ sub generate_hash_function
$case_fold = $options{case_fold} || 0;
 
# Try different hash function parameters until we find a set that works
-   # for these keys.  The multipliers are chosen to be primes that are 
cheap
-   # to calculate via shift-and-add, so don't change them without care.
+   # for these keys.  The multipliers are chosen to be numbers that are 
cheap
+   # to calculate, so don't change them without care.
+   # Currently they are either powers-of-two (which reduce to a shift),
+   # or adjacent primes (which reduce to shift-and-add).
# (Commonly, random seeds are tried, but we want reproducible results
# from this program so we don't do that.)
my 

Re: Row pattern recognition

2023-09-02 Thread Tatsuo Ishii
Attached is the v5 patch. Differences from previous patch include:

* Resolve complaint from "PostgreSQL Patch Tester"
  https://commitfest.postgresql.org/44/4460/

- Change gram.y to use PATTERN_P instead of PATTERN. Using PATTERN seems
  to make trouble with Visual Studio build.

:
:
[10:07:57.853] FAILED: 
src/backend/parser/parser.a.p/meson-generated_.._gram.c.obj 
[10:07:57.853] "cl" "-Isrc\backend\parser\parser.a.p" "-Isrc\backend\parser" 
"-I..\src\backend\parser" "-Isrc\include" "-I..\src\include" 
"-Ic:\openssl\1.1\include" "-I..\src\include\port\win32" 
"-I..\src\include\port\win32_msvc" "/MDd" "/nologo" "/showIncludes" "/utf-8" 
"/W2" "/Od" "/Zi" "/DWIN32" "/DWINDOWS" "/D__WINDOWS__" "/D__WIN32__" 
"/D_CRT_SECURE_NO_DEPRECATE" "/D_CRT_NONSTDC_NO_DEPRECATE" "/wd4018" "/wd4244" 
"/wd4273" "/wd4101" "/wd4102" "/wd4090" "/wd4267" "-DBUILDING_DLL" 
"/Fdsrc\backend\parser\parser.a.p\meson-generated_.._gram.c.pdb" 
/Fosrc/backend/parser/parser.a.p/meson-generated_.._gram.c.obj "/c" 
src/backend/parser/gram.c
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2365: 
'PATTERN': redefinition; previous definition was 'typedef'
[10:07:57.860] C:\Program Files (x86)\Windows 
Kits\10\include\10.0.20348.0\um\wingdi.h(1375): note: see declaration of 
'PATTERN'
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): error C2086: 
'yytokentype PATTERN': redefinition
[10:07:57.860] c:\cirrus\build\src\backend\parser\gram.h(379): note: see 
declaration of 'PATTERN'
[10:07:57.860] ninja: build stopped: subcommand failed.

* Resolve complaint from "make headerscheck"

- Change Windowapi.h and nodeWindowAgg.c to remove unecessary extern
  and public functions.
  
Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp
>From 3e02bccbd3dc02304d6dc34f5ab6cc6dd2ee26d1 Mon Sep 17 00:00:00 2001
From: Tatsuo Ishii 
Date: Sat, 2 Sep 2023 15:32:49 +0900
Subject: [PATCH v5 1/7] Row pattern recognition patch for raw parser.

---
 src/backend/parser/gram.y   | 216 +---
 src/include/nodes/parsenodes.h  |  56 +
 src/include/parser/kwlist.h |   8 ++
 src/include/parser/parse_node.h |   1 +
 4 files changed, 267 insertions(+), 14 deletions(-)

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 7d2032885e..70409cdc9a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -251,6 +251,8 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	DefElem	   *defelt;
 	SortBy	   *sortby;
 	WindowDef  *windef;
+	RPCommonSyntax	*rpcom;
+	RPSubsetItem	*rpsubset;
 	JoinExpr   *jexpr;
 	IndexElem  *ielem;
 	StatsElem  *selem;
@@ -453,8 +455,12 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 TriggerTransitions TriggerReferencing
 vacuum_relation_list opt_vacuum_relation_list
 drop_option_list pub_obj_list
-
-%type 	opt_routine_body
+row_pattern_measure_list row_pattern_definition_list
+opt_row_pattern_subset_clause
+row_pattern_subset_list row_pattern_subset_rhs
+row_pattern
+%type 	 row_pattern_subset_item
+%type 	opt_routine_body row_pattern_term
 %type  group_clause
 %type 	group_by_list
 %type 	group_by_item empty_grouping_set rollup_clause cube_clause
@@ -551,6 +557,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	relation_expr_opt_alias
 %type 	tablesample_clause opt_repeatable_clause
 %type 	target_el set_target insert_column_item
+row_pattern_measure_item row_pattern_definition
 
 %type 		generic_option_name
 %type 	generic_option_arg
@@ -633,6 +640,9 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 %type 	window_clause window_definition_list opt_partition_clause
 %type 	window_definition over_clause window_specification
 opt_frame_clause frame_extent frame_bound
+%type 	opt_row_pattern_common_syntax opt_row_pattern_skip_to
+%type 	opt_row_pattern_initial_or_seek
+%type 	opt_row_pattern_measures
 %type 	opt_window_exclusion_clause
 %type 		opt_existing_window_name
 %type  opt_if_not_exists
@@ -659,7 +669,6 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 json_object_constructor_null_clause_opt
 json_array_constructor_null_clause_opt
 
-
 /*
  * Non-keyword token types.  These are hard-wired into the "flex" lexer.
  * They must be listed first so that their numeric codes do not depend on
@@ -702,7 +711,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query);
 	CURRENT_TIME CURRENT_TIMESTAMP CURRENT_USER CURSOR CYCLE
 
 	DATA_P DATABASE DAY_P DEALLOCATE DEC DECIMAL_P DECLARE DEFAULT DEFAULTS
-	DEFERRABLE DEFERRED DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
+	DEFERRABLE DEFERRED DEFINE DEFINER DELETE_P DELIMITER DELIMITERS DEPENDS DEPTH DESC
 	DETACH DICTIONARY DISABLE_P DISCARD DISTINCT DO DOCUMENT_P