Re: partitioning and identity column

2024-01-23 Thread Ashutosh Bapat
On Wed, Jan 24, 2024 at 12:04 PM Ashutosh Bapat
 wrote:
> >
> > Ok.  Maybe just rephrase that comment somehow then?
>
> Please see refactoring patches attached to [1]. Refactoring that way
> makes it unnecessary to mention "regular inheritance" in each comment.
> Yet I have included a modified version of the comment in that patch
> set.

Sorry forgot to add the reference. Here it is.

[1] 
https://www.postgresql.org/message-id/CAExHW5vz7A-skzt05=4frFx9-VPjfjK4jKQZT7ufRNh4J7=x...@mail.gmail.com

-- 
Best Wishes,
Ashutosh Bapat




Re: [PoC] Improve dead tuple storage for lazy vacuum

2024-01-23 Thread John Naylor
On Tue, Jan 23, 2024 at 10:58 AM Masahiko Sawada  wrote:
>
> The new patches probably need to be polished but the VacDeadItemInfo
> idea looks good to me.

That idea looks good to me, too. Since you already likely know what
you'd like to polish, I don't have much to say except for a few
questions below. I also did a quick sweep through every patch, so some
of these comments are unrelated to recent changes:

v55-0003:

+size_t
+dsa_get_total_size(dsa_area *area)
+{
+ size_t size;
+
+ LWLockAcquire(DSA_AREA_LOCK(area), LW_SHARED);
+ size = area->control->total_segment_size;
+ LWLockRelease(DSA_AREA_LOCK(area));

I looked and found dsa.c doesn't already use shared locks in HEAD,
even dsa_dump. Not sure why that is...

+/*
+ * Calculate the slab blocksize so that we can allocate at least 32 chunks
+ * from the block.
+ */
+#define RT_SLAB_BLOCK_SIZE(size) \
+ Max((SLAB_DEFAULT_BLOCK_SIZE / (size)) * (size), (size) * 32)

The first parameter seems to be trying to make the block size exact,
but that's not right, because of the chunk header, and maybe
alignment. If the default block size is big enough to waste only a
tiny amount of space, let's just use that as-is. Also, I think all
block sizes in the code base have been a power of two, but I'm not
sure how much that matters.

+#ifdef RT_SHMEM
+ fprintf(stderr, "  [%d] chunk %x slot " DSA_POINTER_FORMAT "\n",
+ i, n4->chunks[i], n4->children[i]);
+#else
+ fprintf(stderr, "  [%d] chunk %x slot %p\n",
+ i, n4->chunks[i], n4->children[i]);
+#endif

Maybe we could invent a child pointer format, so we only #ifdef in one place.

--- /dev/null
+++ b/src/test/modules/test_radixtree/meson.build
@@ -0,0 +1,35 @@
+# FIXME: prevent install during main install, but not during test :/

Can you look into this?

test_radixtree.c:

+/*
+ * XXX: should we expose and use RT_SIZE_CLASS and RT_SIZE_CLASS_INFO?
+ */
+static int rt_node_class_fanouts[] = {
+ 4, /* RT_CLASS_3 */
+ 15, /* RT_CLASS_32_MIN */
+ 32, /* RT_CLASS_32_MAX */
+ 125, /* RT_CLASS_125 */
+ 256 /* RT_CLASS_256 */
+};

These numbers have been wrong a long time, too, but only matters for
figuring out where it went wrong when something is broken. And for the
XXX, instead of trying to use the largest number that should fit (it's
obviously not testing that the expected node can actually hold that
number anyway), it seems we can just use a "big enough" number to
cause growing into the desired size class.

As far as cleaning up the tests, I always wondered why these didn't
use EXPECT_TRUE, EXPECT_FALSE, etc. as in Andres's prototype where
where convenient, and leave comments above the tests. That seemed like
a good idea to me -- was there a reason to have hand-written branches
and elog messages everywhere?

--- a/src/tools/pginclude/cpluspluscheck
+++ b/src/tools/pginclude/cpluspluscheck
@@ -101,6 +101,12 @@ do
  test "$f" = src/include/nodes/nodetags.h && continue
  test "$f" = src/backend/nodes/nodetags.h && continue

+ # radixtree_*_impl.h cannot be included standalone: they are just
code fragments.
+ test "$f" = src/include/lib/radixtree_delete_impl.h && continue
+ test "$f" = src/include/lib/radixtree_insert_impl.h && continue
+ test "$f" = src/include/lib/radixtree_iter_impl.h && continue
+ test "$f" = src/include/lib/radixtree_search_impl.h && continue

Ha! I'd forgotten about these -- they're long outdated.

v55-0005:

- * The radix tree is locked in shared mode during the iteration, so
- * RT_END_ITERATE needs to be called when finished to release the lock.
+ * The caller needs to acquire a lock in shared mode during the iteration
+ * if necessary.

"need if necessary" is maybe better phrased as "is the caller's responsibility"

+ /*
+ * We can rely on DSA_AREA_LOCK to get the total amount of DSA memory.
+ */
  total = dsa_get_total_size(tree->dsa);

Maybe better to have a header comment for RT_MEMORY_USAGE that the
caller doesn't need to take a lock.

v55-0006:

"WIP: Not built, since some benchmarks have broken" -- I'll work on
this when I re-run some benchmarks.

v55-0007:

+ * Internally, a tid is encoded as a pair of 64-bit key and 64-bit value,
+ * and stored in the radix tree.

This hasn't been true for a few months now, and I thought we fixed
this in some earlier version?

+ * TODO: The caller must be certain that no other backend will attempt to
+ * access the TidStore before calling this function. Other backend must
+ * explicitly call TidStoreDetach() to free up backend-local memory associated
+ * with the TidStore. The backend that calls TidStoreDestroy() must not call
+ * TidStoreDetach().

Do we need to do anything now?

v55-0008:

-TidStoreAttach(dsa_area *area, TidStoreHandle handle)
+TidStoreAttach(dsa_area *area, dsa_pointer rt_dp)

"handle" seemed like a fine name. Is that not the case anymore? The
new one is kind of cryptic. The commit message just says "remove
control object" -- does that imply that we need to think of this
parameter differently, or is it unrelated? (Same with
dead_items_handl

Re: partitioning and identity column

2024-01-23 Thread Ashutosh Bapat
On Tue, Jan 23, 2024 at 12:29 AM Peter Eisentraut  wrote:
>
> On 22.01.24 13:23, Ashutosh Bapat wrote:
> >>   if (newdef->identity)
> >>   {
> >>   Assert(!is_partioning);
> >>   /*
> >>* Identity is never inherited.  The new column can have an
> >>* identity definition, so we always just take that one.
> >>*/
> >>   def->identity = newdef->identity;
> >>   }
> >>
> >> Thoughts?
> >
> > That code block already has Assert(!is_partition) at line 3085. I
> > thought that Assert is enough.
>
> Ok.  Maybe just rephrase that comment somehow then?

Please see refactoring patches attached to [1]. Refactoring that way
makes it unnecessary to mention "regular inheritance" in each comment.
Yet I have included a modified version of the comment in that patch
set.

>
> > There's another thing I found. The file isn't using
> > check_stack_depth() in the function which traverse inheritance
> > hierarchies. This isn't just a problem of the identity related
> > function but most of the functions in that file. Do you think it's
> > worth fixing it?
>
> I suppose the number of inheritance levels is usually not a problem for
> stack depth?
>

Practically it should not. I would rethink the application design if
it requires so many inheritance or partition levels. But functions in
optimizer like try_partitionwise_join() and set_append_rel_size() call

/* Guard against stack overflow due to overly deep inheritance tree. */
check_stack_depth();

I am fine if we want to skip this.

-- 
Best Wishes,
Ashutosh Bapat




Re: tablecmds.c/MergeAttributes() cleanup

2024-01-23 Thread Ashutosh Bapat
Hi Peter,

On Mon, Jan 22, 2024 at 6:13 PM Peter Eisentraut  wrote:
>
> On 06.12.23 09:23, Peter Eisentraut wrote:
> > The (now) second patch is also still of interest to me, but I have since
> > noticed that I think [0] should be fixed first, but to make that fix
> > simpler, I would like the first patch from here.
> >
> > [0]:
> > https://www.postgresql.org/message-id/flat/24656cec-d6ef-4d15-8b5b-e8dfc9c833a7%40eisentraut.org
>
> The remaining patch in this series needed a rebase and adjustment.
>
> The above precondition still applies.

While working on identity support and now while looking at the
compression problem you referred to, I found MergeAttribute() to be
hard to read. It's hard to follow high level logic in that function
since the function is not modular. I took a stab at modularising a
part of it. Attached is the resulting patch series.

0001 is your patch as is
0002 is pgindent fix and also fixing what I think is a typo/thinko
from 0001. If you are fine with the changes, 0002 should be merged
into 0003.
0003 separates the part of code merging a child attribute to the
corresponding inherited attribute into its own function.
0004 does the same for code merging inherited attributes incrementally.

I have kept 0003 and 0004 separate in case we pick one and not the
other. But they can be committed as a single commit.

The two new functions have some common code and some differences.
Common code is not surprising since merging attributes whether from
child definition or from inheritance parents will have common rules.
Differences are expected in cases when child attributes need to be
treated differently. But the differences may point us to some
yet-unknown bugs; compression being one of those differences. I think
the next step should combine these functions into one so that all the
logic to merge one attribute is at one place. I haven't attempted it;
wanted to propose the idea first.

I can see that this moduralization will lead to another and we will be
able to reduce MergeAttribute() to a set of function calls reflecting
its high level logic and push the detailed implementation into minion
functions like this.

-- 
Best Wishes,
Ashutosh Bapat
From 2553d082b16db2c54f2e1e4a66f1a57ecabf3c1e Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Wed, 24 Jan 2024 11:15:15 +0530
Subject: [PATCH 2/4] Mark NULL constraint in merged definition instead of new
 definition

This is a typo/thinko in previous commit.

Also fix pgindent issue.

Ashutosh Bapat
---
 src/backend/commands/tablecmds.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 04e674bbda..cde524083e 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -2726,8 +2726,8 @@ MergeAttributes(List *columns, const List *supers, char relpersistence,
 
 			/*
 			 * Regular inheritance children are independent enough not to
-			 * inherit identity columns.  But partitions are integral part
-			 * of a partitioned table and inherit identity column.
+			 * inherit identity columns.  But partitions are integral part of
+			 * a partitioned table and inherit identity column.
 			 */
 			if (is_partition)
 newdef->identity = attribute->attidentity;
@@ -2844,7 +2844,7 @@ MergeAttributes(List *columns, const List *supers, char relpersistence,
 if (bms_is_member(parent_attno, nncols) ||
 	bms_is_member(parent_attno - FirstLowInvalidHeapAttributeNumber,
   pkattrs))
-	newdef->is_not_null = true;
+	prevdef->is_not_null = true;
 
 /*
  * Check for GENERATED conflicts
-- 
2.25.1

From 092828a7d7274b48bf33c88863a97585bff80ebb Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Tue, 23 Jan 2024 17:00:43 +0530
Subject: [PATCH 4/4] Separate function to merge next parent attribute

Partition inherit from only a single parent.  The logic to merge an
attribute from the next parent into the corresponding attribute
inherited from previous parents in MergeAttribute() is only applicable
to regular inheritance children.  This code is isolated enough that it
can be separate into a function by itself.  This separation makes
MergeAttributes() more readable making it easy to follow high level
logic without getting entangled into details.

This separation revealed that the code handling NOT NULL constraints is
duplicated in blocks merging the attribute definition incrementally.
Deduplicate that code.

Author: Ashutosh Bapat
---
 src/backend/commands/tablecmds.c | 352 ---
 1 file changed, 178 insertions(+), 174 deletions(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index 843cc3bff6..3f0066b435 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -360,6 +360,8 @@ static List *MergeAttributes(List *columns, const List *supers, char relpersiste
 			 List **supnotnulls);
 static List *MergeCheckConstraint(List *co

Re: speed up a logical replica setup

2024-01-23 Thread Shubham Khanna
On Tue, Jan 23, 2024 at 7:41 AM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> > We fixed some of the comments posted in the thread. We have created it
> > as top-up patch 0002 and 0003.
>
> I found that the CFbot raised an ERROR. Also, it may not work well in case
> of production build.
> PSA the fixed patch set.

 Segmentation fault was found after testing the given command(There is
an extra '/' between 'new_standby2' and '-P') '$ gdb --args
./pg_subscriber -D ../new_standby2 / -P "host=localhost
port=5432 dbname=postgres" -d postgres'
While executing the above command, I got the following error:
pg_subscriber: error: too many command-line arguments (first is "/")
pg_subscriber: hint: Try "pg_subscriber --help" for more information.

Program received signal SIGSEGV, Segmentation fault.
0x7e5b in cleanup_objects_atexit () at pg_subscriber.c:173
173if (perdb->made_subscription)
(gdb) p perdb
$1 = (LogicalRepPerdbInfo *) 0x0

Thanks and Regards,
Shubham Khanna.




Re: Synchronizing slots from primary to standby

2024-01-23 Thread Masahiko Sawada
On Wed, Jan 24, 2024 at 2:43 PM Amit Kapila  wrote:
>
> On Wed, Jan 24, 2024 at 10:41 AM Masahiko Sawada  
> wrote:
> >
> > On Mon, Jan 22, 2024 at 3:58 PM Amit Kapila  wrote:
> > >
> > > Can we think of using GetStandbyFlushRecPtr()? We probably need to
> > > expose this function, if this works for the required purpose.
> >
> > GetStandbyFlushRecPtr() seems good. But do we really want to raise an
> > ERROR in this case? IIUC this case could happen often when the slot
> > used by the standby is not listed in standby_slot_names.
> >
>
> or it can be due to some bug in the code as well.
>
> > I think we
> > can just skip such a slot to synchronize and check it the next time.
> >
>
> How about logging the message and then skipping the sync step? This
> will at least make users aware that they could be missing to set
> standby_slot_names.

+1

>
> > Here are random comments on slotsyncworker.c (v66):
> >
> > +/* GUC variable */
> > +bool   enable_syncslot = false;
> >
> > Is enable_syncslot a really good name? We use "enable" prefix only for
> > planner parameters such as enable_seqscan, and it seems to me that
> > "slot" is not specific. Other candidates are:
> >
> > * synchronize_replication_slots = on|off
> > * synchronize_failover_slots = on|off
> >
>
> I would prefer the second one. Would it be better to just say
> sync_failover_slots?

Works for me. But if we want to extend this option for non-failover
slots as well in the future, synchronize_replication_slots (or
sync_replication_slots) seems better. We can extend it by having an
enum later. For example, the values can be on, off, or failover etc.

Regards,

-- 
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com




Re: Make COPY format extendable: Extract COPY TO format implementations

2024-01-23 Thread Sutou Kouhei
Hi,

I've implemented custom COPY format feature based on the
current design discussion. See the attached patches for
details.

I also implemented a PoC COPY format handler for Apache
Arrow with this implementation and it worked.
https://github.com/kou/pg-copy-arrow

The patches implement not only custom COPY TO format feature
but also custom COPY FROM format feature.

0001-0004 is for COPY TO and 0005-0008 is for COPY FROM.

For COPY TO:

0001: This adds CopyToRoutine and use it for text/csv/binary
formats. No implementation change. This just move codes.

0002: This adds support for adding custom COPY TO format by
"CREATE FUNCTION ${FORMAT_NAME}". This uses the same
approach provided by Sawada-san[1] but this doesn't
introduce a wrapper CopyRoutine struct for
Copy{To,From}Routine. Because I noticed that a wrapper
CopyRoutine struct is needless. Copy handler can just return
CopyToRoutine or CopyFromRtouine because both of them have
NodeTag. We can distinct a returned struct by the NodeTag.

[1] 
https://www.postgresql.org/message-id/cad21aocunywhird3gapzwe6s9jg1wzxj3cr6vgn36ddhegj...@mail.gmail.com

0003: This exports CopyToStateData. No implementation change
except CopyDest enum values. I changed COPY_ prefix to
COPY_DEST_ to avoid name conflict with CopySource enum
values. This just moves codes.

0004: This adds CopyToState::opaque and exports
CopySendEndOfRow(). CopySendEndOfRow() is renamed to
CopyToStateFlush().

For COPY FROM:

0005: Same as 0001 but for COPY FROM. This adds
CopyFromRoutine and use it for text/csv/binary formats. No
implementation change. This just move codes.

0006: Same as 0002 but for COPY FROM. This adds support for
adding custom COPY FROM format by "CREATE FUNCTION
${FORMAT_NAME}".

0007: Same as 0003 but for COPY FROM. This exports
CopyFromStateData. No implementation change except
CopySource enum values. I changed COPY_ prefix to
COPY_SOURCE_ to align CopyDest changes in 0003. This just
moves codes.

0008: Same as 0004 but for COPY FROM. This adds
CopyFromState::opaque and exports
CopyReadBinaryData(). CopyReadBinaryData() is renamed to
CopyFromStateRead().


Thanks,
-- 
kou

In <20240115.152702.2011620917962812379@clear-code.com>
  "Re: Make COPY format extendable: Extract COPY TO format implementations" on 
Mon, 15 Jan 2024 15:27:02 +0900 (JST),
  Sutou Kouhei  wrote:

> Hi,
> 
> If there are no more comments for the current design, I'll
> start implementing this feature with the following
> approaches for "Discussing" items:
> 
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
> 
> I'll choose "one function(internal) for COPY TO/FROM handlers".
> 
>> 3.4 Should we export Copy{To,From}State? Or should we just
>> provide getters/setters to access Copy{To,From}State
>> internal?
>> [10]
> 
> I'll export Copy{To,From}State.
> 
> 
> Thanks,
> -- 
> kou
> 
> In <20240112.144615.157925223373344229@clear-code.com>
>   "Re: Make COPY format extendable: Extract COPY TO format implementations" 
> on Fri, 12 Jan 2024 14:46:15 +0900 (JST),
>   Sutou Kouhei  wrote:
> 
>> Hi,
>> 
>> Here is the current summary for a this discussion to make
>> COPY format extendable. It's for reaching consensus and
>> starting implementing the feature. (I'll start implementing
>> the feature once we reach consensus.) If you have any
>> opinion, please share it.
>> 
>> Confirmed:
>> 
>> 1.1 Making COPY format extendable will not reduce performance.
>> [1]
>> 
>> Decisions:
>> 
>> 2.1 Use separated handler for COPY TO and COPY FROM because
>> our COPY TO implementation (copyto.c) and COPY FROM
>> implementation (coypfrom.c) are separated.
>> [2]
>> 
>> 2.2 Don't use system catalog for COPY TO/FROM handlers. We can
>> just use a function(internal) that returns a handler instead.
>> [3]
>> 
>> 2.3 The implementation must include documentation.
>> [5]
>> 
>> 2.4 The implementation must include test.
>> [6]
>> 
>> 2.5 The implementation should be consist of small patches
>> for easy to review.
>> [6]
>> 
>> 2.7 Copy{To,From}State must have a opaque space for
>> handlers.
>> [8]
>> 
>> 2.8 Export CopySendData() and CopySendEndOfRow() for COPY TO
>> handlers.
>> [8]
>> 
>> 2.9 Make "format" in PgMsg_CopyOutResponse message
>> extendable.
>> [9]
>> 
>> 2.10 Make makeNode() call avoidable in function(internal)
>>  that returns COPY TO/FROM handler.
>>  [9]
>> 
>> 2.11 Custom COPY TO/FROM handlers must be able to parse
>>  their options.
>>  [11]
>> 
>> Discussing:
>> 
>> 3.1 Should we use one function(internal) for COPY TO/FROM
>> handlers or two function(internal)s (one is for COPY TO
>> handler and another is for COPY FROM handler)?
>> [4]
>> 
>> 3.2 If we use separated function(internal) for COPY TO/FROM
>> handlers, we need to define naming rule. For example,
>>

Re: Synchronizing slots from primary to standby

2024-01-23 Thread Amit Kapila
On Wed, Jan 24, 2024 at 10:41 AM Masahiko Sawada  wrote:
>
> On Mon, Jan 22, 2024 at 3:58 PM Amit Kapila  wrote:
> >
> > Can we think of using GetStandbyFlushRecPtr()? We probably need to
> > expose this function, if this works for the required purpose.
>
> GetStandbyFlushRecPtr() seems good. But do we really want to raise an
> ERROR in this case? IIUC this case could happen often when the slot
> used by the standby is not listed in standby_slot_names.
>

or it can be due to some bug in the code as well.

> I think we
> can just skip such a slot to synchronize and check it the next time.
>

How about logging the message and then skipping the sync step? This
will at least make users aware that they could be missing to set
standby_slot_names.

> Here are random comments on slotsyncworker.c (v66):
>
> +/* GUC variable */
> +bool   enable_syncslot = false;
>
> Is enable_syncslot a really good name? We use "enable" prefix only for
> planner parameters such as enable_seqscan, and it seems to me that
> "slot" is not specific. Other candidates are:
>
> * synchronize_replication_slots = on|off
> * synchronize_failover_slots = on|off
>

I would prefer the second one. Would it be better to just say
sync_failover_slots?

-- 
With Regards,
Amit Kapila.




Re: Documentation to upgrade logical replication cluster

2024-01-23 Thread vignesh C
On Mon, 15 Jan 2024 at 14:39, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Vignesh,
>
> Thanks for updating the patch!
>
> > > 7.
> > > ```
> > > +
> > > +dba@node1:/opt/PostgreSQL/postgres/&majorversion;/bin$ pg_ctl -D
> > /opt/PostgreSQL/pub_data stop -l logfile
> > > +
> > > ```
> > >
> > > Hmm. I thought you did not have to show the current directory. You were 
> > > in the
> > > bin dir, but it is not our requirement, right?
> >
> > I kept this just to show the version being used
> >
>
> Hmm, but by default, the current directory is not set as PATH. So this example
> looks strange for me.

I have removed the paths shown to avoid confusion.

> Below lines are my comments for v2 patch.
>
> 01.
>
> ```
> +   
> +Upgrade logical replication clusters
> +
> +
> + Refer logical replication 
> upgrade section
> + for details on upgrading logical replication clusters.
> +
> +
> +   
> ```
>
> I think we do not have to write it as one of steps. I think we can move to
> "Usage" part and modify like:
>
> This page only focus on nodes which are not logical replication participant. 
> See
>  for upgrading such nodes.

I have removed it from usage and moved it to the description section.

> 02.
>
> ```
>with the primary.)  Only logical slots on the primary are copied to the
>new standby, but other slots on the old standby are not copied so must
>be recreated manually.
> ```
>
> A description for logical slots were remained. If you want to keep, we must
> say that it would be done for PG17+.

Mentioned as 17 or later.

> 03.
>
> I think the numbering seems bit confusing. sectX sgml tags should be used in
> this case. How about formatting like below?
>
> Upgrade (sect1)
> --- Prerequisites (sect2)
>   --- For upgrading a publisher node  (sect3)
>   --- For upgrading a subscriber node  (sect3)
> --- Examples (sect2)
>   --- Two-node logical replication cluster  (sect3)
>   --- Cascaded logical replication cluster  (sect3)
>   --- Two-node circular logical replication cluster (sect3)

I felt this is better and changed it like:
 30.11. Upgrade
 --- 30.11.1. Prepare for publisher upgrades
 --- 30.11.2. Prepare for subscriber upgrades
 --- 30.11.3. Upgrading logical replication clusters
  --- 30.11.3.1. Steps to upgrade a two-node logical replication cluster
  --- 30.11.3.2. Steps to upgrade a cascaded logical replication cluster
  --- 30.11.3.3. Steps to upgrade a two-node circular logical
replication cluster

> 04.
>
> Missing introduction in the head of this section. E.g.,
>
> Both publishers and subscribers can be upgraded, but there are some notes.
> Before reading this section, you should read  page.

Added it with slight changes

> 05.
>
> ```
> +   
> +Prepare for publisher upgrades
> ...
> ```
>
> Should we describe in this page that publications can be upgraded in any
> versions?

I felt that need not be mentioned, as these are being upgraded from
earlier versions too

> 06.
>
> ```
> +   
> +Prepare for subscriber upgrades ```
>
> Same as 5, should we describe in this page that subscriptions can be upgraded
> in any versions?

I felt that need not be mentioned, as these are being upgraded from
earlier versions too

> 07.
>
> Basic considerations should be described before describing concrete steps.

The steps clearly mention the order in which it should be upgraded,
I'm not sure if we should repeat it again.

> E.g., publishers must be upgraded first. Also: While upgrading a subscriber,
> publisher can accept changes from users.

I  have added this.

> 08.
>
> ```
> +   two subscriptions sub1_node1_node2 and sub2_node1_node2 which is
> +   subscribing the changes from node1.
> ```
>
> Both "sub1_node1_node2" and "sub2_node1_node2" must be rendered.

Modified

> 09.
>
> ```
> +   
> +
> + Initialize data1_upgraded instance by using the required newer
> + version.
> +
> ```
>
> Missing rendering. All similar paragraphs must be fixed.

Modified

> 10.
>
> ```
> + On node2, create any tables that were created in
> + the upgraded publisher node1 server between
> + 
> + when the subscriptions where disabled in 
> node2
> + and now, e.g.:
> ```
>
> a.
>
> I think the link is not correct, it should refer Step 6. Can we add the step 
> number?
> All similar paragraphs must be fixed.

I have kept it as step1 just in case any table is created before the
server is stopped in node1. So I felt it is better to refer to the
step of disabled subscription now.

> b.
>
> Not sure, but s/where disabled/were disabled/ ?
> All similar paragraphs must be fixed.

This is removed

> 11.
>
> ```
> +
> + Refresh the node2 subscription's publications 
> using
> +  linkend="sql-altersubscription-params-refresh-publication">ALTER 
> SUBSCRIPTION ... REFRESH PUBLICATION,
> + e.g.:
> +
> +node2=# ALTER SUBSCRIPTION sub1_node1_node2 REFRESH PUBLICATION;
> +ALTER S

Re: Documentation to upgrade logical replication cluster

2024-01-23 Thread vignesh C
On Mon, 15 Jan 2024 at 09:01, Peter Smith  wrote:
>
> Hi Vignesh, here are some review comments for patch v2-0001.
>
> ==
> doc/src/sgml/ref/pgupgrade.sgml
>
> 1.
> +   
> +Upgrade logical replication clusters
> +
> +
> + Refer logical
> replication upgrade section
> + for details on upgrading logical replication clusters.
> +
> +
> +   
> +
>
> This renders like:
> Refer logical replication upgrade section for details on upgrading
> logical replication clusters.
>
> ~
>
> IMO it would be better to use xref instead of link, which will render
> more normally like:
> See Section 30.11 for details on upgrading logical replication clusters.
>
> SUGGESTION
> 
>  See 
>  for details on upgrading logical replication clusters.
> 

Modified

> ==
> doc/src/sgml/logical-replication.sgml
>
> 2. GENERAL - blurb
>
> + 
> +  Upgrade
> +
> +  
> +   
> +Prepare for publisher upgrades
>
> I felt there should be a short (1 or 2 sentence) general blurb about
> pub/sub upgrade before jumping straight into:
>
> "1. Prepare for publisher upgrades"
> "2. Prepare for subscriber upgrades"
> "3. Upgrading logical replication cluster"

Added

> ~
>
> Specifically, at first, it looks strange that the HTML renders as
> steps 1,2,3 instead of sub-sections (30.11.1, 30.11.2, 30.11.3); Maybe
> "steps" are fine, but then at least there needs to be some intro
> sentence saying like "follow these steps:"
> ~~~

Modified

>
> 3.
> +   
> +Upgrading logical replication cluster
>
> /cluster/clusters/

Modified

> ~~~
>
> 4.
> +
> + The steps to upgrade the following logical replication clusters are
> + detailed below:
> + 
> +  
> +   
> +Two-node
> logical replication cluster.
> +   
> +  
> +  
> +   
> +Cascaded
> logical replication cluster.
> +   
> +  
> +  
> +   
> + linkend="steps-two-node-circular-logical-replication-cluster">Two-node
> circular logical replication cluster.
> +   
> +  
> + 
> +
>
> Isn't there a better way to accomplish this by using xref and
> 'xreflabel' so you don't have to type the link text here?

Modified

>
> //
> Steps to upgrade a two-node logical replication cluster
> //
>
> 5.
> +  
> +   Let's say publisher is in node1 and subscriber is
> +   in node2. The subscriber node2 
> has
> +   two subscriptions sub1_node1_node2 and sub2_node1_node2 which is
> +   subscribing the changes from node1.
> +  
>
> 5a
> Those subscription names should also be rendered as literals.

Modified

> ~
>
> 5b
> /which is/which are/

Modified

> ~~~
>
> 6.
> +   
> +
> + Initialize data1_upgraded instance by using the required newer
> + version.
> +
> +   
>
> data1_upgraded should be rendered as literal.

Modified

> ~~~
>
> 7.
> +
> +   
> +
> + Initialize data2_upgraded instance by using the required newer
> + version.
> +
> +   
>
> data2_upgraded should be rendered as literal.

Modified

> ~~~
>
> 8.
> +
> +   
> +
> + On node2, create any tables that were created in
> + the upgraded publisher node1 server between
> + 
> + when the subscriptions where disabled in
> node2
> + and now, e.g.:
> +
> +node2=# CREATE TABLE distributors (did integer PRIMARY KEY, name 
> varchar(40));
> +CREATE TABLE
> +
> +
> +   
>
> 8a.
> This link to the earlier step renders badly like:
> On node2, create any tables that were created in the upgraded
> publisher node1 server between when the subscriptions where disabled
> in node2 and now, e.g.:
>
> IMO this link should be like "Step N", not some words -- maybe it is
> another opportunity for using xreflabel?

Modified

> ~
>
> 8b.
> Also has typos "when the subscriptions where disabled" (??)

This is not required after using xref, removed it.

> //
> Steps to upgrade a cascaded logical replication clusters
> //
>
> 9.
> +
> + 
> +  Steps to upgrade a cascaded logical replication clusters
>
> The title has a strange mix of singular "a" and plural "clusters"

Changed it to keep it consistent

> ~~~
>
> 10.
> +  
> +   Let's say we have a cascaded logical replication setup
> +   
> node1->node2->node3.
> +   Here node2 is subscribing the changes from
> +   node1 and node3 is subscribing
> +   the changes from node2. The 
> node2
> +   has two subscriptions sub1_node1_node2 and sub2_node1_node2 which is
> +   subscribing the changes from node1. The
> +   node3 has two subscriptions sub1_node2_node3 and
> +   sub2_node2_node3 which is subscribing the changes from
> +   node2.
> +  
>
> 10a.
> Those subscription names should also be rendered as literals.

Modified

> ~
>
> 10b.
> /which is/which are/ (occurs 2x)

Modified

> ~~~
>
> 11.
> +
> +   
> +
> + Initialize data1_upgra

Re: Synchronizing slots from primary to standby

2024-01-23 Thread Masahiko Sawada
On Mon, Jan 22, 2024 at 3:58 PM Amit Kapila  wrote:
>
> On Fri, Jan 19, 2024 at 3:55 PM shveta malik  wrote:
> >
> > On Fri, Jan 19, 2024 at 10:35 AM Masahiko Sawada  
> > wrote:
> > >
> > >
> > > Thank you for updating the patch. I have some comments:
> > >
> > > ---
> > > +latestWalEnd = GetWalRcvLatestWalEnd();
> > > +if (remote_slot->confirmed_lsn > latestWalEnd)
> > > +{
> > > +elog(ERROR, "exiting from slot synchronization as the
> > > received slot sync"
> > > + " LSN %X/%X for slot \"%s\" is ahead of the
> > > standby position %X/%X",
> > > + LSN_FORMAT_ARGS(remote_slot->confirmed_lsn),
> > > + remote_slot->name,
> > > + LSN_FORMAT_ARGS(latestWalEnd));
> > > +}
> > >
> > > IIUC GetWalRcvLatestWalEnd () returns walrcv->latestWalEnd, which is
> > > typically the primary server's flush position and doesn't mean the LSN
> > > where the walreceiver received/flushed up to.
> >
> > yes. I think it makes more sense to use something which actually tells
> > flushed-position. I gave it a try by replacing GetWalRcvLatestWalEnd()
> > with GetWalRcvFlushRecPtr() but I see a problem here. Lets say I have
> > enabled the slot-sync feature in a running standby, in that case we
> > are all good (flushedUpto is the same as actual flush-position
> > indicated by LogstreamResult.Flush). But if I restart standby, then I
> > observed that the startup process sets flushedUpto to some value 'x'
> > (see [1]) while when the wal-receiver starts, it sets
> > 'LogstreamResult.Flush' to another value (see [2]) which is always
> > greater than 'x'. And we do not update flushedUpto with the
> > 'LogstreamResult.Flush' value in walreceiver until we actually do an
> > operation on primary. Performing a data change on primary sends WALs
> > to standby which then hits XLogWalRcvFlush() and updates flushedUpto
> > same as LogstreamResult.Flush. Until then we have a situation where
> > slots received on standby are ahead of flushedUpto and thus slotsync
> > worker keeps one erroring out. I am yet to find out why flushedUpto is
> > set to a lower value than 'LogstreamResult.Flush' at the start of
> > standby.  Or maybe am I using the wrong function
> > GetWalRcvFlushRecPtr() and should be using something else instead?
> >
>
> Can we think of using GetStandbyFlushRecPtr()? We probably need to
> expose this function, if this works for the required purpose.

GetStandbyFlushRecPtr() seems good. But do we really want to raise an
ERROR in this case? IIUC this case could happen often when the slot
used by the standby is not listed in standby_slot_names. I think we
can just skip such a slot to synchronize and check it the next time.

Here are random comments on slotsyncworker.c (v66):

---
The postmaster relaunches the slotsync worker without intervals. So if
a connection string in primary_conninfo is not correct, many errors
are emitted.

---
+/* GUC variable */
+bool   enable_syncslot = false;

Is enable_syncslot a really good name? We use "enable" prefix only for
planner parameters such as enable_seqscan, and it seems to me that
"slot" is not specific. Other candidates are:

* synchronize_replication_slots = on|off
* synchronize_failover_slots = on|off

---
+   elog(ERROR,
+"cannot synchronize local slot \"%s\" LSN(%X/%X)"
+" to remote slot's LSN(%X/%X) as synchronization"
+" would move it backwards", remote_slot->name,

Many error messages in slotsync.c are splitted into several lines, but
I think it would reduce the greppability when the user looks for the
error message in the source code.

---
+   SpinLockAcquire(&slot->mutex);
+   slot->data.database = get_database_oid(remote_slot->database, false);
+   namestrcpy(&slot->data.plugin, remote_slot->plugin);

We should not access syscaches while holding a spinlock.

---
+   SpinLockAcquire(&slot->mutex);
+   slot->data.database = get_database_oid(remote_slot->database, false);
+   namestrcpy(&slot->data.plugin, remote_slot->plugin);
+   SpinLockRelease(&slot->mutex);

Similarly, it's better to avoid calling namestrcpy() while holding a
spinlock, as we do in CreateInitDecodingContext().

---
+   SpinLockAcquire(&SlotSyncWorker->mutex);
+
+   SlotSyncWorker->stopSignaled = true;
+
+   if (SlotSyncWorker->pid == InvalidPid)
+   {
+   SpinLockRelease(&SlotSyncWorker->mutex);
+   return;
+   }
+
+   kill(SlotSyncWorker->pid, SIGINT);
+
+   SpinLockRelease(&SlotSyncWorker->mutex);

It's better to avoid calling a system call while holding a spin lock.

---
+   BackgroundWorkerUnblockSignals();

I think it's no longer necessary.

---
+   ereport(LOG,
+   /* translator: %s is a GUC variable name */
+   errmsg("bad configuration for slot synchronization"),
+   errhint("\"wal_level\" must be >= logical."));

There is 

Re: Synchronizing slots from primary to standby

2024-01-23 Thread Amit Kapila
On Wed, Jan 24, 2024 at 8:52 AM Peter Smith  wrote:
>
> Here are some comments for patch v66-0001.
>
> ==
> doc/src/sgml/catalogs.sgml
>
> 1.
> +  
> +   If true, the associated replication slots (i.e. the main slot and the
> +   table sync slots) in the upstream database are enabled to be
> +   synchronized to the physical standbys
> +  
>
> /physical standbys/physical standby/
>
> I wondered if it is better just to say singular "standby" instead of
> "standbys" in places like this; e.g. plural might imply cascading for
> some readers.
>

I don't think it is confusing as we used in a similar way in docs. We
can probably avoid using physical in places similar to above as that
is implied

>
>
> ==
> src/backend/replication/slotfuncs.c
>
> 11. create_physical_replication_slot
>
>   /* acquire replication slot, this will check for conflicting names */
>   ReplicationSlotCreate(name, false,
> -   temporary ? RS_TEMPORARY : RS_PERSISTENT, false);
> +   temporary ? RS_TEMPORARY : RS_PERSISTENT, false,
> +   false);
>
> Having an inline comment might be helpful here instead of passing 
> "false,false"
>
> SUGGESTION
> ReplicationSlotCreate(name, false,
>  temporary ? RS_TEMPORARY : RS_PERSISTENT, false,
>  false /* failover */);
>

I don't think we follow to use of inline comments. I feel that
sometimes makes code difficult to read considering when we have
multiple such parameters.

-- 
With Regards,
Amit Kapila.




Re: Synchronizing slots from primary to standby

2024-01-23 Thread Peter Smith
Here are some comments for patch v66-0001.

==
doc/src/sgml/catalogs.sgml

1.
+  
+   If true, the associated replication slots (i.e. the main slot and the
+   table sync slots) in the upstream database are enabled to be
+   synchronized to the physical standbys
+  

/physical standbys/physical standby/

I wondered if it is better just to say singular "standby" instead of
"standbys" in places like this; e.g. plural might imply cascading for
some readers.

There are a number of examples like this, so I've repeated the same
comment multiple times below. If you disagree, please just ignore all
of them.

==
doc/src/sgml/func.sgml

2.
 that the decoding of prepared transactions is enabled for this
-slot. A call to this function has the same effect as the replication
-protocol command CREATE_REPLICATION_SLOT ...
LOGICAL.
+slot. The optional fifth parameter,
+failover, when set to true,
+specifies that this slot is enabled to be synced to the
+physical standbys so that logical replication can be resumed
+after failover. A call to this function has the same effect as
+the replication protocol command
+CREATE_REPLICATION_SLOT ... LOGICAL.


(same as above)

/physical standbys/physical standby/

Also, I don't see anything else on this page using plural "standbys".

==
doc/src/sgml/protocol.sgml

3. CREATE_REPLICATION_SLOT

+   
+FAILOVER [ boolean ]
+
+ 
+  If true, the slot is enabled to be synced to the physical
+  standbys so that logical replication can be resumed after failover.
+  The default is false.
+ 
+
+   

(same as above)

/physical standbys/physical standby/

~~~

4. ALTER_REPLICATION_SLOT

+  
+   
+FAILOVER [ boolean ]
+
+ 
+  If true, the slot is enabled to be synced to the physical
+  standbys so that logical replication can be resumed after failover.
+ 
+
+   
+  

(same as above)

/physical standbys/physical standby/

==
doc/src/sgml/ref/create_subscription.sgml

5.
+   
+failover (boolean)
+
+ 
+  Specifies whether the replication slots associated with the
subscription
+  are enabled to be synced to the physical standbys so that logical
+  replication can be resumed from the new primary after failover.
+  The default is false.
+ 
+
+   

(same as above)

/physical standbys/physical standby/

==
doc/src/sgml/system-views.sgml

6.
+
+ 
+  
+   failover bool
+  
+  
+   True if this is a logical slot enabled to be synced to the physical
+   standbys so that logical replication can be resumed from the new primary
+   after failover. Always false for physical slots.
+  
+ 

(same as above)

/physical standbys/physical standby/

==
src/backend/commands/subscriptioncmds.c

7.
+ if (IsSet(opts.specified_opts, SUBOPT_FAILOVER))
+ {
+ if (!sub->slotname)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set failover for a subscription that does not have a
slot name")));
+
+ /*
+ * Do not allow changing the failover state if the
+ * subscription is enabled. This is because the failover
+ * state of the slot on the publisher cannot be modified if
+ * the slot is currently acquired by the apply worker.
+ */
+ if (sub->enabled)
+ ereport(ERROR,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg("cannot set %s for enabled subscription",
+ "failover")));
+
+ values[Anum_pg_subscription_subfailover - 1] =
+ BoolGetDatum(opts.failover);
+ replaces[Anum_pg_subscription_subfailover - 1] = true;
+ }

The first message is not consistent with the second. The "failover"
option maybe should be extracted so it won't be translated.

SUGGESTION
errmsg("cannot set %s for a subscription that does not have a slot
name", "failover")

~~~

8. AlterSubscription

+ if (!wrconn)
+ ereport(ERROR,
+ (errcode(ERRCODE_CONNECTION_FAILURE),
+ errmsg("could not connect to the publisher: %s", err)));
+

Need to keep an eye on the patch proposed by Nisha [1] for messages
similar to this one, so in case that gets pushed this code should be
changed appropriately.

==
src/backend/replication/slot.c

9.
  * during getting changes, if the two_phase option is enabled it can skip
  * prepare because by that time start decoding point has been moved. So the
  * user will only get commit prepared.
+ * failover: If enabled, allows the slot to be synced to physical standbys so
+ * that logical replication can be resumed after failover.
  */

(same as earlier)

/physical standbys/physical standby/

~~~

10.
+ /*
+ * Do not allow users to alter slots to enable failover on the standby
+ * as we do not support sync to the cascading standby.
+ */
+ if (RecoveryInProgress() && failover)
+ ereport(ERR

Re: Add code indentation check to cirrus-ci (was Re: Add BF member koel-like indentation checks to SanityCheck CI)

2024-01-23 Thread Bharath Rupireddy
On Mon, Jan 22, 2024 at 8:48 PM Andrew Dunstan  wrote:
>
>
> On 2024-01-21 Su 22:19, Peter Smith wrote:
> > 2024-01 Commitfest.
> >
> > Hi, This patch has a CF status of "Needs Review" [1], but it seems
> > like there was some CFbot test failure last time it was run [2].
> > Please have a look and post an updated version if necessary.
> >
>
> I don't think there's a consensus that we want this. It should probably
> be returned with feedback.

I've withdrawn the CF entry as the idea is being discussed in a
separate thread -
https://www.postgresql.org/message-id/20231019044907.ph6dw637loqg3lqk%40awork3.anarazel.de

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Custom explain options

2024-01-23 Thread David G. Johnston
Came across this while looking for patches to review.  IMO this thread has
been hijacked to the point of being not useful for the subject.  I suggest
this discussion regarding prefetch move to its own thread and this thread
and commitfest entry be ended/returned with feedback.

Also IMO, the commitfest is not for early stage idea patches.  The stuff on
there that is ready for review should at least be thought of by the
original author as something they would be willing to commit.  I suggest
you post the most recent patch and a summary of the discussion to a new
thread that hopefully won't be hijacked.  Consistent and on-topic replies
will keep the topic front-and-center on the lists until a patch is ready
for consideration.

David J.


Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 6:30 AM, Shlok Kyal wrote:
> We fixed some of the comments posted in the thread. We have created it
> as top-up patch 0002 and 0003.

Cool.

> 0002 patch contains the following changes:
> * Add a timeout option for the recovery option, per [1]. The code was
> basically ported from pg_ctl.c.
> * Reject if the target server is not a standby, per [2]
> * Raise FATAL error if --subscriber-conninfo specifies non-local server, per 
> [3]
>   (not sure it is really needed, so feel free reject the part.)
> * Add check for max_replication_slots and wal_level; as per [4]
> * Add -u and -p options; as per [5]
> * Addressed comment except 5 and 8 in [6] and comment in [7]

My suggestion is that you create separate patches for each change. It helps
with review and alternative proposals. Some of these items conflict with what I
have in my local branch and removing one of them is time consuming. For this
one, I did the job but let's avoid rework.

> 0003 patch contains fix for bug reported in [8].

LGTM. As I said in the other email, I included it.

I'll post a new one soon.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: index prefetching

2024-01-23 Thread Melanie Plageman
On Tue, Jan 23, 2024 at 12:43 PM Tomas Vondra
 wrote:
>
> On 1/19/24 22:43, Melanie Plageman wrote:
>
> > We fill a queue with blocks from TIDs that we fetched from the index.
> > The queue is saved in a scan descriptor that is made available to the
> > streaming read callback. Once the queue is full, we invoke the table
> > AM specific index_fetch_tuple() function which calls
> > pg_streaming_read_buffer_get_next(). When the streaming read API
> > invokes the callback we registered, it simply dequeues a block number
> > for prefetching.
>
> So in a way there are two queues in IndexFetchTableData. One (blk_queue)
> is being filled from IndexNext, and then the queue in StreamingRead.

I've changed the name from blk_queue to tid_queue to fix the issue you
mention in your later remarks.
I suppose there are two queues. The tid_queue is just to pass the
block requests to the streaming read API. The prefetch distance will
be the smaller of the two sizes.

> > The only change to the streaming read API is that now, even if the
> > callback returns InvalidBlockNumber, we may not be finished, so make
> > it resumable.
>
> Hmm, not sure when can the callback return InvalidBlockNumber before
> reaching the end. Perhaps for the first index_fetch_heap call? Any
> reason not to fill the blk_queue before calling index_fetch_heap?

The callback will return InvalidBlockNumber whenever the queue is
empty. Let's say your queue size is 5 and your effective prefetch
distance is 10 (some combination of the PgStreamingReadRange sizes and
PgStreamingRead->max_ios). The first time you call index_fetch_heap(),
the callback returns InvalidBlockNumber. Then the tid_queue is filled
with 5 tids. Then index_fetch_heap() is called.
pg_streaming_read_look_ahead() will prefetch all 5 of these TID's
blocks, emptying the queue. Once all 5 have been dequeued, the
callback will return InvalidBlockNumber.
pg_streaming_read_buffer_get_next() will return one of the 5 blocks in
a buffer and save the associated TID in the per_buffer_data. Before
index_fetch_heap() is called again, we will see that the queue is not
full and fill it up again with 5 TIDs. So, the callback will return
InvalidBlockNumber 3 times in this scenario.

> > Structurally, this changes the timing of when the heap blocks are
> > prefetched. Your code would get a tid from the index and then prefetch
> > the heap block -- doing this until it filled a queue that had the
> > actual tids saved in it. With my approach and the streaming read API,
> > you fetch tids from the index until you've filled up a queue of block
> > numbers. Then the streaming read API will prefetch those heap blocks.
>
> And is that a good/desirable change? I'm not saying it's not, but maybe
> we should not be filling either queue in one go - we don't want to
> overload the prefetching.

We can focus on the prefetch distance algorithm maintained in the
streaming read API and then make sure that the tid_queue is larger
than the desired prefetch distance maintained by the streaming read
API.

> > I didn't actually implement the block queue -- I just saved a single
> > block number and pretended it was a block queue. I was imagining we
> > replace this with something like your IndexPrefetch->blockItems --
> > which has light deduplication. We'd probably have to flesh it out more
> > than that.
>
> I don't understand how this passes the TID to the index_fetch_heap.
> Isn't it working only by accident, due to blk_queue only having a single
> entry? Shouldn't the first queue (blk_queue) store TIDs instead?

Oh dear! Fixed in the attached v2. I've replaced the single
BlockNumber with a single ItemPointerData. I will work on implementing
an actual queue next week.

> > There are also table AM layering violations in my sketch which would
> > have to be worked out (not to mention some resource leakage I didn't
> > bother investigating [which causes it to fail tests]).
> >
> > 0001 is all of Thomas' streaming read API code that isn't yet in
> > master and 0002 is my rough sketch of index prefetching using the
> > streaming read API
> >
> > There are also numerous optimizations that your index prefetching
> > patch set does that would need to be added in some way. I haven't
> > thought much about it yet. I wanted to see what you thought of this
> > approach first. Basically, is it workable?
>
> It seems workable, yes. I'm not sure it's much simpler than my patch
> (considering a lot of the code is in the optimizations, which are
> missing from this patch).
>
> I think the question is where should the optimizations happen. I suppose
> some of them might/should happen in the StreamingRead API itself - like
> the detection of sequential patterns, recently prefetched blocks, ...

So, the streaming read API does detection of sequential patterns and
not prefetching things that are in shared buffers. It doesn't handle
avoiding prefetching recently prefetched blocks yet AFAIK. But I
daresay this would be relevant for other streamin

Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 22:00:01 +0300, Alexander Lakhin wrote:
> 23.01.2024 20:30, Andres Freund wrote:
> > I don't think that's viable and would cause more problems than it solves, 
> > it'd
> > make us think that we might have an old postgres process hanging around that
> > needs to be terminted before we can start up. And I simply don't see the 
> > point
> > - we already record whether we crashed in the control file, no?
> 
> With an Assert injected in walsender.c (as in [1]) and test
> 012_subtransactions.pl modified to finish just after the first
> "$node_primary->stop;", I see:
> pg_controldata -D 
> src/test/recovery/tmp_check/t_012_subtransactions_primary_data/pgdata/
> Database cluster state:   shut down
> 
> But the assertion undoubtedly failed:
> grep TRAP src/test/recovery/tmp_check/log/*
> src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed
> Assert("0"), File: "walsender.c", Line: 2688, PID: 142201

Yea, because it's after checkpointer has changed the state to "shutdowned". I
think we could add additional states, to be set by postmaster, instead of
checkpointer, for this purpose.


> As to the need to terminate a process, which is supposedly hanging around,
> I think, this situation doesn't differ in general from what we have after
> kill -9...

So? Making it more likely for postgres failing to restart successfully,
because the pid has been reused, is bad.

Greetings,

Andres Freund




Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 17:26:19 -0600, Tristan Partin wrote:
> On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:
> > Hi,
> > 
> > On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:
> > > What is keeping us from using pthread_sigmask(3) instead of 
> > > sigprocmask(2)?
> > 
> > We would need to make sure to compile with threading support everywhere. One
> > issue is that on some platforms things get slower once you actually start
> > using pthreads.
> 
> What are examples of these reduced performance platforms?

With some libc, including IIRC glibc, FILE* style io starts to use locking,
for example. Which we likely shouldn't use as heavily as we do, but we
currently do use it for things like COPY.


> From reading the meson.build files, it seems like building with threading
> enabled is the future, so should we just rip the band-aid off for 17?

Building with -pthreads and using threads are separate things...


> > > If an extension can guarantee that threads that get launched by it don't
> > > interact with anything Postgres-related, would that be enough to protect
> > > from any fork(2) related issues?
> > 
> > A fork() while threads are running is undefined behavior IIRC, and undefined
> > behavior isn't limited to a single thread. You'd definitely need to use
> > pthread_sigprocmask etc to address that aspect alone.
> 
> If you can find a resource that explains the UB, I would be very interested
> to read that. I found a SO[0] answer that made it seem like this actually
> wasn't the case.

I think there are safe ways to do it, but I don't think we currently reliably
do so. It certainly wouldn't be well defined to have a thread created in
postmaster, before backends are forked off ("the child process may only
execute async-signal-safe operations until such time as one of the exec
functions is called").

Greetings,

Andres Freund




Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 4:06 AM, Hayato Kuroda (Fujitsu) wrote:
> I analyzed and found a reason. This is because publications are invisible for 
> some transactions.
> 
> As the first place, below operations were executed in this case.
> Tuples were inserted after getting consistent_lsn, but before starting the 
> standby.
> After doing the workload, I confirmed again that the publication was created.
> 
> 1. on primary, logical replication slots were created.
> 2. on primary, another replication slot was created.
> 3. ===on primary, some tuples were inserted. ===
> 4. on standby, a server process was started
> 5. on standby, the process waited until all changes have come.
> 6. on primary, publications were created.
> 7. on standby, subscriptions were created.
> 8. on standby, a replication progress for each subscriptions was set to given 
> LSN (got at step2).
> =pg_subscriber finished here=
> 9. on standby, a server process was started again
> 10. on standby, subscriptions were enabled. They referred slots created at 
> step1.
> 11. on primary, decoding was started but ERROR was raised.

Good catch! It is a design flaw.

> In this case, tuples were inserted *before creating publication*.
> So I thought that the decoded transaction could not see the publication 
> because
> it was committed after insertions.
> 
> One solution is to create a publication before creating a consistent slot.
> Changes which came before creating the slot were surely replicated to the 
> standby,
> so upcoming transactions can see the object. We are planning to patch set to 
> fix
> the issue in this approach.

I'll include a similar code in the next patch and also explain why we should
create the publication earlier. (I'm renaming
create_all_logical_replication_slots to setup_publisher and calling
create_publication from there and also adding the proposed GUC checks in it.)


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: speed up a logical replica setup

2024-01-23 Thread Euler Taveira
On Mon, Jan 22, 2024, at 6:22 AM, Amit Kapila wrote:
> On Mon, Jan 22, 2024 at 2:38 PM Hayato Kuroda (Fujitsu)
>  wrote:
> > >
> > > > Yet other options could be
> > > > pg_buildsubscriber, pg_makesubscriber as 'build' or 'make' in the name
> > > > sounds like we are doing some work to create the subscriber which I
> > > > think is the case here.
> > >
> > > I see your point here.  pg_createsubscriber is not like createuser in
> > > that it just runs an SQL command.  It does something different than
> > > CREATE SUBSCRIBER.
> 
> Right.

Subscriber has a different meaning of subscription. Subscription is an SQL
object. Subscriber is the server (node in replication terminology) where the
subscription resides. Having said that pg_createsubscriber doesn't seem a bad
name because you are creating a new subscriber. (Indeed, you are transforming /
converting but "create" seems closer and users can infer that it is a tool to
build a new logical replica.

>   So a different verb would make that clearer.  Maybe
> > > something from here: https://www.thesaurus.com/browse/convert
> >
> > I read the link and found a good verb "switch". So, how about using 
> > "pg_switchsubscriber"?
> >
> 
> I also initially thought on these lines and came up with a name like
> pg_convertsubscriber but didn't feel strongly about it as that would
> have sounded meaningful if we use a name like
> pg_convertstandbytosubscriber. Now, that has become too long. Having
> said that, I am not opposed to it having a name on those lines. BTW,
> another option that occurred to me today is pg_preparesubscriber. We
> internally create slots and then wait for wal, etc. which makes me
> sound like adding 'prepare' in the name can also explain the purpose.

I think "convert" and "transform" fit for this case. However, "create",
"convert" and "transform" have 6, 7 and 9 characters,  respectively. I suggest
that we avoid long names (subscriber already has 10 characters). My preference
is pg_createsubscriber.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Support TZ format code in to_timestamp()

2024-01-23 Thread Daniel Gustafsson
> On 23 Jan 2024, at 23:33, Tom Lane  wrote:

> Anyway, v2-0001 below is the previous patch rebased up to current
> (only line numbers change), and then v2-0002 responds to your
> and Daniel's review comments.

LGTM.

--
Daniel Gustafsson





Re: [PATCH] Add native windows on arm64 support

2024-01-23 Thread Michael Paquier
On Tue, Jan 23, 2024 at 04:13:05PM -0500, Dave Cramer wrote:
> On Tue, 23 Jan 2024 at 08:46, Dave Cramer  wrote:
>> The attached patch works with v17. I will work on getting a buildfarm
>> animal up shortly

Thanks for doing that!

> I can build using meson manually, but for some reason building with the
> buildfarm fails.
> 
> I'm not sure which files to attach to this to help. Andrew, what files do
> you need ?

Probably build.ninja and the contents of meson-logs/ would offer
enough information?
--
Michael


signature.asc
Description: PGP signature


Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Tristan Partin

On Tue Jan 23, 2024 at 4:23 PM CST, Andres Freund wrote:

Hi,

On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:
> What is keeping us from using pthread_sigmask(3) instead of sigprocmask(2)?

We would need to make sure to compile with threading support everywhere. One
issue is that on some platforms things get slower once you actually start
using pthreads.


What are examples of these reduced performance platforms?

From reading the meson.build files, it seems like building with 
threading enabled is the future, so should we just rip the band-aid off 
for 17?



> If an extension can guarantee that threads that get launched by it don't
> interact with anything Postgres-related, would that be enough to protect
> from any fork(2) related issues?

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.


If you can find a resource that explains the UB, I would be very 
interested to read that. I found a SO[0] answer that made it seem like 
this actually wasn't the case.


[0]: https://stackoverflow.com/a/42679479/7572728

--
Tristan Partin
Neon (https://neon.tech)




Re: Support TZ format code in to_timestamp()

2024-01-23 Thread Tom Lane
Aleksander Alekseev  writes:
> I reviewed the patch and tested it on MacOS and generally concur with
> stated above. The only nitpick I have is the apparent lack of negative
> tests for to_timestamp(), e.g. when the string doesn't match the
> specified format.

That's an excellent suggestion indeed, because when I tried

SELECT to_timestamp('2011-12-18 11:38 JUNK', '-MM-DD HH12:MI TZ');  -- error

I got

ERROR:  invalid value "JU" for "TZ"
DETAIL:  Value must be an integer.

which seems pretty off-point.  In the attached I made it give an
error message about a bad zone abbreviation if the input starts
with a letter, but perhaps the dividing line between "probably
meant as a zone name" and "probably meant as numeric" should be
drawn differently?

Anyway, v2-0001 below is the previous patch rebased up to current
(only line numbers change), and then v2-0002 responds to your
and Daniel's review comments.

regards, tom lane

From 8361441cda3343e0f9286bb3b51ccd3041b46807 Mon Sep 17 00:00:00 2001
From: Tom Lane 
Date: Tue, 23 Jan 2024 16:58:19 -0500
Subject: [PATCH v2 1/2] Support timezone abbreviations in to_timestamp().

This patch allows the TZ format code to be used in to_timestamp().
It will accept the same possibilities that to_char() can produce
with the TZ format code, namely a zone abbreviation or a numeric
zone offset in the form [+-]HH or [+-]HH:MM.

While at it we may as well implement the OF format code too,
since it corresponds exactly to the numeric zone offset case
and indeed can share code.

A conceivable extension to this would be to accept timezone names
not just abbreviations.  However, to_char() never outputs those,
and there'd be a pretty serious garbage-input hazard because
pg_tzset() will accept just about anything as a POSIX zone name.

A useful side effect is that jsonpath's datetime() method will
now accept the common-in-JSON format "-mm-ddThh:mm:ssZ",
correctly interpreting the "Z" as signifying UTC time.  We can
reduce the number of format patterns that executeDateTimeMethod
has to try, too.
---
 doc/src/sgml/func.sgml   |  10 +-
 src/backend/utils/adt/datetime.c |  76 ++
 src/backend/utils/adt/formatting.c   | 151 ---
 src/backend/utils/adt/jsonpath_exec.c|  18 +--
 src/include/utils/datetime.h |   3 +
 src/test/regress/expected/horology.out   |  70 -
 src/test/regress/expected/jsonb_jsonpath.out |  12 ++
 src/test/regress/sql/horology.sql|  18 ++-
 src/test/regress/sql/jsonb_jsonpath.sql  |   2 +
 9 files changed, 287 insertions(+), 73 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 210c7c0b02..ad965432f3 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -8131,13 +8131,11 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');


 TZ
-upper case time-zone abbreviation
- (only supported in to_char)
+upper case time-zone abbreviation


 tz
-lower case time-zone abbreviation
- (only supported in to_char)
+lower case time-zone abbreviation


TZH
@@ -8149,8 +8147,8 @@ SELECT regexp_match('abc01234xyz', '(?:(.*?)(\d+)(.*)){1,1}');


 OF
-time-zone offset from UTC
- (only supported in to_char)
+time-zone offset from UTC (HH
+ or HH:MM)

   
  
diff --git a/src/backend/utils/adt/datetime.c b/src/backend/utils/adt/datetime.c
index 17b0248bf7..cccabb0c2a 100644
--- a/src/backend/utils/adt/datetime.c
+++ b/src/backend/utils/adt/datetime.c
@@ -3246,6 +3246,82 @@ DecodeTimezoneNameToTz(const char *tzname)
 	return result;
 }
 
+/* DecodeTimezoneAbbrevPrefix()
+ * Interpret prefix of string as a timezone abbreviation, if possible.
+ *
+ * This has roughly the same functionality as DecodeTimezoneAbbrev(),
+ * but the API is adapted to the needs of formatting.c.  Notably,
+ * we will match the longest possible prefix of the given string
+ * rather than insisting on a complete match, and downcasing is applied
+ * here rather than in the caller.
+ *
+ * Returns the length of the timezone abbreviation, or -1 if not recognized.
+ * On success, sets *offset to the GMT offset for the abbreviation if it
+ * is a fixed-offset abbreviation, or sets *tz to the pg_tz struct for
+ * a dynamic abbreviation.
+ */
+int
+DecodeTimezoneAbbrevPrefix(const char *str, int *offset, pg_tz **tz)
+{
+	char		lowtoken[TOKMAXLEN + 1];
+	int			len;
+
+	*offset = 0;/* avoid uninitialized vars on failure */
+	*tz = NULL;
+
+	if (!zoneabbrevtbl)
+		return -1;/* no abbrevs known, so fail immediately */
+
+	/* Downcase as much of the string as we could need */
+	for (len = 0; len < TOKMAXLEN; len++)
+	{
+		if (*str == '\0' || !isalpha((unsigned char) *str))
+			break;
+		lowtoken[len] = pg_tolower((unsigned char) *str++);
+	}
+	lo

Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 15:50:11 -0600, Tristan Partin wrote:
> What is keeping us from using pthread_sigmask(3) instead of sigprocmask(2)?

We would need to make sure to compile with threading support everywhere. One
issue is that on some platforms things get slower once you actually start
using pthreads.


> If an extension can guarantee that threads that get launched by it don't
> interact with anything Postgres-related, would that be enough to protect
> from any fork(2) related issues?

A fork() while threads are running is undefined behavior IIRC, and undefined
behavior isn't limited to a single thread. You'd definitely need to use
pthread_sigprocmask etc to address that aspect alone.

Greetings,

Andres Freund




Re: On login trigger: take three

2024-01-23 Thread Alexander Korotkov
On Tue, Jan 23, 2024 at 8:00 PM Alexander Lakhin  wrote:
> Please look at the following query, which triggers an assertion failure on
> updating the field dathasloginevt for an entry in pg_database:
> SELECT format('CREATE DATABASE db1 LOCALE_PROVIDER icu ICU_LOCALE en ENCODING 
> utf8
> ICU_RULES ''' || repeat(' ', 20) || ''' TEMPLATE template0;')
> \gexec
> \c db1 -
>
> CREATE FUNCTION on_login_proc() RETURNS event_trigger AS $$
> BEGIN
>RAISE NOTICE 'You are welcome!';
> END;
> $$ LANGUAGE plpgsql;
>
> CREATE EVENT TRIGGER on_login_trigger ON login EXECUTE PROCEDURE 
> on_login_proc();
> DROP EVENT TRIGGER on_login_trigger;
>
> \c
>
> \connect: connection to server on socket "/tmp/.s.PGSQL.5432" failed: server 
> closed the connection unexpectedly
>
> The stack trace of the assertion failure is:
> ...
> #5  0x55c8699b9b8d in ExceptionalCondition (
>  conditionName=conditionName@entry=0x55c869a1f7c0 
> "HaveRegisteredOrActiveSnapshot()",
>  fileName=fileName@entry=0x55c869a1f4c6 "toast_internals.c", 
> lineNumber=lineNumber@entry=669) at assert.c:66
> #6  0x55c86945df0a in init_toast_snapshot (...) at toast_internals.c:669
> #7  0x55c86945dfbe in toast_delete_datum (...) at toast_internals.c:429
> #8  0x55c8694fd1da in toast_tuple_cleanup (...) at toast_helper.c:309
> #9  0x55c8694b55a1 in heap_toast_insert_or_update (...) at heaptoast.c:333
> #10 0x55c8694a8c6c in heap_update (... at heapam.c:3604
> #11 0x55c8694a96cb in simple_heap_update (...) at heapam.c:4062
> #12 0x55c869555b7b in CatalogTupleUpdate (...) at indexing.c:322
> #13 0x55c8695f0725 in EventTriggerOnLogin () at event_trigger.c:957
> ...
>
> Funnily enough, when Tom Lane was wondering, whether pg_database's toast
> table could pose a risk [1], I came to the conclusion that it's impossible,
> but that was before the login triggers introduction...
>
> [1] https://www.postgresql.org/message-id/1284094.1695479962%40sss.pgh.pa.us

Thank you for reporting.  I'm looking into this...
I wonder if there is a way to avoid toast update given that we don't
touch the toasted field here.

--
Regards,
Alexander Korotkov




Re: Multiple startup messages over the same connection

2024-01-23 Thread Vladimir Churyukin
On Mon, Jan 22, 2024 at 11:43 PM Heikki Linnakangas  wrote:

> On 22/01/2024 21:58, Vladimir Churyukin wrote:
> > A question about protocol design - would it be possible to extend the
> > protocol, so it can handle multiple startup / authentication messages
> > over a single connection? Are there any serious obstacles? (possible
> > issues with re-initialization of backends, I guess?)
> > If that is possible, it could improve one important edge case - where
> > you have to talk to multiple databases on a single host currently, you
> > need to open a separate connection to each of them. In some cases
> > (multitenancy for example), you may have thousands of databases on a
> > host, which leads to inefficient connection utilization on clients (on
> > the db side too). A lot of other RDBMSes  don't have this limitation.
>
> The protocol and the startup message are the least of your problems.
> Yeah, it would be nice if you could switch between databases, but the
> assumption that one backend operates on one database is pretty deeply
> ingrained in the code.


Yes, I suspected that's the reason why it was not implemented so far,
but what's the main problem there?
Is the issue with the global data cleanup / re-initialization after the
database is changed?
Is it in 3rd party extensions that assume the same and may break?
Anything else?

-Vladimir Churyukin


Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Tristan Partin

On Tue Jan 23, 2024 at 2:10 PM CST, Tristan Partin wrote:

On Tue Jan 23, 2024 at 1:47 PM CST, Andres Freund wrote:
> Hi,
> On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:
> > These checks are not effective for what they are trying to prevent. A recent
> > commit[0] in libcurl when used on macOS has been tripping the
> > pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
> > entries which use libcurl (like Neon). Under the hood, libcurl calls
> > SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.
>
> Maybe I'm missing something, but isn't that indicating the exact opposite,
> namely that the check is precisely doing what it's intended?

The logic in the original commit seems sound:

> On Darwin, detect and report a multithreaded postmaster.
>
> Darwin --enable-nls builds use a substitute setlocale() that may start a
> thread.  Buildfarm member orangutan experienced BackendList corruption
> on account of different postmaster threads executing signal handlers
> simultaneously.  Furthermore, a multithreaded postmaster risks undefined
> behavior from sigprocmask() and fork().  Emit LOG messages about the
> problem and its workaround.  Back-patch to 9.0 (all supported versions).

Some reading from signal(7):

> A process-directed signal may be delivered to any one of the threads 
> that does not currently have the signal blocked. If more than one of 
> the threads has the signal unblocked, then the kernel chooses an 
> arbitrary thread to which to deliver the signal.


So it sounds like the commit is trying to protect from the last 
sentence.


And then forks only copy from their parent thread...


What is keeping us from using pthread_sigmask(3) instead of 
sigprocmask(2)? If an extension can guarantee that threads that get 
launched by it don't interact with anything Postgres-related, would that 
be enough to protect from any fork(2) related issues? In the OP example, 
is there any harm in the thread that libcurl inadvertantly launches from 
a Postgres perspective?


--
Tristan Partin
Neon (https://neon.tech)




Re: [PATCH] Add native windows on arm64 support

2024-01-23 Thread Dave Cramer
On Tue, 23 Jan 2024 at 08:46, Dave Cramer  wrote:

>
>
> On Tue, 19 Sept 2023 at 23:48, Michael Paquier 
> wrote:
>
>> On Tue, Sep 19, 2023 at 01:35:17PM +0100, Anthony Roberts wrote:
>> > Was there an explicit request for something there? I was under the
>> > impression that this was all just suggestion/theory at the moment.
>>
>> Yes.  The addition of a buildfarm member registered into the community
>> facilities, so as it is possible to provide back to developers dynamic
>> feedback to the new configuration setup you would like to see
>> supported in PostgreSQL.  This has been mentioned a few times on this
>> thread, around these places:
>>
>> https://www.postgresql.org/message-id/20220322103011.i6z2tuj4hle23wgx@jrouhaud
>>
>> https://www.postgresql.org/message-id/dbd80715-e56b-40eb-84aa-ace70198e...@yesql.se
>>
>> https://www.postgresql.org/message-id/6d1e2719-fa49-42a5-a6ff-0b0072bf6...@yesql.se
>> --
>> Michael
>>
>
> The attached patch works with v17. I will work on getting a buildfarm
> animal up shortly
>
>
I can build using meson manually, but for some reason building with the
buildfarm fails.

I'm not sure which files to attach to this to help. Andrew, what files do
you need ?

Dave

>
>


Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

2024-01-23 Thread David Rowley
On Wed, 24 Jan 2024 at 08:15, Alvaro Herrera  wrote:
> (Similarly, allowing GROUP BY to ignore columns not in the GROUP BY,
> when a UNIQUE constraint exists and all columns are NOT NULL; currently
> we allow that for PRIMARY KEY, but if you have the NOT NULL constraint
> OIDs to cue the plan invalidation would let that case to be implemented
> as well.)

I recall some discussion about the GROUP BY case. I think at the time
there might have been some confusion with plan cache invalidation and
invalidating views that have been created with columns in the target
list which are functionally dependent on columns in the GROUP BY.

i.e, given:

create table ab (a int primary key, b int not null unique);

the following works:

create view v_ab1 as select a,b from ab group by a; -- works

but this one does not:

create view v_ab2 as select a,b from ab group by b; -- does not work
ERROR:  column "ab.a" must appear in the GROUP BY clause or be used in
an aggregate function
LINE 1: create view v_ab2 as select a,b from ab group by b;

I think thanks to your work on adding pg_constraint records for NOT
NULL conditions, the latter case could now be made to work.

As for the plan optimisation, I agree with Tom about the relcache
invalidation triggering a replan.  Maybe it's worth adding a test to
ensure the replan is done after a ALTER TABLE ... DROP NOT NULL,
however.

David




Re: logical decoding and replication of sequences, take 2

2024-01-23 Thread Robert Haas
On Thu, Jan 11, 2024 at 11:27 AM Tomas Vondra
 wrote:
> 1) desirability: We want a built-in way to handle sequences in logical
> replication. I think everyone agrees this is not a way to do distributed
> sequences in an active-active setups, but that there are other use cases
> that need this feature - typically upgrades / logical failover.

Yeah. I find it extremely hard to take seriously the idea that this
isn't a valuable feature. How else are you supposed to do a logical
failover without having your entire application break?

> 2) performance: There was concern about the performance impact, and that
> it affects everyone, including those who don't replicate sequences (as
> the overhead is mostly incurred before calls to output plugin etc.).
>
> The agreement was that the best way is to have a CREATE SUBSCRIPTION
> option that would instruct the upstream to decode sequences. By default
> this option is 'off' (because that's the no-overhead case), but it can
> be enabled for each subscription.

Seems reasonable, at least unless and until we come up with something better.

> 3) correctness: The last point is about making "transactional" flag
> correct when the snapshot state changes mid-transaction, originally
> pointed out by Dilip [4]. Per [5] this however happens to work
> correctly, because while we identify the change as 'non-transactional'
> (which is incorrect), we immediately throw it again (so we don't try to
> apply it, which would error-out).

I've said this before, but I still find this really scary. It's
unclear to me that we can simply classify updates as transactional or
non-transactional and expect things to work. If it's possible, I hope
we have a really good explanation somewhere of how and why it's
possible. If we do, can somebody point me to it so I can read it?

To be possibly slightly more clear about my concern, I think the scary
case is where we have transactional and non-transactional things
happening to the same sequence in close temporal proximity, either
within the same session or across two or more sessions.  If a
non-transactional change can get reordered ahead of some transactional
change upon which it logically depends, or behind some transactional
change that logically depends on it, then we have trouble. I also
wonder if there are any cases where the same operation is partly
transactional and partly non-transactional.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re:Bug report and fix about building historic snapshot

2024-01-23 Thread cca5507
This patch may be better, which only track catalog modified transactions.


--
Regards,
ChangAo Chen
-- Original --
From:   
 "cca5507"  
  

0001-Track-transactions-committed-in-BUILDING_SNAPSHOT.patch
Description: Binary data


Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Nathan Bossart
On Tue, Jan 23, 2024 at 06:33:25PM +0100, Alvaro Herrera wrote:
> On 2024-Jan-22, Nathan Bossart wrote:
> 
>> Here is a patch.
> 
> Looks reasonable.

Committed.  Thank you for the report and the reviews.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Optimizing nbtree ScalarArrayOp execution, allowing multi-column ordered scans, skip scan

2024-01-23 Thread Peter Geoghegan
On Mon, Jan 22, 2024 at 4:13 PM Matthias van de Meent
 wrote:
> On Fri, 19 Jan 2024 at 23:42, Peter Geoghegan  wrote:
> And this is where I disagree with your (percieved) implicit argument
> that this should be and always stay this way.

I never said that, and didn't intend to imply it. As I outlined to you
back in November, my general philosophy around APIs (such as the index
AM API) is that ambiguity about the limits and extent of an abstract
interface isn't necessarily a bad thing [1]. It can actually be a good
thing! Ever hear of Hyrum's Law? Abstractions are very often quite
leaky.

APIs like the index AM API inevitably make trade-offs. Trade-offs are
almost always political, in one way or another. Litigating every
possible question up-front requires knowing ~everything before you
really get started. This mostly ends up being a waste of time, since
many theoretical contentious trade-offs just won't matter one bit in
the long run, for one reason or another (not necessarily because there
is only ever one consumer of an API, for all sorts of reasons).

You don't have to agree with me. That's just what my experience
indicates works best on average, and in the long run. I cannot justify
it any further than that.

> By keeping that expectation alive,
> this becomes a self-fulfilling prophecy, and I really don't like such
> restrictive self-fulfilling prophecies. It's nice that we have index
> AM feature flags, but if we only effectively allow one implementation
> for this set of flags by ignoring potential other users when changing
> behavior, then what is the API good for (apart from abstraction
> layering, which is nice)?

I explicitly made it clear that I don't mean that.

> I think that may be right, but I could very well have built a
> btree-lite that doesn't have the historical baggage of having to deal
> with pages from before v12 (version 4) and some improvements that
> haven't made it to core yet.

Let me know if you ever do that. Let me know what the problems you
encounter are. I'm quite happy to revise my position on this in light
of new information. I change my mind all the time!

> Something like the following, maybe?
>
> """
> Compatibility: The planner will now generate paths with array scan
> keys in any column for ordered indexes, rather than on only a prefix
> of the index columns. The planner still expects fully ordered data
> from those indexes.
> Historically, the btree AM couldn't output correctly ordered data for
> suffix array scan keys, which was "fixed" by prohibiting the planner
> from generating array scan keys without an equality prefix scan key up
> to that attribute. In this commit, the issue has been fixed, and the
> planner restriction has thus been removed as the only in-core IndexAM
> that reports amcanorder now supports array scan keys on any column
> regardless of what prefix scan keys it has.
> """

Even if I put something like this in the commit message, I doubt that
Bruce would pick it up in anything like this form (I have my doubts
about it happening no matter what wording is used, actually).

I could include something less verbose, mentioning a theoretical risk
to out-of-core amcanorder routines that coevolved with nbtree,
inherited the same SAOP limitations, and then never got the same set
of fixes.

> patch-a is a trivial and clean implementation of mergesort, which
> tends to reduce the total number of compare operations if both arrays
> are of similar size and value range. It writes data directly back into
> the main array on non-assertion builds, and with assertions it reuses
> your binary search join on scratch space for algorithm validation.

I'll think about this some more.

> patch-b is an improved version of patch-a with reduced time complexity
> in some cases: It applies exponential search to reduce work done where
> there are large runs of unmatched values in either array, and does
> that while keeping O(n+m) worst-case complexity, now getting a
> best-case O(log(n)) complexity.

This patch seems massively over-engineered, though. Over 200 new lines
of code, all for a case that is only used when queries are written
with obviously-contradictory quals? It's just not worth it.

> > The recursive call to _bt_advance_array_keys is needed to deal with
> > unsatisfied-and-required inequalities that were not detected by an
> > initial call to _bt_check_compare, following a second retry call to
> > _bt_check_compare. We know that this recursive call to
> > _bt_advance_array_keys won't cannot recurse again because we've
> > already identified which specific inequality scan key it is that's a
> > still-unsatisfied inequality, following an initial round of array key
> > advancement.
>
> So, it's a case of this?
>
> scankey: a > 1 AND a = ANY (1 (=current), 2, 3)) AND b < 4 AND b = ANY
> (2 (=current), 3, 4)
> tuple: a=2, b=4

I don't understand why your example starts with "scankey: a > 1" and
uses redundant/contradictory scan keys (for both "a" and "b"). For a
forw

Re: Parallelize correlated subqueries that execute within each worker

2024-01-23 Thread Robert Haas
On Tue, Jan 23, 2024 at 9:34 AM Akshat Jaimini  wrote:
> Hello!
> I was going through the previous conversations for this particular patch and 
> it seems that this patch failed some tests previously?
> Imo we should move it to the next CF so that the remaining issues can be 
> resolved accordingly.

So I guess the question here is whether this is thought to be ready
for serious review or whether it is still thought to need work. If the
latter, it should be marked RwF until that happens -- if the former,
then we should try to review it rather than letting it languish
forever.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: the s_lock_stuck on perform_spin_delay

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-22 15:18:35 +0800, Andy Fan wrote:
> I used sigismember(&BlockSig, SIGQUIT) to detect if a process is doing a
> quickdie, however this is bad not only because it doesn't work on
> Windows, but also it has too poor performance even it impacts on
> USE_ASSERT_CHECKING build only. In v8, I introduced a new global
> variable quickDieInProgress to handle this.

For reasons you already noted, using sigismember() isn't viable. But I am not
convinced by quickDieInProgress either. I think we could just reset the state
for the spinlock check in the code handling PANIC, perhaps via a helper
function in spin.c.


> +void
> +VerifyNoSpinLocksHeld(void)
> +{
> +#ifdef USE_ASSERT_CHECKING
> + if (last_spin_lock_file != NULL)
> + elog(PANIC, "A spin lock has been held at %s:%d",
> +  last_spin_lock_file, last_spin_lock_lineno);
> +#endif
> +}

I think the #ifdef for this needs to be in the header, not here. Otherwise we
add a pointless external function call to a bunch of performance critical
code.


> From f09518df76572adca85cba5008ea0cae5074603a Mon Sep 17 00:00:00 2001
> From: "yizhi.fzh" 
> Date: Fri, 19 Jan 2024 13:57:46 +0800
> Subject: [PATCH v8 2/3] Treat (un)LockBufHdr as a SpinLock.
> 
> The LockBufHdr also used init_local_spin_delay / perform_spin_delay
> infrastructure and so it is also possible that PANIC the system
> when it can't be acquired in a short time, and its code is pretty
> similar with s_lock. so treat it same as SPIN lock when regarding to
> misuse of spinlock detection.
> ---
>  src/backend/storage/buffer/bufmgr.c | 1 +
>  src/include/storage/buf_internals.h | 1 +
>  2 files changed, 2 insertions(+)
> 
> diff --git a/src/backend/storage/buffer/bufmgr.c 
> b/src/backend/storage/buffer/bufmgr.c
> index 7d601bef6d..c600a113cf 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -5409,6 +5409,7 @@ LockBufHdr(BufferDesc *desc)
>  
>   init_local_spin_delay(&delayStatus);
>  
> + START_SPIN_LOCK();
>   while (true)
>   {
>   /* set BM_LOCKED flag */

Seems pretty odd that we now need init_local_spin_delay() and
START_SPIN_LOCK(). Note that init_local_spin_delay() also wraps handling of
__FILE__, __LINE__ etc, so it seems we're duplicating state here.


Greetings,

Andres Freund




Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Tristan Partin

On Tue Jan 23, 2024 at 1:47 PM CST, Andres Freund wrote:

Hi,
On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:
> These checks are not effective for what they are trying to prevent. A recent
> commit[0] in libcurl when used on macOS has been tripping the
> pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
> entries which use libcurl (like Neon). Under the hood, libcurl calls
> SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.

Maybe I'm missing something, but isn't that indicating the exact opposite,
namely that the check is precisely doing what it's intended?


The logic in the original commit seems sound:


On Darwin, detect and report a multithreaded postmaster.

Darwin --enable-nls builds use a substitute setlocale() that may start a
thread.  Buildfarm member orangutan experienced BackendList corruption
on account of different postmaster threads executing signal handlers
simultaneously.  Furthermore, a multithreaded postmaster risks undefined
behavior from sigprocmask() and fork().  Emit LOG messages about the
problem and its workaround.  Back-patch to 9.0 (all supported versions).


Some reading from signal(7):

A process-directed signal may be delivered to any one of the threads 
that does not currently have the signal blocked. If more than one of 
the threads has the signal unblocked, then the kernel chooses an 
arbitrary thread to which to deliver the signal.


So it sounds like the commit is trying to protect from the last 
sentence.


And then forks only copy from their parent thread...

Please ignore this thread. I need to think of a better solution to this 
problem.


--
Tristan Partin
Neon (https://neon.tech)




Re: the s_lock_stuck on perform_spin_delay

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-18 14:00:58 -0500, Robert Haas wrote:
> > The LockBufHdr also used init_local_spin_delay / perform_spin_delay
> > infrastruce and then it has the same issue like ${subject}, it is pretty
> > like the code in s_lock; Based on my current knowledge, I think we
> > should add the check there.
> 
> I'd like to hear from Andres, if possible. @Andres: Should these
> sanity checks apply only to spin locks per se, or also to buffer
> header locks?

They also should apply to buffer header locks. The exact same dangers apply
there. The only reason this isn't using a plain spinlock is that this way we
can modify more state with a single atomic operation. But all the dangers of
using spinlocks apply just as well.

Greetings,

Andres Freund




Re: Refactoring backend fork+exec code

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 21:07:08 +0200, Heikki Linnakangas wrote:
> On 22/01/2024 23:07, Andres Freund wrote:
> > > diff --git a/src/backend/utils/activity/backend_status.c 
> > > b/src/backend/utils/activity/backend_status.c
> > > index 1a1050c8da1..92f24db4e18 100644
> > > --- a/src/backend/utils/activity/backend_status.c
> > > +++ b/src/backend/utils/activity/backend_status.c
> > > @@ -257,17 +257,16 @@ pgstat_beinit(void)
> > >   else
> > >   {
> > >   /* Must be an auxiliary process */
> > > - Assert(MyAuxProcType != NotAnAuxProcess);
> > > + Assert(IsAuxProcess(MyBackendType));
> > >   /*
> > >* Assign the MyBEEntry for an auxiliary process.  
> > > Since it doesn't
> > >* have a BackendId, the slot is statically allocated 
> > > based on the
> > > -  * auxiliary process type (MyAuxProcType).  Backends use slots 
> > > indexed
> > > -  * in the range from 0 to MaxBackends (exclusive), so we use
> > > -  * MaxBackends + AuxProcType as the index of the slot for an 
> > > auxiliary
> > > -  * process.
> > > +  * auxiliary process type.  Backends use slots indexed in the 
> > > range
> > > +  * from 0 to MaxBackends (exclusive), and aux processes use the 
> > > slots
> > > +  * after that.
> > >*/
> > > - MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
> > > + MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - 
> > > FIRST_AUX_PROC];
> > >   }
> > 
> > Hm, this seems less than pretty. It's probably ok for now, but it seems 
> > like a
> > better fix might be to just start assigning backend ids to aux procs or 
> > switch
> > to indexing by pgprocno?
> 
> Using pgprocno is a good idea. Come to think of it, why do we even have a
> concept of backend ID that's separate from pgprocno? backend ID is used to
> index the ProcState array, but AFAICS we could use pgprocno as the index to
> that, too.

I think we should do that. There are a few processes not participating in
sinval, but it doesn't make enough of a difference to make sinval slower. And
I think there'd be far bigger efficiency improvements to sinvaladt than not
having a handful more entries.

Greetings,

Andres Freund




Re: Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Andres Freund
Hi,
On 2024-01-23 13:20:15 -0600, Tristan Partin wrote:
> These checks are not effective for what they are trying to prevent. A recent
> commit[0] in libcurl when used on macOS has been tripping the
> pthread_is_threaded_np() check in postmaster.c for shared_preload_libraries
> entries which use libcurl (like Neon). Under the hood, libcurl calls
> SCDynamicStoreCopyProxies[1], which apparently causes the check to fail.

Maybe I'm missing something, but isn't that indicating the exact opposite,
namely that the check is precisely doing what it's intended?

Greetings,

Andres Freund




Re: Build versionless .so for Android

2024-01-23 Thread Peter Eisentraut

On 19.01.24 18:12, Matthias Kuhn wrote:

Thanks,

This patch brought it further and indeed, the resulting libpq.so file is 
unversioned when built for Android while it's versioned when built for 
linux.


Ok, I have committed all of this now:

- Fix for correct order of host_system and portname assignment in 
meson.build.


- Patch from Andres to map android to linux in meson.build.

- Makefile.shlib support for Android (based on your patch, but I 
reworked it a bit).


So this should all work for you now.





Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

2024-01-23 Thread Tom Lane
Alvaro Herrera  writes:
> On 2024-Jan-23, David Rowley wrote:
>> Until now PostgreSQL has not been very smart about optimizing away IS
>> NOT NULL base quals on columns defined as NOT NULL.

> Hmm, what happens if a NOT NULL constraint is dropped and you have such
> a plan in plancache?  As I recall, lack of a mechanism to invalidate
> such plans was the main reason for Postgres not to have this.

IIRC, we realized that that concern was bogus.  Removal of such
constraints would cause pg_attribute.attnotnull to change, leading
to a relcache invalidation on the table, forcing replan.  If anyone
tried to get rid of attnotnull or make it incompletely reliable,
then we'd have problems; but AFAIK that's not being contemplated.

regards, tom lane




Re: Shared detoast Datum proposal

2024-01-23 Thread Andy Fan

Michael Zhilin  writes:

> Hi Andy,
>
> It looks like v5 is missing in your mail. Could you please check and resend 
> it?

ha, yes.. v5 is really attached this time.

commit eee0b2058f912d0d56282711c5d88bc0b1b75c2f (HEAD -> 
shared_detoast_value_v3)
Author: yizhi.fzh 
Date:   Tue Jan 23 13:38:34 2024 +0800

shared detoast feature.

details at https://postgr.es/m/87il4jrk1l.fsf%40163.com

commit eeca405f5ae87e7d4e5496de989ac7b5173bcaa9
Author: yizhi.fzh 
Date:   Mon Jan 22 15:48:33 2024 +0800

Introduce a Bitset data struct.

While Bitmapset is designed for variable-length of bits, Bitset is
designed for fixed-length of bits, the fixed length must be specified at
the bitset_init stage and keep unchanged at the whole lifespan. Because
of this, some operations on Bitset is simpler than Bitmapset.

The bitset_clear unsets all the bits but kept the allocated memory, this
capacity is impossible for bit Bitmapset for some solid reasons and this
is the main reason to add this data struct.

Also for performance aspect, the functions for Bitset removed some
unlikely checks, instead with some Asserts.

[1] 
https://postgr.es/m/CAApHDvpdp9LyAoMXvS7iCX-t3VonQM3fTWCmhconEvORrQ%2BZYA%40mail.gmail.com
[2] https://postgr.es/m/875xzqxbv5.fsf%40163.com


As for the commit "Introduce a Bitset data struct.", the test coverage
is 100% now. So it would be great that people can review this first. 

-- 
Best Regards
Andy Fan

>From eeca405f5ae87e7d4e5496de989ac7b5173bcaa9 Mon Sep 17 00:00:00 2001
From: "yizhi.fzh" 
Date: Mon, 22 Jan 2024 15:48:33 +0800
Subject: [PATCH v5 1/2] Introduce a Bitset data struct.

While Bitmapset is designed for variable-length of bits, Bitset is
designed for fixed-length of bits, the fixed length must be specified at
the bitset_init stage and keep unchanged at the whole lifespan. Because
of this, some operations on Bitset is simpler than Bitmapset.

The bitset_clear unsets all the bits but kept the allocated memory, this
capacity is impossible for bit Bitmapset for some solid reasons and this
is the main reason to add this data struct.

Also for performance aspect, the functions for Bitset removed some
unlikely checks, instead with some Asserts.

[1] https://postgr.es/m/CAApHDvpdp9LyAoMXvS7iCX-t3VonQM3fTWCmhconEvORrQ%2BZYA%40mail.gmail.com
[2] https://postgr.es/m/875xzqxbv5.fsf%40163.com
---
 src/backend/nodes/bitmapset.c | 201 +-
 src/backend/nodes/outfuncs.c  |  51 +
 src/include/nodes/bitmapset.h |  28 +++
 src/include/nodes/nodes.h |   4 +
 src/test/modules/test_misc/Makefile   |  11 +
 src/test/modules/test_misc/README |   4 +-
 .../test_misc/expected/test_bitset.out|   7 +
 src/test/modules/test_misc/meson.build|  17 ++
 .../modules/test_misc/sql/test_bitset.sql |   3 +
 src/test/modules/test_misc/test_misc--1.0.sql |   5 +
 src/test/modules/test_misc/test_misc.c| 132 
 src/test/modules/test_misc/test_misc.control  |   4 +
 src/tools/pgindent/typedefs.list  |   1 +
 13 files changed, 456 insertions(+), 12 deletions(-)
 create mode 100644 src/test/modules/test_misc/expected/test_bitset.out
 create mode 100644 src/test/modules/test_misc/sql/test_bitset.sql
 create mode 100644 src/test/modules/test_misc/test_misc--1.0.sql
 create mode 100644 src/test/modules/test_misc/test_misc.c
 create mode 100644 src/test/modules/test_misc/test_misc.control

diff --git a/src/backend/nodes/bitmapset.c b/src/backend/nodes/bitmapset.c
index 65805d4527..eb38834e21 100644
--- a/src/backend/nodes/bitmapset.c
+++ b/src/backend/nodes/bitmapset.c
@@ -1315,23 +1315,18 @@ bms_join(Bitmapset *a, Bitmapset *b)
  * It makes no difference in simple loop usage, but complex iteration logic
  * might need such an ability.
  */
-int
-bms_next_member(const Bitmapset *a, int prevbit)
+
+static int
+bms_next_member_internal(int nwords, const bitmapword *words, int prevbit)
 {
-	int			nwords;
 	int			wordnum;
 	bitmapword	mask;
 
-	Assert(bms_is_valid_set(a));
-
-	if (a == NULL)
-		return -2;
-	nwords = a->nwords;
 	prevbit++;
 	mask = (~(bitmapword) 0) << BITNUM(prevbit);
 	for (wordnum = WORDNUM(prevbit); wordnum < nwords; wordnum++)
 	{
-		bitmapword	w = a->words[wordnum];
+		bitmapword	w = words[wordnum];
 
 		/* ignore bits before prevbit */
 		w &= mask;
@@ -1348,9 +1343,23 @@ bms_next_member(const Bitmapset *a, int prevbit)
 		/* in subsequent words, consider all bits */
 		mask = (~(bitmapword) 0);
 	}
+
 	return -2;
 }
 
+int
+bms_next_member(const Bitmapset *a, int prevbit)
+{
+	Assert(a == NULL || IsA(a, Bitmapset));
+
+	Assert(bms_is_valid_set(a));
+
+	if (a == NULL)
+		return -2;
+
+	return bms_next_member_internal(a->nwords, a->words, prevbit);
+}
+
 /*
  * bms_prev_member - find prev member of a set
  *
@@ -1458,3 +1467,177 @@ bitmap_match(const void *key1, const void *key2, Size keysize

Remove pthread_is_threaded_np() checks in postmaster

2024-01-23 Thread Tristan Partin
These checks are not effective for what they are trying to prevent. 
A recent commit[0] in libcurl when used on macOS has been tripping the 
pthread_is_threaded_np() check in postmaster.c for 
shared_preload_libraries entries which use libcurl (like Neon). Under 
the hood, libcurl calls SCDynamicStoreCopyProxies[1], which apparently 
causes the check to fail.


Attached is a patch to remove the check, and a minimal reproducer 
(curlexe.zip) for others to run on macOS.


Here are some logs:

Postgres working as expected:

$ LC_ALL="C" /opt/homebrew/opt/postgresql@16/bin/postgres -D 
/opt/homebrew/var/postgresql@16
2024-01-22 23:18:51.203 GMT [50388] LOG:  starting PostgreSQL 16.1 (Homebrew) 
on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 
(clang-1500.1.0.2.5), 64-bit
2024-01-22 23:18:51.204 GMT [50388] LOG:  listening on IPv6 address "::1", port 
5432
2024-01-22 23:18:51.204 GMT [50388] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2024-01-22 23:18:51.205 GMT [50388] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2024-01-22 23:18:51.207 GMT [50391] LOG:  database system was shut down at 
2023-12-21 23:12:10 GMT
2024-01-22 23:18:51.211 GMT [50388] LOG:  database system is ready to accept 
connections
^C2024-01-22 23:18:53.797 GMT [50388] LOG:  received fast shutdown request
2024-01-22 23:18:53.798 GMT [50388] LOG:  aborting any active transactions
2024-01-22 23:18:53.800 GMT [50388] LOG:  background worker "logical replication 
launcher" (PID 50394) exited with exit code 1
2024-01-22 23:18:53.801 GMT [50389] LOG:  shutting down
2024-01-22 23:18:53.801 GMT [50389] LOG:  checkpoint starting: shutdown 
immediate
2024-01-22 23:18:53.805 GMT [50389] LOG:  checkpoint complete: wrote 3 buffers 
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.002 s, sync=0.001 
s, total=0.005 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 
kB, estimate=0 kB; lsn=0/4BE77E0, redo lsn=0/4BE77E0
2024-01-22 23:18:53.809 GMT [50388] LOG:  database system is shut down


Postgres not working with attached extension preloaded:

$ echo shared_preload_libraries=curlexe >> 
/opt/homebrew/var/postgresql@16/postgresql.conf
$ LC_ALL="C" /opt/homebrew/opt/postgresql@16/bin/postgres -D 
/opt/homebrew/var/postgresql@16
2024-01-22 23:19:01.108 GMT [50395] LOG:  starting PostgreSQL 16.1 (Homebrew) 
on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 
(clang-1500.1.0.2.5), 64-bit
2024-01-22 23:19:01.110 GMT [50395] LOG:  listening on IPv6 address "::1", port 
5432
2024-01-22 23:19:01.110 GMT [50395] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2024-01-22 23:19:01.111 GMT [50395] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2024-01-22 23:19:01.113 GMT [50395] FATAL:  postmaster became multithreaded 
during startup
2024-01-22 23:19:01.113 GMT [50395] HINT:  Set the LC_ALL environment variable 
to a valid locale.
2024-01-22 23:19:01.114 GMT [50395] LOG:  database system is shut down


Same as previous, but without IPv6 support in libcurl:

$ DYLD_LIBRARY_PATH=/opt/homebrew/opt/curl-without-ipv6/lib LC_ALL="C" 
/opt/homebrew/opt/postgresql@16/bin/postgres -D /opt/homebrew/var/postgresql@16
2024-01-22 23:23:17.151 GMT [50546] LOG:  starting PostgreSQL 16.1 (Homebrew) 
on aarch64-apple-darwin23.2.0, compiled by Apple clang version 15.0.0 
(clang-1500.1.0.2.5), 64-bit
2024-01-22 23:23:17.152 GMT [50546] LOG:  listening on IPv6 address "::1", port 
5432
2024-01-22 23:23:17.152 GMT [50546] LOG:  listening on IPv4 address 
"127.0.0.1", port 5432
2024-01-22 23:23:17.152 GMT [50546] LOG:  listening on Unix socket 
"/tmp/.s.PGSQL.5432"
2024-01-22 23:23:17.155 GMT [50549] LOG:  database system was shut down at 
2024-01-22 23:23:10 GMT
2024-01-22 23:23:17.158 GMT [50546] LOG:  database system is ready to accept 
connections
^C2024-01-22 23:23:26.997 GMT [50546] LOG:  received fast shutdown request
2024-01-22 23:23:26.998 GMT [50546] LOG:  aborting any active transactions
2024-01-22 23:23:27.000 GMT [50546] LOG:  background worker "logical replication 
launcher" (PID 50552) exited with exit code 1
2024-01-22 23:23:27.000 GMT [50547] LOG:  shutting down
2024-01-22 23:23:27.001 GMT [50547] LOG:  checkpoint starting: shutdown 
immediate
2024-01-22 23:23:27.002 GMT [50547] LOG:  checkpoint complete: wrote 3 buffers 
(0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 
s, total=0.003 s; sync files=2, longest=0.001 s, average=0.001 s; distance=0 
kB, estimate=0 kB; lsn=0/4BE78D0, redo lsn=0/4BE78D0
2024-01-22 23:23:27.005 GMT [50546] LOG:  database system is shut down


[0]: 
https://github.com/curl/curl/commit/8b7cbe9decc205b08ec8258eb184c89a33e3084b
[1]: 
https://developer.apple.com/documentation/systemconfiguration/1517088-scdynamicstorecopyproxies

--
Tristan Partin
Neon (https://neon.tech)
<>
From 0a514e7a8ea2af21d98cbcc2e4da799745786155 Mon Sep 17 00:00:00 2001
From: Tristan Partin 
Date: Tue, 23 Jan 2024 13:14:33 -0600
Subject: [PATCH v1] Re

Re: pgsql: Add better handling of redundant IS [NOT] NULL quals

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-23, David Rowley wrote:

> Add better handling of redundant IS [NOT] NULL quals
> 
> Until now PostgreSQL has not been very smart about optimizing away IS
> NOT NULL base quals on columns defined as NOT NULL.

Hmm, what happens if a NOT NULL constraint is dropped and you have such
a plan in plancache?  As I recall, lack of a mechanism to invalidate
such plans was the main reason for Postgres not to have this.  One of
the motivations for adding catalogued NOT NULL constraints was precisely
to have an OID that you could use to cause plancache to invalidate such
a plan.  Does this new code add something like that?

Admittedly I didn't read the threads or the patch, just skimmed for some
clues, so I may have failed to notice it.  But in the tests you added I
don't see any ALTER TABLE DROP CONSTRAINT.


(Similarly, allowing GROUP BY to ignore columns not in the GROUP BY,
when a UNIQUE constraint exists and all columns are NOT NULL; currently
we allow that for PRIMARY KEY, but if you have the NOT NULL constraint
OIDs to cue the plan invalidation would let that case to be implemented
as well.)

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Por suerte hoy explotó el califont porque si no me habría muerto
 de aburrido"  (Papelucho)




Re: Refactoring backend fork+exec code

2024-01-23 Thread Heikki Linnakangas

On 22/01/2024 23:07, Andres Freund wrote:

On 2024-01-10 14:35:52 +0200, Heikki Linnakangas wrote:

@@ -5344,31 +5344,31 @@ StartChildProcess(AuxProcType type)
errno = save_errno;
switch (type)
{
-   case StartupProcess:
+   case B_STARTUP:
ereport(LOG,
(errmsg("could not fork startup 
process: %m")));
break;
-   case ArchiverProcess:
+   case B_ARCHIVER:
ereport(LOG,
(errmsg("could not fork archiver 
process: %m")));
break;
-   case BgWriterProcess:
+   case B_BG_WRITER:
ereport(LOG,
(errmsg("could not fork background 
writer process: %m")));
break;
-   case CheckpointerProcess:
+   case B_CHECKPOINTER:
ereport(LOG,
(errmsg("could not fork checkpointer 
process: %m")));
break;
-   case WalWriterProcess:
+   case B_WAL_WRITER:
ereport(LOG,
(errmsg("could not fork WAL writer 
process: %m")));
break;
-   case WalReceiverProcess:
+   case B_WAL_RECEIVER:
ereport(LOG,
(errmsg("could not fork WAL receiver 
process: %m")));
break;
-   case WalSummarizerProcess:
+   case B_WAL_SUMMARIZER:
ereport(LOG,
(errmsg("could not fork WAL 
summarizer process: %m")));
break;


Seems we should replace this with something slightly more generic one of these
days...


The later patches in this thread will turn these into

ereport(LOG,
(errmsg("could not fork %s process: %m", 
PostmasterChildName(type;



diff --git a/src/backend/utils/activity/backend_status.c 
b/src/backend/utils/activity/backend_status.c
index 1a1050c8da1..92f24db4e18 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -257,17 +257,16 @@ pgstat_beinit(void)
else
{
/* Must be an auxiliary process */
-   Assert(MyAuxProcType != NotAnAuxProcess);
+   Assert(IsAuxProcess(MyBackendType));
  
  		/*

 * Assign the MyBEEntry for an auxiliary process.  Since it 
doesn't
 * have a BackendId, the slot is statically allocated based on 
the
-* auxiliary process type (MyAuxProcType).  Backends use slots 
indexed
-* in the range from 0 to MaxBackends (exclusive), so we use
-* MaxBackends + AuxProcType as the index of the slot for an 
auxiliary
-* process.
+* auxiliary process type.  Backends use slots indexed in the 
range
+* from 0 to MaxBackends (exclusive), and aux processes use the 
slots
+* after that.
 */
-   MyBEEntry = &BackendStatusArray[MaxBackends + MyAuxProcType];
+   MyBEEntry = &BackendStatusArray[MaxBackends + MyBackendType - 
FIRST_AUX_PROC];
}


Hm, this seems less than pretty. It's probably ok for now, but it seems like a
better fix might be to just start assigning backend ids to aux procs or switch
to indexing by pgprocno?


Using pgprocno is a good idea. Come to think of it, why do we even have 
a concept of backend ID that's separate from pgprocno? backend ID is 
used to index the ProcState array, but AFAICS we could use pgprocno as 
the index to that, too.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Alexander Lakhin

23.01.2024 20:30, Andres Freund wrote:

I don't think that's viable and would cause more problems than it solves, it'd
make us think that we might have an old postgres process hanging around that
needs to be terminted before we can start up. And I simply don't see the point
- we already record whether we crashed in the control file, no?


With an Assert injected in walsender.c (as in [1]) and test
012_subtransactions.pl modified to finish just after the first
"$node_primary->stop;", I see:
pg_controldata -D 
src/test/recovery/tmp_check/t_012_subtransactions_primary_data/pgdata/
Database cluster state:   shut down

But the assertion undoubtedly failed:
grep TRAP src/test/recovery/tmp_check/log/*
src/test/recovery/tmp_check/log/012_subtransactions_primary.log:TRAP: failed Assert("0"), File: "walsender.c", Line: 
2688, PID: 142201


As to the need to terminate a process, which is supposedly hanging around,
I think, this situation doesn't differ in general from what we have after
kill -9...

So my point was to let 'pg_ctl stop' know about an error occurred during
the server stop.

[1] 
https://www.postgresql.org/message-id/290b9ae3-98a2-0896-a957-18d3b60b6260%40gmail.com

Best regards,
Alexander




Re: generate syscache info automatically

2024-01-23 Thread Peter Eisentraut

On 22.01.24 01:33, John Naylor wrote:

On Fri, Jan 19, 2024 at 9:03 PM Peter Eisentraut  wrote:


The DECLARE_* macros map to actual BKI commands ("declare toast",
"declare index").  So I wanted to use a different verb to distinguish
"generate code for this" from those BKI commands.


That makes sense to me.


I have committed the first patch, and the buildfarm hiccup seems to have 
passed.


I'll close this commitfest entry now.  I have enough feedback to work on 
the ObjectProperty generation, but I'll make a new entry for that.





Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Nathan Bossart
On Tue, Jan 23, 2024 at 12:22:58PM -0600, Nathan Bossart wrote:
> On Tue, Jan 23, 2024 at 06:33:25PM +0100, Alvaro Herrera wrote:
>> Does this actually detect a problem if you take out the fix?  I think
>> what's going to happen is that postmaster is going to crash, then do the
>> recovery cycle, then stop as instructed by the test; so pg_controldata
>> would report that it was correctly shut down.
> 
> Yes, the control data shows "in production" without it.  The segfault
> happens within the shut-down path, and the test logs indicate that the
> server continues shutting down without doing a recovery cycle:

I see that ->init turns off restart_after_crash, which might be why it's
not doing a recovery cycle.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: FEATURE REQUEST: Role vCPU limit/priority

2024-01-23 Thread Bruce Momjian
On Tue, Jan 23, 2024 at 10:10:27AM -0800, Andres Freund wrote:
> Hi,
> 
> On 2024-01-23 13:09:22 +0100, Thomas Kellerer wrote:
> > Yoni Sade schrieb am 21.01.2024 um 19:07:
> > > It would be useful to have the ability to define for a role default
> > > vCPU affinity limits/thread priority settings so that more active
> > > sessions could coexist similar to MySQL resource groups
> > > .
> > 
> > To a certain extent, you can achieve something like that using Linux cgroups
> > 
> > https://www.cybertec-postgresql.com/en/linux-cgroups-for-postgresql/
> 
> If you do that naively, you just run into priority inversion issues. E.g. a
> backend holding a critical lwlock not getting scheduled for a while because it
> exceeded it CPU allocation, preventing higher priority processes from
> progressing.
> 
> I doubt you can implement this in a robust manner outside of postgres.

FYI, here is an article about priority inversion:

https://www.geeksforgeeks.org/priority-inversion-what-the-heck/

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Nathan Bossart
On Tue, Jan 23, 2024 at 06:33:25PM +0100, Alvaro Herrera wrote:
> On 2024-Jan-22, Nathan Bossart wrote:
>> This might be a topic for another thread, but I do wonder whether we could
>> put a generic pg_controldata check in node->stop that would at least make
>> sure that the state is some sort of expected shut-down state.  My first
>> thought is that it could be a tad expensive, but... maybe it wouldn't be
>> too bad.
> 
> Does this actually detect a problem if you take out the fix?  I think
> what's going to happen is that postmaster is going to crash, then do the
> recovery cycle, then stop as instructed by the test; so pg_controldata
> would report that it was correctly shut down.

Yes, the control data shows "in production" without it.  The segfault
happens within the shut-down path, and the test logs indicate that the
server continues shutting down without doing a recovery cycle:

2024-01-23 12:14:49.254 CST [2376301] LOG:  received fast shutdown request
2024-01-23 12:14:49.254 CST [2376301] LOG:  aborting any active transactions
2024-01-23 12:14:49.255 CST [2376301] LOG:  background worker "logical 
replication launcher" (PID 2376308) exited with exit code 1
2024-01-23 12:14:49.256 CST [2376301] LOG:  background worker "autoprewarm 
leader" (PID 2376307) was terminated by signal 11: Segmentation fault
2024-01-23 12:14:49.256 CST [2376301] LOG:  terminating any other active server 
processes
2024-01-23 12:14:49.257 CST [2376301] LOG:  abnormal database system shutdown
2024-01-23 12:14:49.261 CST [2376301] LOG:  database system is shut down

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: FEATURE REQUEST: Role vCPU limit/priority

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 13:09:22 +0100, Thomas Kellerer wrote:
> Yoni Sade schrieb am 21.01.2024 um 19:07:
> > It would be useful to have the ability to define for a role default
> > vCPU affinity limits/thread priority settings so that more active
> > sessions could coexist similar to MySQL resource groups
> > .
> 
> To a certain extent, you can achieve something like that using Linux cgroups
> 
> https://www.cybertec-postgresql.com/en/linux-cgroups-for-postgresql/

If you do that naively, you just run into priority inversion issues. E.g. a
backend holding a critical lwlock not getting scheduled for a while because it
exceeded it CPU allocation, preventing higher priority processes from
progressing.

I doubt you can implement this in a robust manner outside of postgres.

Regards,

Andres




Re: On login trigger: take three

2024-01-23 Thread Alexander Lakhin

Hello Alexander and Daniel,

Please look at the following query, which triggers an assertion failure on
updating the field dathasloginevt for an entry in pg_database:
SELECT format('CREATE DATABASE db1 LOCALE_PROVIDER icu ICU_LOCALE en ENCODING 
utf8
ICU_RULES ''' || repeat(' ', 20) || ''' TEMPLATE template0;')
\gexec
\c db1 -

CREATE FUNCTION on_login_proc() RETURNS event_trigger AS $$
BEGIN
  RAISE NOTICE 'You are welcome!';
END;
$$ LANGUAGE plpgsql;

CREATE EVENT TRIGGER on_login_trigger ON login EXECUTE PROCEDURE 
on_login_proc();
DROP EVENT TRIGGER on_login_trigger;

\c

\connect: connection to server on socket "/tmp/.s.PGSQL.5432" failed: server 
closed the connection unexpectedly

The stack trace of the assertion failure is:
...
#5  0x55c8699b9b8d in ExceptionalCondition (
    conditionName=conditionName@entry=0x55c869a1f7c0 
"HaveRegisteredOrActiveSnapshot()",
    fileName=fileName@entry=0x55c869a1f4c6 "toast_internals.c", 
lineNumber=lineNumber@entry=669) at assert.c:66
#6  0x55c86945df0a in init_toast_snapshot (...) at toast_internals.c:669
#7  0x55c86945dfbe in toast_delete_datum (...) at toast_internals.c:429
#8  0x55c8694fd1da in toast_tuple_cleanup (...) at toast_helper.c:309
#9  0x55c8694b55a1 in heap_toast_insert_or_update (...) at heaptoast.c:333
#10 0x55c8694a8c6c in heap_update (... at heapam.c:3604
#11 0x55c8694a96cb in simple_heap_update (...) at heapam.c:4062
#12 0x55c869555b7b in CatalogTupleUpdate (...) at indexing.c:322
#13 0x55c8695f0725 in EventTriggerOnLogin () at event_trigger.c:957
...

Funnily enough, when Tom Lane was wondering, whether pg_database's toast
table could pose a risk [1], I came to the conclusion that it's impossible,
but that was before the login triggers introduction...

[1] https://www.postgresql.org/message-id/1284094.1695479962%40sss.pgh.pa.us

Best regards,
Alexander




Re: index prefetching

2024-01-23 Thread Tomas Vondra
On 1/19/24 22:43, Melanie Plageman wrote:
> On Fri, Jan 12, 2024 at 11:42 AM Tomas Vondra
>  wrote:
>>
>> On 1/9/24 21:31, Robert Haas wrote:
>>> On Thu, Jan 4, 2024 at 9:55 AM Tomas Vondra
>>>  wrote:
 Here's a somewhat reworked version of the patch. My initial goal was to
 see if it could adopt the StreamingRead API proposed in [1], but that
 turned out to be less straight-forward than I hoped, for two reasons:
>>>
>>> I guess we need Thomas or Andres or maybe Melanie to comment on this.
>>>
>>
>> Yeah. Or maybe Thomas if he has thoughts on how to combine this with the
>> streaming I/O stuff.
> 
> I've been studying your patch with the intent of finding a way to
> change it and or the streaming read API to work together. I've
> attached a very rough sketch of how I think it could work.
> 

Thanks.

> We fill a queue with blocks from TIDs that we fetched from the index.
> The queue is saved in a scan descriptor that is made available to the
> streaming read callback. Once the queue is full, we invoke the table
> AM specific index_fetch_tuple() function which calls
> pg_streaming_read_buffer_get_next(). When the streaming read API
> invokes the callback we registered, it simply dequeues a block number
> for prefetching.

So in a way there are two queues in IndexFetchTableData. One (blk_queue)
is being filled from IndexNext, and then the queue in StreamingRead.

> The only change to the streaming read API is that now, even if the
> callback returns InvalidBlockNumber, we may not be finished, so make
> it resumable.
> 

Hmm, not sure when can the callback return InvalidBlockNumber before
reaching the end. Perhaps for the first index_fetch_heap call? Any
reason not to fill the blk_queue before calling index_fetch_heap?


> Structurally, this changes the timing of when the heap blocks are
> prefetched. Your code would get a tid from the index and then prefetch
> the heap block -- doing this until it filled a queue that had the
> actual tids saved in it. With my approach and the streaming read API,
> you fetch tids from the index until you've filled up a queue of block
> numbers. Then the streaming read API will prefetch those heap blocks.
> 

And is that a good/desirable change? I'm not saying it's not, but maybe
we should not be filling either queue in one go - we don't want to
overload the prefetching.

> I didn't actually implement the block queue -- I just saved a single
> block number and pretended it was a block queue. I was imagining we
> replace this with something like your IndexPrefetch->blockItems --
> which has light deduplication. We'd probably have to flesh it out more
> than that.
> 

I don't understand how this passes the TID to the index_fetch_heap.
Isn't it working only by accident, due to blk_queue only having a single
entry? Shouldn't the first queue (blk_queue) store TIDs instead?

> There are also table AM layering violations in my sketch which would
> have to be worked out (not to mention some resource leakage I didn't
> bother investigating [which causes it to fail tests]).
> 
> 0001 is all of Thomas' streaming read API code that isn't yet in
> master and 0002 is my rough sketch of index prefetching using the
> streaming read API
> 
> There are also numerous optimizations that your index prefetching
> patch set does that would need to be added in some way. I haven't
> thought much about it yet. I wanted to see what you thought of this
> approach first. Basically, is it workable?
> 

It seems workable, yes. I'm not sure it's much simpler than my patch
(considering a lot of the code is in the optimizations, which are
missing from this patch).

I think the question is where should the optimizations happen. I suppose
some of them might/should happen in the StreamingRead API itself - like
the detection of sequential patterns, recently prefetched blocks, ...

But I'm not sure what to do about optimizations that are more specific
to the access path. Consider for example the index-only scans. We don't
want to prefetch all the pages, we need to inspect the VM and prefetch
just the not-all-visible ones. And then pass the info to the index scan,
so that it does not need to check the VM again. It's not clear to me how
to do this with this approach.


The main


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-22, Nathan Bossart wrote:

> Here is a patch.

Looks reasonable.

> This might be a topic for another thread, but I do wonder whether we could
> put a generic pg_controldata check in node->stop that would at least make
> sure that the state is some sort of expected shut-down state.  My first
> thought is that it could be a tad expensive, but... maybe it wouldn't be
> too bad.

Does this actually detect a problem if you take out the fix?  I think
what's going to happen is that postmaster is going to crash, then do the
recovery cycle, then stop as instructed by the test; so pg_controldata
would report that it was correctly shut down.

If we had a restart-cycle-counter to check maybe we could verify that it
hasn't changed, but I don't think we have that.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Cada quien es cada cual y baja las escaleras como quiere" (JMSerrat)




Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-23 08:00:00 +0300, Alexander Lakhin wrote:
> 22.01.2024 23:41, Andres Freund wrote:
> > ISTM that we shouldn't basically silently overlook shutdowns due to crashes 
> > in
> > the tests. How to not do so is unfortunately not immediately obvious to 
> > me...
> > 
> 
> FWIW, I encountered this behavior as well (with pg_stat):
> https://www.postgresql.org/message-id/18158-88f667028dbc7...@postgresql.org
> 
> and proposed a way to detect such shutdowns for a discussion:
> https://www.postgresql.org/message-id/flat/290b9ae3-98a2-0896-a957-18d3b60b6260%40gmail.com
> 
> where Shveta referenced a previous thread started by Tom Lane:
> https://www.postgresql.org/message-id/flat/2366244.1651681...@sss.pgh.pa.us
> 
> What do you think about leaving postmaster.pid on disk in case of an
> abnormal shutdown?

I don't think that's viable and would cause more problems than it solves, it'd
make us think that we might have an old postgres process hanging around that
needs to be terminted before we can start up. And I simply don't see the point
- we already record whether we crashed in the control file, no?

Greetings,

Andres Freund




Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Andres Freund
Hi,

On 2024-01-22 16:27:43 -0600, Nathan Bossart wrote:
> Here is a patch.

LGTM.


> This might be a topic for another thread, but I do wonder whether we could
> put a generic pg_controldata check in node->stop that would at least make
> sure that the state is some sort of expected shut-down state.  My first
> thought is that it could be a tad expensive, but... maybe it wouldn't be
> too bad.

I think that'd probably would be a good idea - I suspect there'd need to be a
fair number of exceptions, but that it'd be easier to change uses of ->stop()
to the exception case where needed, than to add a new function doing checking
and converting most things to that.

Greetings,

Andres Freund




Re: psql: Allow editing query results with \gedit

2024-01-23 Thread Pavel Stehule
út 23. 1. 2024 v 11:38 odesílatel Christoph Berg  napsal:

> Re: Pavel Stehule
> > It looks great for simple queries, but if somebody uses it like SELECT *
> > FROM pg_proc \gedit
>
> What's wrong with that? If the pager can handle the amount of data,
> the editor can do that as well. (If not, the fix is to just not run
> the command, and not blame the feature.)
>

just editing wide or long command or extra long strings can be unfriendly


>
> > I almost sure so \gedit is wrong name for this feature.
>
> I'm open for suggestions.
>

I have not too many ideas. The problem is in missing relation between
"edit" and "update and execute"


>
> > Can be nice if we are able:
> >
> > a) export data set in some readable format
> >
> > b) be possible to use more command in pipes
> >
> > some like
> >
> > select start, call, qrg, name from log where cty = 'CE9' order by start
> > \gpipexec(tsv) mypipe | bash update_pattern.sh > tmpfile; vi tmpfile; cat
> > tmpfile > mypipe
>
> Well yeah, that's still a lot of typing.
>

it should not be problem. You can hide long strings to psql variables



>
> > I understand your motivation well, but I don't like your proposal because
> > too many different things are pushed to one feature, and it is designed
> for
> > a single purpose.
>
> It's one feature for one purpose. And the patch isn't *that* huge. Did
> I make the mistake of adding documentation, extra command options, and
> tab completion in v1?
>

no - I have problem so one command does editing, generating write SQL
commands (updates)  and their execution


>
> Christoph
>


Re: psql JSON output format

2024-01-23 Thread Christoph Berg
Re: Laurenz Albe
> > I'd just stop the flood right before it starts...
> 
> I'd stop the flood right after json/jsonb.

Nod. I do see a point here, given it's "json in json", and not
"something else in json". Will try to make it work with \gedit.

> Arrays as database columns are probably too rare to be a real issue.

Ack.

> I'm pretty certain that people are more likely to use psql's JSON
> output format to move data somewhere else.

Well, there's also the other patch to add JSON support to COPY, that
would be even more suitable.

Christoph




Re: core dumps in auto_prewarm, tests succeed

2024-01-23 Thread Nathan Bossart
On Mon, Jan 22, 2024 at 04:27:43PM -0600, Nathan Bossart wrote:
> Here is a patch.

I'd like to fix these crashes sooner than later, so I will plan on
committing this tonight (barring objections or feedback).  If this needs to
be revisited later for some reason, I'm happy to do so.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: psql JSON output format

2024-01-23 Thread Laurenz Albe
On Tue, 2024-01-23 at 16:36 +0100, Christoph Berg wrote:
> Re: Laurenz Albe
> > I am kind of unhappy about this change.  It seems awkward and undesirable
> > so have JSON values decorated with weird quoting in JSON output.
> > I understand the motivation, but I bet it's not what will make users
> > happy.
> 
> Well, why stop at JSON, and not represent any array type as a JSON
> array? Compound types could be transformed into JSON objects. Custom
> data types could add hooks for their own custom JSON representation.
> 
> I'd just stop the flood right before it starts...

I'd stop the flood right after json/jsonb.

Arrays as database columns are probably too rare to be a real issue.

> The real reason I'd not want to go that route is because I need the
> format to be round-trip safe for the "\gedit" patch, see the other
> thread. We would have to transform JSON sub-parts back into PG's text
> format. But there were already complaints that other patch is complex.
> At the moment it's just strcmp-ing the text value before and after,
> which is straighforward.

I don't particularly care about \gedit, any I think \gedit shouldn't
be the raison d'être for this patch.

I cannot imagine that anybody who wants to move data from PostgreSQL
to PostgreSQL will use \gedit, load the output from psql into the
editor and save.  After all, there is COPY and pg_dump.

I'm pretty certain that people are more likely to use psql's JSON
output format to move data somewhere else.

Yours,
Laurenz Albe




Re: psql JSON output format

2024-01-23 Thread Christoph Berg
Re: Laurenz Albe
> If you import the data into an existing structure, you don't need the 
> metadata.

Also, since you were running the query yourself, you should know what
columns you were expecting.

Christoph




Re: psql JSON output format

2024-01-23 Thread Laurenz Albe
On Tue, 2024-01-23 at 08:01 -0700, David G. Johnston wrote:
> I do think that the output should include a metadata section that lists all of
> the actual columns in the result, the column position, and since we have the
> info available, the data type name and possibly OID.  Then any column name
> present in the metadata but that isn't a key name for a given object is known
> to have an SQL NULL as the value of that column in that row.

Sounds attractive, but I'm a bit worried that that additional information
might make life harder for some consumers.

My crystal ball is as cloudy as anybody's when it comes to guessing the
most likely use cases for the feature, but I'd rather keep it simple and add
features like that later, if there is a demand.

If you import the data into an existing structure, you don't need the metadata.

Yours,
Laurenz Albe




Re: psql JSON output format

2024-01-23 Thread Christoph Berg
Re: Laurenz Albe
> I am kind of unhappy about this change.  It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.

Well, why stop at JSON, and not represent any array type as a JSON
array? Compound types could be transformed into JSON objects. Custom
data types could add hooks for their own custom JSON representation.

I'd just stop the flood right before it starts...

The real reason I'd not want to go that route is because I need the
format to be round-trip safe for the "\gedit" patch, see the other
thread. We would have to transform JSON sub-parts back into PG's text
format. But there were already complaints that other patch is complex.
At the moment it's just strcmp-ing the text value before and after,
which is straighforward.

> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.

That works, but only by convention only.


Re: David G. Johnston
> I agree on distinguishing SQL via omission but I do think, almost
> regardless, that the output should include a metadata section that lists
> all of the actual columns in the result, the column position, and since we
> have the info available, the data type name and possibly OID.  Then any
> column name present in the metadata but that isn't a key name for a given
> object is known to have an SQL NULL as the value of that column in that row.

There are no comments in JSON.

Adding the info in-band would break the simple use case of using the
data as-is for further processing.

Christoph




Re: psql JSON output format

2024-01-23 Thread David G. Johnston
On Tue, Jan 23, 2024 at 7:35 AM Stefan Keller  wrote:

> Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
> :
> > I understand the motivation, but I bet it's not what will make users
> > happy.
> >
> > If you need to disambiguate between SQL NULL and JSON null, my
> > preferred solution would be to omit SQL NULL columns from the output
> > altogether.
>
> I fully support Laurenz's proposal and argumentation. The main use
> case for such a JSON output feature is further processing somewhere
> else.
>
> --Stefan
>
> Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
> :
> >
> > On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> > > What I did now in v3 of this patch is to print boolean and numeric
> > > values (ints, floats, numeric) without quotes, while adding the quotes
> > > back to json. This solves the NULL vs 'null'::json problem.
> >
> > The patch is working as advertised.
> >
> > I am kind of unhappy about this change.  It seems awkward and undesirable
> > so have JSON values decorated with weird quoting in JSON output.
> > I understand the motivation, but I bet it's not what will make users
> > happy.
> >
> > If you need to disambiguate between SQL NULL and JSON null, my
> > preferred solution would be to omit SQL NULL columns from the output
> > altogether.
> >
>

I agree on distinguishing SQL via omission but I do think, almost
regardless, that the output should include a metadata section that lists
all of the actual columns in the result, the column position, and since we
have the info available, the data type name and possibly OID.  Then any
column name present in the metadata but that isn't a key name for a given
object is known to have an SQL NULL as the value of that column in that row.

David J.


Re: make BuiltinTrancheNames less ugly

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-23, Alvaro Herrera wrote:

> ... oh, actually FreeBSD failed with this strange problem while linking:

I figured this out.  For dtrace support, we need to run dtrace on all
the objects produced by the compilation step, but because I took
lwlock.o out of the objects used to produce the backend into its own
"library", it needs to be included explicitly.  (I think this is not a
problem for things like nodes/copyfuncs.c only because those files don't
have any use of TRACE macros).

So here's v3, which now passes fully in CI.

I'm a total newbie to Meson, so it's likely that there are better ways
to implement this.  I'll leave this here for a little bit in case
anybody wants to comment.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"El sudor es la mejor cura para un pensamiento enfermo" (Bardia)
>From 97e86277ec9a406fc625141b59d81757d4358a93 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 23 Jan 2024 10:36:14 +0100
Subject: [PATCH v3] Use designated initializers for BuiltinTrancheNames

---
 src/backend/meson.build   |   5 +-
 src/backend/storage/lmgr/Makefile |   3 +-
 .../storage/lmgr/generate-lwlocknames.pl  |  10 +-
 src/backend/storage/lmgr/lwlock.c | 110 ++
 src/backend/storage/lmgr/meson.build  |  12 +-
 5 files changed, 55 insertions(+), 85 deletions(-)

diff --git a/src/backend/meson.build b/src/backend/meson.build
index 8767aaba67..6ce6292ea1 100644
--- a/src/backend/meson.build
+++ b/src/backend/meson.build
@@ -127,7 +127,10 @@ backend_objs = [postgres_lib.extract_all_objects(recursive: false)]
 if dtrace.found() and host_system != 'darwin'
   backend_input += custom_target(
 'probes.o',
-input: ['utils/probes.d', postgres_lib.extract_objects(backend_sources, timezone_sources)],
+input: ['utils/probes.d',
+  postgres_lib.extract_objects(backend_sources, timezone_sources),
+  lwlock_lib.extract_objects(lwlock_source)
+  ],
 output: 'probes.o',
 command: [dtrace, '-C', '-G', '-o', '@OUTPUT@', '-s', '@INPUT@'],
 install: false,
diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 504480e847..81da6ee13a 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,13 +12,14 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
+override CPPFLAGS := -I. $(CPPFLAGS)
+
 OBJS = \
 	condition_variable.o \
 	deadlock.o \
 	lmgr.o \
 	lock.o \
 	lwlock.o \
-	lwlocknames.o \
 	predicate.o \
 	proc.o \
 	s_lock.o \
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index 7b93ecf6c1..a679a4ff54 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -10,7 +10,6 @@ use Getopt::Long;
 my $output_path = '.';
 
 my $lastlockidx = -1;
-my $continue = "\n";
 
 GetOptions('outdir:s' => \$output_path);
 
@@ -29,8 +28,6 @@ print $h $autogen;
 print $h "/* there is deliberately not an #ifndef LWLOCKNAMES_H here */\n\n";
 print $c $autogen, "\n";
 
-print $c "const char *const IndividualLWLockNames[] = {";
-
 #
 # First, record the predefined LWLocks listed in wait_event_names.txt.  We'll
 # cross-check those with the ones in lwlocknames.txt.
@@ -97,12 +94,10 @@ while (<$lwlocknames>)
 	while ($lastlockidx < $lockidx - 1)
 	{
 		++$lastlockidx;
-		printf $c "%s	\"\"", $continue, $lastlockidx;
-		$continue = ",\n";
+		printf $c "[%s] = \"\",\n", $lastlockidx, $lastlockidx;
 	}
-	printf $c "%s	\"%s\"", $continue, $trimmedlockname;
+	printf $c "[%s] = \"%s\",\n", $lockidx, $trimmedlockname;
 	$lastlockidx = $lockidx;
-	$continue = ",\n";
 
 	print $h "#define $lockname (&MainLWLockArray[$lockidx].lock)\n";
 }
@@ -112,7 +107,6 @@ die
   . "lwlocknames.txt"
   if $i < scalar @wait_event_lwlocks;
 
-printf $c "\n};\n";
 print $h "\n";
 printf $h "#define NUM_INDIVIDUAL_LWLOCKS		%s\n", $lastlockidx + 1;
 
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f2de5a562..8aad9c6690 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -115,8 +115,8 @@ StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
  * There are three sorts of LWLock "tranches":
  *
  * 1. The individually-named locks defined in lwlocknames.h each have their
- * own tranche.  The names of these tranches appear in IndividualLWLockNames[]
- * in lwlocknames.c.
+ * own tranche.  The names of these tranches come from lwlocknames.c into
+ * BuiltinTranchNames[] below.
  *
  * 2. There are some predefined tranches for built-in groups of locks.
  * These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
@@ -129,75 +129,43 @@ StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
  * All these names are user-visible as wait event names, so choose with care
  * ..

Re: psql JSON output format

2024-01-23 Thread Stefan Keller
Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
:
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.

I fully support Laurenz's proposal and argumentation. The main use
case for such a JSON output feature is further processing somewhere
else.

--Stefan

Am Di., 23. Jan. 2024 um 15:15 Uhr schrieb Laurenz Albe
:
>
> On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> > What I did now in v3 of this patch is to print boolean and numeric
> > values (ints, floats, numeric) without quotes, while adding the quotes
> > back to json. This solves the NULL vs 'null'::json problem.
>
> The patch is working as advertised.
>
> I am kind of unhappy about this change.  It seems awkward and undesirable
> so have JSON values decorated with weird quoting in JSON output.
> I understand the motivation, but I bet it's not what will make users
> happy.
>
> If you need to disambiguate between SQL NULL and JSON null, my
> preferred solution would be to omit SQL NULL columns from the output
> altogether.
>
> Yours,
> Laurenz Albe
>
>




Re: Parallelize correlated subqueries that execute within each worker

2024-01-23 Thread Akshat Jaimini
Hello! 
I was going through the previous conversations for this particular patch and it 
seems that this patch failed some tests previously? 
Imo we should move it to the next CF so that the remaining issues can be 
resolved accordingly.

Re: psql JSON output format

2024-01-23 Thread Laurenz Albe
On Mon, 2024-01-22 at 16:19 +0100, Christoph Berg wrote:
> What I did now in v3 of this patch is to print boolean and numeric
> values (ints, floats, numeric) without quotes, while adding the quotes
> back to json. This solves the NULL vs 'null'::json problem.

The patch is working as advertised.

I am kind of unhappy about this change.  It seems awkward and undesirable
so have JSON values decorated with weird quoting in JSON output.
I understand the motivation, but I bet it's not what will make users
happy.

If you need to disambiguate between SQL NULL and JSON null, my
preferred solution would be to omit SQL NULL columns from the output
altogether.

Yours,
Laurenz Albe




Re: [PATCH] Add native windows on arm64 support

2024-01-23 Thread Dave Cramer
On Tue, 19 Sept 2023 at 23:48, Michael Paquier  wrote:

> On Tue, Sep 19, 2023 at 01:35:17PM +0100, Anthony Roberts wrote:
> > Was there an explicit request for something there? I was under the
> > impression that this was all just suggestion/theory at the moment.
>
> Yes.  The addition of a buildfarm member registered into the community
> facilities, so as it is possible to provide back to developers dynamic
> feedback to the new configuration setup you would like to see
> supported in PostgreSQL.  This has been mentioned a few times on this
> thread, around these places:
>
> https://www.postgresql.org/message-id/20220322103011.i6z2tuj4hle23wgx@jrouhaud
>
> https://www.postgresql.org/message-id/dbd80715-e56b-40eb-84aa-ace70198e...@yesql.se
>
> https://www.postgresql.org/message-id/6d1e2719-fa49-42a5-a6ff-0b0072bf6...@yesql.se
> --
> Michael
>

The attached patch works with v17. I will work on getting a buildfarm
animal up shortly


windows_arm_build.patch
Description: Binary data


Re: Support TZ format code in to_timestamp()

2024-01-23 Thread Aleksander Alekseev
Hi,

> Since I had this on my (ever-growing) TODO I re-prioritized today and took a
> look at it since I think it's something we should support.
>
> Nothing really sticks out and I was unable to poke any holes so I don't have
> too much more to offer than a LGTM.
>
> +   while (len > 0)
> +   {
> +   const datetkn *tp = datebsearch(lowtoken, zoneabbrevtbl->abbrevs,
> +   zoneabbrevtbl->numabbrevs);
>
> My immediate reaction was that we should stop at prefix lengths of two since I
> could only think of abbreviations of two or more.  Googling and reading found
> that there are indeed one-letter timezones (Alpha, Bravo etc..).  Not sure if
> it's worth mentioning that in the comment to help other readers who aren't 
> neck
> deep in timezones?
>
> + /* FALL THRU */
>
> Tiny nitpick, it looks a bit curious that we spell it FALL THRU here and "fall
> through" a few cases up in the same switch.  While we are quite inconsistent
> across the tree, consistency within a file is preferrable (regardless of
> which).

I reviewed the patch and tested it on MacOS and generally concur with
stated above. The only nitpick I have is the apparent lack of negative
tests for to_timestamp(), e.g. when the string doesn't match the
specified format.

-- 
Best regards,
Aleksander Alekseev




Re: FEATURE REQUEST: Role vCPU limit/priority

2024-01-23 Thread Aleksander Alekseev
Hi,

> Well, I'm not a developer, I just wanted to pitch this idea as a DBA who 
> would make use of this feature.

I don't think one shouldn't be a developer in order to write an RFC
and drive its discussion within the community. On top of that I'm
pretty confident as a DBA you can contribute tests and documentation.

-- 
Best regards,
Aleksander Alekseev




Re: make BuiltinTrancheNames less ugly

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-23, Alvaro Herrera wrote:

> This is what I came up with.

Hm, I forgot the attachment, thanks Heikki for the reminder.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Las mujeres son como hondas:  mientras más resistencia tienen,
 más lejos puedes llegar con ellas"  (Jonas Nightingale, Leap of Faith)
>From 2f5185cdd37e3502520d85e5f6f565b58330e702 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 23 Jan 2024 10:36:14 +0100
Subject: [PATCH v2] Use designated initializers for BuiltinTrancheNames

---
 src/backend/storage/lmgr/Makefile |   3 +-
 .../storage/lmgr/generate-lwlocknames.pl  |  10 +-
 src/backend/storage/lmgr/lwlock.c | 110 ++
 src/backend/storage/lmgr/meson.build  |  10 +-
 4 files changed, 49 insertions(+), 84 deletions(-)

diff --git a/src/backend/storage/lmgr/Makefile b/src/backend/storage/lmgr/Makefile
index 504480e847..81da6ee13a 100644
--- a/src/backend/storage/lmgr/Makefile
+++ b/src/backend/storage/lmgr/Makefile
@@ -12,13 +12,14 @@ subdir = src/backend/storage/lmgr
 top_builddir = ../../../..
 include $(top_builddir)/src/Makefile.global
 
+override CPPFLAGS := -I. $(CPPFLAGS)
+
 OBJS = \
 	condition_variable.o \
 	deadlock.o \
 	lmgr.o \
 	lock.o \
 	lwlock.o \
-	lwlocknames.o \
 	predicate.o \
 	proc.o \
 	s_lock.o \
diff --git a/src/backend/storage/lmgr/generate-lwlocknames.pl b/src/backend/storage/lmgr/generate-lwlocknames.pl
index 7b93ecf6c1..a679a4ff54 100644
--- a/src/backend/storage/lmgr/generate-lwlocknames.pl
+++ b/src/backend/storage/lmgr/generate-lwlocknames.pl
@@ -10,7 +10,6 @@ use Getopt::Long;
 my $output_path = '.';
 
 my $lastlockidx = -1;
-my $continue = "\n";
 
 GetOptions('outdir:s' => \$output_path);
 
@@ -29,8 +28,6 @@ print $h $autogen;
 print $h "/* there is deliberately not an #ifndef LWLOCKNAMES_H here */\n\n";
 print $c $autogen, "\n";
 
-print $c "const char *const IndividualLWLockNames[] = {";
-
 #
 # First, record the predefined LWLocks listed in wait_event_names.txt.  We'll
 # cross-check those with the ones in lwlocknames.txt.
@@ -97,12 +94,10 @@ while (<$lwlocknames>)
 	while ($lastlockidx < $lockidx - 1)
 	{
 		++$lastlockidx;
-		printf $c "%s	\"\"", $continue, $lastlockidx;
-		$continue = ",\n";
+		printf $c "[%s] = \"\",\n", $lastlockidx, $lastlockidx;
 	}
-	printf $c "%s	\"%s\"", $continue, $trimmedlockname;
+	printf $c "[%s] = \"%s\",\n", $lockidx, $trimmedlockname;
 	$lastlockidx = $lockidx;
-	$continue = ",\n";
 
 	print $h "#define $lockname (&MainLWLockArray[$lockidx].lock)\n";
 }
@@ -112,7 +107,6 @@ die
   . "lwlocknames.txt"
   if $i < scalar @wait_event_lwlocks;
 
-printf $c "\n};\n";
 print $h "\n";
 printf $h "#define NUM_INDIVIDUAL_LWLOCKS		%s\n", $lastlockidx + 1;
 
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f2de5a562..8aad9c6690 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -115,8 +115,8 @@ StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
  * There are three sorts of LWLock "tranches":
  *
  * 1. The individually-named locks defined in lwlocknames.h each have their
- * own tranche.  The names of these tranches appear in IndividualLWLockNames[]
- * in lwlocknames.c.
+ * own tranche.  The names of these tranches come from lwlocknames.c into
+ * BuiltinTranchNames[] below.
  *
  * 2. There are some predefined tranches for built-in groups of locks.
  * These are listed in enum BuiltinTrancheIds in lwlock.h, and their names
@@ -129,75 +129,43 @@ StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
  * All these names are user-visible as wait event names, so choose with care
  * ... and do not forget to update the documentation's list of wait events.
  */
-extern const char *const IndividualLWLockNames[];	/* in lwlocknames.c */
-
 static const char *const BuiltinTrancheNames[] = {
-	/* LWTRANCHE_XACT_BUFFER: */
-	"XactBuffer",
-	/* LWTRANCHE_COMMITTS_BUFFER: */
-	"CommitTsBuffer",
-	/* LWTRANCHE_SUBTRANS_BUFFER: */
-	"SubtransBuffer",
-	/* LWTRANCHE_MULTIXACTOFFSET_BUFFER: */
-	"MultiXactOffsetBuffer",
-	/* LWTRANCHE_MULTIXACTMEMBER_BUFFER: */
-	"MultiXactMemberBuffer",
-	/* LWTRANCHE_NOTIFY_BUFFER: */
-	"NotifyBuffer",
-	/* LWTRANCHE_SERIAL_BUFFER: */
-	"SerialBuffer",
-	/* LWTRANCHE_WAL_INSERT: */
-	"WALInsert",
-	/* LWTRANCHE_BUFFER_CONTENT: */
-	"BufferContent",
-	/* LWTRANCHE_REPLICATION_ORIGIN_STATE: */
-	"ReplicationOriginState",
-	/* LWTRANCHE_REPLICATION_SLOT_IO: */
-	"ReplicationSlotIO",
-	/* LWTRANCHE_LOCK_FASTPATH: */
-	"LockFastPath",
-	/* LWTRANCHE_BUFFER_MAPPING: */
-	"BufferMapping",
-	/* LWTRANCHE_LOCK_MANAGER: */
-	"LockManager",
-	/* LWTRANCHE_PREDICATE_LOCK_MANAGER: */
-	"PredicateLockManager",
-	/* LWTRANCHE_PARALLEL_HASH_JOIN: */
-	"ParallelHashJoin",
-	/* LWTRANCHE_PARALLEL_QUERY_DSA: */
-	"ParallelQueryDSA",
-	/* LWTRANCHE_PER_SESSION_DSA: */
-	"PerSessionDSA",
-	/* LWTRANCHE_PER_SESSION_RECORD_TYPE: */
-	"PerSe

Re: make BuiltinTrancheNames less ugly

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-23, Alvaro Herrera wrote:

> ... oh, actually FreeBSD failed with this strange problem while linking:
> [11:39:43.550] ld: error: undefined symbol: 
> __dtraceenabled_postgresql___lwlock__wait__start
> [11:39:43.551] >>> referenced by lwlock.c:1277 
> (../src/backend/storage/lmgr/lwlock.c:1277)
> [11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockAcquire) in 
> archive src/backend/storage/lmgr/lwlock.a
> [11:39:43.551] >>> referenced by lwlock.c:1442 
> (../src/backend/storage/lmgr/lwlock.c:1442)
> [11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockAcquireOrWait) 
> in archive src/backend/storage/lmgr/lwlock.a
> [11:39:43.551] >>> referenced by lwlock.c:1660 
> (../src/backend/storage/lmgr/lwlock.c:1660)
> [11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockWaitForVar) in 
> archive src/backend/storage/lmgr/lwlock.a
> [11:39:43.551] 

So what's going on here is that lwlock.c is being compiled to a separate
archive (lwlock.a), and when linking it wants to see the dtrace symbols
(__dtraceenabled_postgresql___lwlock__wait__start et al) but it looks
like they're not defined.

I installed systemtap on my machine and reconfigured meson with it so
that it uses dtrace, and it compiles successfully.  One interesting
point: on my ninja log, I see dtrace being invoked, which I do not in
the FreeBSD log.

I continue to not understand what is going on.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sabio habla porque tiene algo que decir;
el tonto, porque tiene que decir algo" (Platon).




Re: Dump-restore loosing 'attnotnull' bit for DEFERRABLE PRIMARY KEY column(s).

2024-01-23 Thread Amul Sul
On Sat, Jan 20, 2024 at 7:55 AM vignesh C  wrote:

> On Fri, 22 Sept 2023 at 18:45, Amul Sul  wrote:
> >
> >
> >
> > On Wed, Sep 20, 2023 at 8:29 PM Alvaro Herrera 
> wrote:
> >>
> >> On 2023-Sep-20, Amul Sul wrote:
> >>
> >> > On the latest master head, I can see a $subject bug that seems to be
> related
> >> > commit #b0e96f311985:
> >> >
> >> > Here is the table definition:
> >> > create table foo(i int, j int, CONSTRAINT pk PRIMARY KEY(i)
> DEFERRABLE);
> >>
> >> Interesting, thanks for the report.  Your attribution to that commit is
> >> correct.  The table is dumped like this:
> >>
> >> CREATE TABLE public.foo (
> >> i integer CONSTRAINT pgdump_throwaway_notnull_0 NOT NULL NO INHERIT,
> >> j integer
> >> );
> >> ALTER TABLE ONLY public.foo
> >> ADD CONSTRAINT pk PRIMARY KEY (i) DEFERRABLE;
> >> ALTER TABLE ONLY public.foo DROP CONSTRAINT pgdump_throwaway_notnull_0;
> >>
> >> so the problem here is that the deferrable PK is not considered a reason
> >> to keep attnotnull set, so we produce a throwaway constraint that we
> >> then drop.  This is already bogus, but what is more bogus is the fact
> >> that the backend accepts the DROP CONSTRAINT at all.
> >>
> >> The pg_dump failing should be easy to fix, but fixing the backend to
> >> error out sounds more critical.  So, the reason for this behavior is
> >> that RelationGetIndexList doesn't want to list an index that isn't
> >> marked indimmediate as a primary key.  I can easily hack around that by
> >> doing
> >>
> >> diff --git a/src/backend/utils/cache/relcache.c
> b/src/backend/utils/cache/relcache.c
> >> index 7234cb3da6..971d9c8738 100644
> >> --- a/src/backend/utils/cache/relcache.c
> >> +++ b/src/backend/utils/cache/relcache.c
> >> @@ -4794,7 +4794,6 @@ RelationGetIndexList(Relation relation)
> >>  * check them.
> >>  */
> >> if (!index->indisunique ||
> >> -   !index->indimmediate ||
> >> !heap_attisnull(htup,
> Anum_pg_index_indpred, NULL))
> >> continue;
> >>
> >> @@ -4821,6 +4820,9 @@ RelationGetIndexList(Relation relation)
> >>  relation->rd_rel->relkind ==
> RELKIND_PARTITIONED_TABLE))
> >> pkeyIndex = index->indexrelid;
> >>
> >> +   if (!index->indimmediate)
> >> +   continue;
> >> +
> >> if (!index->indisvalid)
> >> continue;
> >>
> >>
> >> But of course this is not great, since it impacts unrelated bits of code
> >> that are relying on relation->pkindex or RelationGetIndexAttrBitmap
> >> having their current behavior with non-immediate index.
> >
> >
> > True, but still wondering why would relation->rd_pkattr skipped for a
> > deferrable primary key, which seems to be a bit incorrect to me since it
> > sensing that relation doesn't have PK at all.  Anyway, that is unrelated.
> >
> >>
> >> I think a real solution is to stop relying on RelationGetIndexAttrBitmap
> >> in ATExecDropNotNull().  (And, again, pg_dump needs some patch as well
> >> to avoid printing a throwaway NOT NULL constraint at this point.)
> >
> >
> > I might not have understood this, but I think, if it is ok to skip
> throwaway NOT
> > NULL for deferrable PK then that would be enough for the reported issue
> > to be fixed.  I quickly tried with the attached patch which looks
> sufficient
> > to skip that, but, TBH, I haven't thought carefully about this change.
>
> I did not see any test addition for this, can we add it?
>

Ok, added it in the attached version.

This was an experimental patch, I was looking for the comment on the
proposed
approach -- whether we could simply skip the throwaway NOT NULL constraint
for
deferred PK constraint.  Moreover,  skip that for any PK constraint.

Regards,
Amul


trial_skip_throwaway_non_null_v2.patch
Description: Binary data


Re: FEATURE REQUEST: Role vCPU limit/priority

2024-01-23 Thread Thomas Kellerer
Yoni Sade schrieb am 21.01.2024 um 19:07:
> It would be useful to have the ability to define for a role default
> vCPU affinity limits/thread priority settings so that more active
> sessions could coexist similar to MySQL resource groups
> .

To a certain extent, you can achieve something like that using Linux cgroups

https://www.cybertec-postgresql.com/en/linux-cgroups-for-postgresql/






Re: FEATURE REQUEST: Role vCPU limit/priority

2024-01-23 Thread Yoni Sade
Well, I'm not a developer, I just wanted to pitch this idea as a DBA who
would make use of this feature.

Best Regards,
Yoni Sade

‫בתאריך יום ב׳, 22 בינו׳ 2024 ב-12:43 מאת ‪Aleksander Alekseev‬‏ <‪
aleksan...@timescale.com‬‏>:‬

> Hi Yoni,
>
> > It would be useful to have the ability to define for a role default vCPU
> affinity limits/thread priority settings so that more active sessions could
> coexist similar to MySQL resource groups.
>
> To me this sounds like a valuable feature.
>
> Would you be interested in working on it? Typically it is a good idea
> to start with an RFC document and discuss it with the community.
>
> --
> Best regards,
> Aleksander Alekseev
>


Re: Synchronizing slots from primary to standby

2024-01-23 Thread shveta malik
On Tue, Jan 23, 2024 at 9:45 AM Peter Smith  wrote:
>
> Here are some review comments for v65-0002

Thanks Peter for the feedback. I have addressed these in v66.

>
> 4. GetStandbyFlushRecPtr
>
>  /*
> - * Returns the latest point in WAL that has been safely flushed to disk, and
> - * can be sent to the standby. This should only be called when in recovery,
> - * ie. we're streaming to a cascaded standby.
> + * Returns the latest point in WAL that has been safely flushed to disk.
> + * This should only be called when in recovery.
> + *
>
> Since it says "This should only be called when in recovery", should
> there also be a check for that (e.g. RecoveryInProgress) in the added
> Assert?

Since 'am_cascading_walsender' and  'IsLogicalSlotSyncWorker' makes
sense 'in-recovery' only, I think explicit check for
'RecoveryInProgress' is not needed here. But I can add if others also
think it is needed.

thanks
Shveta




Re: make BuiltinTrancheNames less ugly

2024-01-23 Thread Alvaro Herrera
On 2024-Jan-23, Heikki Linnakangas wrote:

> On 23/01/2024 12:25, Alvaro Herrera wrote:

> > 2. More invasively, rework generate-lwlocknames.pl so that both lwlocks
> > and these builtin tranche names appear in a single array.  (We could do
> > so by #include'ing lwlocknames.c at the top of the array).

> Idea 2 seems pretty straightforward, +1 for that.

This is what I came up with.  For the compilation framework (both
Makefile and meson) I mostly just copied what we do in src/backend/node,
which seems to have passed CI just fine, but maybe there are better ways
to achieve it.

... oh, actually FreeBSD failed with this strange problem while linking:
[11:39:43.550] ld: error: undefined symbol: 
__dtraceenabled_postgresql___lwlock__wait__start
[11:39:43.551] >>> referenced by lwlock.c:1277 
(../src/backend/storage/lmgr/lwlock.c:1277)
[11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockAcquire) in 
archive src/backend/storage/lmgr/lwlock.a
[11:39:43.551] >>> referenced by lwlock.c:1442 
(../src/backend/storage/lmgr/lwlock.c:1442)
[11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockAcquireOrWait) in 
archive src/backend/storage/lmgr/lwlock.a
[11:39:43.551] >>> referenced by lwlock.c:1660 
(../src/backend/storage/lmgr/lwlock.c:1660)
[11:39:43.551] >>>   lwlock.a.p/lwlock.c.o:(LWLockWaitForVar) in 
archive src/backend/storage/lmgr/lwlock.a
[11:39:43.551] 
  [ further similar lines ]
https://cirrus-ci.com/task/5055489315176448?logs=build#L1091

I have no idea what this means or how is it related to what I'm doing.


IMO it would be less ugly to have the origin file lwlocknames.txt be not
a text file but a .h with a macro that can be defined by interested
parties so that they can extract what they want from the file, like
PG_CMDTAG or PG_KEYWORD.  Using Perl for this seems dirty ...

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"In Europe they call me Niklaus Wirth; in the US they call me Nickel's worth.
 That's because in Europe they call me by name, and in the US by value!"




Re: logical decoding build wrong snapshot with subtransactions

2024-01-23 Thread Amit Kapila
On Fri, Jan 19, 2024 at 1:05 PM feichanghong  wrote:
>
> This issue has been reported in the  list at the below link, but
> received almost no response:
> https://www.postgresql.org/message-id/18280-4c8060178cb41750%40postgresql.org
> Hoping for some feedback from kernel hackers, thanks!
>

Thanks for the report and analysis. I have responded to the original
thread. Let's discuss it there.

-- 
With Regards,
Amit Kapila.




Re: BUG: Former primary node might stuck when started as a standby

2024-01-23 Thread Alexander Lakhin

Hi Aleksander,

[ I'm writing this off-list to minimize noise, but we can continue the 
discussion in -hackers, if you wish ]

22.01.2024 14:00, Aleksander Alekseev wrote:


Hi,


But node1 knows that it's a standby now and it's expected to get all the
WAL records from the primary, doesn't it?

Yes, but node1 doesn't know if it always was a standby or not. What if
node1 was always a standby, node2 was a primary, then node2 died and
node3 is a new primary.


Excuse me, but I still can't understand what could go wrong in this case.
Let's suppose, node1 has WAL with the following contents before start:
CPLOC | TL1R1 | TL1R2 | TL1R3 |

while node2's WAL contains:
TL1R1 | TL2R1 | TL2R2 | ...

where CPLOC -- a checkpoint location, TLxRy -- a record y on a timeline x.

I assume that requesting all WAL records from node2 without redoing local
records should be the right thing.

And even in the situation you propose:
CPLOC | TL2R5 | TL2R6 | TL2R7 |

while node3's WAL contains:
TL2R5 | TL3R1 | TL3R2 | ...

I see no issue with applying records from node3...


  If node1 sees inconsistency  in the WAL
records, it should report it and stop doing anything, since it doesn't
has all the information needed to resolve the inconsistencies in all
the possible cases. Only DBA has this information.


I still wonder, what can be considered an inconsistency in this situation.
Doesn't the exactly redo of all the local WAL records create the
inconsistency here?
For me, it's the question of an authoritative source, and if we had such a
source, we should trust it's records only.

Or in the other words, what if the record TL1R3, which node1 wrote to it's
WAL, but didn't send to node2, happened to have an incorrect checksum (due
to partial write, for example)?
If I understand correctly, node1 will just stop redoing WAL at that
position to receive all the following records from node2 and move forward
without reporting the inconsistency (an extra WAL record).

Best regards,
Alexander




Re: psql: Allow editing query results with \gedit

2024-01-23 Thread Christoph Berg
Re: Pavel Stehule
> It looks great for simple queries, but if somebody uses it like SELECT *
> FROM pg_proc \gedit

What's wrong with that? If the pager can handle the amount of data,
the editor can do that as well. (If not, the fix is to just not run
the command, and not blame the feature.)

> I almost sure so \gedit is wrong name for this feature.

I'm open for suggestions.

> Can be nice if we are able:
> 
> a) export data set in some readable format
> 
> b) be possible to use more command in pipes
> 
> some like
> 
> select start, call, qrg, name from log where cty = 'CE9' order by start
> \gpipexec(tsv) mypipe | bash update_pattern.sh > tmpfile; vi tmpfile; cat
> tmpfile > mypipe

Well yeah, that's still a lot of typing.

> I understand your motivation well, but I don't like your proposal because
> too many different things are pushed to one feature, and it is designed for
> a single purpose.

It's one feature for one purpose. And the patch isn't *that* huge. Did
I make the mistake of adding documentation, extra command options, and
tab completion in v1?

Christoph




Re: make BuiltinTrancheNames less ugly

2024-01-23 Thread Heikki Linnakangas

On 23/01/2024 12:25, Alvaro Herrera wrote:

This array of tranche names is looking pretty ugly these days, and it'll
get worse as we add more members to it.  I propose to use C99 designated
initializers, like we've done for other arrays.  Patch attached.

The way I've coded in this patch, it means the array will now have 52
NULL pointers at the beginning.  I don't think this is a big deal and
makes the code prettier.  I see two alternatives:

1. Avoid all those NULLs by making each definition uglier (subtract
NUM_INDIVIDUAL_LWLOCKS from each array index) _and_ the usage of the
array by subtracting the same amount.  This saves 208 bytes at the
expense of making the code worse.

2. More invasively, rework generate-lwlocknames.pl so that both lwlocks
and these builtin tranche names appear in a single array.  (We could do
so by #include'ing lwlocknames.c at the top of the array).


Now, having written this proposal, I'm leaning towards idea 2 myself,
but since the patch here is less invasive, it seems worth having as
evidence.


Idea 2 seems pretty straightforward, +1 for that.

--
Heikki Linnakangas
Neon (https://neon.tech)





make BuiltinTrancheNames less ugly

2024-01-23 Thread Alvaro Herrera
This array of tranche names is looking pretty ugly these days, and it'll
get worse as we add more members to it.  I propose to use C99 designated
initializers, like we've done for other arrays.  Patch attached.

The way I've coded in this patch, it means the array will now have 52
NULL pointers at the beginning.  I don't think this is a big deal and
makes the code prettier.  I see two alternatives:

1. Avoid all those NULLs by making each definition uglier (subtract
NUM_INDIVIDUAL_LWLOCKS from each array index) _and_ the usage of the
array by subtracting the same amount.  This saves 208 bytes at the
expense of making the code worse.

2. More invasively, rework generate-lwlocknames.pl so that both lwlocks
and these builtin tranche names appear in a single array.  (We could do
so by #include'ing lwlocknames.c at the top of the array).


Now, having written this proposal, I'm leaning towards idea 2 myself,
but since the patch here is less invasive, it seems worth having as
evidence.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"People get annoyed when you try to debug them."  (Larry Wall)
>From 037fc293b359fbe7dfffcdaf8d5816daa82c0ddd Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 23 Jan 2024 10:36:14 +0100
Subject: [PATCH] Use designated initializers for BuiltinTrancheNames

---
 src/backend/storage/lmgr/lwlock.c | 97 +++
 1 file changed, 33 insertions(+), 64 deletions(-)

diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index 2f2de5a562..98fa6035cc 100644
--- a/src/backend/storage/lmgr/lwlock.c
+++ b/src/backend/storage/lmgr/lwlock.c
@@ -132,72 +132,41 @@ StaticAssertDecl(LW_VAL_EXCLUSIVE > (uint32) MAX_BACKENDS,
 extern const char *const IndividualLWLockNames[];	/* in lwlocknames.c */
 
 static const char *const BuiltinTrancheNames[] = {
-	/* LWTRANCHE_XACT_BUFFER: */
-	"XactBuffer",
-	/* LWTRANCHE_COMMITTS_BUFFER: */
-	"CommitTsBuffer",
-	/* LWTRANCHE_SUBTRANS_BUFFER: */
-	"SubtransBuffer",
-	/* LWTRANCHE_MULTIXACTOFFSET_BUFFER: */
-	"MultiXactOffsetBuffer",
-	/* LWTRANCHE_MULTIXACTMEMBER_BUFFER: */
-	"MultiXactMemberBuffer",
-	/* LWTRANCHE_NOTIFY_BUFFER: */
-	"NotifyBuffer",
-	/* LWTRANCHE_SERIAL_BUFFER: */
-	"SerialBuffer",
-	/* LWTRANCHE_WAL_INSERT: */
-	"WALInsert",
-	/* LWTRANCHE_BUFFER_CONTENT: */
-	"BufferContent",
-	/* LWTRANCHE_REPLICATION_ORIGIN_STATE: */
-	"ReplicationOriginState",
-	/* LWTRANCHE_REPLICATION_SLOT_IO: */
-	"ReplicationSlotIO",
-	/* LWTRANCHE_LOCK_FASTPATH: */
-	"LockFastPath",
-	/* LWTRANCHE_BUFFER_MAPPING: */
-	"BufferMapping",
-	/* LWTRANCHE_LOCK_MANAGER: */
-	"LockManager",
-	/* LWTRANCHE_PREDICATE_LOCK_MANAGER: */
-	"PredicateLockManager",
-	/* LWTRANCHE_PARALLEL_HASH_JOIN: */
-	"ParallelHashJoin",
-	/* LWTRANCHE_PARALLEL_QUERY_DSA: */
-	"ParallelQueryDSA",
-	/* LWTRANCHE_PER_SESSION_DSA: */
-	"PerSessionDSA",
-	/* LWTRANCHE_PER_SESSION_RECORD_TYPE: */
-	"PerSessionRecordType",
-	/* LWTRANCHE_PER_SESSION_RECORD_TYPMOD: */
-	"PerSessionRecordTypmod",
-	/* LWTRANCHE_SHARED_TUPLESTORE: */
-	"SharedTupleStore",
-	/* LWTRANCHE_SHARED_TIDBITMAP: */
-	"SharedTidBitmap",
-	/* LWTRANCHE_PARALLEL_APPEND: */
-	"ParallelAppend",
-	/* LWTRANCHE_PER_XACT_PREDICATE_LIST: */
-	"PerXactPredicateList",
-	/* LWTRANCHE_PGSTATS_DSA: */
-	"PgStatsDSA",
-	/* LWTRANCHE_PGSTATS_HASH: */
-	"PgStatsHash",
-	/* LWTRANCHE_PGSTATS_DATA: */
-	"PgStatsData",
-	/* LWTRANCHE_LAUNCHER_DSA: */
-	"LogicalRepLauncherDSA",
-	/* LWTRANCHE_LAUNCHER_HASH: */
-	"LogicalRepLauncherHash",
-	/* LWTRANCHE_DSM_REGISTRY_DSA: */
-	"DSMRegistryDSA",
-	/* LWTRANCHE_DSM_REGISTRY_HASH: */
-	"DSMRegistryHash",
+	[LWTRANCHE_XACT_BUFFER] = "XactBuffer",
+	[LWTRANCHE_COMMITTS_BUFFER] = "CommitTsBuffer",
+	[LWTRANCHE_SUBTRANS_BUFFER] = "SubtransBuffer",
+	[LWTRANCHE_MULTIXACTOFFSET_BUFFER] = "MultiXactOffsetBuffer",
+	[LWTRANCHE_MULTIXACTMEMBER_BUFFER] = "MultiXactMemberBuffer",
+	[LWTRANCHE_NOTIFY_BUFFER] = "NotifyBuffer",
+	[LWTRANCHE_SERIAL_BUFFER] = "SerialBuffer",
+	[LWTRANCHE_WAL_INSERT] = "WALInsert",
+	[LWTRANCHE_BUFFER_CONTENT] = "BufferContent",
+	[LWTRANCHE_REPLICATION_ORIGIN_STATE] = "ReplicationOriginState",
+	[LWTRANCHE_REPLICATION_SLOT_IO] = "ReplicationSlotIO",
+	[LWTRANCHE_LOCK_FASTPATH] = "LockFastPath",
+	[LWTRANCHE_BUFFER_MAPPING] = "BufferMapping",
+	[LWTRANCHE_LOCK_MANAGER] = "LockManager",
+	[LWTRANCHE_PREDICATE_LOCK_MANAGER] = "PredicateLockManager",
+	[LWTRANCHE_PARALLEL_HASH_JOIN] = "ParallelHashJoin",
+	[LWTRANCHE_PARALLEL_QUERY_DSA] = "ParallelQueryDSA",
+	[LWTRANCHE_PER_SESSION_DSA] = "PerSessionDSA",
+	[LWTRANCHE_PER_SESSION_RECORD_TYPE] = "PerSessionRecordType",
+	[LWTRANCHE_PER_SESSION_RECORD_TYPMOD] = "PerSessionRecordTypmod",
+	[LWTRANCHE_SHARED_TUPLESTORE] = "SharedTupleStore",
+	[LWTRANCHE_SHARED_TIDBITMAP] = "SharedTidBitmap",
+	[LWTRANCHE_PARALLEL_APPEND] = "ParallelAppend",
+	[LWTRANCHE_PER_XACT_PREDICATE_LIST] = "PerXactPredicateList",
+	[LWTRANCHE_PGSTATS_DSA] = "PgStatsDSA",
+	[LWTR

Re: Sequence Access Methods, round two

2024-01-23 Thread Peter Eisentraut

On 18.01.24 16:54, Matthias van de Meent wrote:

At $prevjob we had a use case for PRNG to generate small,
non-sequential "random" numbers without the birthday problem occurring
in sqrt(option space) because that'd increase the printed length of
the numbers beyond a set limit. The sequence API proposed here
would've been a great alternative to the solution we found, as it
would allow a sequence to be backed by an Linear Congruential
Generator directly, rather than the implementation of our own
transactional random_sequence table.


This is an interesting use case.  I think what you'd need for that is 
just the specification of a different "nextval" function and some 
additional parameters (modulus, multiplier, and increment).


The proposed sequence AM patch would support a different nextval 
function, but does it support additional parameters?  I haven't found that.


Another use case I have wished for from time to time is creating 
sequences using different data types, for example uuids.  You'd just 
need to provide a data-type-specific "next" function.  However, in this 
patch, all the values and state are hardcoded to int64.


While distributed systems can certainly use global int64 identifiers, 
I'd expect that there would also be demand for uuids, so designing this 
more flexibly would be useful.


I think the proposed patch covers too broad a range of abstraction 
levels.  The use cases described above are very high level and are just 
concerned with how you get the next value.  The current internal 
sequence state would be stored in whatever way it is stored now.  But 
this patch also includes callbacks for very low-level-seeming concepts 
like table AMs and persistence.  Those seem like different things.  And 
the two levels should be combinable.  Maybe I want a local sequence of 
uuids or a global sequence of uuids, or a local sequence of integers or 
a global sequence of integers.  I mean, I haven't thought this through, 
but I get the feeling that there should be more than one level of API 
around this.






Re: [17] CREATE SUBSCRIPTION ... SERVER

2024-01-23 Thread Ashutosh Bapat
On Tue, Jan 23, 2024 at 12:33 AM Jeff Davis  wrote:
>
> On Mon, 2024-01-22 at 18:41 +0530, Ashutosh Bapat wrote:
> > 0002 adds a prefix "regress_" to almost every object that is created
> > in foreign_data.sql.
>
> psql \dew outputs the owner, which in the case of a built-in FDW is the
> bootstrap superuser, which is not a stable name. I used the prefix to
> exclude the built-in FDW -- if you have a better suggestion, please let
> me know. (Though reading below, we might not even want a built-in FDW.)

I am with the prefix. The changes it causes make review difficult. If
you can separate those changes into a patch that will help.

>
> > Dummy FDW makes me nervous. The way it's written, it may grow into a
> > full-fledged postgres_fdw and in the process might acquire the same
> > concerns that postgres_fdw has today. But I will study the patches
> > and
> > discussion around it more carefully.
>
> I introduced that based on this comment[1].
>
> I also thought it fit with your previous suggestion to make it work
> with postgres_fdw, but I suppose it's not required. We could just not
> offer the built-in FDW, and expect users to either use postgres_fdw or
> create their own dummy FDW.

I am fine with this.

-- 
Best Wishes,
Ashutosh Bapat




Re: Network failure may prevent promotion

2024-01-23 Thread Heikki Linnakangas

On 23/01/2024 10:24, Kyotaro Horiguchi wrote:

Thank you for looking this!

At Tue, 23 Jan 2024 15:07:10 +0900, Fujii Masao  wrote in

Regarding the patch, here are the review comments.

+/*
+ * Is current process a wal receiver?
+ */
+bool
+IsWalReceiver(void)
+{
+ return WalRcv != NULL;
+}

This looks wrong because WalRcv can be non-NULL in processes other
than walreceiver.


Mmm. Sorry for the silly mistake. We can use B_WAL_RECEIVER
instead. I'm not sure if the new function IsWalReceiver() is
required. The expression "MyBackendType == B_WAL_RECEIVER" is quite
descriptive. However, the function does make ProcessInterrupts() more
aligned regarding process types.


There's an existing AmWalReceiverProcess() macro too. Let's use that.

(See also 
https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0%40iki.fi 
for refactoring in this area)


Here's a patch set summarizing the changes so far. They should be 
squashed, but I kept them separate for now to help with review:


1. revert the revert of 728f86fec6.
2. your walrcv_shutdown_deblocking_v2-2.patch
3. Also replace libpqrcv_PQexec() and libpqrcv_PQgetResult() with the 
wrappers from libpq-be-fe-helpers.h

4. Replace IsWalReceiver() with AmWalReceiverProcess()


- pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
+ pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */

Can't we just use die(), instead?


There was a comment explaining the problems associated with exiting
within a signal handler;

- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the

And I think we should keep the considerations it suggests. The patch
removes the comment itself, but it does so because it implements our
standard process exit procedure, which incorporates points suggested
by the now-removed comment.


die() doesn't call exit(1). Unless DoingCommandRead is set, but it never 
is in the walreceiver. It looks just like the new 
WalRcvShutdownSignalHandler() function. Am I missing something?


Hmm, but doesn't bgworker_die() have that problem with exit(1)ing in the 
signal handler?


I also wonder if we should replace SignalHandlerForShutdownRequest() 
completely with die(), in all processes? The difference is that 
SignalHandlerForShutdownRequest() uses ShutdownRequestPending, while 
die() uses ProcDiePending && InterruptPending to indicate that the 
signal was received. Or do some of the processes want to check for 
ShutdownRequestPending only at specific places, and don't want to get 
terminated at the any random CHECK_FOR_INTERRUPTS()?


--
Heikki Linnakangas
Neon (https://neon.tech)
From 27b9f8283b2caa7a4243fe57a8d14a127396e80f Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 23 Jan 2024 11:01:03 +0200
Subject: [PATCH v3 1/4] Revert "Revert "libpqwalreceiver: Convert to
 libpq-be-fe-helpers.h""

This reverts commit 21ef4d4d897563adb2f7920ad53b734950f1e0a4.
---
 .../libpqwalreceiver/libpqwalreceiver.c   | 55 +++
 1 file changed, 8 insertions(+), 47 deletions(-)

diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 77669074e82..201c36cb220 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -24,6 +24,7 @@
 #include "common/connect.h"
 #include "funcapi.h"
 #include "libpq-fe.h"
+#include "libpq/libpq-be-fe-helpers.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
 #include "pgstat.h"
@@ -132,7 +133,6 @@ libpqrcv_connect(const char *conninfo, bool logical, bool must_use_password,
  const char *appname, char **err)
 {
 	WalReceiverConn *conn;
-	PostgresPollingStatusType status;
 	const char *keys[6];
 	const char *vals[6];
 	int			i = 0;
@@ -188,56 +188,17 @@ libpqrcv_connect(const char *conninfo, bool logical, bool must_use_password,
 	Assert(i < sizeof(keys));
 
 	conn = palloc0(sizeof(WalReceiverConn));
-	conn->streamConn = PQconnectStartParams(keys, vals,
-			 /* expand_dbname = */ true);
-	if (PQstatus(conn->streamConn) == CONNECTION_BAD)
-		goto bad_connection_errmsg;
-
-	/*
-	 * Poll connection until we have OK or FAILED status.
-	 *
-	 * Per spec for PQconnectPoll, first wait till socket is write-ready.
-	 */
-	status = PGRES_POLLING_WRITING;
-	do
-	{
-		int			io_flag;
-		int			rc;
-
-		if (status == PGRES_POLLING_READING)
-			io_flag = WL_SOCKET_READABLE;
-#ifdef WIN32
-		/* Windows needs a different test while waiting for connection-made */
-		else if (PQstatus(conn->streamConn) == CONNECTION_STARTED)
-			io_flag = WL_SOCKET_CONNECTED;
-#endif
-		else
-			io_flag = WL_SOCKET_WRITEABLE;
-
-		rc = WaitLatchOrSocket(MyLatch,
-			   WL_EXIT_ON_PM_DEATH | WL_LATCH_SET | io_flag,
-			   PQsocket(conn->st

Re: make dist using git archive

2024-01-23 Thread Peter Eisentraut

On 22.01.24 21:04, Tristan Partin wrote:
I am not really following why we can't use the builtin Meson dist 
command. The only difference from my testing is it doesn't use a 
--prefix argument.


Here are some problems I have identified:

1. meson dist internally runs gzip without the -n option.  That makes 
the tar.gz archive include a timestamp, which in turn makes it not 
reproducible.


2. Because gzip includes a platform indicator in the archive, the 
produced tar.gz archive is not reproducible across platforms.  (I don't 
know if gzip has an option to avoid that.  git archive uses an internal 
gzip implementation that handles this.)


3. Meson does not support tar.bz2 archives.

4. Meson uses git archive internally, but then unpacks and repacks the 
archive, which loses the ability to use git get-tar-commit-id.


5. I have found that the tar archives created by meson and git archive 
include the files in different orders.  I suspect that the Python 
tarfile module introduces some either randomness or platform dependency.


6. meson dist is also slower because of the additional work.

7. meson dist produces .sha256sum files but we have called them .sha256. 
 (This is obviously trivial, but it is something that would need to be 
dealt with somehow nonetheless.)


Most or all of these issues are fixable, either upstream in Meson or by 
adjusting our own requirements.  But for now this route would have some 
significant disadvantages.






Re: Synchronizing slots from primary to standby

2024-01-23 Thread Ajin Cherian
On Mon, Jan 22, 2024 at 10:30 PM shveta malik 
wrote:
>
> On Mon, Jan 22, 2024 at 3:10 PM Amit Kapila 
wrote:
> >
> > minor comments on the patch:
> > ===
>
> PFA v65 addressing the comments.
>
> Addressed comments by Peter in [1], comments by Hou-San in [2],
> comments by Amit in [3] and [4]
>
> TODO:
> Analyze the issue reported by Swada-san in [5] (pt 2)
> Disallow subscription creation on standby with failover=true (as we do
> not support sync on cascading standbys)
>
> [1]:
https://www.postgresql.org/message-id/CAHut%2BPt5Pk_xJkb54oahR%2Bf9oawgfnmbpewvkZPgnRhoJ3gkYg%40mail.gmail.com
> [2]:
https://www.postgresql.org/message-id/OS0PR01MB57160C7184E17C6765AAE38294752%40OS0PR01MB5716.jpnprd01.prod.outlook.com
> [3]:
https://www.postgresql.org/message-id/CAA4eK1JPB-zpGYTbVOP5Qp26tNQPMjDuYzNZ%2Ba9RFiN5nE1tEA%40mail.gmail.com
> [4]:
https://www.postgresql.org/message-id/CAA4eK1Jhy1-bsu6vc0%3DNja7aw5-EK_%3D101pnnuM3ATqTA8%2B%3DSg%40mail.gmail.com
> [5]:
https://www.postgresql.org/message-id/CAD21AoBgzONdt3o5mzbQ4MtqAE%3DWseiXUOq0LMqne-nWGjZBsA%40mail.gmail.com
>
>
I was doing some testing on this. What I noticed is that creating
subscriptions with failover enabled is taking a lot longer compared with a
subscription with failover disabled. The setup has primary configured with
standby_slot_names and that standby is enabled with enable_synclot turned
on.

Publisher has one publication, no tables.
subscriber:
postgres=# \timing
Timing is on.
postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres
host=localhost port=6972' PUBLICATION pub with (failover = true);
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION
Time: 10011.829 ms (00:10.012)

== drop the sub

postgres=# CREATE SUBSCRIPTION sub CONNECTION 'dbname=postgres
host=localhost port=6972' PUBLICATION pub with (failover = false);
NOTICE:  created replication slot "sub" on publisher
CREATE SUBSCRIPTION
Time: 46.317 ms

With failover=true, it takes 10011 ms while failover=false takes 46 ms.

I don't see a similar delay when creating slot on the primary with
pg_create_logical_replication_slot() with failover flag enabled.

Then on primary:
postgres=# SELECT 'init' FROM
pg_create_logical_replication_slot('lsub2_slot', 'pgoutput', false, false,
true);
?column?
--
init
(1 row)

Time: 36.125 ms
postgres=# SELECT 'init' FROM
pg_create_logical_replication_slot('lsub1_slot', 'pgoutput', false, false,
false);
?column?
--
init
(1 row)

Time: 53.981 ms

regards,
Ajin Cherian
Fujitsu Australia


Re: Reordering DISTINCT keys to match input path's pathkeys

2024-01-23 Thread David Rowley
On Tue, 23 Jan 2024 at 18:56, Richard Guo  wrote:
>
> Similar to what we did to GROUP BY keys in 0452b461bc, I think we can do
> the same to DISTINCT keys, i.e. reordering DISTINCT keys to match input
> path's pathkeys, which can help reduce cost by avoiding unnecessary
> re-sort, or allowing us to use incremental-sort to save efforts.  For
> instance,

I've not caught up on the specifics of 0452b461b, but I just wanted to
highlight that there was some work done in [1] in this area.  It seems
Ankit didn't ever add that to a CF, so that might explain why it's
been lost.

Anyway, just pointing it out as there may be useful code or discussion
in the corresponding threads.

David

[1] https://postgr.es/m/da9425ae-8ff7-33d9-23b3-2a3eb605e106%40gmail.com




Re: Shared detoast Datum proposal

2024-01-23 Thread Michael Zhilin

Hi Andy,

It looks like v5 is missing in your mail. Could you please check and 
resend it?


Thanks,
 Michael.

On 1/23/24 08:44, Andy Fan wrote:

Hi,

Peter Smith  writes:


2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1]https://commitfest.postgresql.org/46/4759/
[2]https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/4759

v5 attached, it should fix the above issue.  This version also introduce
a data struct called bitset, which has a similar APIs like bitmapset but
have the ability to reset all bits without recycle its allocated memory,
this is important for this feature.

commit 44754fb03accb0dec9710a962a334ee73eba3c49 (HEAD -> 
shared_detoast_value_v2)
Author: yizhi.fzh
Date:   Tue Jan 23 13:38:34 2024 +0800

 shared detoast feature.

commit 14a6eafef9ff4926b8b877d694de476657deee8a
Author: yizhi.fzh
Date:   Mon Jan 22 15:48:33 2024 +0800

 Introduce a Bitset data struct.
 
 While Bitmapset is designed for variable-length of bits, Bitset is

 designed for fixed-length of bits, the fixed length must be specified at
 the bitset_init stage and keep unchanged at the whole lifespan. Because
 of this, some operations on Bitset is simpler than Bitmapset.
 
 The bitset_clear unsets all the bits but kept the allocated memory, this

 capacity is impossible for bit Bitmapset for some solid reasons and this
 is the main reason to add this data struct.
 
 Also for performance aspect, the functions for Bitset removed some

 unlikely checks, instead with some Asserts.
 
 [1]https://postgr.es/m/CAApHDvpdp9LyAoMXvS7iCX-t3VonQM3fTWCmhconEvORrQ%2BZYA%40mail.gmail.com

 [2]https://postgr.es/m/875xzqxbv5.fsf%40163.com


I didn't write a good commit message for commit 2, the people who is
interested with this can see the first message in this thread for
explaination.

I think anyone whose customer uses lots of jsonb probably can get
benefits from this. the precondition is the toast value should be
accessed 1+ times, including the jsonb_out function. I think this would
be not rare to happen.



--
Michael Zhilin
Postgres Professional
+7(925)3366270
https://www.postgrespro.ru


Re: Network failure may prevent promotion

2024-01-23 Thread Heikki Linnakangas

On 23/01/2024 06:23, Kyotaro Horiguchi wrote:

At Mon, 22 Jan 2024 13:29:10 -0800, Andres Freund  wrote in

Hi,

On 2024-01-19 12:28:05 +0900, Michael Paquier wrote:

On Thu, Jan 18, 2024 at 03:42:28PM +0200, Heikki Linnakangas wrote:

Given that commit 728f86fec6 that introduced this issue was not strictly
required, perhaps we should just revert it for v16.


Is there a point in keeping 728f86fec6 as well on HEAD?  That does not
strike me as wise to keep that in the tree for now.  If it needs to be
reworked, looking at this problem from scratch would be a safer
approach.


IDK, I think we'll introduce this type of bug over and over if we don't fix it
properly.


Just to clarify my position, I thought that 728f86fec6 was heading the
right direction. Considering the current approach to signal handling
in walreceiver, I believed that it would be better to further
generalize in this direction rather than reverting. That's why I
proposed that patch.


I reverted commit 728f86fec6 from REL_16_STABLE and master.

I agree it was the right direction, so let's develop a complete patch, 
and re-apply it to master when we have the patch ready.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: remaining sql/json patches

2024-01-23 Thread jian he
On Mon, Jan 22, 2024 at 11:46 PM jian he  wrote:
>
> On Mon, Jan 22, 2024 at 10:28 PM Amit Langote  wrote:
> >
> > > based on v35.
> > > Now I only applied from 0001 to 0007.
> > > For {DEFAULT expression  ON EMPTY}  | {DEFAULT expression ON ERROR}
> > > restrict DEFAULT expression be either Const node or FuncExpr node.
> > > so these 3 SQL/JSON functions can be used in the btree expression index.
> >
> > I'm not really excited about adding these restrictions into the
> > transformJsonFuncExpr() path.  Index or any other code that wants to
> > put restrictions already have those in place, no need to add them
> > here.  Moreover, by adding these restrictions, we might end up
> > preventing users from doing useful things with this like specify
> > column references.  If there are semantic issues with allowing that,
> > we should discuss them.
> >
>
> after applying v36.
> The following index creation and query operation works. I am not 100%
> sure about these cases.
> just want confirmation, sorry for bothering you
>
> drop table t;
> create table t(a jsonb, b  int);
> insert into t select '{"hello":11}',1;
> insert into t select '{"hello":12}',2;
> CREATE INDEX t_idx2 ON t (JSON_query(a, '$.hello1' RETURNING int
> default b + random() on error));
> CREATE INDEX t_idx3 ON t (JSON_query(a, '$.hello1' RETURNING int
> default random()::int on error));
> SELECT JSON_query(a, '$.hello1'  RETURNING int default ret_setint() on
> error) from t;

I forgot to attach ret_setint defition.

create or replace function ret_setint() returns setof integer as
$$
begin
-- perform pg_sleep(0.1);
return query execute 'select 1 union all select 1';
end;
$$
language plpgsql IMMUTABLE;

-
In the function transformJsonExprCommon, we have
`JsonExpr   *jsexpr = makeNode(JsonExpr);`
then the following 2 assignments are not necessary.

/* Both set in the caller. */
jsexpr->result_coercion = NULL;
jsexpr->omit_quotes = false;

So I removed it.

JSON_VALUE OMIT QUOTES by default, so I set it accordingly.
I also changed coerceJsonFuncExprOutput accordingly


v1-0001-minor-refactor-transformJsonFuncExpr.based_on_v36
Description: Binary data


Re: Network failure may prevent promotion

2024-01-23 Thread Kyotaro Horiguchi
Thank you for looking this!

At Tue, 23 Jan 2024 15:07:10 +0900, Fujii Masao  wrote 
in 
> Regarding the patch, here are the review comments.
> 
> +/*
> + * Is current process a wal receiver?
> + */
> +bool
> +IsWalReceiver(void)
> +{
> + return WalRcv != NULL;
> +}
> 
> This looks wrong because WalRcv can be non-NULL in processes other
> than walreceiver.

Mmm. Sorry for the silly mistake. We can use B_WAL_RECEIVER
instead. I'm not sure if the new function IsWalReceiver() is
required. The expression "MyBackendType == B_WAL_RECEIVER" is quite
descriptive. However, the function does make ProcessInterrupts() more
aligned regarding process types.

> - pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
> + pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */
> 
> Can't we just use die(), instead?

There was a comment explaining the problems associated with exiting
within a signal handler;

- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the

And I think we should keep the considerations it suggests. The patch
removes the comment itself, but it does so because it implements our
standard process exit procedure, which incorporates points suggested
by the now-removed comment.

--
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
index 201c36cb22..db779dc6ca 100644
--- a/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
+++ b/src/backend/replication/libpqwalreceiver/libpqwalreceiver.c
@@ -749,7 +749,7 @@ libpqrcv_PQgetResult(PGconn *streamConn)
 		if (rc & WL_LATCH_SET)
 		{
 			ResetLatch(MyLatch);
-			ProcessWalRcvInterrupts();
+			CHECK_FOR_INTERRUPTS();
 		}
 
 		/* Consume whatever data is available from the socket */
@@ -1053,7 +1053,7 @@ libpqrcv_processTuples(PGresult *pgres, WalRcvExecResult *walres,
 	{
 		char	   *cstrs[MaxTupleAttributeNumber];
 
-		ProcessWalRcvInterrupts();
+		CHECK_FOR_INTERRUPTS();
 
 		/* Do the allocations in temporary context. */
 		oldcontext = MemoryContextSwitchTo(rowcontext);
diff --git a/src/backend/replication/walreceiver.c b/src/backend/replication/walreceiver.c
index 728059518e..e491f7d4c5 100644
--- a/src/backend/replication/walreceiver.c
+++ b/src/backend/replication/walreceiver.c
@@ -147,39 +147,34 @@ static void XLogWalRcvSendReply(bool force, bool requestReply);
 static void XLogWalRcvSendHSFeedback(bool immed);
 static void ProcessWalSndrMessage(XLogRecPtr walEnd, TimestampTz sendTime);
 static void WalRcvComputeNextWakeup(WalRcvWakeupReason reason, TimestampTz now);
+static void WalRcvShutdownSignalHandler(SIGNAL_ARGS);
 
-/*
- * Process any interrupts the walreceiver process may have received.
- * This should be called any time the process's latch has become set.
- *
- * Currently, only SIGTERM is of interest.  We can't just exit(1) within the
- * SIGTERM signal handler, because the signal might arrive in the middle of
- * some critical operation, like while we're holding a spinlock.  Instead, the
- * signal handler sets a flag variable as well as setting the process's latch.
- * We must check the flag (by calling ProcessWalRcvInterrupts) anytime the
- * latch has become set.  Operations that could block for a long time, such as
- * reading from a remote server, must pay attention to the latch too; see
- * libpqrcv_PQgetResult for example.
- */
 void
-ProcessWalRcvInterrupts(void)
+WalRcvShutdownSignalHandler(SIGNAL_ARGS)
 {
-	/*
-	 * Although walreceiver interrupt handling doesn't use the same scheme as
-	 * regular backends, call CHECK_FOR_INTERRUPTS() to make sure we receive
-	 * any incoming signals on Win32, and also to make sure we process any
-	 * barrier events.
-	 */
-	CHECK_FOR_INTERRUPTS();
+	int			save_errno = errno;
 
-	if (ShutdownRequestPending)
+	/* Don't joggle the elbow of proc_exit */
+	if (!proc_exit_inprogress)
 	{
-		ereport(FATAL,
-(errcode(ERRCODE_ADMIN_SHUTDOWN),
- errmsg("terminating walreceiver process due to administrator command")));
+		InterruptPending = true;
+		ProcDiePending = true;
 	}
+
+	SetLatch(MyLatch);
+
+	errno = save_errno;
+	
 }
 
+/*
+ * Is current process a wal receiver?
+ */
+bool
+IsWalReceiver(void)
+{
+	return MyBackendType == B_WAL_RECEIVER;
+}
 
 /* Main entry point for walreceiver process */
 void
@@ -277,7 +272,7 @@ WalReceiverMain(void)
 	pqsignal(SIGHUP, SignalHandlerForConfigReload); /* set flag to read config
 	 * file */
 	pqsignal(SIGINT, SIG_IGN);
-	pqsignal(SIGTERM, SignalHandlerForShutdownRequest); /* request shutdown */
+	pqsignal(SIGTERM, WalRcvShutdownSignalHandler); /* request shutdown */
 	/* SIGQUIT handler was already set up by InitPostmasterChild */
 	pqsignal(SIGALRM, SIG_IGN);
 	pqsignal(SIGPIPE, SIG_IGN);
@@ -

Re: pg_rewind WAL segments deletion pitfall

2024-01-23 Thread Alexander Kukushkin
Hi Peter,

On Mon, 22 Jan 2024 at 00:38, Peter Smith  wrote:

> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Ready for Committer", but it is
> currently failing some CFbot tests [1]. Please have a look and post an
> updated version..
>
> ==
> [1]
> https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/46/3874
>
>
>From what I can see all failures are not related to this patch:
1. Windows build failed with
[10:52:49.679] 126/281 postgresql:recovery / recovery/019_replslot_limit
ERROR 185.84s (exit status 255 or signal 127 SIGinvalid)
2. FreeBSD build failed with
[09:11:57.656] 190/285 postgresql:psql / psql/010_tab_completion ERROR
0.46s exit status 2
[09:11:57.656] 220/285 postgresql:authentication /
authentication/001_password ERROR 0.57s exit status 2

In fact, I don't even see this patch being applied for these builds and the
introduced TAP test being executed.

Regards,
--
Alexander Kukushkin


Re: Support "Right Semi Join" plan shapes

2024-01-23 Thread wenhui qiu
Hi  vignesh C
   Many thanks, I have marked it to  "ready for committer"

Best wish

vignesh C  于2024年1月23日周二 10:56写道:

> On Mon, 22 Jan 2024 at 11:27, wenhui qiu  wrote:
> >
> > Hi  vignesh CI saw this path has been passed (
> https://cirrus-ci.com/build/6109321080078336),can we push it?
>
> If you have found no comments from your review and testing, let's mark
> it as "ready for committer".
>
> Regards,
> Vignesh
>


Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed

2024-01-23 Thread Bertrand Drouvot
Hi,

On Tue, Jan 23, 2024 at 02:50:06PM +0900, Michael Paquier wrote:
> On Mon, Jan 22, 2024 at 09:07:45AM +, Bertrand Drouvot wrote:
> 
> I've rewritten some of that, and applied the patch down to v16.

Thanks!

> > Yeah, I can clearly see how this patch helps from a theoritical perspective 
> > but
> > rely on Alexander's testing to see how it actually stabilizes the test.
> 
> Anyway, that's not the end of it.  What should we do for snapshot
> snapshot records coming from the bgwriter?

I've mixed feeling about it. On one hand we'll decrease even more the risk of
seeing a xl_running_xacts in the middle of a test, but on the other hand I agree
with Tom's concern [1]: I think that we could miss "corner cases/bug" detection
(like the one reported in [2]).

What about?

1) Apply "wait_until_vacuum_can_remove() + autovacuum disabled" where it makes
sense and for tests that suffers from random xl_running_xacts. I can look at
031_recovery_conflict.pl, do you have others in mind?
2) fix [2]
3) depending on how stabilized this test (and others that suffer from "random"
xl_running_xacts) is, then think about the bgwriter.

[1]: https://www.postgresql.org/message-id/1375923.1705291719%40sss.pgh.pa.us
[2]: 
https://www.postgresql.org/message-id/flat/zatjw2xh+tquc...@ip-10-97-1-34.eu-west-3.compute.internal

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Reordering DISTINCT keys to match input path's pathkeys

2024-01-23 Thread Laurenz Albe
On Tue, 2024-01-23 at 13:55 +0800, Richard Guo wrote:
> Similar to what we did to GROUP BY keys in 0452b461bc, I think we can do
> the same to DISTINCT keys, i.e. reordering DISTINCT keys to match input
> path's pathkeys, which can help reduce cost by avoiding unnecessary
> re-sort, or allowing us to use incremental-sort to save efforts.
> 
> Attached is a patch for this optimization.  Any thoughts?

I didn't scrutinize the code, but that sounds like a fine optimization.

Yours,
Laurenz Albe