Re: Synchronizing slots from primary to standby

2024-05-22 Thread Peter Smith
Here are some review comments for the docs patch v3-0001.

==
Commit message

1.
This patch adds detailed documentation for the slot sync feature
including examples to guide users on how to verify that all slots have
been successfully synchronized to the standby server and how to
confirm whether the subscription can continue subscribing to
publications on the promoted standby server.

~

This may be easier to read if you put it in bullet form like below:

SUGGESTION

It includes examples to guide the user:

* How to verify that all slots have been successfully synchronized to
the standby server

* How to confirm whether the subscription can continue subscribing to
publications on the promoted standby server

==
doc/src/sgml/high-availability.sgml

2.
+   
+If you have opted for synchronization of logical slots (see
+),
+then before switching to the standby server, it is recommended to check
+if the logical slots synchronized on the standby server are ready
+for failover. This can be done by following the steps described in
+.
+   
+

Maybe it is better to call this feature "logical replication slot
synchronization" to be more consistent with the title of section
47.2.3.

SUGGESTION
If you have opted for logical replication slot synchronization (see ...

==
doc/src/sgml/logical-replication.sgml

3.
+  
+   When the publisher server is the primary server of a streaming replication,
+   the logical slots on that primary server can be synchronized to the standby
+   server by specifying failover = true when creating
+   subscriptions for those publications. Enabling failover ensures a seamless
+   transition of those subscriptions after the standby is promoted. They can
+   continue subscribing to publications now on the new primary server without
+   losing any data that has been flushed to the new primary server.
+  
+

3a.
BEFORE
When the publisher server is the primary server of...

SUGGESTION
When publications are defined on the primary server of...

~

3b.
Enabling failover...

Maybe say "Enabling the failover parameter..." and IMO there should
also be a link to the CREATE SUBSCRIPTION failover parameter so the
user can easily navigate there to read more about it.

~

3c.
BEFORE
They can continue subscribing to publications now on the new primary
server without losing any data that has been flushed to the new
primary server.

SUGGESTION (removes some extra info I did not think was needed)
They can continue subscribing to publications now on the new primary
server without any loss of data.

~~~

4.
+  
+   Because the slot synchronization logic copies asynchronously, it is
+   necessary to confirm that replication slots have been synced to the standby
+   server before the failover happens. Furthermore, to ensure a successful
+   failover, the standby server must not be lagging behind the subscriber. It
+   is highly recommended to use
+   standby_slot_names
+   to prevent the subscriber from consuming changes faster than the
hot standby.
+   To confirm that the standby server is indeed ready for failover, follow
+   these 2 steps:
+  

IMO the last sentence "To confirm..." should be a new paragraph.

~~~

5.
+  
+   Firstly, on the subscriber node, use the following SQL to identify
+   which slots should be synced to the standby that we plan to promote.

AND

+  
+   Next, check that the logical replication slots identified above exist on
+   the standby server and are ready for failover.

~~

5a.
I don't think you need to say "Firstly," and "Next," because the order
to do these steps is already self-evident.

~

5b.
Patch says "on the subscriber node", but isn't that the simplest case?
e.g. maybe there are multiple nodes having subscriptions for these
publications. Maybe the sentence needs to account for case of
subscribers on >1 nodes.

Is there no way to discover this information by querying the publisher?

~~~

6.
+
+test_sub=# SELECT
+   array_agg(slotname) AS slots
+   FROM

...

+
+test_standby=# SELECT slot_name, (synced AND NOT temporary AND NOT
conflicting) AS failover_ready
+   FROM pg_replication_slots
+   WHERE slot_name IN ('sub1','sub2','sub3');


The example SQL for "1a" refers to 'slotname', but the example SQL for
"1b" refers to "slot_name" (i.e. with underscore). It might be better
if those are consistently called 'slot_name'.

~~~

7.
+   
+
+ Confirm that the standby server is not lagging behind the subscribers.
+ This step can be skipped if
+ standby_slot_names
+ has been correctly configured. If standby_slot_names is not configured
+ correctly, it is highly recommended to run this step after the primary
+ server is down, otherwise the results of the query may vary at different
+ points of time due to the ongoing replication on the logical subscribers
+ from the primary server.
+

7a.
I felt that the step should just say "Confirm that the 

RE: Pgoutput not capturing the generated columns

2024-05-22 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

Thanks for updating the patch! I checked your patches briefly. Here are my 
comments.

01. API

Since the option for test_decoding is enabled by default, I think it should be 
renamed.
E.g., "skip-generated-columns" or something.

02. ddl.sql

```
+-- check include-generated-columns option with generated column
+CREATE TABLE gencoltable (a int PRIMARY KEY, b int GENERATED ALWAYS AS (a * 2) 
STORED);
+INSERT INTO gencoltable (a) VALUES (1), (2), (3);
+SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL, NULL, 
'include-xids', '0', 'skip-empty-xacts', '1', 'include-generated-columns', '1');
+data 
+-
+ BEGIN
+ table public.gencoltable: INSERT: a[integer]:1 b[integer]:2
+ table public.gencoltable: INSERT: a[integer]:2 b[integer]:4
+ table public.gencoltable: INSERT: a[integer]:3 b[integer]:6
+ COMMIT
+(5 rows)
```

We should test non-default case, which the generated columns are not generated.

03. ddl.sql

Not sure new tests are in the correct place. Do we have to add new file and 
move tests to it?
Thought?

04. protocol.sgml

Please keep the format of the sgml file.

05. protocol.sgml

The option is implemented as the streaming option of pgoutput plugin, so they 
should be
located under "Logical Streaming Replication Parameters" section.

05. AlterSubscription

```
+   if (IsSet(opts.specified_opts, 
SUBOPT_GENERATED_COLUMN))
+   {
+   ereport(ERROR,
+   
(errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+errmsg("toggling 
generated_column option is not allowed.")));
+   }
```

If you don't want to support the option, you can remove SUBOPT_GENERATED_COLUMN
macro from the function. But can you clarify the reason why you do not want?

07. logicalrep_write_tuple

```
-   if (!column_in_column_list(att->attnum, columns))
+   if (!column_in_column_list(att->attnum, columns) && 
!att->attgenerated)
+   continue;
+
+   if (att->attgenerated && !publish_generated_column)
continue;
```

I think changes in v2 was reverted or wrongly merged.

08. test code

Can you add tests that generated columns are replicated by the logical 
replication?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
https://www.fujitsu.com/ 



Re: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-22 Thread Michael Paquier
On Wed, May 22, 2024 at 02:24:37PM +, Ilyasov Ian wrote:
> I corrected my patch according to what I think
> Michael wanted. I attached the new patch to the letter.

Thanks for compiling this patch.  Yes, that's the idea.

- qr/processing remote data for replication origin \"pg_\d+\" during 
message type "INSERT" for replication target relation "public.tbl" in 
transaction \d+, finished at ([[:xdigit:]]+\/[[:xdigit:]]+)/
+ qr/ERROR:  duplicate key.*\n.*DETAIL:.*\n.*CONTEXT:.* finished at 
([[:xdigit:]]+\/[[:xdigit:]]+)/m

There are three CONTEXT strings that could map to this context.  It
seems to me that we should keep the 'for replication target relation
"public.tbl" in transaction \d+,', before the "finished at" so as it
is possible to make a difference with the context that has a column
name and the context where there is no target relation.  That makes
the regexp longer, but it is not that bad.
--
Michael


signature.asc
Description: PGP signature


Re: [PATCH] LockAcquireExtended improvement

2024-05-22 Thread Will Mortensen
Thanks! :-)




Re: [PATCH] LockAcquireExtended improvement

2024-05-22 Thread Michael Paquier
On Sat, May 18, 2024 at 05:10:38PM +0800, Jingxian Li wrote:
> Nice catch! The patch looks good to me.

And fixed that as well.
--
Michael


signature.asc
Description: PGP signature


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Bertrand Drouvot
Hi,

On Wed, May 22, 2024 at 07:01:54PM -0400, Jonathan S. Katz wrote:
> Thanks. As such I made it:
> 
> "which provides descriptions about wait events and can be combined with
> `pg_stat_activity` to give more insight into why an active session is
> waiting."
> 

Thanks! Works for me.

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Avoid orphaned objects dependencies, take 3

2024-05-22 Thread Bertrand Drouvot
Hi,

On Wed, May 22, 2024 at 10:48:12AM -0400, Robert Haas wrote:
> On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
>  wrote:
> > I started initially with [1] but it was focusing on function-schema only.
> 
> Yeah, that's what I thought we would want to do. And then just extend
> that to the other cases.
> 
> > Then I proposed [2] making use of a dirty snapshot when recording the 
> > dependency.
> > But this approach appeared to be "scary" and it was still failing to close
> > some race conditions.
> 
> The current patch still seems to be using dirty snapshots for some
> reason, which struck me as a bit odd. My intuition is that if we're
> relying on dirty snapshots to solve problems, we likely haven't solved
> the problems correctly, which seems consistent with your statement
> about "failing to close some race conditions". But I don't think I
> understand the situation well enough to be sure just yet.

The reason why we are using a dirty snapshot here is for the cases where we are
recording a dependency on a referenced object that we are creating at the same
time behind the scene (for example, creating a composite type while creating
a relation). Without the dirty snapshot, then the object we are creating behind
the scene (the composite type) would not be visible and we would wrongly assume
that it has been dropped.

Note that the usage of the dirty snapshot is only when the object is first
reported as "non existing" by the new ObjectByIdExist() function.

> > I think there is 2 cases here:
> >
> > First case: the "conflicting" lock mode is for one of our own lock then 
> > LockCheckConflicts()
> > would report this as a NON conflict.
> >
> > Second case: the "conflicting" lock mode is NOT for one of our own lock 
> > then LockCheckConflicts()
> > would report a conflict. But I've the feeling that the existing code would
> > already lock those sessions.
> >
> > Was your concern about "First case" or "Second case" or both?
> 
> The second case, I guess. It's bad to take a weaker lock first and
> then a stronger lock on the same object later, because it can lead to
> deadlocks that would have been avoided if the stronger lock had been
> taken at the outset.

In the example I shared up-thread that would be the opposite: the Session 1 
would
take an AccessExclusiveLock lock on the object before taking an AccessShareLock
during changeDependencyFor().

> Here, it seems like things would happen in the
> other order: if we took two locks, we'd probably take the stronger
> lock in the higher-level code and then the weaker lock in the
> dependency code.

Yeah, I agree.

> That shouldn't break anything; it's just a bit
> inefficient.

Yeah, the second lock is useless in that case (like in the example up-thread).

> My concern was really more about the maintainability of
> the code. I fear that if we add code that takes heavyweight locks in
> surprising places, we might later find the behavior difficult to
> reason about.
>

I think I understand your concern about code maintainability but I'm not sure
that adding locks while recording a dependency is that surprising.

> Tom, what is your thought about that concern?

+1

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Shubham Khanna
> Dear Shubham,
>
> Thanks for creating a patch! Here are high-level comments.

> 1.
> Please document the feature. If it is hard to describe, we should change the 
> API.

I have added the feature in the document.

> 4.
> Regarding the test_decoding plugin, it has already been able to decode the
> generated columns. So... as the first place, is the proposed option really 
> needed
> for the plugin? Why do you include it?
> If you anyway want to add the option, the default value should be on - which 
> keeps
> current behavior.

I have made the generated column options as true for test_decoding
plugin so by default we will send generated column data.

> 5.
> Assuming that the feature become usable used for logical replicaiton. Not 
> sure,
> should we change the protocol version at that time? Nodes prior than PG17 may
> not want receive values for generated columns. Can we control only by the 
> option?

I verified the backward compatibility test by using the generated
column option and it worked fine. I think there is no need to make any
further changes.

> 7.
>
> Some functions refer data->publish_generated_column many times. Can we store
> the value to a variable?
>
> Below comments are for test_decoding part, but they may be not needed.
>
> =
>
> a. pg_decode_startup()
>
> ```
> +else if (strcmp(elem->defname, "include_generated_columns") == 0)
> ```
>
> Other options for test_decoding do not have underscore. It should be
> "include-generated-columns".
>
> b. pg_decode_change()
>
> data->include_generated_columns is referred four times in the function.
> Can you store the value to a varibable?
>
>
> c. pg_decode_change()
>
> ```
> -true);
> +true, data->include_generated_columns );
> ```
>
> Please remove the blank.

Fixed.
The attached v3 Patch has the changes for the same.

Thanks and Regards,
Shubham Khanna.


v3-0001-Support-generated-column-capturing-generated-colu.patch
Description: Binary data


Re: Minor fixes for couple some comments around MERGE RETURNING

2024-05-22 Thread David Rowley
On Sun, 19 May 2024 at 15:20, David Rowley  wrote:
>
> I noticed that PlannedStmt.hasReturning and hasModifyingCTE have an
> outdated comment now that MERGE supports RETURNING (per commit
> c649fa24a)
>
> i.e. these two:
>
> > bool hasReturning; /* is it insert|update|delete RETURNING? */
>
> > bool hasModifyingCTE; /* has insert|update|delete in WITH? */

I've pushed the fix for that.

David




Re: using extended statistics to improve join estimates

2024-05-22 Thread Andrei Lepikhov

On 5/23/24 09:04, Andy Fan wrote:

Andrei Lepikhov  writes:

* c) No extended stats with MCV. If there are multiple join clauses,
* we can try using ndistinct coefficients and do what eqjoinsel does.


OK, I didn't pay enough attention to this comment before. and yes, I get
the same conclusion as you -  there is no implementation of this.

and if so, I think we should remove the comments and do the
implementation in the next patch.

I have an opposite opinion about it:
1. distinct estimation is more universal thing - you can use it 
precisely on any subset of columns.
2. distinct estimation is faster - it just a number, you don't need to 
detoast huge array of values and compare them one-by-one.


So, IMO, it is essential part of join estimation and it should be 
implemented like in eqjoinsel.

Do you think the hook proposal is closely connected with the current
topic? IIUC it's seems not. So a dedicated thread to explain the problem
to slove and the proposal and the follwing discussion should be helpful
for both topics. I'm just worried about mixing the two in one thread
would make things complexer unnecessarily.

Sure.

--
regards,
Andrei Lepikhov
Postgres Professional





Re: First draft of PG 17 release notes

2024-05-22 Thread David Rowley
On Thu, 23 May 2024 at 14:01, Bruce Momjian  wrote:
>
> On Thu, May 23, 2024 at 01:34:10PM +1200, David Rowley wrote:
> > What is the best way to communicate this stuff so it's easily
> > identifiable when you parse the commit messages?
>
> This is why I think we need an "Internal Performance" section where we
> can be clear about simple scaling improvements that might have no
> user-visible explanation.  I would suggest putting it after our "Source
> code" section.

hmm, but that does not really answer my question. There's still a
communication barrier if you're parsing the commit messages are
committers don't clearly state some performance improvement numbers
for the reason I stated.

I also don't agree these should be left to "Source code" section. I
feel that section is best suited for extension authors who might care
about some internal API change.  I'm talking of stuff that makes some
user queries possibly N times (!) faster. Surely "Source Code" isn't
the place to talk about that?

David




Re: Speed up JSON escape processing with SIMD plus other optimisations

2024-05-22 Thread David Rowley
On Thu, 23 May 2024 at 13:23, David Rowley  wrote:
> Master:
> $ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps
> tps = 362.494309 (without initial connection time)
> tps = 363.182458 (without initial connection time)
> tps = 362.679654 (without initial connection time)
>
> Master + 0001 + 0002
> $ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps
> tps = 426.456885 (without initial connection time)
> tps = 430.573046 (without initial connection time)
> tps = 431.142917 (without initial connection time)
>
> About 18% faster.
>
> It would be much faster if we could also get rid of the
> escape_json_cstring() call in the switch default case of
> datum_to_json_internal(). row_to_json() would be heaps faster with
> that done. I considered adding a special case for the "text" type
> there, but in the end felt that we should just fix that with some
> hypothetical other patch that changes how output functions work.
> Others may feel it's worthwhile. I certainly could be convinced of it.

Just to turn that into performance numbers, I tried the attached
patch.  The numbers came out better than I thought.

Same test as before:

master + 0001 + 0002 + attached hacks:
$ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps
tps = 616.094394 (without initial connection time)
tps = 615.928236 (without initial connection time)
tps = 614.175494 (without initial connection time)

About 70% faster than master.

David
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index a266f60ff3..b15f6c5e64 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -24,6 +24,7 @@
 #include "utils/builtins.h"
 #include "utils/date.h"
 #include "utils/datetime.h"
+#include "utils/fmgroids.h"
 #include "utils/json.h"
 #include "utils/jsonfuncs.h"
 #include "utils/lsyscache.h"
@@ -286,9 +287,15 @@ datum_to_json_internal(Datum val, bool is_null, StringInfo 
result,
pfree(jsontext);
break;
default:
-   outputstr = OidOutputFunctionCall(outfuncoid, val);
-   escape_json_cstring(result, outputstr);
-   pfree(outputstr);
+   /* special case common case for text types */
+   if (outfuncoid == F_TEXTOUT)
+   escape_json_from_text(result, (text *) 
DatumGetPointer(val));
+   else
+   {
+   outputstr = OidOutputFunctionCall(outfuncoid, 
val);
+   escape_json_cstring(result, outputstr);
+   pfree(outputstr);
+   }
break;
}
 }


Re: using extended statistics to improve join estimates

2024-05-22 Thread Andy Fan


Andrei Lepikhov  writes:

> On 20/5/2024 15:52, Andy Fan wrote:
>> Hi Andrei,
>> 
>>> On 4/3/24 01:22, Tomas Vondra wrote:
 Cool! There's obviously no chance to get this into v18, and I have stuff
 to do in this CF. But I'll take a look after that.
>>> I'm looking at your patch now - an excellent start to an eagerly awaited
>>> feature!
>>> A couple of questions:
>>> 1. I didn't find the implementation of strategy 'c' - estimation by the
>>> number of distinct values. Do you forget it?
>> What do you mean the "strategy 'c'"?
> As described in 0001-* patch:
> * c) No extended stats with MCV. If there are multiple join clauses,
> * we can try using ndistinct coefficients and do what eqjoinsel does.

OK, I didn't pay enough attention to this comment before. and yes, I get
the same conclusion as you -  there is no implementation of this.

and if so, I think we should remove the comments and do the
implementation in the next patch. 

>>> 2. Can we add a clauselist selectivity hook into the core (something
>>> similar the code in attachment)? It can allow the development and
>>> testing of multicolumn join estimations without patching the core.
>> The idea LGTM. But do you want
>> +if (clauselist_selectivity_hook)
>> +s1 = clauselist_selectivity_hook(root, clauses, varRelid, 
>> jointype,
>> +
>> rather than
>> +if (clauselist_selectivity_hook)
>> +*return* clauselist_selectivity_hook(root, clauses, ..)
> Of course - library may estimate not all the clauses - it is a reason,
> why I added input/output parameter 'estimatedclauses' by analogy with
> statext_clauselist_selectivity.

OK.

Do you think the hook proposal is closely connected with the current
topic? IIUC it's seems not. So a dedicated thread to explain the problem
to slove and the proposal and the follwing discussion should be helpful
for both topics. I'm just worried about mixing the two in one thread
would make things complexer unnecessarily.

-- 
Best Regards
Andy Fan





Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Thu, May 23, 2024 at 01:34:10PM +1200, David Rowley wrote:
> On Thu, 23 May 2024 at 10:04, Bruce Momjian  wrote:
> > You might have seen in this thread, I do record commits that speed up
> > workloads that are user-visible, or specifically make new workloads
> > possible.  I assume that covers the items above, though I have to
> > determine this from the commit message.
> 
> It sometimes is hard to write something specific in the commit message
> about the actual performance increase.
> 
> For example, if a commit removes an O(N log2 N) algorithm and replaces
> it with an O(1), you can't say there's an X% increase in performance
> as the performance increase depends on the value of N.
> 
> Jelte did call me out for not mentioning enough detail about the
> performance in c4ab7da60, but if I claimed any % of an increase, it
> would have been specific to some workload.
> 
> What is the best way to communicate this stuff so it's easily
> identifiable when you parse the commit messages?

This is why I think we need an "Internal Performance" section where we
can be clear about simple scaling improvements that might have no
user-visible explanation.  I would suggest putting it after our "Source
code" section.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Shared detoast Datum proposal

2024-05-22 Thread Andy Fan


Nikita Malakhov  writes:

> Hi,

> Andy, glad you've not lost interest in this work, I'm looking
> forward to your improvements!

Thanks for your words, I've adjusted to the rhythm of the community and
welcome more feedback:)

-- 
Best Regards
Andy Fan





Re: Shared detoast Datum proposal

2024-05-22 Thread Andy Fan

Robert Haas  writes:

> On Tue, May 21, 2024 at 10:02 PM Andy Fan  wrote:
>> One more things I want to highlight it "syscache" is used for metadata
>> and *detoast cache* is used for user data. user data is more
>> likely bigger than metadata, so cache size control (hence eviction logic
>> run more often) is more necessary in this case. The first graph in [1]
>> (including the quota text) highlight this idea.
>
> Yes.
>
>> I think that is same idea as design a.
>
> Sounds similar.
>

This is really great to know!

>> I think the key points includes:
>>
>> 1. Where to put the *detoast value* so that ExprEval system can find out
>> it cheaply. The soluation in A slot->tts_values[*].
>>
>> 2. How to manage the memory for holding the detoast value.
>>
>> The soluation in A is:
>>
>> - using a lazy created MemoryContext in slot.
>> - let the detoast system detoast the datum into the designed
>> MemoryContext.
>>
>> 3. How to let Expr evalution run the extra cycles when needed.
>
> I don't understand (3) but I agree that (1) and (2) are key points. In
> many - nearly all - circumstances just overwriting slot->tts_values is
> unsafe and will cause problems. To make this work, we've got to find
> some way of doing this that is compatible with general design of
> TupleTableSlot, and I think in that API, just about the only time you
> can touch slot->tts_values is when you're trying to populate a virtual
> slot. But often we'll have some other kind of slot. And even if we do
> have a virtual slot, I think you're only allowed to manipulate
> slot->tts_values up until you call ExecStoreVirtualTuple.

Please give me one more chance to explain this. What I mean is:

Take SELECT f(a) FROM t1 join t2...; for example,

When we read the Datum of a Var, we read it from tts->tts_values[*], no
matter what kind of TupleTableSlot is.  So if we can put the "detoast
datum" back to the "original " tts_values, then nothing need to be
changed. 

Aside from efficiency considerations, security and rationality are also
important considerations. So what I think now is:

- tts_values[*] means the Datum from the slot, and it has its operations
  like slot_getsomeattrs, ExecMaterializeSlot, ExecCopySlot,
  ExecClearTuple and so on. All of the characters have nothing with what
  kind of slot it is. 

- Saving the "detoast datum" version into tts_values[*] doesn't break
  the above semantic and the ExprEval engine just get a detoast version
  so it doesn't need to detoast it again.

- The keypoint is the memory management and effeiciency. for now I think
  all the places where "slot->tts_nvalid" is set to 0 means the
  tts_values[*] is no longer validate, then this is the place we should
  release the memory for the "detoast datum".  All the other places like
  ExecMaterializeSlot or ExecCopySlot also need to think about the
  "detoast datum" as well.

> That's why I mentioned using something temporary. You could legally
> (a) materialize the existing slot, (b) create a new virtual slot, (c)
> copy over the tts_values and tts_isnull arrays, (c) detoast any datums
> you like from tts_values and store the new datums back into the array,
> (d) call ExecStoreVirtualTuple, and then (e) use the new slot in place
> of the original one. That would avoid repeated detoasting, but it
> would also have a bunch of overhead.

Yes, I agree with the overhead, so I perfer to know if the my current
strategy has principle issue first.

> Having to fully materialize the
> existing slot is a cost that you wouldn't always need to pay if you
> didn't do this, but you have to do it to make this work. Creating the
> new virtual slot costs something. Copying the arrays costs something.
> Detoasting costs quite a lot potentially, if you don't end up using
> the values. If you end up needing a heap tuple representation of the
> slot, you're going to now have to make a new one rather than just
> using the one that came straight from the buffer.

> But I do agree with you
> that *if* we could make it work, it would probably be better than a
> longer-lived cache.


I noticed the "if" and great to know that:), speically for the efficiecy
& memory usage control purpose. 

Another big challenge of this is how to decide if a Var need this
pre-detoasting strategy, we have to consider:

- The in-place tts_values[*] update makes the slot bigger, which is
  harmful for CP_SMALL_TLIST (createplan.c) purpose. 
- ExprEval runtime checking if the Var is toastable is an overhead. It
  is must be decided ExecInitExpr time. 
  
The code complexity because of the above 2 factors makes people (include
me) unhappy.

===
Yesterday I did some worst case testing for the current strategy, and
the first case show me ~1% *unexpected* performance regression and I
tried to find out it with perf record/report, and no lucky. that's the
reason why I didn't send a complete post yesterday.

As for now, since we are talking about the high level design, I think
it is OK to post the improved design 

Re: First draft of PG 17 release notes

2024-05-22 Thread David Rowley
On Thu, 23 May 2024 at 10:04, Bruce Momjian  wrote:
> You might have seen in this thread, I do record commits that speed up
> workloads that are user-visible, or specifically make new workloads
> possible.  I assume that covers the items above, though I have to
> determine this from the commit message.

It sometimes is hard to write something specific in the commit message
about the actual performance increase.

For example, if a commit removes an O(N log2 N) algorithm and replaces
it with an O(1), you can't say there's an X% increase in performance
as the performance increase depends on the value of N.

Jelte did call me out for not mentioning enough detail about the
performance in c4ab7da60, but if I claimed any % of an increase, it
would have been specific to some workload.

What is the best way to communicate this stuff so it's easily
identifiable when you parse the commit messages?

David




Speed up JSON escape processing with SIMD plus other optimisations

2024-05-22 Thread David Rowley
Currently the escape_json() function takes a cstring and char-by-char
checks each character in the string up to the NUL and adds the escape
sequence if the character requires it.

Because this function requires a NUL terminated string, we're having
to do a little more work in some places.  For example, in
jsonb_put_escaped_value() we call pnstrdup() on the non-NUL-terminated
string to make a NUL-terminated string to pass to escape_json().

To make this faster, we can just have a version of escape_json which
takes a 'len' and stops after doing that many chars rather than
stopping when the NUL char is reached. Now there's no need to
pnstrdup() which saves some palloc()/memcpy() work.

There are also a few places where we do escape_json() with a "text"
typed Datum where we go and convert the text to a NUL-terminated
cstring so we can pass that along to ecape_json().  That's wasteful as
we could just pass the payload of the text Datum directly, and only
allocate memory if the text Datum needs to be de-toasted.  That saves
a useless palloc/memcpy/pfree cycle.

Now, to make this more interesting, since we have a version of
escape_json which takes a 'len', we could start looking at more than 1
character at a time. If you look closely add escape_json() all the
special chars apart from " and \ are below the space character.
pg_lfind8() and pg_lfind8_le() allow processing of 16 bytes at a time,
so we only need to search the 16 bytes 3 times to ensure that no
special chars exist within. When that test fails, just go into
byte-at-a-time processing first copying over the portion of the string
that passed the vector test up until that point.

I've attached 2 patches:

0001 does everything I've described aside from SIMD.
0002 does SIMD

I've not personally done too much work in the area of JSON, so I don't
have any canned workloads to throw at this.  I did try the following:

create table j1 (very_long_column_name_to_test_json_escape text);
insert into j1 select repeat('x', x) from generate_series(0,1024)x;
vacuum freeze j1;

bench.sql:
select row_to_json(j1)::jsonb from j1;

Master:
$ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps
tps = 362.494309 (without initial connection time)
tps = 363.182458 (without initial connection time)
tps = 362.679654 (without initial connection time)

Master + 0001 + 0002
$ pgbench -n -f bench.sql -T 10 -M prepared postgres | grep tps
tps = 426.456885 (without initial connection time)
tps = 430.573046 (without initial connection time)
tps = 431.142917 (without initial connection time)

About 18% faster.

It would be much faster if we could also get rid of the
escape_json_cstring() call in the switch default case of
datum_to_json_internal(). row_to_json() would be heaps faster with
that done. I considered adding a special case for the "text" type
there, but in the end felt that we should just fix that with some
hypothetical other patch that changes how output functions work.
Others may feel it's worthwhile. I certainly could be convinced of it.

I did add a new regression test. I'm not sure I'd want to keep that,
but felt it's worth leaving in there for now.

Other things I considered were if doing 16 bytes at a time is too much
as it puts quite a bit of work into byte-at-a-time processing if just
1 special char exists in a 16-byte chunk. I considered doing SWAR [1]
processing to do the job of vector8_has_le() and vector8_has() byte
maybe with just uint32s.  It might be worth doing that. However, I've
not done it yet as it raises the bar for this patch quite a bit.  SWAR
vector processing is pretty much write-only code. Imagine trying to
write comments for the code in [2] so that the average person could
understand what's going on!?

I'd be happy to hear from anyone that can throw these patches at a
real-world JSON workload to see if it runs more quickly.

Parking for July CF.

David

[1] https://en.wikipedia.org/wiki/SWAR
[2] https://dotat.at/@/2022-06-27-tolower-swar.html


v1-0001-Add-len-parameter-to-escape_json.patch
Description: Binary data


v1-0002-Use-SIMD-processing-for-escape_json.patch
Description: Binary data


Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Michael Paquier
On Wed, May 22, 2024 at 03:28:48PM -0300, Ranier Vilela wrote:
> 1. Another concern is the function *get_partition_ancestors*,
> which may return NIL, which may affect *llast_oid*, which does not handle
> NIL entries.

Hm?  We already know in the code path that the relation we are dealing
with when calling get_partition_ancestors() *is* a partition thanks to
the check on relispartition, no?  In this case, calling
get_partition_ancestors() is valid and there should be a top-most
parent in any case all the time.  So I don't get the point of checking
get_partition_ancestors() for NIL-ness just for the sake of assuming
that it would be possible.

> 2. Is checking *relispartition* enough?
> There a function *check_rel_can_be_partition*
> (src/backend/utils/adt/partitionfuncs.c),
> which performs a much more robust check, would it be worth using it?
> 
> With the v2 attached, 1 is handled, but, in this case,
> will it be the most correct?

Saying that, your point about the result of SearchSysCacheAttName not
checked if it is a valid tuple is right.  We paint errors in these
cases even if they should not happen as that's useful when it comes to
debugging, at least.
--
Michael


signature.asc
Description: PGP signature


Inval reliability, especially for inplace updates

2024-05-22 Thread Noah Misch
https://postgr.es/m/20240512232923.aa.nmi...@google.com wrote:
> Separable, nontrivial things not fixed in the attached patch stack:
> 
> - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK of
>   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
>   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this right.

I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
inside the critical section.  Send it in heap_xlog_inplace(), too.  The
interesting decision is how to handle RelationCacheInitFilePreInvalidate(),
which has an unlink_initfile() that can fail with e.g. EIO.  Options:

1. Unlink during critical section, and accept that EIO becomes PANIC.  Replay
   may reach the same EIO, and the system won't reopen to connections until
   the storage starts cooperating.a  Interaction with checkpoints is not ideal.
   If we checkpoint and then crash between inplace XLogInsert() and inval,
   we'd be relying on StartupXLOG() -> RelationCacheInitFileRemove().  That
   uses elevel==LOG, so replay would neglect to PANIC on EIO.

2. Unlink before critical section, so normal xact abort suffices.  This would
   hold RelCacheInitLock and a buffer content lock at the same time.  In
   RecordTransactionCommit(), it would hold RelCacheInitLock and e.g. slru
   locks at the same time.

The PANIC risk of (1) seems similar to the risk of PANIC at
RecordTransactionCommit() -> XLogFlush(), which hasn't been a problem.  The
checkpoint-related risk bothers me more, and (1) generally makes it harder to
reason about checkpoint interactions.  The lock order risk of (2) feels
tolerable.  I'm leaning toward (2), but that might change.  Other preferences?

Another decision is what to do about LogLogicalInvalidations().  Currently,
inplace update invalidations do reach WAL via LogLogicalInvalidations() at the
next CCI.  Options:

a. Within logical decoding, cease processing invalidations for inplace
   updates.  Inplace updates don't affect storage or interpretation of table
   rows, so they don't affect logicalrep_write_tuple() outcomes.  If they did,
   invalidations wouldn't make it work.  Decoding has no way to retrieve a
   snapshot-appropriate version of the inplace-updated value.

b. Make heap_decode() of XLOG_HEAP_INPLACE recreate the invalidation.  This
   would be, essentially, cheap insurance against invalidations having a
   benefit I missed in (a).

I plan to pick (a).

> - AtEOXact_Inval(true) is outside the RecordTransactionCommit() critical
>   section, but it is critical.  We must not commit transactional DDL without
>   other backends receiving an inval.  (When the inplace inval becomes
>   nontransactional, it will face the same threat.)

This faces the same RelationCacheInitFilePreInvalidate() decision, and I think
the conclusion should be the same as for inplace update.

Thanks,
nm




Re: processes stuck in shutdown following OOM/recovery

2024-05-22 Thread Martijn Wallet
Last test to have a verified mail, added lists.postgresql.org to spf record. 
Cheers.

Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Jonathan S. Katz

On 5/19/24 5:34 PM, Jonathan S. Katz wrote:

On 5/15/24 9:45 PM, Jonathan S. Katz wrote:

Hi,

Attached is a copy of the PostgreSQL 17 Beta 1 release announcement 
draft. This contains a user-facing summary of some of the features 
that will be available in the Beta, as well as a call to test. I've 
made an effort to group them logically around the different workflows 
they affect.


A few notes:

* The section with the features is not 80-char delimited. I will do 
that before the final copy


* There is an explicit callout that we've added in the SQL/JSON 
features that were previously reverted in PG15. I want to ensure we're 
transparent about that, but also use it as a hook to get people testing.


When reviewing:

* Please check for correctness of feature descriptions, keeping in 
mind this is targeted for a general audience


* Please indicate if you believe there's a notable omission, or if we 
should omit any of these callouts


* Please indicate if a description is confusing - I'm happy to rewrite 
to ensure it's clearer.


Please provide feedback no later than Wed 2024-05-22 18:00 UTC. As the 
beta release takes some extra effort, I want to ensure all changes are 
in with time to spare before release day.


Thanks for all the feedback to date. Please see the next revision. 
Again, please provide feedback no later than 2024-05-22 18:00 UTC.


Thanks again everyone for all your feedback. Attached is the final(-ish, 
as I'll do one more readthrough before release) draft of the release 
announcement.


If you catch something and are able to post it prior to 2024-05-23 12:00 
UTC, I may be able to incorporate into the announcement.


Thanks!

Jonathan

The PostgreSQL Global Development Group announces that the first beta release of
PostgreSQL 17 is now [available for 
download](https://www.postgresql.org/download/).
This release contains previews of all features that will be available when
PostgreSQL 17 is made generally available, though some details of the release
can change during the beta period.

You can find information about all of the PostgreSQL 17 features and changes in
the [release notes](https://www.postgresql.org/docs/17/release-17.html):

  
[https://www.postgresql.org/docs/17/release-17.html](https://www.postgresql.org/docs/17/release-17.html)

In the spirit of the open source PostgreSQL community, we strongly encourage you
to test the new features of PostgreSQL 17 on your systems to help us eliminate
bugs or other issues that may exist. While we do not advise you to run
PostgreSQL 17 Beta 1 in production environments, we encourage you to find ways
to run your typical application workloads against this beta release.

Your testing and feedback will help the community ensure that the PostgreSQL 17
release upholds our standards of delivering a stable, reliable release of the
world's most advanced open source relational database. Please read more about
our [beta testing process](https://www.postgresql.org/developer/beta/) and how
you can contribute:

  
[https://www.postgresql.org/developer/beta/](https://www.postgresql.org/developer/beta/)

PostgreSQL 17 Feature Highlights


### Query and Operational Performance Improvements

PostgreSQL 17 builds on recent releases and continues to improve performance 
across the entire system. 
[Vacuum](https://www.postgresql.org/docs/17/routine-vacuuming.html), the 
PostgreSQL process responsible for reclaiming storage, has a new internal data 
structure that has shown up to a 20x memory reduction for vacuum, along with 
improvements in overall time to complete its work. Additionally, the vacuum 
process no longer has a `1GB` limit on the memory it can use (controlled by 
[`maintenance_work_mem`](https://www.postgresql.org/docs/17/runtime-config-resource.html#GUC-MAINTENANCE-WORK-MEM)),
 giving you the option to apply more resources to vacuuming.

This release introduces an interface to stream I/O, and can show performance 
improvements when performing sequential scans and running 
[`ANALYZE`](https://www.postgresql.org/docs/17/sql-analyze.html). PostgreSQL 17 
also includes configuration parameters that can control scalability of 
[transaction, subtransaction and multixact 
buffers](https://www.postgresql.org/docs/17/runtime-config-resource.html#GUC-MULTIXACT-MEMBER-BUFFERS).

PostgreSQL 17 can now use both planner statistics and the sort order of [common 
table expressions](https://www.postgresql.org/docs/17/queries-with.html) (aka 
[`WITH` queries](https://www.postgresql.org/docs/17/queries-with.html)) to 
further optimize these queries and help them to execute more quickly. 
Additionally, this release significantly improves execution time of queries 
that use the `IN` clause with a [B-tree 
index](https://www.postgresql.org/docs/17/indexes-types.html#INDEXES-TYPES-BTREE).
 Starting with this release, PostgreSQL can remove redundant `IS NOT NULL` 
statements from execution on columns that have a `NOT NULL` constraint, 

Re: processes stuck in shutdown following OOM/recovery

2024-05-22 Thread Martijn Wallet
Great, thanks for the feedback. It was probably the DKIM.

Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Jonathan S. Katz

On 5/21/24 6:40 AM, John Naylor wrote:

On Mon, May 20, 2024 at 8:41 PM Masahiko Sawada  wrote:


On Mon, May 20, 2024 at 8:47 PM Jonathan S. Katz  wrote:


On 5/20/24 2:58 AM, John Naylor wrote:

Hi Jon,

Regarding vacuum "has shown up to a 6x improvement in overall time to
complete its work" -- I believe I've seen reported numbers close to
that only 1) when measuring the index phase in isolation or maybe 2)
the entire vacuum of unlogged tables with one, perfectly-correlated
index (testing has less variance with WAL out of the picture). I
believe tables with many indexes would show a lot of improvement, but
I'm not aware of testing that case specifically. Can you clarify where
6x came from?


Sawada-san showed me the original context, but I can't rapidly find it
in the thread. Sawada-san, can you please share the numbers behind this?



I referenced the numbers that I measured during the development[1]
(test scripts are here[2]). IIRC I used unlogged tables and indexes,
and these numbers were the entire vacuum execution time including heap
scanning, index vacuuming and heap vacuuming.


Thanks for confirming.

The wording "has a new internal data structure that reduces memory
usage and has shown up to a 6x improvement in overall time to complete
its work" is specific for runtime, and the memory use is less
specific. Unlogged tables are not the norm, so I'd be cautious of
reporting numbers specifically designed (for testing) to isolate the
thing that changed.

I'm wondering if it might be both more impressive-sounding and more
realistic for the average user experience to reverse that: specific on
memory, and less specific on speed. The best-case memory reduction
occurs for table update patterns that are highly localized, such as
the most recently inserted records, and I'd say those are a lot more
common than the use of unlogged tables.

Maybe something like "has a new internal data structure that reduces
overall time to complete its work and can use up to 20x less memory."

Now, it is true that when dead tuples are sparse and evenly spaced
(e.g. 1 every 100 pages), vacuum can now actually use more memory than
v16. However, the nature of that scenario also means that the number
of TIDs just can't get very big to begin with. In contrast, while the
runtime improvement for normal (logged) tables is likely not
earth-shattering, I believe it will always be at least somewhat
faster, and never slower.


Thanks for the feedback. I flipped it around, per your suggestion:

"has a new internal data structure that has shown up to a 20x memory 
reduction for vacuum, along with improvements in overall time to 
complete its work."


Thanks,

Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Jonathan S. Katz

On 5/20/24 6:08 AM, Alvaro Herrera wrote:

On 2024-May-19, Jonathan S. Katz wrote:


### Query and Operational Performance Improvements


In this section I'd add mention the new GUCs to control SLRU memory
size, which is going to be a huge performance boon for cases where the
current fixed-size buffers cause bottlenecks.  Perhaps something like

"Increase scalability of transaction, subtransaction and multixact
shared memory buffer handling, and make their buffer sizes configurable".

I don't know if we have any published numbers of the performance
improvement achieved, but with this patch (or ancestor ones) some
systems go from completely unoperational to working perfectly fine.
Maybe the best link is here
https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-MULTIXACT-MEMBER-BUFFERS
though exactly which GUC affects any particular user is workload-
dependant, so I'm not sure how best to do it.


I originally had penciled in this change, but didn't have a good way of 
describing it. The above solves that problem. I went with:


"PostgreSQL 17 also includes configuration parameters that can control 
scalability of [transaction, subtransaction and multixact 
buffers](https://www.postgresql.org/docs/devel/runtime-config-resource.html#GUC-MULTIXACT-MEMBER-BUFFERS)."



### Developer Experience


I think this section should also include the libpq query cancellation
improvements Jelte wrote.  Maybe something like "On the client side,
PostgreSQL 17 provides better support for asynchronous and more secure
query cancellation routines in libpq."  --> link to
https://www.postgresql.org/docs/17/libpq-cancel.html


I went with:

PostgreSQL 17 also provides better support for [asynchronous and more 
secure query cancellation 
routines](https://www.postgresql.org/docs/17/libpq-cancel.html), which 
drivers can adopt using the libpq API.


Thanks,

Jonathan



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Jonathan S. Katz

On 5/19/24 6:15 PM, Erik Rijkers wrote:

Op 5/19/24 om 23:34 schreef Jonathan S. Katz:

On 5/15/24 9:45 PM, Jonathan S. Katz wrote:

Hi,

Attached is a copy of the PostgreSQL 17 Beta 1 release announcement 


'This release introduces adds an interface'  should be:
'This release adds an interface'
    (or 'introduces'; just not both...)


Thanks; adjusted in the next copy.

Jonathan



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: PostgreSQL 17 Beta 1 release announcement draft

2024-05-22 Thread Jonathan S. Katz

On 5/20/24 5:34 AM, Bertrand Drouvot wrote:

Hi,

On Sun, May 19, 2024 at 05:10:10PM -0400, Jonathan S. Katz wrote:

On 5/16/24 1:15 AM, Bertrand Drouvot wrote:

Hi,

On Wed, May 15, 2024 at 09:45:35PM -0400, Jonathan S. Katz wrote:

Hi,

Attached is a copy of the PostgreSQL 17 Beta 1 release announcement draft.


Thanks for working on it!

I've one comment:


PostgreSQL 17 also introduces a new view, 
[`pg_wait_events`](https://www.postgresql.org/docs/17/view-pg-wait-events.html),
 which provides descriptions about wait events and can be combined with 
`pg_stat_activity` to give more insight into an operation.


Instead of "to give more insight into an operation", what about "to give more
insight about what a session is waiting for (should it be active)"?


I put:

"to give more in insight into why a session is blocked."


Thanks!



Does that work?



I think using "waiting" is better (as the view is "pg_wait_events" and the
join with pg_stat_activity would be on the "wait_event_type" and "wait_event"
columns).

The reason I mentioned "should it be active" is because wait_event and 
wait_event_type
could be non empty in pg_stat_activity while the session is not in an active 
state
anymore (then not waiting).

A right query would be like the one in [1]:

"
SELECT a.pid, a.wait_event, w.description
   FROM pg_stat_activity a JOIN
pg_wait_events w ON (a.wait_event_type = w.type AND
 a.wait_event = w.name)
   WHERE a.wait_event is NOT NULL and a.state = 'active';
"

means filtering on the "active" state too, and that's what the description
proposal I made was trying to highlight.


Thanks. As such I made it:

"which provides descriptions about wait events and can be combined with 
`pg_stat_activity` to give more insight into why an active session is 
waiting."


Jonathan


OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Wed, May 22, 2024 at 09:25:41PM +0900, torikoshia wrote:
> Thanks for working on this as always.
> 
> 
> Allow pg_stat_reset_shared("slru") to clear SLRU statistics (Atsushi
> Torikoshi)
> 
> 
> Considering someone may copy and paste this, 'slru' is better than "slru",
> isn't it?
> I also found an older release note[1] used single quotes for this like:
> 
>   Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared
> statistics for the background writer (Greg Smith)

Agreed, patch applied, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Wed, May 22, 2024 at 11:29:06AM +0900, Masahiko Sawada wrote:
> I found a typo:
> 
> s/pg_statstatement/pg_stat_statement/
> 
> I've attached a patch to fix it.

Agreed, applied, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 11:20:02AM +0900, Amit Langote wrote:
> Thanks Bruce for working on this as always.
> 
> Failed to notice when I read the notes before:
> 
> 
> 
> Add SQL/JSON constructor functions JSON(), JSON_SCALAR(), and
> JSON_SERIALIZE() (Amit Langote)
> 
> 
> 
> Should be:
> 
> 
> 
> Add SQL/JSON constructor functions JSON(), JSON_SCALAR(), and
> JSON_SERIALIZE() (Nikita Glukhov, Teodor Sigaev, Oleg Bartunov,
> Alexander Korotkov, Andrew Dunstan, Amit Langote)
> 
> 

Thanks, applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Mon, May 20, 2024 at 11:48:09AM -0700, Jeff Davis wrote:
> On Sat, 2024-05-18 at 17:51 -0400, Bruce Momjian wrote:
> > Okay, I went with the attached applied patch.  Adjustments?
> 
> I think it should have more emphasis on the actual new feature: a
> platform-independent builtin collation provider and the C.UTF-8 locale.
> 
> The user can look at the documentation for comparison with libc.

Okay, changed with the attached applied patch.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 3025f31f8a0..e381dbe5d96 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -885,11 +885,11 @@ Author: Jeff Davis 
 
 
 
-Create "builtin" collation provider similar to libc's C locale (Jeff Davis)
+Add a builtin platform-independent collation provider (Jeff Davis)
 
 
 
-While its C locale is similar but independent of libc, its C.UTF-8 locale sorts by Unicode code points and has Unicode-based case folding.
+This supports "C" and "C.UTF-8" collations.
 
 
 


Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 09:40:28AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-18 11:13:54 -0400, Bruce Momjian wrote:
> > Please see the email I just posted.  There are three goals we have to
> > adjust for:
> > 
> > 1.  short release notes so they are readable
> > 2.  giving people credit for performance improvements
> > 3.  showing people Postgres cares about performance
> > 
> > I would like to achieve 2 & 3 without harming #1.  My experience is if I
> > am reading a long document, and I get to a section where I start to
> > wonder, "Why should I care about this?", I start to skim the rest of
> > the document.
> 
> I agree keeping things reasonably short is important. But I don't think you're
> evenly applying it as a goal.
> 
> Just skimming the notes from the end, I see
> - an 8 entries long pg_stat_statements section

What item did you want to remove?  Those are all user-visible changes.

> - multiple entries about "Create custom wait events for ..."

Well, those are all in different sections, so how can they be merged,
unless I create a "wait event section", I guess.

> - three entries about adding --all to {reindexdb,vacuumdb,clusterdb}.

The problem with merging these is that the "Specifically, --all can now
be used with" is different for all three of them.

> - an entry about adding long options to pg_archivecleanup

Well, that is a user-visible change.  Should it not be listed?

> - two entries about grantable maintenance rights, once via pg_maintain, once
>   per-table

Well, one is a GRANT and another is a role, so merging them seemed like
it would be too confusing.

> - separate entries about pg_stat_reset_slru(), pg_stat_reset_shared("slru"),

They are different functions with different detail text.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: processes stuck in shutdown following OOM/recovery

2024-05-22 Thread Thomas Munro
On Thu, May 23, 2024 at 9:58 AM Martijn Wallet  wrote:
> The following review has been posted through the commitfest application:
> make installcheck-world:  not tested
> Implements feature:   not tested
> Spec compliant:   not tested
> Documentation:not tested
>
> Hi, I somehow fail to be able to mark all checkboxes on this review page...
> However, build and tested with all passed successfully on Rocky Linux release 
> 8.9 (Green Obsidian).
> Not sure of more reviewing is needed on other Operating Systems since this is 
> only my second review.

Thanks!

I'm also hoping to get review of the rather finickity state machine
logic involved from people familiar with that; I think it's right, but
I'd hate to break some other edge case...

> nb: second mail to see spf is fixed and Thomas receives this message.

FTR 171641337152.1103.7326466732639994038.p...@coridan.postgresql.org
and 171641505305.1105.9868637944637520353.p...@coridan.postgresql.org
both showed up in my inbox, and they both have headers "Received-SPF:
pass ...".




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 02:26:15PM -0400, Melanie Plageman wrote:
> In Postgres development, we break larger projects into smaller ones
> and then those smaller projects into multiple individual commits. Each
> commit needs to stand alone and each subproject needs to have a
> defensible benefit. One thing that is harder with performance-related
> work than non-performance feature work is that there isn't always a
> final "turn it on" commit. For example, let's say you are adding a new
> view that tracks new stats of some kind. You do a bunch of refactoring
> and small subprojects to make it possible to add the view. Then the
> final commit that actually creates the view has obvious user value to
> whoever is reading the log. For performance features, it doesn't
> always work like this.
> 
> For the vacuum WAL volume reduction, there were a bunch of smaller
> projects throughout the last development year that I worked on that
> were committed by different people and with different individual
> benefits. Some changes caused vacuum to do less visibility checks (so
> less CPU usage), some changed WAL format in a way that saves some
> space, and some, like the commit you mention, make vacuum emit less
> WAL. That commit by itself doesn't contain all of the user benefits of
> the whole project. I couldn't think of a good place to list all of the
> commits together that were part of the same project. Perhaps you could
> argue that they were not in fact part of the same project and instead
> were just small individual changes -- none of which are individually
> worth including in the release notes.

I try and group them, but I am sure imperfectly.  It is very true that
infrastucture changes that enable later commits are often missed.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 01:50:58PM -0400, Robert Haas wrote:
> On Tue, May 21, 2024 at 12:27 PM Andres Freund  wrote:
> > To me that's the "General Performance" section. If somebody reading the
> > release notes doesn't care about performance, they can just skip that 
> > section
> > ([1]).  I don't see why we wouldn't want to include the same level of detail
> > as for other changes.
> 
> I'm relatively sure that we've had this argument in previous years and
> essentially everyone but Bruce has agreed with the idea that
> performance changes ought to be treated the same as any other kind of
> improvement. The difficulty is that Bruce is the one doing the release
> notes. I think it might help if someone were willing to prepare a
> patch showing what they think specifically should be changed. Or maybe
> Bruce would be willing to provide a list of all of the performance
> improvements he doesn't think are worth release-noting or isn't sure
> how to release-note, and someone else can then have a go at them.

Well, developers do ask why their individual commits were not listed,
and I repeat the same thing, as I have done in this thread multiple
times.  You can probably look over this thread to find them, or in
previous years.

To be clear, it is performance improvements that don't have user-visible
changes or enable new workloads that I skip listing.  I will also note
that don't remember any user ever finding a performance boost, and then
coming to use and asking why we didn't list it --- this release note
review process seems to add all of those.

Maybe adding a section called "Internal Performance".  Here is our
General Performance contents:

E.1.3.1.3. General Performance

Allow vacuum to more efficiently remove and freeze tuples (Melanie
Plageman)

WAL traffic caused by vacuum is also more compact.

Allow vacuum to more efficiently store tuple references (Masahiko
Sawada, John Naylor)

Additionally, vacuum is no longer silently limited to one gigabyte
of memory when maintenance_work_mem or autovacuum_work_mem are higher.

Optimize vacuuming of relations with no indexes (Melanie Plageman)

Increase default vacuum_buffer_usage_limit to 2MB (Thomas Munro)

Improve performance when checking roles with many memberships
(Nathan Bossart)

Improve performance of heavily-contended WAL writes (Bharath
Rupireddy)

Improve performance when transferring large blocks of data to a
client (Melih Mutlu)

Allow the grouping of file system reads with the new system variable
io_combine_limit (Thomas Munro, Andres Freund, Melanie Plageman, Nazir
Bilal Yavuz)

Do we think more user-invisible changes should be in there?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 09:51:09AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2024-05-21 09:27:20 -0700, Andres Freund wrote:
> > Also, the release notes are also not just important to users. I often go 
> > back
> > and look in the release notes to see when some some important change was 
> > made,
> > because sometimes it's harder to find it in the git log, due to sheer
> > volume. And even just keeping up with all the changes between two releases 
> > is
> > hard, it's useful to be able to read the release notes and see what 
> > happened.
> >
> > [...]
> >
> > [1] I've wondered if we should have one more level of TOC on the release 
> > note
> > page, so it's easier to jump to specific sections.
> 
> Which reminds me: Eventually I'd like to add links to the most important
> commits related to release note entries. We already do much of the work of
> building that list of commits for each entry. That'd allow a reader to find
> more details if interested.

Yes, it would be cool if they could mouse-over a graphic next to each
release note item to get a popup to the commits.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Tue, May 21, 2024 at 09:27:20AM -0700, Andres Freund wrote:
> On 2024-05-18 10:59:47 -0400, Bruce Momjian wrote:
> > I agree the impact of performance improvements are often greater than
> > the average release note item.  However, if people expect Postgres to be
> > faster, is it important for them to know _why_ it is faster?
> 
> Yes, it very often is. Performance improvements typically aren't "make
> everything 3% faster", they're more "make this special thing 20%
> faster". Without know what got faster, users don't know if
> a) the upgrade will improve their production situation
> b) they need to change something to take advantage of the improvement

You might have seen in this thread, I do record commits that speed up
workloads that are user-visible, or specifically make new workloads
possible.  I assume that covers the items above, though I have to
determine this from the commit message.

> > On the flip side, a performance improvement that makes everything 10%
> > faster has little behavioral change for users, and in fact I think we
> > get ~6% faster in every major release.
> 
> I cannot recall many "make everything faster" improvements, if any.
> 
> And even if it's "make everything faster" - that's useful for users to know,
> it might solve their production problem!  It's also good for PR.

Again, it is down to having three goals for the release notes, and #1 is
having it readable/short, and 2 & 3 are for PR and acknowledging authors.

> Also, the release notes are also not just important to users. I often go back
> and look in the release notes to see when some some important change was made,
> because sometimes it's harder to find it in the git log, due to sheer
> volume. And even just keeping up with all the changes between two releases is
> hard, it's useful to be able to read the release notes and see what happened.

Well, I would say we need some _other_ way to record and perhaps
advertise such changes.

> > > For another, it's also very frustrating for developers that focus on
> > > performance. The reticence to note their work, while noting other, far
> > > smaller, things in the release notes, pretty much tells us that our work 
> > > isn't
> > > valued.
> >
> > Yes, but are we willing to add text that every user will have to read
> > just for this purpose?
> 
> Of course it's a tradeoff. We shouldn't verbosely note down every small
> changes just because of the recognition, that'd make the release notes
> unreadable. And it'd just duplicate the commit log. But that's not the same as
> defaulting to not noting performance improvements, even if the performance
> improvement is more impactful than many other features that are noted.

Again, see above, I do mention performance improvements, but they have
to be user-visible or enable new workloads.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: First draft of PG 17 release notes

2024-05-22 Thread Bruce Momjian
On Mon, May 20, 2024 at 02:47:28PM -0400, Melanie Plageman wrote:
> On Mon, May 20, 2024 at 9:37 AM Bruce Momjian  wrote:
> >
> > On Mon, May 20, 2024 at 01:23:02PM +0700, John Naylor wrote:
> > > Hi Bruce, thanks for doing this again!
> > >
> > > I'm a bit late to this discussion -- there's been a bit of churn in
> > > the vacuum items, and some streams got crossed along the way. I've
> > > attached an attempt to rectify this.
> >
> > Agreed, patch applied, thanks.
> 
> If "Allow vacuum to more efficiently remove and freeze tuples" stays
> in the release notes, I would add Heikki's name. He wasn't listed as a
> co-author on all of the commits that were part of this feature, but he
> made a substantial investment in it and should be listed, IMO. Patch
> attached.

Thanks, patch applied.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: processes stuck in shutdown following OOM/recovery

2024-05-22 Thread Martijn Wallet
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi, I somehow fail to be able to mark all checkboxes on this review page...
However, build and tested with all passed successfully on Rocky Linux release 
8.9 (Green Obsidian).
Not sure of more reviewing is needed on other Operating Systems since this is 
only my second review.

Cheers, Martijn.

nb: second mail to see spf is fixed and Thomas receives this message.

Re: processes stuck in shutdown following OOM/recovery

2024-05-22 Thread Martijn Wallet
The following review has been posted through the commitfest application:
make installcheck-world:  not tested
Implements feature:   not tested
Spec compliant:   not tested
Documentation:not tested

Hi, I somehow fail to be able to mark all checkboxes on this review page...
However, build and tested with all passed successfully on Rocky Linux release 
8.9 (Green Obsidian).
Not sure of more reviewing is needed on other Operating Systems since this is 
only my second review.

Cheers, Martijn.

The new status of this patch is: Ready for Committer


Re: Sort operation displays more tuples than it contains its subnode

2024-05-22 Thread Tom Lane
"a.rybakina"  writes:
> I faced the issue, when the sorting node in the actual information  
> shows a larger number of tuples than it actually is. And I can not 
> understand why?

If I'm reading this correctly, the sort node you're worrying about
feeds the inner side of a merge join.  Merge join will rewind its
inner side to the start of the current group of equal-keyed tuples
whenever it sees that the next outer tuple must also be joined to
that group.  Since what EXPLAIN is counting is the number of tuples
returned from the node, that causes it to double-count those tuples.
The more duplicate-keyed tuples on the outer side, the bigger the
effect.

You can see the same thing happening at the Materialize a little
further up, which is feeding the inside of the other merge join.

regards, tom lane




Re: Sort operation displays more tuples than it contains its subnode

2024-05-22 Thread David Rowley
On Thu, 23 May 2024 at 08:48, a.rybakina  wrote:
>->  Sort  (cost=613.48..632.86 rows=7754 width=8) (actual 
> time=7.612..21.214 rows=77531 loops=1)
>  Sort Key: broke_down_course.sno, broke_down_course.cno
>  Sort Method: quicksort  Memory: 374kB
>  ->  Seq Scan on broke_down_course  (cost=0.00..112.54 
> rows=7754 width=8) (actual time=0.016..1.366 rows=7754 loops=1)


> Maybe, my reproduction looks questionable, sorry for that, but I seriously 
> don't understand why we have so many tuples here in Sort node.

This is because of the "mark and restore" that occurs because of the
Merge Join. This must happen for joins because every tuple matching
the join condition must join to every other tuple that matches the
join condition. That means, if you have 3 tuples with the same key on
either side, you get 9 rows, not 3 rows.

Here's a simple example of the behaviour you describe shrunk down so
that it's more easily understandable:

create table t(a int);
insert into t values(1),(1),(1);
set enable_hashjoin=0;
set enable_nestloop=0;
explain (analyze, costs off) select * from t inner join t t1 on t.a=t1.a;
   QUERY PLAN

 Merge Join (actual time=0.036..0.038 rows=9 loops=1)
   Merge Cond: (t.a = t1.a)
   ->  Sort (actual time=0.025..0.025 rows=3 loops=1)
 Sort Key: t.a
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on t (actual time=0.017..0.018 rows=3 loops=1)
   ->  Sort (actual time=0.007..0.007 rows=7 loops=1)
 Sort Key: t1.a
 Sort Method: quicksort  Memory: 25kB
 ->  Seq Scan on t t1 (actual time=0.003..0.003 rows=3 loops=1)

Note the sort has rows=7 and the Seq Scan on t1 rows=3 and an output of 9 rows.

If you look at the code in [1], you can see the restoreMark() calls to
achieve this.

David

[1] https://en.wikipedia.org/wiki/Sort-merge_join




Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Pavel Stehule
st 22. 5. 2024 v 20:21 odesílatel  napsal:

> Alvaro Herrera:
> > Perhaps the solution to all this is to avoid having the variables be
> > implicitly present in the range table of all queries.  Instead, if you
> > need a variable's value, then you need to add the variable to the FROM
> > clause;
>
> +1
>
> This should make it easier to work with composite type schema variables
> in some cases.  It could also enable schema qualifying of schema
> variables, or at least make it easier to do, I think.
>
> In this case variables would share the same namespace as tables and
> views, right?  So I could not create a variable with the same name as
> another table.  Which is a good thing, I guess.  Not sure how it's
> currently implemented in the patch.
>

I don't like this. Sure, this fixes the problem with collisions, but then
we cannot talk about variables. When some is used like a table, then it
should be a table. I can imagine memory tables, but it is a different type
of object. Table is relation, variable is just value. Variables should not
have columns, so using the same patterns for tables and variables has no
sense. Using the same catalog for variables and tables. Variables just hold
a value, and then you can use it inside a query without necessity to write
JOIN. Variables are not tables, and then it is not too confusing so they
are not transactional and don't support more rows, more columns.

The problem with collision can be solved very easily - just use a dedicated
schema (only for variables) and don't use it in the search path.

In this case, the unwanted collision is not too probable - although it is
possible, if you use a schema name for a variable same like table name or
alias name.

I can use

CREATE SCHEMA __;
CREATE VARIABLE __.a AS int;

SELECT __.a;

although it is maybe wild, probably nobody will use alias or table name __
and then there should not be any problem







>
> Best,
>
> Wolfgang
>


Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Pavel Stehule
st 22. 5. 2024 v 19:25 odesílatel Tom Lane  napsal:

> Peter Eisentraut  writes:
> > On 18.05.24 13:29, Alvaro Herrera wrote:
> >> I want to note that when we discussed this patch series at the dev
> >> meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
> >> schema variables at all because of the fact that creating a variable
> >> would potentially change the meaning of queries by shadowing table
> >> columns.  But this turns out to be incorrect: it's_variables_  that are
> >> shadowed by table columns, not the other way around.
>
> > But that's still bad, because seemingly unrelated schema changes can
> > make variables appear and disappear.  For example, if you have
> >   SELECT a, b FROM table1
> > and then you drop column b, maybe the above query continues to work
> > because there is also a variable b.
>
> Yeah, that seems pretty dangerous.  Could we make it safe enough
> by requiring some qualification on variable names?  That is, if
> you mean b to be a variable, then you must write something like
>
> SELECT a, pg_variables.b FROM table1
>
> This is still ambiguous if you use "pg_variables" as a table alias in
> the query, but the alias would win so the query still means what it
> meant before.  Also, table aliases (as opposed to actual table names)
> don't change readily, so I don't think there's much risk of the query
> suddenly meaning something different than it did yesterday.
>

With active shadowing variable warning for described example you will get a
warning before dropping.

Session variables are joined with schema (in my proposal). Do anybody can
do just

CREATE SCHEMA svars; -- or what (s)he likes
CREATE VARIABLE svars.b AS int;

SELECT a, b FROM table1

and if somebody can be really safe, the can write

SELECT t.a, t.b FROM table1 t

or

SELECT t.a, svars.b FROM table1 t

It can be customized in the way anybody prefers - just creating dedicated
schemas and setting search_path. Using its own schema for session variables
without enhancing search_path for this schema forces the necessity to set
only qualified names for session variables.

Sure the naming of schemas, aliases can be unhappy wrong, and there can be
the problem. But this can be a problem today too.

Regards

Pavel


>
> regards, tom lane
>


Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Pavel Stehule
st 22. 5. 2024 v 14:37 odesílatel Peter Eisentraut 
napsal:

> On 18.05.24 13:29, Alvaro Herrera wrote:
> > I want to note that when we discussed this patch series at the dev
> > meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
> > schema variables at all because of the fact that creating a variable
> > would potentially change the meaning of queries by shadowing table
> > columns.  But this turns out to be incorrect: it's_variables_  that are
> > shadowed by table columns, not the other way around.
>
> But that's still bad, because seemingly unrelated schema changes can
> make variables appear and disappear.  For example, if you have
>
> SELECT a, b FROM table1
>
> and then you drop column b, maybe the above query continues to work
> because there is also a variable b.  Or maybe it now does different
> things because b is of a different type.  This all has the potential to
> be very confusing.
>

In the described case, the variable's shadowing warning will be raised.

There are more cases where not well designed changes (just with tables) can
break queries or change results. Adding columns can be a potential risk,
creating tables or dropping tables (when the search path contains more
schemas) too.

Good practice is using well designed names and almost all use aliases or
labels, and it is one way to minimize real risks. Personally I prefer a
very strict mode that disallows shadowing, conflicts, ... but on second
hand, for some usual work this strict mode can be boring, so we should find
some good compromise.

Regards

Pavel


Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela
Em qua., 22 de mai. de 2024 às 13:09, Ranier Vilela 
escreveu:

> Em qua., 22 de mai. de 2024 às 11:44, Ranier Vilela 
> escreveu:
>
>> Hi.
>>
>> Per Coverity.
>>
>> 2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of
>> 21 times).
>> 3. var_assigned: Assigning: ptup = NULL return value from
>> SearchSysCacheAttName.
>>  964ptup = SearchSysCacheAttName(relid, attname);
>> CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
>> 4. dereference: Dereferencing ptup, which is known to be NULL.
>>
>> The functions SearchSysCacheAttNum and SearchSysCacheAttName,
>> need to have the result checked.
>>
>> The commit 5091995
>> ,
>> left an oversight.
>>
>> Fixed by the patch attached, a change of style, unfortunately, was
>> necessary.
>>
> v1 Attached, fix wrong column variable name in error report.
>
1. Another concern is the function *get_partition_ancestors*,
which may return NIL, which may affect *llast_oid*, which does not handle
NIL entries.

2. Is checking *relispartition* enough?
There a function *check_rel_can_be_partition*
(src/backend/utils/adt/partitionfuncs.c),
which performs a much more robust check, would it be worth using it?

With the v2 attached, 1 is handled, but, in this case,
will it be the most correct?

best regards,
Ranier Vilela


v2-fix-catalog-cache-search-check.patch
Description: Binary data


Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread walther

Alvaro Herrera:

Perhaps the solution to all this is to avoid having the variables be
implicitly present in the range table of all queries.  Instead, if you
need a variable's value, then you need to add the variable to the FROM
clause;


+1

This should make it easier to work with composite type schema variables 
in some cases.  It could also enable schema qualifying of schema 
variables, or at least make it easier to do, I think.


In this case variables would share the same namespace as tables and 
views, right?  So I could not create a variable with the same name as 
another table.  Which is a good thing, I guess.  Not sure how it's 
currently implemented in the patch.


Best,

Wolfgang




Re: First draft of PG 17 release notes

2024-05-22 Thread Alvaro Herrera
On 2024-May-21, Andres Freund wrote:

> Which reminds me: Eventually I'd like to add links to the most important
> commits related to release note entries. We already do much of the work of
> building that list of commits for each entry. That'd allow a reader to find
> more details if interested.

+1.  Several times I have had to open the SGML to fetch a commit ID and
build a URL to provide to someone.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Puedes vivir sólo una vez, pero si lo haces bien, una vez es suficiente"




Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Alvaro Herrera
On 2024-May-22, Dmitry Dolgov wrote:

> Yeah, that's a bummer. Interestingly enough, the db2 implementation of
> global session variables mechanism is mentioned as similar to what we
> have in the patch. But weirdly, the db2 documentation just states
> possibility of a resolution conflict for unqualified names, nothing
> else.

Perhaps the solution to all this is to avoid having the variables be
implicitly present in the range table of all queries.  Instead, if you
need a variable's value, then you need to add the variable to the FROM
clause; and if you try to read from the variable and the name conflicts
with that of a column in one of the tables in the FROM clause, then you
get an error that the name is ambiguous and invites to qualify it.
Like, for instance,

create table lefttab (a int, b int);
create table righttab (c int, d int, b int);

=# select b from lefttab, righttab;
ERROR:  column reference "b" is ambiguous
LÍNEA 1: select b from lefttab, righttab;
^

but this works fine because there's no longer an ambiguity:

select lefttab.b from lefttab, righttab;
 b 
───
(0 filas)


Nothing breaks if you create new variables, because your queries won't
see them until you explicitly request them.  And if you add add columns
to either tables or variables, it's possible that some queries would
start having ambiguous references, in which case they'll just stop
working until you disambiguate by editing the query.


Now, Pavel has been saying that variables are simple and cannot break
queries (because they're always shadowed), which is why they're always
implicitly visible to all queries[1]; but maybe that's a mistake.

[1] 
https://postgr.es/m/cafj8pra2p7uafgpfjxvhrhftizbcn41j00breotspdd+urg...@mail.gmail.com

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)




Re: Virtual generated columns

2024-05-22 Thread Peter Eisentraut

On 29.04.24 20:54, Corey Huinker wrote:

  -- generation expression must be immutable
-CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
GENERATED ALWAYS AS (random()) STORED);
+CREATE TABLE gtest_err_4 (a int PRIMARY KEY, b double precision
GENERATED ALWAYS AS (random()) VIRTUAL);

Does a VIRTUAL generated column have to be immutable? I can see where 
the STORED one has to be, but consider the following:


CREATE TABLE foo (
created_at timestamptz DEFAULT CURRENT_TIMESTAMP,
row_age interval GENERATED ALWAYS AS CURRENT_TIMESTAMP - created_at
);


I have been hesitant about this, but I'm now leaning toward that we 
could allow this.



  -- can't have generated column that is a child of normal column
  CREATE TABLE gtest_normal (a int, b int);
-CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
(a * 2) STORED) INHERITS (gtest_normal);  -- error
+CREATE TABLE gtest_normal_child (a int, b int GENERATED ALWAYS AS
(a * 2) VIRTUAL) INHERITS (gtest_normal);  -- error

This is the barrier to the partitioning reorganization scheme I 
described above. Is there any hard rule why a child table couldn't have 
a generated column matching the parent's regular column? I can see where 
it might prevent indexing that column on the parent table, but is there 
some other dealbreaker or is this just a "it doesn't work yet" situation?


We had a quite a difficult time getting the inheritance business of 
stored generated columns working correctly.  I'm sticking to the 
well-trodden path here.  We can possibly expand this if someone wants to 
work out the details.


One last thing to keep in mind is that there are two special case 
expressions in the spec:


GENERATED ALWAYS AS ROW START
GENERATED ALWAYS AS ROW END

and we'll need to be able to fit those into the catalog. I'll start 
another thread for that unless you prefer I keep it here.


I think this is a separate feature.





Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Tom Lane
Peter Eisentraut  writes:
> On 18.05.24 13:29, Alvaro Herrera wrote:
>> I want to note that when we discussed this patch series at the dev
>> meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
>> schema variables at all because of the fact that creating a variable
>> would potentially change the meaning of queries by shadowing table
>> columns.  But this turns out to be incorrect: it's_variables_  that are
>> shadowed by table columns, not the other way around.

> But that's still bad, because seemingly unrelated schema changes can 
> make variables appear and disappear.  For example, if you have
>   SELECT a, b FROM table1
> and then you drop column b, maybe the above query continues to work 
> because there is also a variable b.

Yeah, that seems pretty dangerous.  Could we make it safe enough
by requiring some qualification on variable names?  That is, if
you mean b to be a variable, then you must write something like

SELECT a, pg_variables.b FROM table1

This is still ambiguous if you use "pg_variables" as a table alias in
the query, but the alias would win so the query still means what it
meant before.  Also, table aliases (as opposed to actual table names)
don't change readily, so I don't think there's much risk of the query
suddenly meaning something different than it did yesterday.

regards, tom lane




Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Dave Page
On Wed, 22 May 2024 at 17:50, Nazir Bilal Yavuz  wrote:

> Hi,
>
> On Wed, 22 May 2024 at 17:21, Dave Page  wrote:
> >
> > Hi
> >
> > On Wed, 22 May 2024 at 14:11, Nazir Bilal Yavuz 
> wrote:
> >>
> >>
> >> I tried to install your latest zlib artifact (nmake one) to the
> >> Windows CI images (not the official ones) [1]. Then, I used the
> >> default meson.build file to build but meson could not find the zlib.
> >> After that, I modified it like you suggested before; I used a
> >> 'cc.find_library()' to find zlib as a fallback method and it seems it
> >> worked [2]. Please see meson setup logs below [3], does something
> >> similar to the attached solve your problem?
> >
> >
> > That patch does solve my problem - thank you!
>
> I am glad that it worked!
>
> Do you think that we need to have this patch in the upstream Postgres?
> I am not sure because:
> - There is a case that meson is able to find zlib but tests fail.
> - This might be a band-aid fix rather than a permanent fix.


Yes I do:

- This is the documented way to build/install zlib on Windows.
- The behaviour with the patch matches <= PG16
- The behaviour with the patch is consistent with OpenSSL detection, and
(from a quick, unrelated test), libxml2 detection.

Thanks!

>


Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Nazir Bilal Yavuz
Hi,

On Wed, 22 May 2024 at 17:21, Dave Page  wrote:
>
> Hi
>
> On Wed, 22 May 2024 at 14:11, Nazir Bilal Yavuz  wrote:
>>
>>
>> I tried to install your latest zlib artifact (nmake one) to the
>> Windows CI images (not the official ones) [1]. Then, I used the
>> default meson.build file to build but meson could not find the zlib.
>> After that, I modified it like you suggested before; I used a
>> 'cc.find_library()' to find zlib as a fallback method and it seems it
>> worked [2]. Please see meson setup logs below [3], does something
>> similar to the attached solve your problem?
>
>
> That patch does solve my problem - thank you!

I am glad that it worked!

Do you think that we need to have this patch in the upstream Postgres?
I am not sure because:
- There is a case that meson is able to find zlib but tests fail.
- This might be a band-aid fix rather than a permanent fix.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: AIX support

2024-05-22 Thread Sriram RK
Thanks Alvaro, for the info…

Hi Team,
We referred to the below links to build this patch …
https://wiki.postgresql.org/wiki/Submitting_a_Patch
https://peter.eisentraut.org/blog/2023/05/09/how-to-submit-a-patch-by-email-2023-edition

Please find the attached patch.

Apart from the AIX specific changes, there is a minor change in this file wrt 
to XLC, below is the error for which we removed inline.
Later, the build and tests passed for both XLC(16.1.0.18) and gcc(12) as well.

  src/backend/storage/buffer/bufmgr.c

  "bufmgr.c", line 811.39: 1506-780 (S) Reference to "RelationGetSmgr" with 
internal linkage is not allowed within inline definition of 
"ReadBufferExtended".
  "bufmgr.c", line 811.15: 1506-780 (S) Reference to "ReadBuffer_common" with 
internal linkage is not allowed within inline definition of 
"ReadBufferExtended".
  gmake[4]: *** [: bufmgr.o] Error 1


Please let us know your feedback.

Thanks,
Sriram.


0001-AIX-support-revert-the-changes-from-0b16bb8776bb8.patch
Description:  0001-AIX-support-revert-the-changes-from-0b16bb8776bb8.patch


Re: Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela
Em qua., 22 de mai. de 2024 às 11:44, Ranier Vilela 
escreveu:

> Hi.
>
> Per Coverity.
>
> 2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of
> 21 times).
> 3. var_assigned: Assigning: ptup = NULL return value from
> SearchSysCacheAttName.
>  964ptup = SearchSysCacheAttName(relid, attname);
> CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
> 4. dereference: Dereferencing ptup, which is known to be NULL.
>
> The functions SearchSysCacheAttNum and SearchSysCacheAttName,
> need to have the result checked.
>
> The commit 5091995
> ,
> left an oversight.
>
> Fixed by the patch attached, a change of style, unfortunately, was
> necessary.
>
v1 Attached, fix wrong column variable name in error report.

best regards,
Ranier Vilela


v1-fix-catalog-cache-seach-check.patch
Description: Binary data


Re: An implementation of multi-key sort

2024-05-22 Thread Heikki Linnakangas

On 22/05/2024 15:48, Wang Yao wrote:

Comparing to classic quick sort, it can get significant performance
improvement once multiple keys are available. A rough test shows it got
~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE
INDEX on the same data set. (See more details in section "Performance
Test")


Impressive. Did you test the performance of the cases where MK-sort 
doesn't help, to check if there is a performance regression?


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: speed up a logical replica setup

2024-05-22 Thread Euler Taveira
On Wed, May 22, 2024, at 8:19 AM, Amit Kapila wrote:
> >
> > v2-0001: not changed
> >
> 
> Shouldn't we modify it as per the suggestion given in the email [1]? I
> am wondering if we can entirely get rid of checking the primary
> business and simply rely on recovery_timeout and keep checking
> server_is_in_recovery(). If so, we can modify the test to use
> non-default recovery_timeout (say 180s or something similar if we have
> used it at any other place). As an additional check we can ensure that
> constent_lsn is present on standby.

That's exactly what I want to propose as Tomas convinced me offlist that less is
better when we don't have a useful recovery progress reporting mechanism to make
sure it is still working on the recovery and we should wait.

> > v2-0002: not changed
> >
> 
> We have added more tries to see if the primary_slot_name becomes
> active but I think it is still fragile because it is possible on slow
> machines that the required slot didn't become active even after more
> retries. I have raised the same comment previously [2] and asked an
> additional question but didn't get any response.

Following the same line that simplifies the code, we can: (a) add a loop in
check_subscriber() that waits until walreceiver is available on subscriber or
(b) use a timeout. The main advantage of (a) is that the primary slot is already
available but I'm afraid we need a escape mechanism for the loop (timeout?).

I'll summarize all issues as soon as I finish the review of sync slot support. I
think we should avoid new development if we judge that the item can be
documented as a limitation for this version. Nevertheless, I will share patches
so you can give your opinion on whether it is an open item or new development.


--
Euler Taveira
EDB   https://www.enterprisedb.com/


Re: Shared detoast Datum proposal

2024-05-22 Thread Robert Haas
On Tue, May 21, 2024 at 10:02 PM Andy Fan  wrote:
> One more things I want to highlight it "syscache" is used for metadata
> and *detoast cache* is used for user data. user data is more
> likely bigger than metadata, so cache size control (hence eviction logic
> run more often) is more necessary in this case. The first graph in [1]
> (including the quota text) highlight this idea.

Yes.

> I think that is same idea as design a.

Sounds similar.

> I think the key points includes:
>
> 1. Where to put the *detoast value* so that ExprEval system can find out
> it cheaply. The soluation in A slot->tts_values[*].
>
> 2. How to manage the memory for holding the detoast value.
>
> The soluation in A is:
>
> - using a lazy created MemoryContext in slot.
> - let the detoast system detoast the datum into the designed
> MemoryContext.
>
> 3. How to let Expr evalution run the extra cycles when needed.

I don't understand (3) but I agree that (1) and (2) are key points. In
many - nearly all - circumstances just overwriting slot->tts_values is
unsafe and will cause problems. To make this work, we've got to find
some way of doing this that is compatible with general design of
TupleTableSlot, and I think in that API, just about the only time you
can touch slot->tts_values is when you're trying to populate a virtual
slot. But often we'll have some other kind of slot. And even if we do
have a virtual slot, I think you're only allowed to manipulate
slot->tts_values up until you call ExecStoreVirtualTuple.

That's why I mentioned using something temporary. You could legally
(a) materialize the existing slot, (b) create a new virtual slot, (c)
copy over the tts_values and tts_isnull arrays, (c) detoast any datums
you like from tts_values and store the new datums back into the array,
(d) call ExecStoreVirtualTuple, and then (e) use the new slot in place
of the original one. That would avoid repeated detoasting, but it
would also have a bunch of overhead. Having to fully materialize the
existing slot is a cost that you wouldn't always need to pay if you
didn't do this, but you have to do it to make this work. Creating the
new virtual slot costs something. Copying the arrays costs something.
Detoasting costs quite a lot potentially, if you don't end up using
the values. If you end up needing a heap tuple representation of the
slot, you're going to now have to make a new one rather than just
using the one that came straight from the buffer.

So I don't think we could do something like this unconditionally.
We'd have to do it only when there was a reasonable expectation that
it would work out, which means we'd have to be able to predict fairly
accurately whether it was going to work out. But I do agree with you
that *if* we could make it work, it would probably be better than a
longer-lived cache.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Avoid orphaned objects dependencies, take 3

2024-05-22 Thread Robert Haas
On Wed, May 22, 2024 at 6:21 AM Bertrand Drouvot
 wrote:
> I started initially with [1] but it was focusing on function-schema only.

Yeah, that's what I thought we would want to do. And then just extend
that to the other cases.

> Then I proposed [2] making use of a dirty snapshot when recording the 
> dependency.
> But this approach appeared to be "scary" and it was still failing to close
> some race conditions.

The current patch still seems to be using dirty snapshots for some
reason, which struck me as a bit odd. My intuition is that if we're
relying on dirty snapshots to solve problems, we likely haven't solved
the problems correctly, which seems consistent with your statement
about "failing to close some race conditions". But I don't think I
understand the situation well enough to be sure just yet.

> Then, Tom proposed another approach in [2] which is that "creation DDL will 
> have
> to take a lock on each referenced object that'd conflict with a lock taken by 
> DROP".
> This is the one the current patch is trying to implement.

It's a clever idea, but I'm not sure that I like it.

> I think there is 2 cases here:
>
> First case: the "conflicting" lock mode is for one of our own lock then 
> LockCheckConflicts()
> would report this as a NON conflict.
>
> Second case: the "conflicting" lock mode is NOT for one of our own lock then 
> LockCheckConflicts()
> would report a conflict. But I've the feeling that the existing code would
> already lock those sessions.
>
> Was your concern about "First case" or "Second case" or both?

The second case, I guess. It's bad to take a weaker lock first and
then a stronger lock on the same object later, because it can lead to
deadlocks that would have been avoided if the stronger lock had been
taken at the outset. Here, it seems like things would happen in the
other order: if we took two locks, we'd probably take the stronger
lock in the higher-level code and then the weaker lock in the
dependency code. That shouldn't break anything; it's just a bit
inefficient. My concern was really more about the maintainability of
the code. I fear that if we add code that takes heavyweight locks in
surprising places, we might later find the behavior difficult to
reason about.

Tom, what is your thought about that concern?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Avoid possible dereference null pointer (src/backend/catalog/pg_depend.c)

2024-05-22 Thread Ranier Vilela
Hi.

Per Coverity.

2. returned_null: SearchSysCacheAttName returns NULL (checked 20 out of 21
times).
3. var_assigned: Assigning: ptup = NULL return value from
SearchSysCacheAttName.
 964ptup = SearchSysCacheAttName(relid, attname);
CID 1545986: (#1 of 1): Dereference null return value (NULL_RETURNS)
4. dereference: Dereferencing ptup, which is known to be NULL.

The functions SearchSysCacheAttNum and SearchSysCacheAttName,
need to have the result checked.

The commit 5091995
,
left an oversight.

Fixed by the patch attached, a change of style, unfortunately, was
necessary.

best regards,
Ranier Vilela


fix-catalog-cache-search-check.patch
Description: Binary data


Re: State of pg_createsubscriber

2024-05-22 Thread Robert Haas
On Mon, May 20, 2024 at 2:42 AM Amit Kapila  wrote:
> Just to summarize, apart from BF failures for which we had some
> discussion, I could recall the following open points:
>
> 1. After promotion, the pre-existing replication objects should be
> removed (either optionally or always), otherwise, it can lead to a new
> subscriber not being able to restart or getting some unwarranted data.
> [1][2].
>
> 2. Retaining synced slots on new subscribers can lead to unnecessary
> WAL retention and dead rows [3].
>
> 3. We need to consider whether some of the messages displayed in
> --dry-run mode are useful or not [4].

Amit, thanks for summarzing your understanding of the situation. Tom,
is this list complete, to your knowledge? The original thread is quite
complex and it's hard to pick out what the open items actually are.
:-(

I would like to see this open item broken up into multiple open items,
one per issue.

Link [4] goes to a message that doesn't seem to relate to --dry-run.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: State of pg_createsubscriber

2024-05-22 Thread Robert Haas
On Wed, May 22, 2024 at 9:52 AM Robert Haas  wrote:
> Another option that we should at least consider is "do nothing". In a
> case like the one Shlok describes, how are we supposed to know what
> the right thing to do is? Is it unreasonable to say that if the user
> doesn't want those publications or subscriptions to exist, the user
> should drop them?
>
> Maybe it is unreasonable to say that, but it seems to me we should at
> least talk about that.

As another option, maybe we could disable subscriptions, so that
nothing happens when the server is first started, and then the user
could decide after that what they want to do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




RE: Fix src/test/subscription/t/029_on_error.pl test when wal_debug is enabled

2024-05-22 Thread Ilyasov Ian
Dear Michael, Amit, Hayato

I corrected my patch according to what I think
Michael wanted. I attached the new patch to the letter.

--
Kind regards,
Ian Ilyasov.

Junior Software Developer at Postgres Professional


0002-Fix-subscription-029_on_error.pl-test-when-wal_debug.patch
Description:  0002-Fix-subscription-029_on_error.pl-test-when-wal_debug.patch


Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Dave Page
Hi

On Wed, 22 May 2024 at 14:11, Nazir Bilal Yavuz  wrote:

> Hi,
>
> On Tue, 21 May 2024 at 18:24, Dave Page  wrote:
> >
> >
> >
> > On Tue, 21 May 2024 at 16:04, Andres Freund  wrote:
> >>
> >> Hi,
> >>
> >> On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> >> > I have very little experience with Meson, and even less interpreting
> it's
> >> > logs, but it seems to me that it's not including the extra lib and
> include
> >> > directories when it runs the test compile, given the command line it's
> >> > reporting:
> >> >
> >> > cl
> C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
> >> > /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od
> /Oi-
> >> >
> >> > Bug, or am I doing something silly?
> >>
> >> It's a buglet. We rely on meson's internal fallback detection of zlib,
> if it's
> >> not provided via pkg-config or cmake. But it doesn't know about our
> >> extra_include_dirs parameter. We should probably fix that...
> >
> >
> > Oh good, then I'm not going bonkers. I'm still curious about how it
> works for Andrew but not me, however fixing that buglet should solve my
> issue, and would be sensible behaviour.
> >
> > Thanks!
>
> I tried to install your latest zlib artifact (nmake one) to the
> Windows CI images (not the official ones) [1]. Then, I used the
> default meson.build file to build but meson could not find the zlib.
> After that, I modified it like you suggested before; I used a
> 'cc.find_library()' to find zlib as a fallback method and it seems it
> worked [2]. Please see meson setup logs below [3], does something
> similar to the attached solve your problem?
>

That patch does solve my problem - thank you!


>
> The interesting thing is, I also tried this 'cc.find_library' method
> with your old artifact (cmake one). It was able to find zlib but all
> tests failed [4].
>

Very odd. Whilst I haven't used that particular build elsewhere, we've been
building PostgreSQL and shipping client utilities with pgAdmin using
cmake-built zlib for years.


>
> Experimental zlib meson.build diff is attached.
>
> [1] https://cirrus-ci.com/task/6736867247259648
> [2] https://cirrus-ci.com/build/5286228755480576
> [3]
> Run-time dependency zlib found: NO (tried pkgconfig, cmake and system)
> Has header "zlib.h" : YES
> Library zlib found: YES
> ...
>   External libraries
> ...
> zlib   : YES
> ...
> [4] https://cirrus-ci.com/task/5208433811521536
>
> --
> Regards,
> Nazir Bilal Yavuz
> Microsoft
>


-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Dmitry Dolgov
> On Wed, May 22, 2024 at 02:37:49PM +0200, Peter Eisentraut wrote:
> On 18.05.24 13:29, Alvaro Herrera wrote:
> > I want to note that when we discussed this patch series at the dev
> > meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
> > schema variables at all because of the fact that creating a variable
> > would potentially change the meaning of queries by shadowing table
> > columns.  But this turns out to be incorrect: it's_variables_  that are
> > shadowed by table columns, not the other way around.
>
> But that's still bad, because seemingly unrelated schema changes can make
> variables appear and disappear.  For example, if you have
>
> SELECT a, b FROM table1
>
> and then you drop column b, maybe the above query continues to work because
> there is also a variable b.  Or maybe it now does different things because b
> is of a different type.  This all has the potential to be very confusing.

Yeah, that's a bummer. Interestingly enough, the db2 implementation of
global session variables mechanism is mentioned as similar to what we
have in the patch. But weirdly, the db2 documentation just states
possibility of a resolution conflict for unqualified names, nothing
else.

There was extensive discussion about this problem early in the thread,
and one alternative is to use some sort of special syntax every time
when working with a variable to clear any ambiguity [1]. It's more
verbose, has to be careful to not block some useful syntax for other
stuff, etc. But as Pavel said:

> The different syntax disallows any collision well, it is far to what is
> more usual standard in this area. And if we introduce special syntax, then
> there is no way back. We cannot use :varname - this syntax is used already,
> but we can use, theoretically, @var or $var. But, personally, I don't want
> to use it, if there is possibility to do without it.

It seems to me there is no other possibility to resolve those ambiguity
issues.

[1]: 
https://www.postgresql.org/message-id/CAFj8pRD03hwZK%2B541KDt4Eo5YuC81CBBX_P0Sa5A7g5TQFsTww%40mail.gmail.com




Re: State of pg_createsubscriber

2024-05-22 Thread Robert Haas
On Wed, May 22, 2024 at 7:42 AM Amit Kapila  wrote:
> So, what shall we do about such cases?  I think by default we can
> remove all pre-existing subscriptions and publications on the promoted
> standby or instead we can remove them based on some switch. If we want
> to go with this idea then we might need to distinguish the between
> pre-existing subscriptions and the ones created by this tool.
>
> The other case I remember adding an option in this tool was to avoid
> specifying slots, pubs, etc. for each database. See [1]. We can
> probably leave if the same is not important but we never reached the
> conclusion of same.

Another option that we should at least consider is "do nothing". In a
case like the one Shlok describes, how are we supposed to know what
the right thing to do is? Is it unreasonable to say that if the user
doesn't want those publications or subscriptions to exist, the user
should drop them?

Maybe it is unreasonable to say that, but it seems to me we should at
least talk about that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Nazir Bilal Yavuz
Hi,

On Tue, 21 May 2024 at 18:24, Dave Page  wrote:
>
>
>
> On Tue, 21 May 2024 at 16:04, Andres Freund  wrote:
>>
>> Hi,
>>
>> On 2024-05-20 11:58:05 +0100, Dave Page wrote:
>> > I have very little experience with Meson, and even less interpreting it's
>> > logs, but it seems to me that it's not including the extra lib and include
>> > directories when it runs the test compile, given the command line it's
>> > reporting:
>> >
>> > cl C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
>> > /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od /Oi-
>> >
>> > Bug, or am I doing something silly?
>>
>> It's a buglet. We rely on meson's internal fallback detection of zlib, if 
>> it's
>> not provided via pkg-config or cmake. But it doesn't know about our
>> extra_include_dirs parameter. We should probably fix that...
>
>
> Oh good, then I'm not going bonkers. I'm still curious about how it works for 
> Andrew but not me, however fixing that buglet should solve my issue, and 
> would be sensible behaviour.
>
> Thanks!

I tried to install your latest zlib artifact (nmake one) to the
Windows CI images (not the official ones) [1]. Then, I used the
default meson.build file to build but meson could not find the zlib.
After that, I modified it like you suggested before; I used a
'cc.find_library()' to find zlib as a fallback method and it seems it
worked [2]. Please see meson setup logs below [3], does something
similar to the attached solve your problem?

The interesting thing is, I also tried this 'cc.find_library' method
with your old artifact (cmake one). It was able to find zlib but all
tests failed [4].

Experimental zlib meson.build diff is attached.

[1] https://cirrus-ci.com/task/6736867247259648
[2] https://cirrus-ci.com/build/5286228755480576
[3]
Run-time dependency zlib found: NO (tried pkgconfig, cmake and system)
Has header "zlib.h" : YES
Library zlib found: YES
...
  External libraries
...
zlib   : YES
...
[4] https://cirrus-ci.com/task/5208433811521536

-- 
Regards,
Nazir Bilal Yavuz
Microsoft
diff --git a/meson.build b/meson.build
index f5ca5cfed49..d89a7a3e277 100644
--- a/meson.build
+++ b/meson.build
@@ -1373,20 +1373,23 @@ endif
 zlibopt = get_option('zlib')
 zlib = not_found_dep
 if not zlibopt.disabled()
-  zlib_t = dependency('zlib', required: zlibopt)
+  zlib = dependency('zlib', required: false)
 
-  if zlib_t.type_name() == 'internal'
-# if fallback was used, we don't need to test if headers are present (they
-# aren't built yet, so we can't test)
-zlib = zlib_t
-  elif not zlib_t.found()
-warning('did not find zlib')
-  elif not cc.has_header('zlib.h',
-  args: test_c_args, include_directories: postgres_inc,
-  dependencies: [zlib_t], required: zlibopt)
-warning('zlib header not found')
-  else
-zlib = zlib_t
+  if zlib.found() and zlib.type_name() != 'internal'
+if not cc.has_header('zlib.h',
+args: test_c_args, include_directories: postgres_inc,
+dependencies: zlib, required: false)
+  zlib = not_found_dep
+endif
+  elif not zlib.found()
+zlib_lib = cc.find_library('zlib',
+  dirs: test_lib_d,
+  header_include_directories: postgres_inc,
+  has_headers: ['zlib.h'],
+  required: zlibopt)
+if zlib_lib.found()
+  zlib = declare_dependency(dependencies: zlib_lib, include_directories: postgres_inc)
+endif
   endif
 
   if zlib.found()


An implementation of multi-key sort

2024-05-22 Thread Wang Yao
Hi hackers,

I'd submit an implementation of multi-key sort for review. Please see the
code as attachment. Thanks for your reponse in advance.


Overview


MKsort (multi-key sort) is an alternative of standard qsort algorithm,
which has better performance for particular sort scenarios, i.e. the data
set has multiple keys to be sorted.

The implementation is based on the paper:
Jon L. Bentley and Robert Sedgewick, "Fast Algorithms for Sorting and
Searching Strings", Jan 1997 [1]

MKsort is applied only for tuple sort by the patch. Theoretically it can
be applied for general-purpose sort scenario when there are multiple sort
keys available, but it is relatively difficult in practice because kind of
unique interface is needed to manipulate the keys. So I limit the usage of
mksort to sort SortTuple.

Comparing to classic quick sort, it can get significant performance
improvement once multiple keys are available. A rough test shows it got
~129% improvement than qsort for ORDER BY on 6 keys, and ~52% for CREATE
INDEX on the same data set. (See more details in section "Performance
Test")

Author: Yao Wang 
Co-author: Hongxu Ma 

Scope
-

The interface of mksort is pretty simple: in tuplesort_sort_memtuples(),
mksort_tuple() is invoked instead of qsort_tuple() if mksort is applicable.
The major logic in mksort_tuple() is to apply mksort algorithm on
SortTuple, and kind of callback mechanism is used to handle
sort-variant-specific issue, e.g. comparing different datums, like
qsort_tuple() does. It also handles the complexity of "abbreviated keys".

A small difference from classic mksort algorithm is: for IndexTuple, when
all the columns are equal, an additional comparing based on ItemPointer
is performed to determine the order. It is to make the result consistent
to existing qsort.

I did consider about implementing mksort by the approach of kind of
template mechanism like qsort (see sort_template.h), but it seems
unnecessary because all concrete tuple types need to be handled are
derived from SortTuple. Use callback to isolate type specific features
is good enough.

Note that not all tuple types are supported by mksort. Please see the
comments inside tuplesort_sort_memtuples().

Test Cases
--

The changes of test cases include:

* Generally, mksort should generate result exactly same to qsort. However
some test cases don't. The reason is that SQL doesn't specify order on
all possible columns, e.g. "select c1, c2 from t1 order by c1" will
generate different results between mksort/qsort when c1 values are equal,
and the solution is to order c2 as well ("select c1, c2 from t1 order by
c1, c2"). (e.g. geometry)
* Some cases need to be updated to display the new sort method "multi-key
sort" in explain result. (e.g. incremental_sort)
* regress/tuplesort was updated with new cases to cover some scenarios of
mksort.

Performance Test


The script I used to configure the build:

CFLAGS="-O3 -fargument-noalias-global -fno-omit-frame-pointer -g"
./configure --prefix=$PGHOME --with-pgport=5432 --with-perl --with-openssl
--with-python --with-pam --with-blocksize=16 --with-wal-blocksize=16
--with-perl --enable-tap-tests --with-gssapi --with-ldap

I used the script for a rough test for ORDER BY:

\timing on
create table t1 (c1 int, c2 int, c3 int, c4 int, c5 int, c6 varchar(100));
insert into t1 values (generate_series(1,49), 0, 0, 0, 0, 
  'aaabbb');
update t1 set c2 = c1 % 100, c3 = c1 % 50, c4 = c1 % 10, c5 = c1 % 3;
update t1 set c6 = 'aaabbb'
|| (c1 % 5)::text;

-- Use a large work mem to ensure the entire sort happens in memory
set work_mem='1GB';

-- switch between qsort/mksort
set enable_mk_sort=off;

explain analyze select c1 from t1 order by c6, c5, c4, c3, c2, c1;

Results:

mksort:
1341.283 ms (00:01.341)
1379.098 ms (00:01.379)
1369.868 ms (00:01.370)

qsort:
3137.277 ms (00:03.137)
3147.771 ms (00:03.148)
3131.887 ms (00:03.132)

The perf improvement is ~129%.

Another perf test for CREATE INDEX:

create index idx_t1_mk on t3 (c6, c5, c4, c3, c2, c1);

Results:

mksort:
1147.207 ms (00:01.147)
1200.501 ms (00:01.201)
1235.657 ms (00:01.236)

Qsort:
1852.957 ms (00:01.853)
1824.209 ms (00:01.824)
1808.781 ms (00:01.809)

The perf improvement is ~52%.

Another test is to use one of queries of TPC-H:

set work_mem='1GB';

-- query rewritten from TPCH-Q1, and there are 6001215 rows in lineitem
explain analyze select
 l_returnflag,l_linestatus,l_quantity,l_shipmode
from
 lineitem
where
 l_shipdate <= date'1998-12-01' - interval '65 days'
order by
 l_returnflag,l_linestatus,l_quantity,l_shipmode;

Result:

Qsort:
14582.626 ms
14524.188 ms
14524.111 ms

mksort:
11390.891 ms
11647.065 ms
11546.791 ms

The perf improvement is ~25.8%.

[1] https://www.cs.tufts.edu/~nr/cs257/archive/bob-sedgewick/fast-strings.pdf
[2] https://www.tpc.org/tpch/


Thanks,

Yao Wang


Re: Shared detoast Datum proposal

2024-05-22 Thread Nikita Malakhov
Hi,

Robert, thank you very much for your response to this thread.
I agree with most of things you've mentioned, but this proposal
is a PoC, and anyway has a long way to go to be presented
(if it ever would) as something to be committed.

Andy, glad you've not lost interest in this work, I'm looking
forward to your improvements!

-- 
Regards,
Nikita Malakhov
Postgres Professional
The Russian Postgres Company
https://postgrespro.ru/


Re: Schema variables - new implementation for Postgres 15

2024-05-22 Thread Peter Eisentraut

On 18.05.24 13:29, Alvaro Herrera wrote:

I want to note that when we discussed this patch series at the dev
meeting in FOSDEM, a sort-of conclusion was reached that we didn't want
schema variables at all because of the fact that creating a variable
would potentially change the meaning of queries by shadowing table
columns.  But this turns out to be incorrect: it's_variables_  that are
shadowed by table columns, not the other way around.


But that's still bad, because seemingly unrelated schema changes can 
make variables appear and disappear.  For example, if you have


SELECT a, b FROM table1

and then you drop column b, maybe the above query continues to work 
because there is also a variable b.  Or maybe it now does different 
things because b is of a different type.  This all has the potential to 
be very confusing.






Re: First draft of PG 17 release notes

2024-05-22 Thread torikoshia

On 2024-05-09 13:03, Bruce Momjian wrote:

I have committed the first draft of the PG 17 release notes;  you can
see the results here:

https://momjian.us/pgsql_docs/release-17.html

It will be improved until the final release.  The item count is 188,
which is similar to recent releases:

release-10:  189
release-11:  170
release-12:  180
release-13:  178
release-14:  220
release-15:  184
release-16:  206
release-17:  188

I welcome feedback.  For some reason it was an easier job than usual.


Thanks for working on this as always.


Allow pg_stat_reset_shared("slru") to clear SLRU statistics (Atsushi 
Torikoshi)



Considering someone may copy and paste this, 'slru' is better than 
"slru", isn't it?

I also found an older release note[1] used single quotes for this like:

  Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared 
statistics for the background writer (Greg Smith)


[1] https://www.postgresql.org/docs/release/9.0.0/

--
Regards,

--
Atsushi Torikoshi
NTT DATA Group Corporationdiff --git a/doc/src/sgml/release-17.sgml b/doc/src/sgml/release-17.sgml
index 612eb100a7..96be7b3e8b 100644
--- a/doc/src/sgml/release-17.sgml
+++ b/doc/src/sgml/release-17.sgml
@@ -647,7 +647,7 @@ Author: Michael Paquier 
 
 
 
-Allow pg_stat_reset_shared("slru") to clear SLRU statistics (Atsushi Torikoshi)
+Allow pg_stat_reset_shared('slru') to clear SLRU statistics (Atsushi Torikoshi)
 
 
 


Re: Testing autovacuum wraparound (including failsafe)

2024-05-22 Thread Alexander Lakhin

Hello,

30.11.2023 10:35, Masahiko Sawada wrote:



I've attached new version patches (0002 and 0003 are unchanged except
for the commit message). I'll push them, barring any objections.


Pushed.


I've discovered that the test 001_emergency_vacuum.pl can fail due to a
race condition. I can't see the server log at [1], but I reproduced the
failure locally and with additional logging and log_min_messages = DEBUG3,
the log shows:
...
2024-05-22 11:46:28.125 UTC [21256:2853] DEBUG:  SlruScanDirectory invoking 
callback on pg_xact/0690
2024-05-22 11:46:28.125 UTC [21256:2854] DEBUG:  transaction ID wrap limit is 
2147484396, limited by database with OID 5
2024-05-22 11:46:28.126 UTC [21256:2855] LOG: !!!SendPostmasterSignal| 
PMSIGNAL_START_AUTOVAC_LAUNCHER
2024-05-22 11:46:28.135 UTC [14871:20077] DEBUG:  postmaster received pmsignal 
signal
2024-05-22 11:46:28.137 UTC [21257:1] DEBUG:  autovacuum launcher started
2024-05-22 11:46:28.137 UTC [21257:2] DEBUG:  InitPostgres
2024-05-22 11:46:28.138 UTC [21257:3] LOG:  !!!AutoVacLauncherMain| !AutoVacuumingActive() && !ShutdownRequestPending; 
before do_start_worker()

2024-05-22 11:46:28.138 UTC [21257:4] LOG:  !!!do_start_worker| return quickly 
when there are no free workers
2024-05-22 11:46:28.138 UTC [21257:5] DEBUG:  shmem_exit(0): 4 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:6] DEBUG:  shmem_exit(0): 6 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.138 UTC [21257:7] DEBUG:  proc_exit(0): 1 callbacks to make
2024-05-22 11:46:28.138 UTC [21257:8] DEBUG:  exit(0)
2024-05-22 11:46:28.138 UTC [21257:9] DEBUG:  shmem_exit(-1): 0 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.138 UTC [21257:10] DEBUG:  shmem_exit(-1): 0 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.138 UTC [21257:11] DEBUG:  proc_exit(-1): 0 callbacks to 
make
2024-05-22 11:46:28.146 UTC [21256:2856] DEBUG:  MultiXactId wrap limit is 
2147483648, limited by database with OID 5
2024-05-22 11:46:28.146 UTC [21256:2857] DEBUG:  MultiXact member stop limit is 
now 4294914944 based on MultiXact 1
2024-05-22 11:46:28.146 UTC [21256:2858] DEBUG:  shmem_exit(0): 4 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2859] DEBUG:  shmem_exit(0): 7 on_shmem_exit 
callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2860] DEBUG:  proc_exit(0): 1 callbacks to 
make
2024-05-22 11:46:28.147 UTC [21256:2861] DEBUG:  exit(0)
2024-05-22 11:46:28.147 UTC [21256:2862] DEBUG:  shmem_exit(-1): 0 
before_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2863] DEBUG:  shmem_exit(-1): 0 
on_shmem_exit callbacks to make
2024-05-22 11:46:28.147 UTC [21256:2864] DEBUG:  proc_exit(-1): 0 callbacks to 
make
2024-05-22 11:46:28.151 UTC [14871:20078] DEBUG:  forked new backend, pid=21258 
socket=8
2024-05-22 11:46:28.171 UTC [14871:20079] DEBUG:  server process (PID 21256) 
exited with exit code 0
2024-05-22 11:46:28.152 UTC [21258:1] [unknown] LOG:  connection received: 
host=[local]
2024-05-22 11:46:28.171 UTC [21258:2] [unknown] DEBUG:  InitPostgres
2024-05-22 11:46:28.172 UTC [21258:3] [unknown] LOG:  connection authenticated: user="vagrant" method=trust 
(/pgtest/postgresql.git/src/test/modules/xid_wraparound/tmp_check/t_001_emergency_vacuum_main_data/pgdata/pg_hba.conf:117)
2024-05-22 11:46:28.172 UTC [21258:4] [unknown] LOG:  connection authorized: user=vagrant database=postgres 
application_name=001_emergency_vacuum.pl

2024-05-22 11:46:28.175 UTC [21258:5] 001_emergency_vacuum.pl LOG: statement: 
INSERT INTO small(data) SELECT 1

That is, autovacuum worker (21256) sent PMSIGNAL_START_AUTOVAC_LAUNCHER,
postmaster started autovacuum launcher, which could not start new
autovacuum worker due to the process 21256 not exited yet.

The failure can be reproduced easily with the sleep added inside
SetTransactionIdLimit():
    if (TransactionIdFollowsOrEquals(curXid, xidVacLimit) &&
    IsUnderPostmaster && !InRecovery)
SendPostmasterSignal(PMSIGNAL_START_AUTOVAC_LAUNCHER);
+pg_usleep(1L);

By the way I also discovered that rather resource-intensive xid_wraparound
tests executed twice during the buildfarm "make" run (on dodo and perentie
(see [2]) animals), at stage module-xid_wraparound-check and then at stage
testmodules-install-check-C.

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dodo=2024-05-19%2006%3A33%3A34
[2] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=perentie=2024-05-22%2000%3A02%3A19

Best regards,
Alexander




Re: State of pg_createsubscriber

2024-05-22 Thread Amit Kapila
On Wed, May 22, 2024 at 2:45 PM Shlok Kyal  wrote:
>
> > Just to summarize, apart from BF failures for which we had some
> > discussion, I could recall the following open points:
> >
> > 1. After promotion, the pre-existing replication objects should be
> > removed (either optionally or always), otherwise, it can lead to a new
> > subscriber not being able to restart or getting some unwarranted data.
> > [1][2].
> >
> I tried to reproduce the case and found a case where pre-existing
> replication objects can cause unwanted scenario:
>
> Suppose we have a setup of nodes N1, N2 and N3.
> N1 and N2 are in streaming replication where N1 is primary and N2 is standby.
> N3 and N1 are in logical replication where N3 is publisher and N1 is 
> subscriber.
> The subscription created on N1 is replicated to N2 due to streaming 
> replication.
>
> Now, after we run pg_createsubscriber on N2 and start the N2 server,
> we get the following logs repetitively:
> 2024-05-22 11:37:18.619 IST [27344] ERROR:  could not start WAL
> streaming: ERROR:  replication slot "test1" is active for PID 27202
> 2024-05-22 11:37:18.622 IST [27317] LOG:  background worker "logical
> replication apply worker" (PID 27344) exited with exit code 1
> 2024-05-22 11:37:23.610 IST [27349] LOG:  logical replication apply
> worker for subscription "test1" has started
> 2024-05-22 11:37:23.624 IST [27349] ERROR:  could not start WAL
> streaming: ERROR:  replication slot "test1" is active for PID 27202
> 2024-05-22 11:37:23.627 IST [27317] LOG:  background worker "logical
> replication apply worker" (PID 27349) exited with exit code 1
> 2024-05-22 11:37:28.616 IST [27382] LOG:  logical replication apply
> worker for subscription "test1" has started
>
> Note: 'test1' is the name of the subscription created on N1 initially
> and by default, slot name is the same as the subscription name.
>
> Once the N2 server is started after running pg_createsubscriber, the
> subscription that was earlier replicated by streaming replication will
> now try to connect to the publisher. Since the subscription name in N2
> is the same as the subscription created in N1, it will not be able to
> start a replication slot as the slot with the same name is active for
> logical replication between N3 and N1.
>
> Also, there would be a case where N1 becomes down for some time. Then
> in that case subscription on N2 will connect to the publication on N3
> and now data from N3 will be replicated to N2 instead of N1. And once
> N1 is up again, subscription on N1 will not be able to connect to
> publication on N3 as it is already connected to N2. This can lead to
> data inconsistency.
>

So, what shall we do about such cases?  I think by default we can
remove all pre-existing subscriptions and publications on the promoted
standby or instead we can remove them based on some switch. If we want
to go with this idea then we might need to distinguish the between
pre-existing subscriptions and the ones created by this tool.

The other case I remember adding an option in this tool was to avoid
specifying slots, pubs, etc. for each database. See [1]. We can
probably leave if the same is not important but we never reached the
conclusion of same.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1%2Br96SyHYHx7BaTtGX0eviqpbbkSu01MEzwV5b2VFXP6g%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: Pgoutput not capturing the generated columns

2024-05-22 Thread Peter Eisentraut

On 08.05.24 09:13, Shubham Khanna wrote:

The attached patch has the changes to support capturing generated
column data using ‘pgoutput’ and’ test_decoding’ plugin. Now if the
‘include_generated_columns’ option is specified, the generated column
information and generated column data also will be sent.


It might be worth keeping half an eye on the development of virtual 
generated columns [0].  I think it won't be possible to include those 
into the replication output stream.


I think having an option for including stored generated columns is in 
general ok.



[0]: 
https://www.postgresql.org/message-id/flat/a368248e-69e4-40be-9c07-6c3b5880b...@eisentraut.org





Re: tydedef extraction - back to the future

2024-05-22 Thread Peter Eisentraut

On 20.05.24 23:11, Andrew Dunstan wrote:
Attached is an attempt to thread this needle. The core is a new perl 
module that imports the current buildfarm client logic. The intention is 
that once we have this, the buildfarm client will switch to using the 
module (if found) rather than its own built-in logic. There is precedent 
for this sort of arrangement (AdjustUpgrade.pm). Accompanying the new 
module is a standalone perl script that uses the new module, and 
replaces the current shell script (thus making it more portable).


It looks like this code could use a bit of work to modernize and clean 
up cruft, such as


+   my $sep = $using_msvc ? ';' : ':';

This can be done with File::Spec.

+   next if $bin =~ m!bin/(ipcclean|pltcl_)!;

Those are long gone.

+   next if $bin =~ m!/postmaster.exe$!;# sometimes a 
copy not a link


Also gone.

+   elsif ($using_osx)
+   {
+   # On OS X, we need to examine the .o files

Update the name.

+   # exclude ecpg/test, which pgindent does too
+   my $obj_wanted = sub {
+   /^.*\.o\z/s
+ && !($File::Find::name =~ m!/ecpg/test/!s)
+ && push(@testfiles, $File::Find::name);
+   };
+
+   File::Find::find($obj_wanted, $binloc);
+   }

Not clear why this is specific to that platform.

Also, some instructions should be provided.  It looks like this is meant 
to be run on the installation tree?  A README and/or a build target 
would be good.


The code distinguishes between srcdir and bindir, but it's not clear 
what the latter is.  It rather looks like the installation prefix.  Does 
this code support out of tree builds?  This should be cleared up.






Re: in parentesis is not usual on DOCs

2024-05-22 Thread jian he
On Wed, May 22, 2024 at 7:14 PM Peter Eisentraut  wrote:
>
> On 20.05.24 02:00, jian he wrote:
> >> removing parentheses means we need to rephrase this sentence?
> >> So I come up with the following rephrase:
> >>
> >> The context_item specifies the input data to query, the
> >> path_expression is a JSON path expression defining the query,
> >> json_path_name is an optional name for the path_expression. The
> >> optional PASSING clause can provide data values to the
> >> path_expression.
> >
> > Based on this, write a simple patch.
>
> Your patch kind of messes up the indentation of the text you are
> changing.  Please check that.

please check attached.
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 17c44bc3..485f1822 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -18956,10 +18956,10 @@ where json_table_column is:
 
 
 
- The input data to query (context_item),
- the JSON path expression defining the query (path_expression)
- with an optional name (json_path_name), and an
- optional PASSING clause, which can provide data values
+ The context_item specifies the input data to query,
+ the path_expression is a JSON path expression defining the query,
+ json_path_name is an optional name for the path_expression.
+ The optional PASSING clause can provide data values
  to the path_expression.  The result of the input
  data evaluation using the aforementioned elements is called the
  row pattern, which is used as the source for row


Re: speed up a logical replica setup

2024-05-22 Thread Amit Kapila
On Mon, May 20, 2024 at 4:30 PM Shlok Kyal  wrote:
> >
> > I was trying to test this utility when 'sync_replication_slots' is on
> > and it gets in an ERROR loop [1] and never finishes. Please find the
> > postgresql.auto used on the standby attached. I think if the standby
> > has enabled sync_slots, you need to pass dbname in
> > GenerateRecoveryConfig(). I couldn't test it further but I wonder if
> > there are already synced slots on the standby (either due to
> > 'sync_replication_slots' or users have used
> > pg_sync_replication_slots() before invoking pg_createsubscriber),
> > those would be retained as it is on new subscriber and lead to
> > unnecessary WAL retention and dead rows.
> >
> > [1]
> > 2024-04-30 11:50:43.239 IST [12536] LOG:  slot sync worker started
> > 2024-04-30 11:50:43.247 IST [12536] ERROR:  slot synchronization
> > requires dbname to be specified in primary_conninfo
>
> Hi,
>
> I tested the scenario posted by Amit in [1], in which retaining synced
> slots lead to unnecessary WAL retention and ERROR. This is raised as
> the second open point in [2].
> The steps to reproduce the issue:
> (1) Setup physical replication with sync slot feature turned on by
> setting sync_replication_slots = 'true' or using
> pg_sync_replication_slots() on the standby node.
> For physical replication setup, run pg_basebackup with -R  and -d option.
> (2) Create a logical replication slot on primary node with failover
> option as true. A corresponding slot is created on standby as part of
> sync slot feature.
> (3) Run pg_createsubscriber on standby node.
> (4) On Checking for the replication slot on standby node, I noticed
> that the logical slots created in step 2 are retained.
>  I have attached the script to reproduce the issue.
>
> I and Kuroda-san worked to resolve open points. Here are patches to
> solve the second and third point in [2].
> Patches proposed by Euler are also attached just in case, but they
> were not modified.
>
> v2-0001: not changed
>

Shouldn't we modify it as per the suggestion given in the email [1]? I
am wondering if we can entirely get rid of checking the primary
business and simply rely on recovery_timeout and keep checking
server_is_in_recovery(). If so, we can modify the test to use
non-default recovery_timeout (say 180s or something similar if we have
used it at any other place). As an additional check we can ensure that
constent_lsn is present on standby.

> v2-0002: not changed
>

We have added more tries to see if the primary_slot_name becomes
active but I think it is still fragile because it is possible on slow
machines that the required slot didn't become active even after more
retries. I have raised the same comment previously [2] and asked an
additional question but didn't get any response.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JJq_ER6Kq_H%3DjKHR75QPRd8y9_D%3DRtYw%3DaPYKMfqLi9A%40mail.gmail.com
[2] - 
https://www.postgresql.org/message-id/CAA4eK1LT3Z13Dg6p4Z%2B4adO_EY-ow5CmWfikEmBfL%3DeVrm8CPw%40mail.gmail.com

-- 
With Regards,
Amit Kapila.




Re: in parentesis is not usual on DOCs

2024-05-22 Thread Peter Eisentraut

On 20.05.24 02:00, jian he wrote:

removing parentheses means we need to rephrase this sentence?
So I come up with the following rephrase:

The context_item specifies the input data to query, the
path_expression is a JSON path expression defining the query,
json_path_name is an optional name for the path_expression. The
optional PASSING clause can provide data values to the
path_expression.


Based on this, write a simple patch.


Your patch kind of messes up the indentation of the text you are 
changing.  Please check that.





Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Dave Page
On Tue, 21 May 2024 at 20:54, Andrew Dunstan  wrote:

>
> On 2024-05-21 Tu 11:04, Andres Freund wrote:
> > Hi,
> >
> > On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> >> I have very little experience with Meson, and even less interpreting
> it's
> >> logs, but it seems to me that it's not including the extra lib and
> include
> >> directories when it runs the test compile, given the command line it's
> >> reporting:
> >>
> >> cl
> C:\Users\dpage\git\postgresql\build\meson-private\tmpg_h4xcue\testfile.c
> >> /nologo /showIncludes /utf-8 /EP /nologo /showIncludes /utf-8 /EP /Od
> /Oi-
> >>
> >> Bug, or am I doing something silly?
> > It's a buglet. We rely on meson's internal fallback detection of zlib,
> if it's
> > not provided via pkg-config or cmake. But it doesn't know about our
> > extra_include_dirs parameter. We should probably fix that...
> >
>
> Yeah. Meanwhile, what I got working on a totally fresh Windows + VS
> install was instead of using extra_include_dirs etc to add the relevant
> directories to the environment LIB and INCLUDE settings before calling
> `"meson setup".
>

Yes, that works for me too.

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: zlib detection in Meson on Windows broken?

2024-05-22 Thread Dave Page
On Tue, 21 May 2024 at 18:00, Andres Freund  wrote:

> Hi,
>
> On 2024-05-20 11:58:05 +0100, Dave Page wrote:
> > I then attempt to build PostgreSQL:
> >
> >  meson setup build
> > -Dextra_include_dirs=C:/build64/openssl/include,C:/build64/zlib/include
> > -Dextra_lib_dirs=C:/build64/openssl/lib,C:/build64/zlib/lib -Dssl=openssl
> > -Dzlib=enabled --prefix=c:/build64/pgsql
> >
> > Which results in the output in output.txt, indicating that OpenSSL was
> > correctly found, but zlib was not. I've also attached the meson log.
>
> I forgot to mention that earlier: Assuming you're building something to be
> distributed, I'd recommend --auto-features=enabled/disabled and specifying
> specifically which dependencies you want to be used.
>

Good idea - thanks.

-- 
Dave Page
pgAdmin: https://www.pgadmin.org
PostgreSQL: https://www.postgresql.org
EDB: https://www.enterprisedb.com


Re: Avoid orphaned objects dependencies, take 3

2024-05-22 Thread Bertrand Drouvot
Hi,

On Tue, May 21, 2024 at 08:53:06AM -0400, Robert Haas wrote:
> On Wed, May 15, 2024 at 6:31 AM Bertrand Drouvot
>  wrote:
> > Please find attached v6 (only diff with v5 is moving the tests as suggested
> > above).
> 
> I don't immediately know what to think about this patch.

Thanks for looking at it!

> I've known about this issue for a long time, but I didn't think this was how 
> we
> would fix it.

I started initially with [1] but it was focusing on function-schema only.

Then I proposed [2] making use of a dirty snapshot when recording the 
dependency.
But this approach appeared to be "scary" and it was still failing to close
some race conditions.

Then, Tom proposed another approach in [2] which is that "creation DDL will have
to take a lock on each referenced object that'd conflict with a lock taken by 
DROP".
This is the one the current patch is trying to implement.

> What you've done here instead is add locking at a much
> lower level - whenever we are adding a dependency on an object, we
> lock the object. The advantage of that approach is that we definitely
> won't miss anything.

Right, as there is much more than the ones related to schemas, for example:

- function and arg type
- function and return type
- function and function
- domain and domain
- table and type
- server and foreign data wrapper

to name a few.

> The disadvantage of that approach is that it
> means we have some very low-level code taking locks, which means it's
> not obvious which operations are taking what locks.

Right, but the new operations are "only" the ones leading to recording or 
altering
a dependency.

> Maybe it could
> even result in some redundancy, like the higher-level code taking a
> lock also (possibly in a different mode) and then this code taking
> another one.

The one that is added here is in AccessShareLock mode. It could conflict with
the ones in AccessExclusiveLock means (If I'm not missing any):

- AcquireDeletionLock(): which is exactly what we want
- get_object_address()
- get_object_address_rv()
- ExecAlterObjectDependsStmt()
- ExecRenameStmt()
- ExecAlterObjectDependsStmt()
- ExecAlterOwnerStmt()
- RemoveObjects()
- AlterPublication()

I think there is 2 cases here:

First case: the "conflicting" lock mode is for one of our own lock then 
LockCheckConflicts()
would report this as a NON conflict.

Second case: the "conflicting" lock mode is NOT for one of our own lock then 
LockCheckConflicts()
would report a conflict. But I've the feeling that the existing code would
already lock those sessions.

One example where it would be the case:

Session 1: doing "BEGIN; ALTER FUNCTION noschemas.foo2() SET SCHEMA 
alterschema" would
acquire the lock in AccessExclusiveLock during 
ExecAlterObjectSchemaStmt()->get_object_address()->LockDatabaseObject()
(in the existing code and before the new call that would occur through 
changeDependencyFor()->depLockAndCheckObject()
with the patch in place).

Then, session 2: doing "alter function noschemas.foo2() owner to newrole;"
would be locked in the existing code while doing 
ExecAlterOwnerStmt()->get_object_address()->LockDatabaseObject()).

Means that in this example, the new lock this patch is introducing would not be
responsible of session 2 beging locked.

Was your concern about "First case" or "Second case" or both?

[1]: 
https://www.postgresql.org/message-id/flat/5a9daaae-5538-b209-6279-e903c3ea2157%40amazon.com
[2]: 
https://www.postgresql.org/message-id/flat/8369ff70-0e31-f194-2954-787f4d9e21dd%40amazon.com

Regards,

-- 
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: State of pg_createsubscriber

2024-05-22 Thread Shlok Kyal
> Just to summarize, apart from BF failures for which we had some
> discussion, I could recall the following open points:
>
> 1. After promotion, the pre-existing replication objects should be
> removed (either optionally or always), otherwise, it can lead to a new
> subscriber not being able to restart or getting some unwarranted data.
> [1][2].
>
I tried to reproduce the case and found a case where pre-existing
replication objects can cause unwanted scenario:

Suppose we have a setup of nodes N1, N2 and N3.
N1 and N2 are in streaming replication where N1 is primary and N2 is standby.
N3 and N1 are in logical replication where N3 is publisher and N1 is subscriber.
The subscription created on N1 is replicated to N2 due to streaming replication.

Now, after we run pg_createsubscriber on N2 and start the N2 server,
we get the following logs repetitively:
2024-05-22 11:37:18.619 IST [27344] ERROR:  could not start WAL
streaming: ERROR:  replication slot "test1" is active for PID 27202
2024-05-22 11:37:18.622 IST [27317] LOG:  background worker "logical
replication apply worker" (PID 27344) exited with exit code 1
2024-05-22 11:37:23.610 IST [27349] LOG:  logical replication apply
worker for subscription "test1" has started
2024-05-22 11:37:23.624 IST [27349] ERROR:  could not start WAL
streaming: ERROR:  replication slot "test1" is active for PID 27202
2024-05-22 11:37:23.627 IST [27317] LOG:  background worker "logical
replication apply worker" (PID 27349) exited with exit code 1
2024-05-22 11:37:28.616 IST [27382] LOG:  logical replication apply
worker for subscription "test1" has started

Note: 'test1' is the name of the subscription created on N1 initially
and by default, slot name is the same as the subscription name.

Once the N2 server is started after running pg_createsubscriber, the
subscription that was earlier replicated by streaming replication will
now try to connect to the publisher. Since the subscription name in N2
is the same as the subscription created in N1, it will not be able to
start a replication slot as the slot with the same name is active for
logical replication between N3 and N1.

Also, there would be a case where N1 becomes down for some time. Then
in that case subscription on N2 will connect to the publication on N3
and now data from N3 will be replicated to N2 instead of N1. And once
N1 is up again, subscription on N1 will not be able to connect to
publication on N3 as it is already connected to N2. This can lead to
data inconsistency.

This error did not happen before running pg_createsubscriber on
standby node N2, because there is no 'logical replication launcher'
process on standby node.

Thanks and Regards,
Shlok Kyal




Re: apply_scanjoin_target_to_paths and partitionwise join

2024-05-22 Thread Ashutosh Bapat
On Mon, May 6, 2024 at 6:28 PM Ashutosh Bapat 
wrote:

>
>
> On Mon, May 6, 2024 at 4:26 PM Jakub Wartak 
> wrote:
>
>> Hi Ashutosh & hackers,
>>
>> On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
>>  wrote:
>> >
>> > Here's patch with
>> >
>> [..]
>> > Adding to the next commitfest but better to consider this for the next
>> set of minor releases.
>>
>> 1. The patch does not pass cfbot -
>> https://cirrus-ci.com/task/5486258451906560 on master due to test
>> failure "not ok 206   + partition_join"
>>
>
> So I need to create a patch for master first. I thought CFBot somehow knew
> that the patch was created for PG 15. :)
>

PFA patch for master. That should fix CfBot.


>
>> 4. meson test ends up with failures like below:
>>
>>   4/290 postgresql:regress / regress/regress
>>  ERROR  32.67s
>>   6/290 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
>>  ERROR  56.96s
>>  35/290 postgresql:recovery / recovery/027_stream_regress
>>  ERROR  40.20s
>>
>> (all due to "regression tests pass" failures)
>>  [...]
>
>
>> So with the patch that SQL does not use partitionwise join as it finds
>> it more optimal to stick to a plan with cost of "1.10..2.21" instead
>> of "1.11..2.22" (w/ partition_join), nitpicking but still a failure
>> technically. Perhaps it could be even removed? (it's pretty close to
>> noise?).
>>
>
>
The test was added by 6b94e7a6da2f1c6df1a42efe64251f32a444d174 and later
modified by 3c569049b7b502bb4952483d19ce622ff0af5fd6. The modification just
avoided eliminating the join, so that change can be ignored.
6b94e7a6da2f1c6df1a42efe64251f32a444d174 added the tests to test fractional
paths being considered when creating ordered append paths. Reading the
commit message, I was expecting a test which did not use a join as well and
also which used inheritance. But it seems that the queries added by that
commit, test all the required scenarios and hence we see two queries
involving join between partitioned tables. As the comment there says the
intention is to verify index only scans and not exactly partitionwise join.
So just fixing the expected output of one query looks fine. The other query
will test partitionwise join and fractional paths anyway. I am including
Tomas, Arne and Zhihong, who worked on the first commit, to comment on
expected output changes.

I will create patches for the back-branches once the patch for master is in
a committable state.

-- 
Best Wishes,
Ashutosh Bapat
From f66de0ee842a20ced46d2661b4bf80914e9e8847 Mon Sep 17 00:00:00 2001
From: Ashutosh Bapat 
Date: Mon, 15 Apr 2024 12:03:01 +0530
Subject: [PATCH] Fix partitioned join case in apply_scanjoin_target_to_paths()

apply_scanjoin_target_to_paths() assumes that the cheapest paths for a
partitioned relation can be computed by Append'ing appropriate paths from the
partitions. This is not true for partitioned join relations which have two
types of paths: a. Append paths produced by partitionwise join, b. Join paths
produced by joining partitioned relations. Partitionwise join paths may not be
optimal always. But only those are recomputed by
apply_scanjoin_target_to_paths() after applying the new targetlist. Fix
apply_scanjoin_target_to_paths() not to wipe out all existing paths for a
partitioned join relation.

Author: Ashutosh Bapat
Review and initial investigation: Jacob Wartak
Discussion: https://www.postgresql.org/message-id/caexhw5toze58+jl-454j3ty11sqjyu13sz5rjpqzdmaswzg...@mail.gmail.com
---
 src/backend/optimizer/plan/planner.c | 33 ---
 src/test/regress/expected/partition_join.out | 96 +---
 src/test/regress/sql/partition_join.sql  | 15 ++-
 3 files changed, 93 insertions(+), 51 deletions(-)

diff --git a/src/backend/optimizer/plan/planner.c b/src/backend/optimizer/plan/planner.c
index 5320da51a0..4714fa41a0 100644
--- a/src/backend/optimizer/plan/planner.c
+++ b/src/backend/optimizer/plan/planner.c
@@ -7515,17 +7515,24 @@ apply_scanjoin_target_to_paths(PlannerInfo *root,
 	check_stack_depth();
 
 	/*
-	 * If the rel is partitioned, we want to drop its existing paths and
-	 * generate new ones.  This function would still be correct if we kept the
-	 * existing paths: we'd modify them to generate the correct target above
-	 * the partitioning Append, and then they'd compete on cost with paths
-	 * generating the target below the Append.  However, in our current cost
-	 * model the latter way is always the same or cheaper cost, so modifying
-	 * the existing paths would just be useless work.  Moreover, when the cost
-	 * is the same, varying roundoff errors might sometimes allow an existing
-	 * path to be picked, resulting in undesirable cross-platform plan
-	 * variations.  So we drop old paths and thereby force the work to be done
-	 * below the Append, except in the case of a non-parallel-safe target.
+	 * If the rel is partitioned simple rel, we want to drop its existing
+	 * paths and generate new ones.  This