Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Peter Smith
Here are some review comments for v22-0002

==
Commit Message

1.
This commit allows nodes with logical replication slots to be upgraded. While
reading information from the old cluster, a list of logical replication slots is
newly extracted. At the later part of upgrading, pg_upgrade revisits the list
and restores slots by using the pg_create_logical_replication_slots() on the new
clushter.

~

1a
/is newly extracted/is fetched/

~

1b.
/using the pg_create_logical_replication_slots()/executing
pg_create_logical_replication_slots()/

~

1c.
/clushter/cluster/

~~~

2.
Note that it must be done after the final pg_resetwal command during the upgrade
because pg_resetwal will remove WALs that are required by the slots. Due to the
restriction, the timing of restoring replication slots is different from other
objects.

~

2a.
/it must/slot restoration/

~

2b.
/the restriction/this restriction/

==
doc/src/sgml/ref/pgupgrade.sgml

3.
+
+ pg_upgrade attempts to migrate logical
+ replication slots. This helps avoid the need for manually defining the
+ same replication slot on the new publisher.
+

/same replication slot/same replication slots/

~~~

4.
+
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ ALTER
SUBSCRIPTION ... DISABLE.
+ After the upgrade is complete, execute the
+ ALTER SUBSCRIPTION ... CONNECTION command to update the
+ connection string, and then re-enable the subscription.
+

On the rendered page, it looks a bit strange that DISABLE has a link
but COMMENTION does not have a link.

~~~

5.
+
+ There are some prerequisites for pg_upgrade to
+ be able to upgrade the replication slots. If these are not met an error
+ will be reported.
+
+
+

+1 to use all the itemizedlist changes that Amit suggested [1] in his
attachment.

==
src/bin/pg_upgrade/check.c

6.
+static void check_for_logical_replication_slots(ClusterInfo *new_cluster);

IMO the arg name should not shadow a global with the same name. See
other review comment for this function signature.

~~~

7.
+ /* Extract a list of logical replication slots */
+ get_logical_slot_infos(&old_cluster, live_check);

But 'live_check' is never used?

~~~

8. check_for_logical_replication_slots
+
+/*
+ * Verify the parameter settings necessary for creating logical replication
+ * slots.
+ */
+static void
+check_for_logical_replication_slots(ClusterInfo *new_cluster)

IMO the arg name should not shadow a global with the same name. If
this is never going to be called with any param other than
&new_cluster then probably it is better not even to pass have that
argument at all. Just refer to the global new_cluster inside the
function.

You can't say that 'check_for_new_tablespace_dir' does it already so
it must be OK -- I think that the existing function has the same issue
and it also ought to be fixed to avoid shadowing!

~~~

9. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
+ return;

IMO the code matches the comment better if you say < 1700 instead of <= 1600.

==
src/bin/pg_upgrade/function.c

10. get_loadable_libraries
  /*
- * Fetch all libraries containing non-built-in C functions in this DB.
+ * Fetch all libraries containing non-built-in C functions or referred
+ * by logical replication slots in this DB.
  */
  ress[dbnum] = executeQueryOrDie(conn,
~

/referred by/referred to by/

==
src/bin/pg_upgrade/info.c

11.
+/*
+ * get_logical_slot_infos()
+ *
+ * Higher level routine to generate LogicalSlotInfoArr for all databases.
+ */
+void
+get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
+{
+ int dbnum;
+ int slot_count = 0;
+
+ if (cluster == &old_cluster)
+ pg_log(PG_VERBOSE, "\nsource databases:");
+ else
+ pg_log(PG_VERBOSE, "\ntarget databases:");
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ {
+ DbInfo*pDbInfo = &cluster->dbarr.dbs[dbnum];
+
+ get_logical_slot_infos_per_db(cluster, pDbInfo);
+ slot_count += pDbInfo->slot_arr.nslots;
+
+ if (log_opts.verbose)
+ {
+ pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
+ print_slot_infos(&pDbInfo->slot_arr);
+ }
+ }
+}
+

11a.
Now the variable 'slot_count' is no longer being returned so it seems redundant.

~

11b.
What is the 'live_check' parameter for? Nobody is using it.

~~~

12. count_logical_slots

+int
+count_logical_slots(ClusterInfo *cluster)
+{
+ int dbnum;
+ int slotnum = 0;
+
+ for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
+ slotnum += cluster->dbarr.dbs[dbnum].slot_arr.nslots;
+
+ return slotnum;
+}

IMO this variable should be called something like 'slot_count'. This
is the same review comment also made in a previous review. (See [2]
comment#12).

~~~

13. print_slot_infos

+
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum

Re: Remove distprep

2023-08-21 Thread Michael Paquier
On Fri, Aug 18, 2023 at 10:22:47AM +0200, Christoph Berg wrote:
> Yes, mostly. Since autoconf had not seen a new release for so long,
> everyone started to patch it, and one of the things that Debian and
> others added was --runstatedir=DIR. The toolchain is also using it,
> it's part of the default set of options used by dh_auto_configure.
> 
> In parallel, the standard debhelper toolchain also started to run
> autoreconf by default, so instead of telling dh_auto_configure to omit
> --runstatedir, it was really easier to patch configure.ac to remove
> the autoconf 2.69 check.

Ah, I didn't know this part of the story.  Thanks for the insights.

> Two of the other patches are touching configure(.ac) anyway to tweak
> compiler flags (reproducibility, aarch64 tweaks).

Is reproducibility something you've brought to a separate thread?
FWIW, I'd be interested in improving this area for the in-core code,
if need be.  (Not material for this thread, of course).
--
Michael


signature.asc
Description: PGP signature


Re: WIP: new system catalog pg_wait_event

2023-08-21 Thread Drouvot, Bertrand

Hi,

On 8/20/23 10:07 AM, Michael Paquier wrote:

On Sat, Aug 19, 2023 at 06:30:12PM +0200, Drouvot, Bertrand wrote:

Thanks, fixed in v10.


Okay.  I have done an extra review of it, simplifying a few things in
the function, the comments and the formatting, and applied the patch.


Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: dubious warning: FORMAT JSON has no effect for json and jsonb types

2023-08-21 Thread Amit Langote
On Fri, Aug 18, 2023 at 2:59 PM Peter Eisentraut  wrote:
> On 16.08.23 16:59, Merlin Moncure wrote:
> > On Wed, Aug 16, 2023 at 8:55 AM Peter Eisentraut  > > wrote:
> >
> > This warning comes from parse_expr.c transformJsonValueExpr() and is
> > triggered for example by the following test case:
> >
> > SELECT JSON_OBJECT('foo': NULL::json FORMAT JSON);
> > WARNING:  FORMAT JSON has no effect for json and jsonb types
> >
> > But I don't see anything in the SQL standard that would require this
> > warning.  It seems pretty clear that FORMAT JSON in this case is
> > implicit and otherwise without effect.
> >
> > Also, we don't have that warning in the output case (RETURNING json
> > FORMAT JSON).
> >
> > Anyone remember why this is here?  Should we remove it?
> >
> >
> > +1 for removing, on the basis that it is not suprising, and would
> > pollute logs for most configurations.
>
> done

+1 and thanks.  May have been there as a debugging aid if anything.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com




Re: Pgoutput not capturing the generated columns

2023-08-21 Thread Rajendra Kumar Dangwal
Thanks Euler,
Greatly appreciate your inputs.

> Should pgoutput provide a complete row? Probably. If it is an option that 
> defaults to false and doesn't impact performance.

Yes, it would be great if this feature can be implemented.

> The logical replication design decides to compute the generated columns at 
> subscriber side.

If I understand correctly, this approach involves establishing a function on 
the subscriber's side that emulates the operation executed to derive the 
generated column values.
If yes, I see one potential issue where disparities might surface between the 
values of generated columns on the subscriber's side and those computed within 
Postgres. This could happen if the generated column's value relies on the 
current_time function.

Please let me know how can we track the feature requests and the discussions 
around that.

Thanks,
Rajendra.

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-21 Thread Peter Smith
Hi Melih,

Last week we revisited your implementation of design#2. Vignesh rebased it,
and then made a few other changes.

PSA v28*

The patch changes include:
* changed the logic slightly by setting recv_immediately(new variable), if
this variable is set the main apply worker loop will not wait in this case.
* setting the relation state to ready immediately if there are no more
incremental changes to be synced.
* receive the incremental changes if applicable and set the relation state
to ready without waiting.
* reuse the worker if the worker is free before trying to start a new table
sync worker
* restarting the tablesync worker only after wal_retrieve_retry_interval

~

FWIW, we just wanted to share with you the performance measurements seen
using this design#2 patch set:

==

RESULTS (not busy tests)

--
10 empty tables
2w  4w  8w  16w
HEAD:   125 119 140 133
HEAD+v28*:  92  93  123 134
%improvement:   27% 22% 12% -1%
--
100 empty tables
2w  4w  8w  16w
HEAD:   1037843 11091155
HEAD+v28*:  591 625 26162569
%improvement:   43% 26% -136%   -122%
--
1000 empty tables
2w  4w  8w  16w
HEAD:   15874   10047   991910338
HEAD+v28*:  33673   12199   90949896
%improvement:   -112%   -21%8%  4%
--
2000 empty tables
2w  4w  8w  16w
HEAD:   45266   24216   19395   19820
HEAD+v28*:  88043   21550   21668   22607
%improvement:  -95% 11% -12%-14%

~~~

Note - the results were varying quite a lot in comparison to the HEAD
e.g. HEAD results are very consistent, but the v28* results observed are not
HEAD 1000 (2w): 15861, 15777, 16007, 15950, 15886, 15740, 15846, 15740,
15908, 15940
v28* 1000 (2w):  34214, 13679, 8792, 33289, 31976, 56071, 57042, 56163,
34058, 11969

--
Kind Regards,
Peter Smith.
Fujitsu Australia


Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2023-08-21 Thread Peter Smith
Oops - now with attachments

On Mon, Aug 21, 2023 at 5:56 PM Peter Smith  wrote:

> Hi Melih,
>
> Last week we revisited your implementation of design#2. Vignesh rebased
> it, and then made a few other changes.
>
> PSA v28*
>
> The patch changes include:
> * changed the logic slightly by setting recv_immediately(new variable), if
> this variable is set the main apply worker loop will not wait in this case.
> * setting the relation state to ready immediately if there are no more
> incremental changes to be synced.
> * receive the incremental changes if applicable and set the relation state
> to ready without waiting.
> * reuse the worker if the worker is free before trying to start a new
> table sync worker
> * restarting the tablesync worker only after wal_retrieve_retry_interval
>
> ~
>
> FWIW, we just wanted to share with you the performance measurements seen
> using this design#2 patch set:
>
> ==
>
> RESULTS (not busy tests)
>
> --
> 10 empty tables
> 2w  4w  8w  16w
> HEAD:   125 119 140 133
> HEAD+v28*:  92  93  123 134
> %improvement:   27% 22% 12% -1%
> --
> 100 empty tables
> 2w  4w  8w  16w
> HEAD:   1037843 11091155
> HEAD+v28*:  591 625 26162569
> %improvement:   43% 26% -136%   -122%
> --
> 1000 empty tables
> 2w  4w  8w  16w
> HEAD:   15874   10047   991910338
> HEAD+v28*:  33673   12199   90949896
> %improvement:   -112%   -21%8%  4%
> --
> 2000 empty tables
> 2w  4w  8w  16w
> HEAD:   45266   24216   19395   19820
> HEAD+v28*:  88043   21550   21668   22607
> %improvement:  -95% 11% -12%-14%
>
> ~~~
>
> Note - the results were varying quite a lot in comparison to the HEAD
> e.g. HEAD results are very consistent, but the v28* results observed are
> not
> HEAD 1000 (2w): 15861, 15777, 16007, 15950, 15886, 15740, 15846, 15740,
> 15908, 15940
> v28* 1000 (2w):  34214, 13679, 8792, 33289, 31976, 56071, 57042, 56163,
> 34058, 11969
>
> --
> Kind Regards,
> Peter Smith.
> Fujitsu Australia
>


v28-0001-Reuse-Tablesync-Workers.patch
Description: Binary data


v28-0002-Reuse-connection-when-tablesync-workers-change-t.patch
Description: Binary data


v28-0004-Defect-fixes.patch
Description: Binary data


v28-0003-apply-worker-assigns-tables.patch
Description: Binary data


Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-21 Thread Jian Guo
Hi hackers,

I found a new approach to fix this issue, which seems better, so I would like 
to post another version of the patch here. The origin patch made the assumption 
of the values of Vars from CTE must be unique, which could be very wrong. This 
patch examines variables for Vars inside CTE, which avoided the bad assumption, 
so the results could be much more accurate.

Regards,
Jian


From: Tomas Vondra 
Sent: Monday, August 14, 2023 20:58
To: Jian Guo ; Hans Buschmann ; 
pgsql-hackers@lists.postgresql.org 
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

Hi,

I haven't looked at the patch, but please add the patch to the next
commit fest (2023-09), so that we don't lose track of it.

See 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C9d40e84af2c946f3517a08db9cc61ee2%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638276146959658928%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=EUlMgo%2BU4Oi%2BWf0cS%2FKnTmhHzZrYzu26PzfxYnZIDFs%3D&reserved=0


regards

Tomas

On 8/14/23 13:12, Jian Guo wrote:
> Hi hackers,
>
> I have written a patch to add stats info for Vars in CTEs. With this
> patch, the join size estimation on the upper of CTE scans became more
> accurate.
>
> In the function |selfuncs.c:eqjoinsel| it uses the number of the
> distinct values of the two join variables to estimate join size, and in
> the function |selfuncs.c:get_variable_numdistinct| return a default
> value |DEFAULT_NUM_DISTINCT| (200 in Postgres and 1000 in Greenplum),
> with the default value, you can never expect a good plan.
>
> Thanks if anyone could give a review.
>
> Regards,
> Jian
>
> 
> *From:* Hans Buschmann 
> *Sent:* Wednesday, February 8, 2023 21:55
> *To:* pgsql-hackers@lists.postgresql.org
> 
> *Subject:* Wrong rows estimations with joins of CTEs slows queries by
> more than factor 500
>
>
> !! External Email
>
> During data refactoring of our Application I encountered $subject when
> joining 4 CTEs with left join or inner join.
>
>
> 1. Background
>
> PG 15.1 on Windows x64 (OS seems no to have no meening here)
>
>
> I try to collect data from 4 (analyzed) tables (up,li,in,ou) by grouping
> certain data (4 CTEs qup,qli,qin,qou)
>
> The grouping of the data in the CTEs gives estimated row counts of about
> 1000 (1 tenth of the real value) This is OK for estimation.
>
>
> These 4 CTEs are then used to combine the data by joining them.
>
>
> 2. Problem
>
> The 4 CTEs are joined by left joins as shown below:
>
>
> from qup
> left join qli on (qli.curr_season=qup.curr_season and
> qli.curr_code=qup.curr_code and qli.ibitmask>0 and
> cardinality(qli.mat_arr) <=8)
> left join qin on (qin.curr_season=qup.curr_season and
> qin.curr_code=qup.curr_code and qin.ibitmask>0 and
> cardinality(qin.mat_arr) <=8)
> left join qou on (qou.curr_season=qup.curr_season and
> qou.curr_code=qup.curr_code and qou.ibitmask>0 and
> cardinality(qou.mat_arr) <=11)
> where qup.ibitmask>0 and cardinality(qup.mat_arr) <=21
>
> The plan first retrieves qup and qli, taking the estimated row counts of
> 1163 and 1147 respectively
>
>
> BUT the result is then hashed and the row count is estimated as 33!
>
>
> In a Left join the row count stays always the same as the one of left
> table (here qup with 1163 rows)
>
>
> The same algorithm which reduces the row estimation from 1163 to 33 is
> used in the next step to give an estimation of 1 row.
>
> This is totally wrong.
>
>
> Here is the execution plan of the query:
>
> (search the plan for rows=33)
>
>
>
> QUERY PLAN
> --
>  Append  (cost=13673.81..17463.30 rows=5734 width=104) (actual
> time=168.307..222.670 rows=9963 loops=1)
>CTE qup
>  ->  GroupAggregate  (cost=5231.22..6303.78 rows=10320 width=80)
> (actual time=35.466..68.131 rows=10735 loops=1)
>Group Key: sa_upper.sup_season, sa_upper.sup_sa_code
>->  Sort  (cost=5231.22..5358.64 rows=50969 width=18) (actual
> time=35.454..36.819 rows=50969 loops=1)
>  Sort Key: sa_upper.sup_season, sa_upper.sup_sa_code
> COLLATE "C"
>  Sort Method: quicksort  Memory: 4722kB
>  ->  Hash Left Join  (cost=41.71..1246.13 rows=50969
> width=18) (actual time=0.148..10.687 rows=50969 loops=1)
>Hash Cond: ((sa_upper.sup_mat_code)::text =
> upper_target.up_mat_code)
>->  Seq Scan on sa_upper  (cost=0.00..884.69
> rows=50969 width=16) (actual time=0.005..1.972 rows=50969 loops=1)
>->  Hash  (cost=35.53..35.53 rows=495 width=6)
> (a

Re: pg_upgrade - typo in verbose log

2023-08-21 Thread Daniel Gustafsson
> On 18 Aug 2023, at 02:47, Michael Paquier  wrote:
> 
> On Thu, Aug 17, 2023 at 06:09:31PM +1000, Peter Smith wrote:
>> Ping.
> 
> Funnily enough, I was looking at this entry yesterday, before you
> replied, and was wondering what's happening here.

It was a combination of summer vacation, doing CFM and looking at things for
v16, and some things were left further down on the TODO stack duing this.

>> I thought v2 was ready to be pushed, but then this thread went silent
>> for 3 months
> 
> Change looks fine, so applied.

Agreed, and thanks for applying.

--
Daniel Gustafsson





Re: [PATCH] Add function to_oct

2023-08-21 Thread Dean Rasheed
On Sun, 20 Aug 2023 at 16:25, Nathan Bossart  wrote:
>
> On Sat, Aug 19, 2023 at 08:35:46AM +0100, Dean Rasheed wrote:
>
> > The way that negative inputs are handled really should be documented,
> > or at least it should include a couple of examples.
>
> I used your suggestion and noted that the output is the two's complement
> representation [0].
>

Hmm, I think just including the doc text update, without the examples
of positive and negative inputs, might not be sufficient to make the
meaning clear to everyone.

Something else that bothers me slightly is the function naming --
"hexadecimal" gets abbreviated to "hex", "octal" gets abbreviated to
"oct", but "binary" is left as-is. I think it ought to be "to_bin()"
on consistency grounds, even though I understand the words "to bin"
could be interpreted differently. (Looking elsewhere for precedents,
Python has bin(), oct() and hex() functions.)

Also, I think the convention is to always list functions
alphabetically, so to_oct() should really come after to_hex().

Regards,
Dean




Re: UUID v7

2023-08-21 Thread Andrey M. Borodin


> On 20 Aug 2023, at 23:56, Andrey M. Borodin  wrote:
> 
> 

I've observed, that pre-generating and buffering random numbers makes UUID 
generation 10 times faster.

Without buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select 
count(*) from x;
Time: 5286.572 ms (00:05.287)

With buffering
postgres=# with x as (select gen_uuid_v7() from generate_series(1,1e6)) select 
count(*) from x;
Time: 390.091 ms

This can speed up gen_random_uuid() on the same scale too. PFA implementation 
of this technique.


Best regards, Andrey Borodin.


v5-0001-Implement-UUID-v7-as-per-IETF-draft.patch
Description: Binary data


v5-0002-Buffer-random-numbers.patch
Description: Binary data


v5-0003-Use-cached-random-numbers-in-gen_random_uuid-too.patch
Description: Binary data


Re: [dsm] comment typo

2023-08-21 Thread Daniel Gustafsson
> On 18 Aug 2023, at 11:10, Junwang Zhao  wrote:
> 
> In the following sentence, I believe either 'the' or 'a' should be kept, not
> both. I here keep the 'the', but feel free to change.

>  * handle: The handle of an existing object, or for DSM_OP_CREATE, the
> - *a new handle the caller wants created.
> + *new handle the caller wants created.

Since the handle doesn't exist for DSM_OP_CREATE, both "a handle" and "the
handle" seems a tad misleading, how about "the identifier for the new handle the
caller wants created"?

--
Daniel Gustafsson





Optimize Arm64 crc32 implementation in PostgreSQL

2023-08-21 Thread Xiang Gao
Hi all,

Currently PostgreSQL has three different variants of a 32-bit CRC calculation: 
CRC-32C, CRC-32(Ethernet polynomial),
and a legacy CRC-32 version that uses the lookup table. Some ARMv8 (AArch64) 
CPUs implement the CRC32 extension which
is equivalent with CRC-32(Ethernet polynomial), so they can also benefit from 
hardware acceleration.

Can I propose a patch to optimize crc32 calculation with Arm64 specific 
instructions?

Any comments or feedback are welcome.
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Zhijie Hou (Fujitsu)
On Monday, August 21, 2023 11:21 AM Amit Kapila  wrote:
> 
> On Sun, Aug 20, 2023 at 6:49 PM Masahiko Sawada
>  wrote:
> >
> > On Thu, Aug 17, 2023 at 10:31 PM Amit Kapila 
> wrote:
> > >
> > > >
> > > > Sorry I was not clear. I meant the logical replication slots that
> > > > are
> > > > *not* used by logical replication, i.e., are created manually and
> > > > used by third party tools that periodically consume decoded
> > > > changes. As we discussed before, these slots will never be able to
> > > > pass that confirmed_flush_lsn check.
> > > >
> > >
> > > I think normally one would have a background process to periodically
> > > consume changes. Won't one can use the walsender infrastructure for
> > > their plugins to consume changes probably by using replication
> > > protocol?
> >
> > Not sure.
> >
> 
> I think one can use Streaming Replication Protocol to achieve it [1].
> 
> > > Also, I feel it is the plugin author's responsibility to consume
> > > changes or advance slot to the required position before shutdown.
> >
> > How does the plugin author ensure that the slot consumes all WAL
> > records including shutdown_checkpoint before shutdown?
> >
> 
> By using "Streaming Replication Protocol" so that walsender can take care of 
> it.
> If not, I think users should drop such slots before the upgrade because 
> anyway,
> they won't be usable after the upgrade.

Yes, I think pglogical is one example which start a bgworker(apply worker) on 
client to
consume changes which also uses Streaming Replication Protocol IIRC. And
pg_recvlogical is another example which connects to walsender and consume 
changes.

Best Regards,
Hou zj




Re: Fix typo in src/interfaces/libpq/po/zh_CN.po

2023-08-21 Thread Daniel Gustafsson
> On 19 Aug 2023, at 13:36, Zhang Mingli  wrote:
> 
>> On Aug 16, 2023, at 22:24, Peter Eisentraut  wrote:
>> 
>> On 16.08.23 09:34, Zhang Mingli wrote:
>>> The Chinese words there are ok,  but the `Unix-domian` should be 
>>> `Unix-domain`.
>> 
>> fixed, thanks
> 
> Hi, Peter, thanks and  just want to make sure that it is pushed?

This was fixed by Peter as mentioned upthread, but the translations are
maintained in its own Git repository so the commit is not visible in the main
Git repo.  Translations are synced with the main repo before releases.  The
commit can be seen here:

https://git.postgresql.org/gitweb/?p=pgtranslation/messages.git;a=commitdiff;h=14391f71ca61e90d52502093447fe1ee0080116f

--
Daniel Gustafsson





RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Zhijie Hou (Fujitsu)
On Friday, August 18, 2023 9:52 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> 
> Dear Peter,
> 
> PSA new version patch set.

Thanks for updating the patch!
Here are few comments about 0003 patch.

1.

+check_for_lost_slots(ClusterInfo *cluster)
+{
+   int i,
+   ntups,
+   i_slotname;
+   PGresult   *res;
+   DbInfo *active_db = &cluster->dbarr.dbs[0];
+   PGconn *conn = connectToServer(cluster, active_db->db_name);
+ 
+   /* logical slots can be migrated since PG17. */
+   if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+   return;

I think we should build connection after this check, otherwise the connection
may be left open after returning.


2.
+check_for_confirmed_flush_lsn(ClusterInfo *cluster)
+{
+   int i,
+   ntups,
+   i_slotname;
+   PGresult   *res;
+   DbInfo *active_db = &cluster->dbarr.dbs[0];
+   PGconn *conn = connectToServer(cluster, active_db->db_name);
+
+   /* logical slots can be migrated since PG17. */
+   if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+   return;

Same as above.

3.
+   if (GET_MAJOR_VERSION(cluster->major_version) 
>= 17)
+   {

I think you mean 1700 here.


4.
+   p = strpbrk(p, "01234567890ABCDEF");
+
+   /*
+* Upper and lower part of LSN must be 
read separately
+* because it is reported as %X/%X 
format.
+*/
+   upper_lsn = strtoul(p, &slash, 16);
+   lower_lsn = strtoul(++slash, NULL, 16);

Maybe we'd better add a sanity check after strpbrk like "if (p == NULL ||
strlen(p) <= 1)" to be consistent with other similar code.

Best Regards,
Hou zj


Re: [dsm] comment typo

2023-08-21 Thread Junwang Zhao
On Mon, Aug 21, 2023 at 5:16 PM Daniel Gustafsson  wrote:
>
> > On 18 Aug 2023, at 11:10, Junwang Zhao  wrote:
> >
> > In the following sentence, I believe either 'the' or 'a' should be kept, not
> > both. I here keep the 'the', but feel free to change.
>
> >  * handle: The handle of an existing object, or for DSM_OP_CREATE, the
> > - *a new handle the caller wants created.
> > + *new handle the caller wants created.
>
> Since the handle doesn't exist for DSM_OP_CREATE, both "a handle" and "the
> handle" seems a tad misleading, how about "the identifier for the new handle 
> the
> caller wants created"?
>

Sounds great 👍

> --
> Daniel Gustafsson
>


-- 
Regards
Junwang Zhao




Re: Adding a LogicalRepWorker type field

2023-08-21 Thread Amit Kapila
On Mon, Aug 21, 2023 at 5:34 AM Peter Smith  wrote:
>
> On Fri, Aug 18, 2023 at 6:16 PM Zhijie Hou (Fujitsu)
>  wrote:
> >
> > On Friday, August 18, 2023 11:20 AM Amit Kapila  
> > wrote:
> > >
> > > On Mon, Aug 14, 2023 at 12:08 PM Peter Smith 
> > > wrote:
> > > >
> > > > The main patch for adding the worker type enum has been pushed [1].
> > > >
> > > > Here is the remaining (rebased) patch for changing some previous
> > > > cascading if/else to switch on the LogicalRepWorkerType enum instead.
> > > >
> > >
> > > I see this as being useful if we plan to add more worker types. Does 
> > > anyone else
> > > see this remaining patch as an improvement?
> >
> > +1
> >
> > I have one comment for the new error message.
> >
> > +   case WORKERTYPE_UNKNOWN:
> > +   ereport(ERROR, 
> > errmsg_internal("should_apply_changes_for_rel: Unknown worker type"));
> >
> > I think reporting an ERROR in this case is fine. However, I would suggest
> > refraining from mentioning the function name in the error message, as
> > recommended in the error style document [1]. Also, it appears we could use
> > elog() here.
> >
> > [1] https://www.postgresql.org/docs/devel/error-style-guide.html
>
> OK. Modified as suggested. Anyway, getting these errors should not
> even be possible, so I added a comment to emphasise that.
>
> PSA v9
>

LGTM. I'll push this tomorrow unless there are any more comments.

-- 
With Regards,
Amit Kapila.




Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-21 Thread Richard Guo
On Sun, Aug 20, 2023 at 11:48 PM Tom Lane  wrote:

> I looked over the v3 patch.  I think it's going in generally
> the right direction, but there is a huge oversight:
> path_is_reparameterizable_by_child does not come close to modeling
> the success/failure conditions of reparameterize_path_by_child.
> In all the cases where reparameterize_path_by_child can recurse,
> path_is_reparameterizable_by_child has to do so also, to check
> whether the child path is reparameterizable.  (I'm somewhat
> disturbed that apparently we have no test cases that caught that
> oversight; can we add one cheaply?)


Thanks for pointing this out.  It's my oversight.  We have to check the
child path(s) recursively to tell if a path is reparameterizable or not.
I've fixed this in v4 patch, along with a test case.  In the test case
we have a MemoizePath whose subpath is TidPath, which is not
reparameterizable.  This test case would trigger Assert in v3 patch.


> BTW, I also note from the cfbot that 9e9931d2b broke this patch by
> adding more ADJUST_CHILD_ATTRS calls that need to be modified.
> I wonder if we could get away with having that macro cast to "void *"
> to avoid needing to change all its call sites.  I'm not sure whether
> pickier compilers might warn about doing it that way.


Agreed and have done that in v4 patch.

Thanks
Richard


v4-0001-Postpone-reparameterization-of-paths-until-when-creating-plans.patch
Description: Binary data


Re: Fix typo in src/interfaces/libpq/po/zh_CN.po

2023-08-21 Thread Zhang Mingli

> 
> This was fixed by Peter as mentioned upthread, but the translations are
> maintained in its own Git repository so the commit is not visible in the main
> Git repo.  Translations are synced with the main repo before releases.  The
> commit can be seen here:
> 
> https://git.postgresql.org/gitweb/?p=pgtranslation/messages.git;a=commitdiff;h=14391f71ca61e90d52502093447fe1ee0080116f
> 
> --
> Daniel Gustafsson
> 

Thanks, got it~

Zhang Mingli
HashData https://www.hashdata.xyz



Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-21 Thread Tomas Vondra
On 8/21/23 10:16, Jian Guo wrote:
> Hi hackers,
> 
> I found a new approach to fix this issue, which seems better, so I would
> like to post another version of the patch here. The origin patch made
> the assumption of the values of Vars from CTE must be unique, which
> could be very wrong. This patch examines variables for Vars inside CTE,
> which avoided the bad assumption, so the results could be much more
> accurate.
> 

No problem with posting a reworked patch to the same thread, but I'll
repeat my suggestion to register this in the CF app [1]. The benefit is
that people are more likely to notice the patch and also cfbot [2] will
run regression tests.

[1] https://commitfest.postgresql.org
[2] http://cfbot.cputube.org/


-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: Extract numeric filed in JSONB more effectively

2023-08-21 Thread Andy Fan
>
>
> Interestingly, when I relabel both places, like this:
>
>  Oid   targetOid = fexpr->funcresulttype;
>  Const *target  = makeConst(
>OIDOID, -1, InvalidOid, sizeof(Oid),
>ObjectIdGetDatum(targetOid), false, true);
>  RelabelType *rTarget = makeRelabelType((Expr *)target,
>INTERNALOID, -1, InvalidOid, COERCE_IMPLICIT_CAST);
>  fexpr->funcid = new_func_id;
>  fexpr->args = opexpr->args;
>  fexpr->args = list_insert_nth(fexpr->args, 0, rTarget);
>  expr = (Expr *)makeRelabelType((Expr *)fexpr,
>targetOid, -1, InvalidOid, COERCE_IMPLICIT_CAST);
>}
>PG_RETURN_POINTER(expr);
>
> EXPLAIN looks like this:
>
>   Seq Scan on pg_temp.test_jsonb
> Output: jsonb_array_element_type(('23'::oid)::internal, test_json,
> 0), (test_json -> 0)
> Filter: (test_jsonb.json_type = 'scalarint'::text)
>
> With COERCE_IMPLICIT_CAST both places, the relabeling of the
> function result is invisible, but the relabeling of the argument
> is visible.
>
>
I think this is because get_rule_expr's showimplicit is always
true for args in this case, checking the implementation of
get_rule_expr, I found PG behavior like this in many places.

-- 
Best Regards
Andy Fan


Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-21 Thread Etsuro Fujita
Hi,

On Sun, Aug 20, 2023 at 4:34 AM Andres Freund  wrote:
>
> Hi,
>
> On 2023-08-19 20:09:25 +0900, Etsuro Fujita wrote:
> > Maybe my explanation was not enough, so let me explain:
> >
> > * I think you could use the set_join_pathlist_hook hook as you like at
> > your own responsibility, but typical use cases of the hook that are
> > designed to support in the core system would be just add custom paths
> > for replacing joins with scans, as described in custom-scan.sgml (this
> > note is about set_rel_pathlist_hook, but it should also apply to
> > set_join_pathlist_hook):
> >
> > Although this hook function can be used to examine, modify, or remove
> > paths generated by the core system, a custom scan provider will 
> > typically
> > confine itself to generating CustomPath
> > objects and adding
> > them to rel using add_path.
>
> That supports citus' use more than not: "this hook function can be used to
> examine ... paths generated by the core system".
>
>
> > * The problem we had with the set_join_pathlist_hook hook is that in
> > such a typical use case, previously, if the replaced joins had any
> > pseudoconstant clauses, the planner would produce incorrect query
> > plans, due to the lack of support for handling such quals in
> > createplan.c.  We could fix the extensions side, as you proposed, but
> > the cause of the issue is 100% the planner's deficiency, so it would
> > be unreasonable to force the authors to do so, which would also go
> > against our policy of ABI compatibility.  So I fixed the core side, as
> > in the FDW case, so that extensions created for such a typical use
> > case, which I guess are the majority of the hook extensions, need not
> > be modified/recompiled.  I think it is unfortunate that that breaks
> > the use case of the Citus extension, though.
>
> I'm not neutral - I don't work on citus, but work in the same Unit as
> Onder. With that said: I don't think that's really a justification for
> breaking a pre-existing, not absurd, use case in a minor release.
>
> Except that this was only noticed after it was released in a set of minor
> versions, I would say that 6f80a8d9c should just straight up be reverted.
> Skimming the thread there wasn't really any analysis done about breaking
> extensions etc - and that ought to be done before a substantial semantics
> change in a somewhat commonly used hook.  I'm inclined to think that that
> might still be the right path.
>
>
> > BTW: commit 9e9931d2b removed the restriction on the call to the hook
> > extensions, so you might want to back-patch it.
>
> Citus is an extension, not a fork, there's not really a way to just backpatch
> a random commit.
>
>
> > Though, I think it would be better if the hook was well implemented from the
> > beginning.
>
> Sure, but that's neither here nor there.
>
> Greetings,
>
> Andres Freund




Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-21 Thread Etsuro Fujita
Sorry, I hit the send button by mistake.

On Sun, Aug 20, 2023 at 4:34 AM Andres Freund  wrote:
> On 2023-08-19 20:09:25 +0900, Etsuro Fujita wrote:
> > * The problem we had with the set_join_pathlist_hook hook is that in
> > such a typical use case, previously, if the replaced joins had any
> > pseudoconstant clauses, the planner would produce incorrect query
> > plans, due to the lack of support for handling such quals in
> > createplan.c.  We could fix the extensions side, as you proposed, but
> > the cause of the issue is 100% the planner's deficiency, so it would
> > be unreasonable to force the authors to do so, which would also go
> > against our policy of ABI compatibility.  So I fixed the core side, as
> > in the FDW case, so that extensions created for such a typical use
> > case, which I guess are the majority of the hook extensions, need not
> > be modified/recompiled.  I think it is unfortunate that that breaks
> > the use case of the Citus extension, though.
>
> I'm not neutral - I don't work on citus, but work in the same Unit as
> Onder. With that said: I don't think that's really a justification for
> breaking a pre-existing, not absurd, use case in a minor release.
>
> Except that this was only noticed after it was released in a set of minor
> versions, I would say that 6f80a8d9c should just straight up be reverted.
> Skimming the thread there wasn't really any analysis done about breaking
> extensions etc - and that ought to be done before a substantial semantics
> change in a somewhat commonly used hook.  I'm inclined to think that that
> might still be the right path.

I think you misread the thread; actually, we did an analysis and
applied a fix that would avoid ABI breakage (see the commit message
for 6f80a8d9c).  It turned out that that breaks the Citus extension,
though.

Also, this is not such a change; it is just an optimization
disablement.  Let me explain.  This is the commit message for
e7cb7ee14, which added the hook we are discussing:

Allow FDWs and custom scan providers to replace joins with scans.

Foreign data wrappers can use this capability for so-called "join
pushdown"; that is, instead of executing two separate foreign scans
and then joining the results locally, they can generate a path which
performs the join on the remote server and then is scanned locally.
This commit does not extend postgres_fdw to take advantage of this
capability; it just provides the infrastructure.

Custom scan providers can use this in a similar way.  Previously,
it was only possible for a custom scan provider to scan a single
relation.  Now, it can scan an entire join tree, provided of course
that it knows how to produce the same results that the join would
have produced if executed normally.

As described in the commit message, we assume that extensions use the
hook in a similar way to FDWs; if they do so, the restriction added by
6f80a8d9c just diables them to add paths for join pushdown, making the
planner use paths involving local joins, so any breakage (other than
plan changes from custom joins to local joins) would never happen.

So my question is: does the Citus extension use the hook like this?
(Sorry, I do not fully understand Onder's explanation.)

> > BTW: commit 9e9931d2b removed the restriction on the call to the hook
> > extensions, so you might want to back-patch it.
>
> Citus is an extension, not a fork, there's not really a way to just backpatch
> a random commit.

Yeah, I was thinking that that would be your concern.

Best regards,
Etsuro Fujita




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-08-21 Thread torikoshia

Since v5 patch failed applying anymore, updated the patch.

On 2023-03-23 02:50, Andres Freund wrote:

I suggest adding a few more tests:

- COPY with a datatype error that can't be handled as a soft error


I didn't know proper way to test this, but I've found data type widget's
input function widget_in() defined to occur hard-error in regress.c,
attached patch added a test using it.

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom de00c1555e0ee4a61565346946f4f3a4e851252c Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 21 Aug 2023 20:30:29 +0900
Subject: [PATCH v6] Add new COPY option IGNORE_DATATYPE_ERRORS

Currently entire COPY fails even when there is one unexpected data
regarding data type or range.
IGNORE_DATATYPE_ERRORS ignores these errors and skips them and COPY
data which don't contain problem.

This patch uses the soft error handling infrastructure, which is
introduced by d9f7f5d32f20.

Author: Damir Belyalov, Atsushi Torikoshi
---
 doc/src/sgml/ref/copy.sgml   | 13 +
 src/backend/commands/copy.c  | 13 +
 src/backend/commands/copyfrom.c  | 37 
 src/backend/commands/copyfromparse.c | 19 +---
 src/bin/psql/tab-complete.c  |  3 +-
 src/include/commands/copy.h  |  1 +
 src/include/commands/copyfrom_internal.h |  3 ++
 src/test/regress/expected/copy2.out  | 28 ++
 src/test/regress/sql/copy2.sql   | 27 +
 9 files changed, 139 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/copy.sgml b/doc/src/sgml/ref/copy.sgml
index 4d614a0225..d5cdbb4025 100644
--- a/doc/src/sgml/ref/copy.sgml
+++ b/doc/src/sgml/ref/copy.sgml
@@ -43,6 +43,7 @@ COPY { table_name [ ( column_name [, ...] ) | * }
 FORCE_NOT_NULL ( column_name [, ...] )
 FORCE_NULL ( column_name [, ...] )
+IGNORE_DATATYPE_ERRORS [ boolean ]
 ENCODING 'encoding_name'
 
  
@@ -370,6 +371,18 @@ COPY { table_name [ ( 

 
+   
+IGNORE_DATATYPE_ERRORS
+
+ 
+  Drops rows that contain malformed data while copying. These are rows
+  with columns where the data type's input-function raises an error.
+  This option is not allowed when using binary format.  Note that this
+  is only supported in current COPY syntax.
+ 
+
+   
+

 ENCODING
 
diff --git a/src/backend/commands/copy.c b/src/backend/commands/copy.c
index f14fae3308..beb73f5357 100644
--- a/src/backend/commands/copy.c
+++ b/src/backend/commands/copy.c
@@ -419,6 +419,7 @@ ProcessCopyOptions(ParseState *pstate,
 	bool		format_specified = false;
 	bool		freeze_specified = false;
 	bool		header_specified = false;
+	bool		ignore_datatype_errors_specified = false;
 	ListCell   *option;
 
 	/* Support external use for option sanity checking */
@@ -458,6 +459,13 @@ ProcessCopyOptions(ParseState *pstate,
 			freeze_specified = true;
 			opts_out->freeze = defGetBoolean(defel);
 		}
+		else if (strcmp(defel->defname, "ignore_datatype_errors") == 0)
+		{
+			if (ignore_datatype_errors_specified)
+errorConflictingDefElem(defel, pstate);
+			ignore_datatype_errors_specified = true;
+			opts_out->ignore_datatype_errors = defGetBoolean(defel);
+		}
 		else if (strcmp(defel->defname, "delimiter") == 0)
 		{
 			if (opts_out->delim)
@@ -594,6 +602,11 @@ ProcessCopyOptions(ParseState *pstate,
 (errcode(ERRCODE_SYNTAX_ERROR),
  errmsg("cannot specify DEFAULT in BINARY mode")));
 
+	if (opts_out->binary && opts_out->ignore_datatype_errors)
+		ereport(ERROR,
+(errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("cannot specify IGNORE_DATATYPE_ERRORS in BINARY mode")));
+
 	/* Set defaults for omitted options */
 	if (!opts_out->delim)
 		opts_out->delim = opts_out->csv_mode ? "," : "\t";
diff --git a/src/backend/commands/copyfrom.c b/src/backend/commands/copyfrom.c
index b47cb5c66d..853adb8414 100644
--- a/src/backend/commands/copyfrom.c
+++ b/src/backend/commands/copyfrom.c
@@ -752,6 +752,14 @@ CopyFrom(CopyFromState cstate)
 		ti_options |= TABLE_INSERT_FROZEN;
 	}
 
+	/* Set up soft error handler for IGNORE_DATATYPE_ERRORS */
+	if (cstate->opts.ignore_datatype_errors)
+	{
+		ErrorSaveContext escontext = {T_ErrorSaveContext};
+		escontext.details_wanted = true;
+		cstate->escontext = escontext;
+	}
+
 	/*
 	 * We need a ResultRelInfo so we can use the regular executor's
 	 * index-entry-making machinery.  (There used to be a huge amount of code
@@ -987,7 +995,36 @@ CopyFrom(CopyFromState cstate)
 
 		/* Directly store the values/nulls array in the slot */
 		if (!NextCopyFrom(cstate, econtext, myslot->tts_values, myslot->tts_isnull))
+		{
+			if (cstate->opts.ignore_datatype_errors &&
+cstate->ignored_error_count > 0)
+ereport(WARNING,
+		errmsg("%zd rows were skipped due to data type incompatibility",
+			cstate->ignored_error_count));
 			break;
+		}
+
+		/* Soft error occured, skip this tuple and log the reason */
+		if (cstate->escontext.error_occurred)
+	

Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-21 Thread Ashutosh Bapat
On Mon, Aug 21, 2023 at 4:09 PM Richard Guo  wrote:
>
>
> On Sun, Aug 20, 2023 at 11:48 PM Tom Lane  wrote:
>>
>> I looked over the v3 patch.  I think it's going in generally
>> the right direction, but there is a huge oversight:
>> path_is_reparameterizable_by_child does not come close to modeling
>> the success/failure conditions of reparameterize_path_by_child.
>> In all the cases where reparameterize_path_by_child can recurse,
>> path_is_reparameterizable_by_child has to do so also, to check
>> whether the child path is reparameterizable.  (I'm somewhat
>> disturbed that apparently we have no test cases that caught that
>> oversight; can we add one cheaply?)
>
>
> Thanks for pointing this out.  It's my oversight.  We have to check the
> child path(s) recursively to tell if a path is reparameterizable or not.
> I've fixed this in v4 patch, along with a test case.  In the test case
> we have a MemoizePath whose subpath is TidPath, which is not
> reparameterizable.  This test case would trigger Assert in v3 patch.

This goes back to my question about how do we keep
path_is_reparameterizable_by_child() and
reparameterize_path_by_child() in sync with each other. This makes it
further hard to do so. One idea I have is to use the same function
(reparameterize_path_by_child()) in two modes 1. actual
reparameterization 2. assessing whether reparameterization is
possible. Essentially combing reparameterize_path_by_child() and
path_is_reparameterizable_by_child(). The name of such a function can
be chosen appropriately. The mode will be controlled by a fourth
argument to the function. When assessing a path no translations happen
and no extra memory is allocated.

-- 
Best Wishes,
Ashutosh Bapat




Reproducibility (Re: Remove distprep)

2023-08-21 Thread Christoph Berg
Re: Michael Paquier
> Is reproducibility something you've brought to a separate thread?
> FWIW, I'd be interested in improving this area for the in-core code,
> if need be.  (Not material for this thread, of course).

All the "normal" things like C compilation are actually already
reproducible.

The bit addressed by the mentioned patch is that the compiler flags
are recorded for later output by pg_config, and that includes
-ffile-prefix-map=/path/to/source=. which does improve C
reproducibility, but ironically is itself not reproducible when the
source is then compiled in a different directory. The patch simply
removes that flag from the information stored.

https://salsa.debian.org/postgresql/postgresql/-/blob/15/debian/patches/filter-debug-prefix-map

Not sure not much of that would be material for inclusion in PG.

This fix made PG 10 reproducible in Debian for about a week. Then LLVM
happened :D. The .bc files still record the build path, and so far no
one has found a way to prevent that:

https://tests.reproducible-builds.org/debian/rb-pkg/unstable/arm64/diffoscope-results/postgresql-15.html

Afaict that's the last part to be resolved (but it's been a while
since I checked). clang seems to have learned about -ffile-prefix-map=
in the meantime, that needs to be tested.

Christoph




Re: Support run-time partition pruning for hash join

2023-08-21 Thread Andy Fan
On Mon, Aug 21, 2023 at 11:48 AM Richard Guo  wrote:

> If we have a hash join with an Append node on the outer side, something
> like
>
>  Hash Join
>Hash Cond: (pt.a = t.a)
>->  Append
>  ->  Seq Scan on pt_p1 pt_1
>  ->  Seq Scan on pt_p2 pt_2
>  ->  Seq Scan on pt_p3 pt_3
>->  Hash
>  ->  Seq Scan on t
>
> We can actually prune those subnodes of the Append that cannot possibly
> contain any matching tuples from the other side of the join.  To do
> that, when building the Hash table, for each row from the inner side we
> can compute the minimum set of subnodes that can possibly match the join
> condition.  When we have built the Hash table and start to execute the
> Append node, we should have known which subnodes are survived and thus
> can skip other subnodes.
>

This feature looks good, but is it possible to know if we can prune
any subnodes before we pay the extra effort (building the Hash
table, for each row... stuff)?  IIUC, looks no.  If so, I think this area
needs more attention. I can't provide any good suggestions yet.

Maybe at least,  if we have found no subnodes can be skipped
during the hashing, we can stop doing such work anymore.

There are several points that need more consideration.
>
> 1. All the join partition prunning decisions are made in createplan.c
>where the best path tree has been decided.  This is not great.  Maybe
>it's better to make it happen when we build up the path tree, so that
>we can take the partition prunning into consideration when estimating
>the costs.
>

fwiw, the current master totally ignores the cost reduction for run-time
partition prune, even for init partition prune.  So in some real cases,
pg chooses a hash join just because the cost of nest loop join is
highly over estimated.

4. Is it possible and worthwhile to extend the join partition prunning
>mechanism to support nestloop and mergejoin also?
>

In my current knowledge, we have to build the inner table first for this
optimization?  so hash join and sort merge should be OK, but nestloop should
be impossible unless I missed something.

-- 
Best Regards
Andy Fan


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for giving comments! PSA new version patch set.

> 1.
> + ALTER
> SUBSCRIPTION ... DISABLE.
> + After the upgrade is complete, execute the
> + ALTER SUBSCRIPTION ... CONNECTION
> command to update the
> + connection string, and then re-enable the subscription.
> 
> Why does one need to update the connection string?

I wrote like that because the old and new port number can be different. But you
are partially right - it is not always needed. Updated to clarify that.

> 2.
> + /*
> + * Checking for logical slots must be done before
> + * check_new_cluster_is_empty() because the slot_arr attribute of the
> + * new_cluster will be checked in that function.
> + */
> + if (count_logical_slots(&old_cluster))
> + {
> + get_logical_slot_infos(&new_cluster, false);
> + check_for_logical_replication_slots(&new_cluster);
> + }
> +
>   check_new_cluster_is_empty();
> 
> Can't we simplify this checking by simply querying
> pg_replication_slots for any usable slot something similar to what we
> are doing in check_for_prepared_transactions()? We can add this check
> in the function check_for_logical_replication_slots().

Some checks were included to check_for_logical_replication_slots(), and
get_logical_slot_infos() for new_cluster was removed as you said.

But get_logical_slot_infos() cannot be removed completely, because the old
cluster has already been shut down when the new cluster is checked. We must
store the information of old cluster on the memory.

Note that the existence of slots are now checked in any cases because such slots
could not be used after the upgrade.

check_new_cluster_is_empty() is no longer checks logical slots, so all changes 
for
this function was reverted.

> Also, do we
> need a count function, or instead can we have a simple function like
> is_logical_slot_present() where we return even if there is one slot
> 

I think this is still needed, because max_replication_slots and the number
of existing replication slots must be compared.

Of course we can add another simple function like
is_logical_slot_present_on_old_cluster() and use in main(), but not sure 
defining
some similar functions are good.

> Apart from this, (a) I have made a few changes (changed comments) in
> patch 0001 as shared in the email [1]; (b) some modifications in the
> docs as you can see in the attached. Please include those changes in
> the next version if you think they are okay.

I checked and your modification seems nice. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
Description:  v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch


v23-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v23-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


v23-0003-pg_upgrade-Add-check-function-for-logical-replic.patch
Description:  v23-0003-pg_upgrade-Add-check-function-for-logical-replic.patch


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Hayato Kuroda (Fujitsu)
Dear Sawada-san,

Thank you for reviewing! New patch set can be available in [1].

> 
> 0001:
> 
> Do we need regression tests to make sure that the slot's
> confirmed_flush_lsn matches the LSN of the latest shutdown_checkpoint
> record?

Added. I wondered the location of the test, but put on
test_decoding/t/002_always_persist.pl.

> 0002:
> 
> +   
> +Prepare for publisher upgrades
> +
> 
> Should this step be done before "8. Stop both servers" as it might
> require to disable subscriptions and to drop 'lost' replication slots?

Right, moved.

> Why is there no explanation about the slots' confirmed_flush_lsn check
> as prerequisites?

Added.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! The patch can be available in [1].

> Commit Message
> 
> 1.
> This commit allows nodes with logical replication slots to be upgraded. While
> reading information from the old cluster, a list of logical replication slots 
> is
> newly extracted. At the later part of upgrading, pg_upgrade revisits the list
> and restores slots by using the pg_create_logical_replication_slots() on the 
> new
> clushter.
> 
> ~
> 
> 1a
> /is newly extracted/is fetched/

Fixed.

> 1b.
> /using the pg_create_logical_replication_slots()/executing
> pg_create_logical_replication_slots()/

Fixed.

> 1c.
> /clushter/cluster/

Fixed.

> 2.
> Note that it must be done after the final pg_resetwal command during the 
> upgrade
> because pg_resetwal will remove WALs that are required by the slots. Due to 
> the
> restriction, the timing of restoring replication slots is different from other
> objects.
> 
> ~
> 
> 2a.
> /it must/slot restoration/

You meant to say s/it must/slot restoration must/, right? Fixed.

> 2b.
> /the restriction/this restriction/
> 
> ==
> doc/src/sgml/ref/pgupgrade.sgml
> 
> 3.
> +
> + pg_upgrade attempts to migrate logical
> + replication slots. This helps avoid the need for manually defining the
> + same replication slot on the new publisher.
> +
> 
> /same replication slot/same replication slots/

Fixed.

> 4.
> +
> + Before you start upgrading the publisher cluster, ensure that the
> + subscription is temporarily disabled, by executing
> + ALTER
> SUBSCRIPTION ... DISABLE.
> + After the upgrade is complete, execute the
> + ALTER SUBSCRIPTION ... CONNECTION
> command to update the
> + connection string, and then re-enable the subscription.
> +
> 
> On the rendered page, it looks a bit strange that DISABLE has a link
> but COMMENTION does not have a link.

Added.

> 5.
> +
> + There are some prerequisites for
> pg_upgrade to
> + be able to upgrade the replication slots. If these are not met an error
> + will be reported.
> +
> +
> +
> 
> +1 to use all the itemizedlist changes that Amit suggested [1] in his
> attachment.

Yeah, I agreed it is nice. Applied.

> src/bin/pg_upgrade/check.c
> 
> 6.
> +static void check_for_logical_replication_slots(ClusterInfo *new_cluster);
> 
> IMO the arg name should not shadow a global with the same name. See
> other review comment for this function signature.

OK, fixed.

> 7.
> + /* Extract a list of logical replication slots */
> + get_logical_slot_infos(&old_cluster, live_check);
> 
> But 'live_check' is never used?

It is needed for 0003, moved.

> 8. check_for_logical_replication_slots
> +
> +/*
> + * Verify the parameter settings necessary for creating logical replication
> + * slots.
> + */
> +static void
> +check_for_logical_replication_slots(ClusterInfo *new_cluster)
> 
> IMO the arg name should not shadow a global with the same name. If
> this is never going to be called with any param other than
> &new_cluster then probably it is better not even to pass have that
> argument at all. Just refer to the global new_cluster inside the
> function.
> 
> You can't say that 'check_for_new_tablespace_dir' does it already so
> it must be OK -- I think that the existing function has the same issue
> and it also ought to be fixed to avoid shadowing!

Fixed.

> 9. check_for_logical_replication_slots
> 
> + /* logical replication slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> + return;
> 
> IMO the code matches the comment better if you say < 1700 instead of <= 1600.

Changed.

> src/bin/pg_upgrade/function.c
> 
> 10. get_loadable_libraries
>   /*
> - * Fetch all libraries containing non-built-in C functions in this DB.
> + * Fetch all libraries containing non-built-in C functions or referred
> + * by logical replication slots in this DB.
>   */
>   ress[dbnum] = executeQueryOrDie(conn,
> ~
> 
> /referred by/referred to by/

Fixed.

> src/bin/pg_upgrade/info.c
> 
> 11.
> +/*
> + * get_logical_slot_infos()
> + *
> + * Higher level routine to generate LogicalSlotInfoArr for all databases.
> + */
> +void
> +get_logical_slot_infos(ClusterInfo *cluster, bool live_check)
> +{
> + int dbnum;
> + int slot_count = 0;
> +
> + if (cluster == &old_cluster)
> + pg_log(PG_VERBOSE, "\nsource databases:");
> + else
> + pg_log(PG_VERBOSE, "\ntarget databases:");
> +
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
> + DbInfo*pDbInfo = &cluster->dbarr.dbs[dbnum];
> +
> + get_logical_slot_infos_per_db(cluster, pDbInfo);
> + slot_count += pDbInfo->slot_arr.nslots;
> +
> + if (log_opts.verbose)
> + {
> + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
> + print_slot_infos(&pDbInfo->slot_arr);
> + }
> + }
> +}
> +
> 
> 11a.
> Now the variable 'slot_count' is no longer being returned so it seems 
> redundant.
> 
> ~
> 
> 11b.
> What is the 'live_check' parameter for? Nobody is using it.

These are needed f

Re: persist logical slots to disk during shutdown checkpoint

2023-08-21 Thread Ashutosh Bapat
On Sun, Aug 20, 2023 at 8:40 AM Amit Kapila  wrote:
>
> On Sat, Aug 19, 2023 at 12:46 PM Julien Rouhaud  wrote:
> >
> > On Sat, 19 Aug 2023, 14:16 Amit Kapila,  wrote:
> >>
> >
> >> The idea discussed in the thread [1] is to always persist logical
> >> slots to disk during the shutdown checkpoint. I have extracted the
> >> patch to achieve the same from that thread and attached it here. This
> >> could lead to some overhead during shutdown (checkpoint) if there are
> >> many slots but it is probably a one-time work.
> >>
> >> I couldn't think of better ideas but another possibility is to mark
> >> the slot as dirty when we update the confirm_flush LSN (see
> >> LogicalConfirmReceivedLocation()). However, that would be a bigger
> >> overhead in the running server as it could be a frequent operation and
> >> could lead to more writes.
> >
> >
> > Yeah I didn't find any better option either at that time. I still think 
> > that forcing persistence on shutdown is the best compromise. If we tried to 
> > always mark the slot as dirty, we would be sure to add regular overhead but 
> > we would probably end up persisting the slot on disk on shutdown anyway 
> > most of the time, so I don't think it would be a good compromise.
> >
>
> The other possibility is that we introduce yet another dirty flag for
> slots, say dirty_for_shutdown_checkpoint which will be set when we
> update confirmed_flush LSN. The flag will be cleared each time we
> persist the slot but we won't persist if only this flag is set. We can
> then use it during the shutdown checkpoint to decide whether to
> persist the slot.

There are already two booleans controlling dirty-ness of slot, dirty
and just_dirty. Adding third will created more confusion.

Another idea is to record the confirm_flush_lsn at the time of
persisting the slot. We can use it in two different ways 1. to mark a
slot dirty and persist if the last confirm_flush_lsn when slot was
persisted was too far from the current confirm_flush_lsn of the slot.
2. at shutdown checkpoint, persist all the slots which have these two
confirm_flush_lsns different.

-- 
Best Wishes,
Ashutosh Bapat




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Hayato Kuroda (Fujitsu)
Dear Hou,

Thank you for reviewing! The patch can be available in [1].

> 1.
> 
> +check_for_lost_slots(ClusterInfo *cluster)
> +{
> + int i,
> + ntups,
> + i_slotname;
> + PGresult   *res;
> + DbInfo *active_db = &cluster->dbarr.dbs[0];
> + PGconn *conn = connectToServer(cluster, active_db->db_name);
> +
> + /* logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
> + return;
> 
> I think we should build connection after this check, otherwise the connection
> may be left open after returning.

Fixed.

> 2.
> +check_for_confirmed_flush_lsn(ClusterInfo *cluster)
> +{
> + int i,
> + ntups,
> + i_slotname;
> + PGresult   *res;
> + DbInfo *active_db = &cluster->dbarr.dbs[0];
> + PGconn *conn = connectToServer(cluster, active_db->db_name);
> +
> + /* logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
> + return;
> 
> Same as above.

Fixed.

> 3.
> + if
> (GET_MAJOR_VERSION(cluster->major_version) >= 17)
> + {
> 
> I think you mean 1700 here.

Right, fixed.

> 4.
> + p = strpbrk(p,
> "01234567890ABCDEF");
> +
> + /*
> +  * Upper and lower part of LSN must
> be read separately
> +  * because it is reported as %X/%X
> format.
> +  */
> + upper_lsn = strtoul(p, &slash, 16);
> + lower_lsn = strtoul(++slash, NULL,
> 16);
> 
> Maybe we'd better add a sanity check after strpbrk like "if (p == NULL ||
> strlen(p) <= 1)" to be consistent with other similar code.

Added.

[1]: 
https://www.postgresql.org/message-id/TYCPR01MB5870B5C0FE0C61CD04CBD719F51EA%40TYCPR01MB5870.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Robert Haas
On Sun, Aug 20, 2023 at 7:58 PM Michael Paquier  wrote:
> Attached is a v3 to do these two things, with adjustments for two SSL
> tests.  Any objections about it?

+ * No authentication identity was set; this happens e.g. when the
+ * trust method is in use.  For audit purposes, log a breadcrumb to
+ * explain where in the HBA this happened.

Proposed rewrite: "Normally, if log_connections is set, the call to
set_authn_id will log the connection. However, if that function is
never called, perhaps because the trust method is in use, then we
handle the logging here instead."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: BUG #18059: Unexpected error 25001 in stored procedure

2023-08-21 Thread Robert Haas
On Sat, Aug 19, 2023 at 1:19 PM Tom Lane  wrote:
> What I'm inclined to propose, therefore, is that we make revalidation
> be a no-op for every statement type for which transformStmt() reaches
> its default: case.  (When it does so, the resulting CMD_UTILITY Query
> will not get any processing from the rewriter or planner either.)
> That gives us this list of statements requiring revalidation:
>
> case T_InsertStmt:
> case T_DeleteStmt:
> case T_UpdateStmt:
> case T_MergeStmt:
> case T_SelectStmt:
> case T_ReturnStmt:
> case T_PLAssignStmt:
> case T_DeclareCursorStmt:
> case T_ExplainStmt:
> case T_CreateTableAsStmt:
> case T_CallStmt:

That sounds like the right thing. It is perhaps unfortunate that we
don't have a proper parse analysis/execution distinction for other
types of statements, but if that ever changes then this can be
revisited.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: postgres_fdw: wrong results with self join + enable_nestloop off

2023-08-21 Thread Önder Kalacı
Hi,

Thanks for the explanation.

As described in the commit message, we assume that extensions use the
> hook in a similar way to FDWs


I'm not sure if it is fair to assume that extensions use any hook in any
way.

So my question is: does the Citus extension use the hook like this?
> (Sorry, I do not fully understand Onder's explanation.)
>
>
I haven't gone into detail about how Citus uses this hook, but I don't
think we should
need to explain it. In general, Citus uses many hooks, and many other
extensions
use this specific hook. With minor version upgrades, we haven't seen this
kind of
behavior change before.

In general, Citus relies on this hook for collecting information about
joins across
relations/ctes/subqueries. So, its scope is bigger than a single join for
Citus.

The extension assigns a special marker(s) for RTE Relations, and then
checks whether
all relations with these special markers joined transitively across
subqueries, such that
it can decide to pushdown the whole or some parts of the (sub)query.

I must admit, I have not yet looked into whether we can fix the problem
within the extension.
Maybe we can, maybe not.

But the bigger issue is that there has usually been a clear line between
the extensions and
the PG itself when it comes to hooks within the minor version upgrades.
Sadly, this change
breaks that line. We wanted to share our worries here and find out what
others think.

>Except that this was only noticed after it was released in a set of minor
> > versions, I would say that 6f80a8d9c should just straight up be reverted.


I cannot be the one to ask for reverting a commit in PG, but I think doing
it would be a
fair action. We kindly ask those who handle this to think about it.

Thanks,
Onder


Re: meson: pgxs Makefile.global differences

2023-08-21 Thread Andrew Dunstan


On 2023-08-17 Th 16:51, Andres Freund wrote:

Hi,

On 2023-08-17 14:45:54 -0500, Tristan Partin wrote:

On Thu Aug 17, 2023 at 2:32 PM CDT, Andrew Dunstan wrote:

I started digging into a warning I noticed on my FDW builds where
Postgres is built with meson, 
e.g.
which has this:

cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
[-Wformat-security]

I found that the pgxs Makefile.global built under meson is a bit
different. On debug builds for both this is what I get on HEAD (meson)
and REL_15_STABLE (autoconf), stripped of the current components:

I assume "current" means the flags that are present in both cases?



   HEAD: CFLAGS =-Wshadow=compatible-local
REL_15_STABLE: CFLAGS =-Wall  -g

The warning is apparently due to the missing -Wall.

Shouldn't we be aiming for pretty much identical settings?

The difference for -Wshadow=compatible-local is due to changes between 15 and
HEAD.

We're indeed not adding -Wall right now (the warning level is handled by
meson, so it doesn't show up in our cflags right now).



I agree that they should be identical. The meson bild should definitely be
aiming for 100% compatibility for the Makefile.global.

I don't think that's feasible. It was a fair bit of work to get the most
important contents to match, while skipping lots of things that are primarily
relevant for building the server (which isn't relevant for pgxs).

That said, in this specific case, I agree, we should likely emit -Wall to
Makefile.global in meson as well.



Where should we do that? And how about the -g that's also missing for 
debug-enabled builds?



cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: meson: pgxs Makefile.global differences

2023-08-21 Thread Peter Eisentraut

On 21.08.23 17:33, Andrew Dunstan wrote:
Where should we do that? And how about the -g that's also missing for 
debug-enabled builds?


I think it's the options in these two tables that meson handles 
internally and that we should explicitly reproduce for Makefile.global:


https://mesonbuild.com/Builtin-options.html#details-for-buildtype
https://mesonbuild.com/Builtin-options.html#details-for-warning_level





Re: meson: pgxs Makefile.global differences

2023-08-21 Thread Tristan Partin

On Mon Aug 21, 2023 at 10:33 AM CDT, Andrew Dunstan wrote:


On 2023-08-17 Th 16:51, Andres Freund wrote:
> Hi,
>
> On 2023-08-17 14:45:54 -0500, Tristan Partin wrote:
>> On Thu Aug 17, 2023 at 2:32 PM CDT, Andrew Dunstan wrote:
>>> I started digging into a warning I noticed on my FDW builds where
>>> Postgres is built with meson, 
e.g.
>>> which has this:
>>>
>>> cc1: warning: ‘-Wformat-security’ ignored without ‘-Wformat’
>>> [-Wformat-security]
>>>
>>> I found that the pgxs Makefile.global built under meson is a bit
>>> different. On debug builds for both this is what I get on HEAD (meson)
>>> and REL_15_STABLE (autoconf), stripped of the current components:
> I assume "current" means the flags that are present in both cases?
>
>
>>>    HEAD: CFLAGS =-Wshadow=compatible-local
>>> REL_15_STABLE: CFLAGS =-Wall  -g
>>>
>>> The warning is apparently due to the missing -Wall.
>>>
>>> Shouldn't we be aiming for pretty much identical settings?
> The difference for -Wshadow=compatible-local is due to changes between 15 and
> HEAD.
>
> We're indeed not adding -Wall right now (the warning level is handled by
> meson, so it doesn't show up in our cflags right now).
>
>
>> I agree that they should be identical. The meson bild should definitely be
>> aiming for 100% compatibility for the Makefile.global.
> I don't think that's feasible. It was a fair bit of work to get the most
> important contents to match, while skipping lots of things that are primarily
> relevant for building the server (which isn't relevant for pgxs).
>
> That said, in this specific case, I agree, we should likely emit -Wall to
> Makefile.global in meson as well.
>

Where should we do that? And how about the -g that's also missing for 
debug-enabled builds?


Look in src/makefiles/meson.build. You will see a line like
'CFLAGS': var_cflags. You probably want to do something like:

pgxs_cflags = var_cflags + cc.get_supported_arguments('-Wxxx')
if get_option('debug')
# Populate for debug flags that aren't -g
debug_flags = {}

		pgxs_cflags += debug_flags.get(cc.get_id(), 
			cc.get_supported_arguments('-g')

endif

...
CFLAGS: pgxs_cflags,
...

--
Tristan Partin
Neon (https://neon.tech)




Re: Make all Perl warnings fatal

2023-08-21 Thread Andrew Dunstan


On 2023-08-21 Mo 02:20, Peter Eisentraut wrote:
To avoid a complete bloodbath on cfbot, here is an updated patch set 
that includes a workaround for the getprotobyname() issue mentioned 
below.



On 10.08.23 07:58, Peter Eisentraut wrote:
We have a lot of Perl scripts in the tree, mostly code generation and 
TAP tests.  Occasionally, these scripts produce warnings.  These are 
AFAICT always mistakes on the developer side (true positives).  
Typical examples are warnings from genbki.pl or related when you make 
a mess in the catalog files during development, or warnings from 
tests when they massage a config file that looks different on 
different hosts, or mistakes during merges (e.g., duplicate 
subroutine definitions), or just mistakes that weren't noticed, 
because, you know, there is a lot of output in a verbose build.


I wanted to figure put if we can catch these more reliably, in the 
style of -Werror.  AFAICT, there is no way to automatically turn all 
warnings into fatal errors.  But there is a way to do it per script, 
by replacing


 use warnings;

by

 use warnings FATAL => 'all';

See attached patch to try it out.

The documentation at 
 appears to sort of 
hand-wave against doing that.  Their argument appears to be something 
like, the modules you use might in the future produce additional 
warnings, thus breaking your scripts.  On balance, I'd take that 
risk, if it means I would notice the warnings in a more timely and 
robust way.  But that's just me at a moment in time.


Thoughts?



It's not really the same as -Werror, because many warnings can be 
generated at runtime rather than compile-time.


Still, I guess that might not matter too much since apart from plperl we 
only use perl for building / testing.


Regarding the dangers mentioned, I guess we can undo it if it proves a 
nuisance.


+1 to getting rid if the unnecessary call to getprotobyname().


cheers


andrew

--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: [17] Special search_path names "!pg_temp" and "!pg_catalog"

2023-08-21 Thread Jeff Davis
On Sat, 2023-08-19 at 07:18 +0200, Pavel Stehule wrote:
> cannot be better special syntax
> 
> CREATE OR REPLACE FUNCTION xxx()
> RETURNS yyy AS $$ ... $$$
> SET SEARCH_PATH DISABLE
> 
> with possible next modification
> 
> SET SEARCH_PATH CATALOG .. only for pg_catalog
> SET SEARCH_PATH MINIMAL .. pg_catalog, pg_temp

I agree that we should consider new syntax, and there's a related
discussion here:

https://www.postgresql.org/message-id/flat/2710f56add351a1ed553efb677408e51b060e67c.ca...@j-davis.com

Regardless, even with syntax changes, we need something to print when
someone does a "SHOW search_path", i.e. some representation that
indicates pg_temp is excluded. That way it can also be saved and
restored.

> I question if we should block search path settings when this setting
> is used. Although I set search_path, the search_path can be
> overwritten in function of inside some nesting calls 

If so, that should be a separate feature. For the purposes of this
thread, we just need a way to represent a search path that excludes
pg_temp and/or pg_catalog.

Regards,
Jeff Davis





Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Jacob Champion
On Sun, Aug 20, 2023 at 4:58 PM Michael Paquier  wrote:
> Attached is a v3 to do these two things, with adjustments for two SSL
> tests.  Any objections about it?

(Sorry for the long weekend delay.) No objections; you may want to
adjust the comment above the test block in t/001_password.pl, as well.

I will ask -- more as a rhetorical question than something to resolve
for this patch, since the topic is going to come back with a vengeance
for OAuth -- what purpose the consistency here is serving. If the OP
wants to notice when a connection that should be using strong
authentication is not, is it helpful to make that connection "look the
same" in the logs? I understand we've been carrying the language
"trust authentication method" for a long time, but is that really the
only hang-up, or would there be pushback if I tried to change that
too, sometime in the future?

Thanks,
--Jacob




Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-08-21 Thread Robert Haas
On Wed, Aug 16, 2023 at 1:45 PM Jeff Davis  wrote:
> On Wed, 2023-08-16 at 08:51 +0200, Peter Eisentraut wrote:
> > On 12.08.23 04:35, Jeff Davis wrote:
> > > The attached patch implements a new SEARCH clause for CREATE
> > > FUNCTION.
> > > The SEARCH clause controls the search_path used when executing
> > > functions that were created without a SET clause.
> >
> > I don't understand this.  This adds a new option for cases where the
> > existing option wasn't specified.  Why not specify the existing
> > option
> > then?  Is it not good enough?  Can we improve it?
>
> SET search_path = '...' not good enough in my opinion.
>
> 1. Not specifying a SET clause falls back to the session's search_path,
> which is a bad default because it leads to all kinds of inconsistent
> behavior and security concerns.
>
> 2. There's no way to explicitly request that you'd actually like to use
> the session's search_path, so it makes it very hard to ever change the
> default.
>
> 3. It's user-unfriendly. A safe search_path that would be suitable for
> most functions is "SET search_path = pg_catalog, pg_temp", which is
> arcane, and requires some explanation.
>
> 4. search_path for the session is conceptually different than for a
> function. A session should be context-sensitive and the same query
> should (quite reasonably) behave differently for different sessions and
> users to sort out things like object name conflicts, etc. A function
> should (ordinarily) be context-insensitive, especially when used in
> something like an index expression or constraint. Having different
> syntax helps separate those concepts.
>
> 5. There's no way to prevent pg_temp from being included in the
> search_path. This is separately fixable, but having the proposed SEARCH
> syntax is likely to make for a better user experience in the common
> cases.
>
> I'm open to suggestion about other ways to improve it, but SEARCH is
> what I came up with.

The one thing that I really like about your proposal is that you
explicitly included a way of specifying that the prevailing
search_path should be used. If we move to any kind of a system where
the default behavior is something other than that, then we need that
as an option. Another, related thing that I recently discovered would
be useful is a way to say "I'd like to switch the search_path to X,
but I'd also like to discover what the prevailing search_path was just
before entering this function." For example, if I have a function that
is SECURITY DEFINER which takes some executable code as an input, I
might want to arrange to eventually execute that code with the
caller's user ID and search_path, but I can't discover the caller's
search_path unless I don't set it, and that's a dangerous thing to do.

However, my overall concern here is that this feels like it's
reinventing the wheel. We already have a way of setting search_path;
this gives us a second one. If we had no existing mechanism for that,
I think this would definitely be an improvement, and quite possibly
better than the current mechanism. But given that we had a mechanism
already, if we added this, we'd then have two, which seems like the
wrong number.

I'm inclined to think that if there are semantics that we currently
lack, we should think of extending the current syntax to support them.
Right now you can SET search_path = 'specific value' or SET
search_path FROM CURRENT or leave it out. We could introduce a new way
of spelling "leave it out," like RESET search_path or whatever. We
could introduce a new setting that doesn't set the search_path at all
but reverts to the old value on function exit, like SET search_path
USING CALL or whatever. And we could think of making SET search_path
FROM CURRENT or any new semantics we introduce the default in a future
release, or even make the default behavior depend on an evil
behavior-changing GUC as you proposed. I'm not quite sure what we
should do here conceptually, but I don't see why having a completely
new syntax for doing it really helps.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Add function to_oct

2023-08-21 Thread Nathan Bossart
On Mon, Aug 21, 2023 at 09:31:37AM +0100, Dean Rasheed wrote:
> Hmm, I think just including the doc text update, without the examples
> of positive and negative inputs, might not be sufficient to make the
> meaning clear to everyone.

I added some examples for negative inputs.

> Something else that bothers me slightly is the function naming --
> "hexadecimal" gets abbreviated to "hex", "octal" gets abbreviated to
> "oct", but "binary" is left as-is. I think it ought to be "to_bin()"
> on consistency grounds, even though I understand the words "to bin"
> could be interpreted differently. (Looking elsewhere for precedents,
> Python has bin(), oct() and hex() functions.)

I renamed it to to_bin().

> Also, I think the convention is to always list functions
> alphabetically, so to_oct() should really come after to_hex().

I reordered the functions in the docs.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 131ee7fcfc62040572425cf1a090c0a6967d7559 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Aug 2023 11:29:51 -0700
Subject: [PATCH v9 1/1] add to_bin() and to_oct()

---
 doc/src/sgml/func.sgml| 59 +-
 src/backend/utils/adt/varlena.c   | 86 +++
 src/include/catalog/pg_proc.dat   | 12 
 src/test/regress/expected/strings.out | 62 ++-
 src/test/regress/sql/strings.sql  | 15 -
 5 files changed, 204 insertions(+), 30 deletions(-)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index be2f54c914..7a0d4b9134 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3737,6 +3737,32 @@ repeat('Pg', 4) PgPgPgPg

   
 
+  
+   
+
+ to_bin
+
+to_bin ( integer )
+text
+   
+   
+to_bin ( bigint )
+text
+   
+   
+Converts the number to its equivalent two's complement binary
+representation.
+   
+   
+to_bin(2147483647)
+111
+   
+   
+to_bin(-1234)
+101100101110
+   
+  
+
   

 
@@ -3750,11 +3776,42 @@ repeat('Pg', 4) PgPgPgPg
 text


-Converts the number to its equivalent hexadecimal representation.
+Converts the number to its equivalent two's complement hexadecimal
+representation.


 to_hex(2147483647)
 7fff
+   
+   
+to_hex(-1234)
+fb2e
+   
+  
+
+  
+   
+
+ to_oct
+
+to_oct ( integer )
+text
+   
+   
+to_oct ( bigint )
+text
+   
+   
+Converts the number to its equivalent two's complement octal
+representation.
+   
+   
+to_oct(2147483647)
+177
+   
+   
+to_oct(-1234)
+3775456

   
 
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index b1ec5c32ce..72e1e24fe0 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -4919,53 +4919,87 @@ array_to_text_internal(FunctionCallInfo fcinfo, ArrayType *v,
 	return result;
 }
 
-#define HEXBASE 16
 /*
- * Convert an int32 to a string containing a base 16 (hex) representation of
- * the number.
+ * Workhorse for to_bin, to_oct, and to_hex.  Note that base must be > 1 and <=
+ * 16.
  */
-Datum
-to_hex32(PG_FUNCTION_ARGS)
+static inline text *
+convert_to_base(uint64 value, int base)
 {
-	uint32		value = (uint32) PG_GETARG_INT32(0);
-	char	   *ptr;
 	const char *digits = "0123456789abcdef";
-	char		buf[32];		/* bigger than needed, but reasonable */
 
-	ptr = buf + sizeof(buf) - 1;
-	*ptr = '\0';
+	/* We size the buffer for to_bin's longest possible return value. */
+	char		buf[sizeof(uint64) * BITS_PER_BYTE];
+	char	   *const end = buf + sizeof(buf);
+	char	   *ptr = end;
+
+	Assert(base > 1);
+	Assert(base <= 16);
 
 	do
 	{
-		*--ptr = digits[value % HEXBASE];
-		value /= HEXBASE;
+		*--ptr = digits[value % base];
+		value /= base;
 	} while (ptr > buf && value);
 
-	PG_RETURN_TEXT_P(cstring_to_text(ptr));
+	return cstring_to_text_with_len(ptr, end - ptr);
+}
+
+/*
+ * Convert an integer to a string containing a base-2 (binary) representation
+ * of the number.
+ */
+Datum
+to_bin32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint32) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
+}
+Datum
+to_bin64(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint64) PG_GETARG_INT64(0);
+
+	PG_RETURN_TEXT_P(convert_to_base(value, 2));
 }
 
 /*
- * Convert an int64 to a string containing a base 16 (hex) representation of
+ * Convert an integer to a string containing a base-8 (oct) representation of
  * the number.
  */
 Datum
-to_hex64(PG_FUNCTION_ARGS)
+to_oct32(PG_FUNCTION_ARGS)
+{
+	uint64		value = (uint32) PG_GETARG_INT32(0);
+
+	PG_RETURN_TEXT_P(con

C function to return double precision[][]

2023-08-21 Thread Markur Sens
Is there any piece of code I could see how to achieve $subject ? 
I haven’t found anything in the standard library or contrib modules.

I’m trying to build ArrayType ** of sorts and return a Datum of those but I 
can’t seem to manage memory correctly. 







Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-08-21 Thread Jeff Davis
On Sat, 2023-08-19 at 11:59 -0700, Andres Freund wrote:
> If you install a bunch of extensions into public - very very
> common from what I have seen - you can't really remove public from
> the search
> path. Which then basically makes all approaches of resolving any of
> the
> security issues via search path pretty toothless.

Toothless only if (a) untrusted users have CREATE privileges in the
public schema, which is no longer the default; and (b) you're writing a
function that accesses extension objects installed in the public
schema.

While those may be normal things to do, there are a lot of times when
those things aren't true. I speculate that it's far more common to
write functions that only use pg_catalog objects (e.g. the "+"
operator, some string manipulation, etc.) and basic control flow.

There's a lot of value in making those simple cases secure-by-default.
We are already moving users towards a readable-but-not-writable public
schema as a best practice, so if we also move to something like SEARCH
SYSTEM as a best practice, then that will help a LOT of users.

> > 
> I don't think that really works in practice, due to the very common
> practice
> of installing extensions into the same schema as the application.
> Then that
> schema needs to be in search path (if you don't want to schema
> qualify
> everything), which leaves you wide open.

...

> > 
> myextension is typically public. Which means that there's zero
> protection due
> to such a search path.

You mentioned this three times so I must be missing something. Why is
it "wide open" and "zero protection"? If the schema is not world-
writable, then aren't attacks a lot harder to pull off?

> > 
> I think the more common attack paths are things like tricking
> extension
> scripts into evaluating arbitrary code, to gain "real superuser"
> privileges.

Extension scripts are a separate beast. I do see some potential avenues
of attack, but I don't see how your approach of resolving schemas early
would help.

> Search path does not reliably protect things involving "type
> matching". If you
> have a better fitting cast, or a function call with parameters that
> won't need
> coercion, later in search path, they'll win, even if there's another
> fit
> earlier on.

You need to trust the schemas in your search_path.

> If we instead store something that avoids the need for such search,
> the
> "better fitting cast" logic wouldn't add these kind of security
> issues
> anymore.

I don't disagree, but I don't understand the approach in detail (i.e. I
couldn't write it up as a proposal). For instance, what would the
pg_dump output look like?

And even if we had that in place, I think we'd still want a better way
to control the search_path.

> > 
> Hm, I'm not quite sure I follow on what exactly you see as different
> here.

>From what I understand, Robert's approach is to fully parse the
commands and resolve to specific OIDs (necessitating dependencies,
etc.); while your approach resolves to fully-qualified names but not
OIDs (and needing no dependencies).

I don't understand either proposal entirely, so perhaps I'm on the
wrong track here, but I feel like Robert's approach is more "normal"
and easy to document whereas your approach is more "creative" and
perhaps hard to document.

Both approaches (resolving to names and resolving to OIDs) seem pretty
far away, so I'm still very much inclined to nudge users toward safer
best practices with search_path. I think SEARCH SYSTEM is a good start
there and doable for 17.

Regards,
Jeff Davis





Re: Optimize Arm64 crc32 implementation in PostgreSQL

2023-08-21 Thread Nathan Bossart
On Mon, Aug 21, 2023 at 09:32:42AM +, Xiang Gao wrote:
> Currently PostgreSQL has three different variants of a 32-bit CRC 
> calculation: CRC-32C, CRC-32(Ethernet polynomial),
> and a legacy CRC-32 version that uses the lookup table. Some ARMv8 (AArch64) 
> CPUs implement the CRC32 extension which
> is equivalent with CRC-32(Ethernet polynomial), so they can also benefit from 
> hardware acceleration.
> 
> Can I propose a patch to optimize crc32 calculation with Arm64 specific 
> instructions?

We have support for ARMv8 CRC instructions for CRC-32C (see
src/port/pg_crc32c_armv8.c), but AFAICT Postgres has no such optimization
for CRC-32.  The CRC-32 macros have a comment indicating that they are
currently only used in ltree and hstore, so there might not be terribly
much demand for hardware acceleration, though.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: C function to return double precision[][]

2023-08-21 Thread Joe Conway

On 8/21/23 15:31, Markur Sens wrote:

Is there any piece of code I could see how to achieve $subject ?
I haven’t found anything in the standard library or contrib modules.

I’m trying to build ArrayType ** of sorts and return a Datum of those but I 
can’t seem to manage memory correctly.


There is an example in PL/R here:

https://github.com/postgres-plr/plr/blob/20a1f133bcf2bc8f37ac23da191aea590d612619/pg_conversion.c#L1275

which points to here with number of dims == 2:

https://github.com/postgres-plr/plr/blob/20a1f133bcf2bc8f37ac23da191aea590d612619/pg_conversion.c#L1493

This is all generic to the element type (i.e. not specifically float8), 
but the needed type conversion stuff happens in here:


https://github.com/postgres-plr/plr/blob/20a1f133bcf2bc8f37ac23da191aea590d612619/plr.c#L1109

HTH,

--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: should frontend tools use syncfs() ?

2023-08-21 Thread Robert Haas
On Wed, Aug 16, 2023 at 11:50 PM Michael Paquier  wrote:
> >> Do we actually need --no-sync at all if --sync-method is around?  We
> >> could have an extra --sync-method=none at option level with --no-sync
> >> still around mainly for compatibility?  Or perhaps that's just
> >> over-designing things?
> >
> > I don't have a strong opinion.  We could take up deprecating --no-sync in a
> > follow-up thread, though.  Like you said, we'll probably need to keep it
> > around for backward compatibility, so it might not be worth the trouble.
>
> Okay, maybe that's not worth it.

Doesn't seem worth it to me. I think --no-sync is more intuitive than
--sync-method=none, it's certainly shorter, and it's a pretty
important setting because we use it when running the regression tests.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH] Add function to_oct

2023-08-21 Thread Dean Rasheed
On Mon, 21 Aug 2023 at 20:15, Nathan Bossart  wrote:
>
> I added some examples for negative inputs.
>
> I renamed it to to_bin().
>
> I reordered the functions in the docs.
>

OK, cool. This looks good to me.

Regards,
Dean




Re: SLRUs in the main buffer pool - Page Header definitions

2023-08-21 Thread Robert Haas
On Fri, Aug 18, 2023 at 12:15 PM Nathan Bossart
 wrote:
> I think I agree with Stephen.  We routinely make changes that require
> updates to extensions, and I doubt anyone is terribly wild about
> maintaining two SLRU systems for several years.

Yeah, maintaining two systems doesn't sound like a good idea.

However, this would be a big change. I'm not sure how we validate a
change of this magnitude. There are both correctness and performance
considerations. I saw there had been a few performance results on the
thread from Thomas that spawned this thread; but I guess we'd want to
do more research. One question is: how do you decide how many buffers
to use for each SLRU, and how many to leave available for regular
data?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-21 Thread Tom Lane
I spent some time reviewing the v4 patch.  I noted that
path_is_reparameterizable_by_child still wasn't modeling the pass/fail
behavior of reparameterize_path_by_child very well, because that
function checks this at every recursion level:

/*
 * If the path is not parameterized by the parent of the given relation,
 * it doesn't need reparameterization.
 */
if (!path->param_info ||
!bms_overlap(PATH_REQ_OUTER(path), child_rel->top_parent_relids))
return path;

So we might have a sub-path (eg a join input rel) that is not one of
the supported kinds, and yet we can succeed because parameterization
appears only in other sub-paths.  The patch as written will not crash
in such a case, but it might refuse to use a path that we previously
would have allowed.  So I think we need to put the same test into
path_is_reparameterizable_by_child, which requires adding child_rel
to its param list but is otherwise simple enough.

I also realized that this test is equivalent to 
PATH_PARAM_BY_PARENT(), which makes it really unnecessary for
createplan.c to test PATH_PARAM_BY_PARENT, so we don't need to
expose those macros globally after all.

(On the same logic, we could skip PATH_PARAM_BY_PARENT at the
call sites of path_is_reparameterizable_by_child.  I didn't
do that in the attached, mainly because it seems to make it
harder to understand/explain what is being tested.)

Another change I made is to put the path_is_reparameterizable_by_child
tests before the initial_cost_nestloop/add_path_precheck steps, on
the grounds that they're now cheap enough that we might as well do
them first.  The existing ordering of these steps was sensible when
we were doing the expensive reparameterization, but it seems a bit
unnatural IMO.

Lastly, although I'd asked for a test case demonstrating detection of
an unparameterizable sub-path, I ended up not using that, because
it seemed pretty fragile.  If somebody decides that
reparameterize_path_by_child ought to cover TidPaths, the test won't
prove anything any more.

So that led me to the attached v5, which seemed committable to me so I
set about back-patching it ... and it fell over immediately in v15, as
shown in the attached regression diffs from v15.  It looks to me like
we are now failing to recognize that reparameterized quals are
redundant with not-reparameterized ones, so this patch is somehow
dependent on restructuring that happened during the v16 cycle.
I don't have time to dig deeper than that, and I'm not sure that that
is an area we'd want to mess with in a back-patched bug fix.

What I'm thinking we ought to do instead for the back branches
is just refuse to generate a reparameterized path for tablesample
scans.  A minimal fix like that could be as little as

case T_Path:
+   if (path->pathtype == T_SampleScan)
+   return NULL;
FLAT_COPY_PATH(new_path, path, Path);
break;

This rejects more than it absolutely has to, because the
parameterization (that we know exists) might be in the path's
regular quals or tlist rather than in the tablesample.
So we could add something to see if the tablesample is
parameter-free, but I'm quite unsure that it's worth the
trouble.  There must be just about nobody using such cases,
or we'd have heard of this bug long ago.

(BTW, I did look at Ashutosh's idea of merging the
reparameterize_path_by_child and path_is_reparameterizable_by_child
functions, but I didn't think that would be an improvement,
because we'd have to clutter reparameterize_path_by_child with
a lot of should-I-skip-this-step tests.  Some of that could be
hidden in the macros, but a lot would not be.  Another issue
is that I do not think we can change reparameterize_path_by_child's
API contract in the back branches, because we advertise it as
something that FDWs and custom scan providers can use.)

Thoughts?

regards, tom lane

diff --git a/src/backend/optimizer/path/joinpath.c b/src/backend/optimizer/path/joinpath.c
index 821d282497..e2ecf5b14b 100644
--- a/src/backend/optimizer/path/joinpath.c
+++ b/src/backend/optimizer/path/joinpath.c
@@ -30,8 +30,9 @@
 set_join_pathlist_hook_type set_join_pathlist_hook = NULL;
 
 /*
- * Paths parameterized by the parent can be considered to be parameterized by
- * any of its child.
+ * Paths parameterized by a parent rel can be considered to be parameterized
+ * by any of its children, when we are performing partitionwise joins.  These
+ * macros simplify checking for such cases.  Beware multiple eval of args.
  */
 #define PATH_PARAM_BY_PARENT(path, rel)	\
 	((path)->param_info && bms_overlap(PATH_REQ_OUTER(path),	\
@@ -762,6 +763,20 @@ try_nestloop_path(PlannerInfo *root,
 	/* If we got past that, we shouldn't have any unsafe outer-join refs */
 	Assert(!have_unsafe_outer_join_ref(root, outerrelids, inner_paramrels));
 
+	/*
+	 * If the inner path is parameterized, it is parameterized by the topmost
+	 * parent of the outer rel, not

Re: CREATE FUNCTION ... SEARCH { DEFAULT | SYSTEM | SESSION }

2023-08-21 Thread Jeff Davis
On Mon, 2023-08-21 at 15:14 -0400, Robert Haas wrote:
> Another, related thing that I recently discovered would
> be useful is a way to say "I'd like to switch the search_path to X,
> but I'd also like to discover what the prevailing search_path was
> just
> before entering this function."

Interesting, that could probably be accommodated one way or another.

> However, my overall concern here is that this feels like it's
> reinventing the wheel. We already have a way of setting search_path;
> this gives us a second one.

In one sense, you are obviously right. We have a way to set search_path
for a function already, just like any other GUC.

But I don't look at the search_path as "just another GUC" when it comes
to executing a function. The source of the initial value of search_path
is more like the IMMUTABLE marker.

We can also do something with the knowledge the SEARCH marker gives us.
For instance, issue WARNINGs or ERRORs when someone uses a SEARCH
SESSION function in an index expression or constraint, or perhaps when
they try to declare a function IMMUTABLE in the first place.

In other words, the SEARCH clause tells us where search_path comes
from, not so much what it is specifically. I believe that tells us
something fundamental about the kind of function it is. If I tell you
nothing about a function except whether the search path comes from the
system or the session, you can imagine how it should be used (or not
used, as the case may be).

> I'm inclined to think that if there are semantics that we currently
> lack, we should think of extending the current syntax to support
> them.
> Right now you can SET search_path = 'specific value' or SET
> search_path FROM CURRENT or leave it out. We could introduce a new
> way
> of spelling "leave it out," like RESET search_path or whatever.

The thought occurred to me but any way I looked at it was messier and
less user-friendly. It feels like generalizing from search_path to all
GUCs, and then needing to specialize for search_path anyway.

For instance, if we want the default search_path to be the safe value
'pg_catalog, pg_temp', where would that default value come from? Or
instead, we could say that the default would be FROM CURRENT, which
would seem to generalize; but then we immediately run into the problem
that we don't want most GUCs to default to FROM CURRENT (because that
would capture the entire GUC state, which seems bad for several
reasons), and again we'd need to specialize for search_path.


In other words, search_path really *is* special. I don't think it's
great to generalize from it as though it were just like every other
GUC.

I do recognize that "SEARCH SYSTEM ... SET search_path = '...'" is
redundant, and that's not great. I just see the other options as worse,
but if I've misunderstood your approach then please clarify.

Regards,
Jeff Davis





Re: PG 16 draft release notes ready

2023-08-21 Thread Bruce Momjian
On Sat, Aug 19, 2023 at 12:59:47PM -0400, Bruce Momjian wrote:
> On Thu, Aug 17, 2023 at 08:37:28AM +0300, Pavel Luzanov wrote:
> > I can try to explain how I understand it myself.
> > 
> > In v15 and early, inheritance of granted to role privileges depends on
> > INHERIT attribute of a role:
> > 
> > create user alice;
> > grant pg_read_all_settings to alice;
> > 
> > By default privileges inherited:
> > \c - alice
> > show data_directory;
> >    data_directory
> > -
> >  /var/lib/postgresql/15/main
> > (1 row)
> > 
> > After disabling the INHERIT attribute, privileges are not inherited:
> > 
> > \c - postgres
> > alter role alice noinherit;
> > 
> > \c - alice
> > show data_directory;
> > ERROR:  must be superuser or have privileges of pg_read_all_settings to
> > examine "data_directory"
> > 
> > In v16 changing INHERIT attribute on alice role doesn't change inheritance
> > behavior of already granted roles.
> > If we repeat the example, Alice still inherits pg_read_all_settings
> > privileges after disabling the INHERIT attribute for the role.
> > 
> > Information for making decisions about role inheritance has been moved from
> > the role attribute to GRANT role TO role [WITH INHERIT|NOINHERIT] command
> > and can be viewed by the new \drg command:
> > 
> > \drg
> >     List of role grants
> >  Role name |  Member of   |   Options    | Grantor
> > ---+--+--+--
> >  alice | pg_read_all_settings | INHERIT, SET | postgres
> > (1 row)
> > 
> > Changing the INHERIT attribute for a role now will affect (as the default
> > value) only future GRANT commands without an INHERIT clause.
> 
> I was able to create this simple example to illustrate it:
> 
>   CREATE ROLE a1;
>   CREATE ROLE a2;
>   CREATE ROLE a3;
>   CREATE ROLE a4;
>   CREATE ROLE b INHERIT;
> 
>   GRANT a1 TO b WITH INHERIT TRUE;
>   GRANT a2 TO b WITH INHERIT FALSE;
> 
>   GRANT a3 TO b;
>   ALTER USER b NOINHERIT;
>   GRANT a4 TO b;
> 
>   \drg
>  List of role grants
>Role name | Member of |   Options| Grantor
>   ---+---+--+--
>b | a1| INHERIT, SET | postgres
>b | a2| SET  | postgres
>b | a3| INHERIT, SET | postgres
>b | a4| SET  | postgres
> 
> I will work on the relase notes adjustments for this and reply in a few
> days.

Attached is an applied patch that moves the inherit item into
incompatibilities. clarifies it, and splits out the ADMIN syntax item.

Please let me know if I need any other changes.  Thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.
diff --git a/doc/src/sgml/release-16.sgml b/doc/src/sgml/release-16.sgml
index c9c4fc07ca..c4ae566900 100644
--- a/doc/src/sgml/release-16.sgml
+++ b/doc/src/sgml/release-16.sgml
@@ -229,6 +229,24 @@ Collations and locales can vary between databases so having them as read-only se
 
 
 
+
+
+
+
+Role inheritance now controls the default inheritance status of member roles added during GRANT (Robert Haas)
+
+
+
+The role's default inheritance behavior can be overridden with the new GRANT ... WITH INHERIT clause.
+This allows inheritance of some roles and not others because the members' inheritance status is set at GRANT time.
+Previously the inheritance status of member roles was controlled only by the role's inheritance status, and
+changes to a role's inheritance status affected all previous and future member roles.
+
+
+
 

Re: PG 16 draft release notes ready

2023-08-21 Thread Bruce Momjian
On Sat, Aug 19, 2023 at 04:24:48AM +0200, Erwin Brandstetter wrote:
> I posted to pgsql-docs first, but was kindly redirected here by Jonathan:
> 
> The release notes for Postgres 16 says here:
> https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-PERFORMANCE
> 
> Same as here:
> https://momjian.us/pgsql_docs/release-16.html#RELEASE-16-PERFORMANCE
> 
>     Allow window functions to use ROWS mode internally when RANGE mode is
> specified but unnecessary (David Rowley)

Yes, I didn't like "specified" myself but never returned to improve it. 
I am now using:

Allow window functions to use the faster ROWS mode
internally when RANGE mode is active but unnecessary
 --
(David Rowley)

Can that be improved?

> But the improvement (fix to some degree) also applies to the much more common
> case where no mode has been specified, RANGE unfortunately being the default.
> That includes the most common use case "row_number() OVER (ORDER BY col)",
> where RANGE mode should not be applied to begin with, according to SQL specs.
> This is what made me investigate, test and eventually propose a fix in the
> first place. See:

Yes, very true.

> Also, I was hoping to be mentioned in the release note for working this out:
> 
>     Allow window functions to use the faster ROWS mode internally when RANGE
> mode is specified or would be default, but unnecessary (David Rowley, Erwin
> Brandstetter)

Uh, I have CC'ed David Rowley because that is unclear based on the
commit message.  I don't normally mention reviewers.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 09:27:51AM -0400, Robert Haas wrote:
> + * No authentication identity was set; this happens e.g. when the
> + * trust method is in use.  For audit purposes, log a breadcrumb to
> + * explain where in the HBA this happened.
> 
> Proposed rewrite: "Normally, if log_connections is set, the call to
> set_authn_id will log the connection. However, if that function is
> never called, perhaps because the trust method is in use, then we
> handle the logging here instead."

WFM.
--
Michael


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 10:49:16AM -0700, Jacob Champion wrote:
> On Sun, Aug 20, 2023 at 4:58 PM Michael Paquier  wrote:
> > Attached is a v3 to do these two things, with adjustments for two SSL
> > tests.  Any objections about it?
> 
> (Sorry for the long weekend delay.) No objections; you may want to
> adjust the comment above the test block in t/001_password.pl, as well.

There are additionally two more comments in the SSL tests that could
be removed, I guess.  Here's a v4, with Robert's latest suggestion
added.

> I will ask -- more as a rhetorical question than something to resolve
> for this patch, since the topic is going to come back with a vengeance
> for OAuth -- what purpose the consistency here is serving. If the OP
> wants to notice when a connection that should be using strong
> authentication is not, is it helpful to make that connection "look the
> same" in the logs? I understand we've been carrying the language
> "trust authentication method" for a long time, but is that really the
> only hang-up, or would there be pushback if I tried to change that
> too, sometime in the future?

I am not sure that we need to change this historic term, TBH.  Perhaps
it would be shorter to just rip off the trust method from the tree
with a deprecation period but that's not something I'm much in favor
off either (I use it daily for my own stuff, as one example).
Another, more conservative approach may be to make it a developer-only
option and discourage more its use in the docs.
--
Michael
From b6eacc5084b1b7b1f711d756a1b1bc96c358394f Mon Sep 17 00:00:00 2001
From: Jacob Champion 
Date: Wed, 16 Aug 2023 14:48:49 -0700
Subject: [PATCH v4] log_connections: add entries for "trust" connections

Allow DBAs to audit unexpected "trust" connections.  Previously, you had
to notice the absence of a "connection authenticated" line between the
"connection received" and "connection authorized" entries.  Now it's
called out explicitly.
---
 src/backend/libpq/auth.c  | 16 
 src/test/authentication/t/001_password.pl | 11 +++
 src/test/ssl/t/001_ssltests.pl|  6 ++
 3 files changed, 25 insertions(+), 8 deletions(-)

diff --git a/src/backend/libpq/auth.c b/src/backend/libpq/auth.c
index 315a24bb3f..f6a42257b4 100644
--- a/src/backend/libpq/auth.c
+++ b/src/backend/libpq/auth.c
@@ -645,6 +645,22 @@ ClientAuthentication(Port *port)
 #endif
 	}
 
+	if (Log_connections && (status == STATUS_OK) &&
+		!MyClientConnectionInfo.authn_id)
+	{
+		/*
+		 * Normally, if log_connections is set, the call to set_authn_id()
+		 * will log the connection.  However, if that function is never
+		 * called, perhaps because the trust method is in use, then we
+		 * handle the logging here instead.
+		 */
+		ereport(LOG,
+errmsg("connection authenticated: user=\"%s\" method=%s "
+	   "(%s:%d)",
+	   port->user_name, hba_authname(port->hba->auth_method),
+	   port->hba->sourcefile, port->hba->linenumber));
+	}
+
 	if (ClientAuthentication_hook)
 		(*ClientAuthentication_hook) (port, status);
 
diff --git a/src/test/authentication/t/001_password.pl b/src/test/authentication/t/001_password.pl
index 12552837a8..9ba4308689 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -136,13 +136,16 @@ SKIP:
 # Create a database to test regular expression.
 $node->safe_psql('postgres', "CREATE database regex_testdb;");
 
-# For "trust" method, all users should be able to connect. These users are not
-# considered to be authenticated.
+# For "trust" method, all users should be able to connect.
 reset_pg_hba($node, 'all', 'all', 'trust');
 test_conn($node, 'user=scram_role', 'trust', 0,
-	log_unlike => [qr/connection authenticated:/]);
+	log_like => [
+		qr/connection authenticated: user="scram_role" method=trust/
+	]);
 test_conn($node, 'user=md5_role', 'trust', 0,
-	log_unlike => [qr/connection authenticated:/]);
+	log_like => [
+		qr/connection authenticated: user="md5_role" method=trust/
+	]);
 
 # SYSTEM_USER is null when not authenticated.
 $res = $node->safe_psql('postgres', "SELECT SYSTEM_USER IS NULL;");
diff --git a/src/test/ssl/t/001_ssltests.pl b/src/test/ssl/t/001_ssltests.pl
index 76442de063..ce17a760fa 100644
--- a/src/test/ssl/t/001_ssltests.pl
+++ b/src/test/ssl/t/001_ssltests.pl
@@ -800,8 +800,7 @@ $node->connect_ok(
 	"$common_connstr user=ssltestuser sslcert=ssl/client.crt "
 	  . sslkey('client.key'),
 	"auth_option clientcert=verify-full succeeds with matching username and Common Name",
-	# verify-full does not provide authentication
-	log_unlike => [qr/connection authenticated:/],);
+	log_like => [qr/connection authenticated: user="ssltestuser" method=trust/],);
 
 $node->connect_fails(
 	"$common_connstr user=anotheruser sslcert=ssl/client.crt "
@@ -818,8 +817,7 @@ $node->connect_ok(
 	"$common_connstr user=yetanotheruser sslcert=ssl/client.crt "
 	  . sslkey('client.key'),
 	"auth_option clientcert=verify-ca succeed

Re: should frontend tools use syncfs() ?

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 04:08:46PM -0400, Robert Haas wrote:
> Doesn't seem worth it to me. I think --no-sync is more intuitive than
> --sync-method=none, it's certainly shorter, and it's a pretty
> important setting because we use it when running the regression tests.

No arguments against that ;)
--
Michael


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Isaac Morland
On Mon, 21 Aug 2023 at 19:23, Michael Paquier  wrote:

I am not sure that we need to change this historic term, TBH.  Perhaps
> it would be shorter to just rip off the trust method from the tree
> with a deprecation period but that's not something I'm much in favor
> off either (I use it daily for my own stuff, as one example).
> Another, more conservative approach may be to make it a developer-only
> option and discourage more its use in the docs.
>

I hope we're not really considering removing the "trust" method. For
testing and development purposes it's very handy — just tell the database,
running in a VM, to allow all connections and just believe who they say
they are from a client process running in the same or a different VM, with
no production data anywhere in site and no connection to the real network.

If people are really getting confused and using it in production, then
change the documentation to make it even more clear that it is a
non-authenticating setting which is there specifically to bypass security
in testing contexts. Ultimately, real tools have the ability to cut your
arm off, and our documentation just needs to make clear which parts of
Postgres are like that.


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Jacob Champion
On Mon, Aug 21, 2023 at 4:22 PM Michael Paquier  wrote:
> There are additionally two more comments in the SSL tests that could
> be removed, I guess.  Here's a v4, with Robert's latest suggestion
> added.

LGTM.

> I am not sure that we need to change this historic term, TBH.  Perhaps
> it would be shorter to just rip off the trust method from the tree
> with a deprecation period but that's not something I'm much in favor
> off either (I use it daily for my own stuff, as one example).
> Another, more conservative approach may be to make it a developer-only
> option and discourage more its use in the docs.

I don't think we should get rid of anonymous connections; there are
ways to securely authorize a client connection without ever
authenticating the entity at the other end. I'd just like the server
to call them what they are, because I think the distinction is
valuable for DBAs who are closely watching their systems.

--Jacob




Re: should frontend tools use syncfs() ?

2023-08-21 Thread Michael Paquier
On Fri, Aug 18, 2023 at 09:01:11AM -0700, Nathan Bossart wrote:
> On Thu, Aug 17, 2023 at 12:50:31PM +0900, Michael Paquier wrote:
>> SyncMethod may be a bit too generic as name for the option structure.
>> How about a PGSyncMethod or pg_sync_method?
> 
> In v4, I renamed this to DataDirSyncMethod and merged it with
> RecoveryInitSyncMethod.  I'm not wedded to the name, but that seemed
> generic enough for both use-cases.  As an aside, we need to be careful to
> distinguish these options from those for wal_sync_method.

Okay.

>>> Yeah, this crossed my mind.  Do you know of any existing examples of
>>> options with links to a common section?  One problem with this approach is
>>> that there are small differences in the wording for some of the frontend
>>> utilities, so it might be difficult to cleanly unite these sections.
>> 
>> The closest thing I can think of is Color Support in section
>> Appendixes, that describes something shared across a lot of binaries
>> (that would be 6 tools with this patch).
> 
> If I added a "syncfs() Caveats" appendix for the common parts of the docs,
> it would only say something like the following:
> 
>   Using syncfs may be a lot faster than using fsync, because it doesn't
>   need to open each file one by one.  On the other hand, it may be slower
>   if a file system is shared by other applications that modify a lot of
>   files, since those files will also be written to disk.  Furthermore, on
>   versions of Linux before 5.8, I/O errors encountered while writing data
>   to disk may not be reported to the calling program, and relevant error
>   messages may appear only in kernel logs.
> 
> Does that seem reasonable?  It would reduce the duplication a little bit,
> but I'm not sure it's really much of an improvement in this case.

This would cut 60% (?) of the documentation added by the patch for
these six tools, so that looks like an improvement to me.  Perhaps
other may disagree, so more opinions are welcome.

--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -43,15 +43,11 @@
 #ifndef FD_H
 #define FD_H
 
+#ifndef FRONTEND
+
 #include 
 #include 
 
Ugh.  So you need this part because pg_rewind's filemap.c includes
fd.h, and pg_rewind also needs file_utils.h.  This is not the fault of
your patch, but this does not make the situation better, either..  It
looks like we need to think harder about this layer.  An improvement
would be to split file_utils.c so as its frontend-only code is moved
to OBJS_FRONTEND in a new file with a new header?  It should be OK to
keep DataDirSyncMethod in file_utils.h as long as the split is clean.
--
Michael


signature.asc
Description: PGP signature


Re: Logging of matching pg_hba.conf entry during auth skips trust auth, potential security issue

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 07:43:56PM -0400, Isaac Morland wrote:
> I hope we're not really considering removing the "trust" method. For
> testing and development purposes it's very handy — just tell the database,
> running in a VM, to allow all connections and just believe who they say
> they are from a client process running in the same or a different VM, with
> no production data anywhere in site and no connection to the real network.

For some benchmarking scenarios, it can actually be useful when
testing cases where new connections are spawned as it bypasses
entirely the authentication path, moving the bottlenecks to different
areas one may want to stress.
--
Michael


signature.asc
Description: PGP signature


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Peter Smith
Hi Kuroda-san,

Here are some review comments for v22-0003.

(FYI, I was already mid-way through this review before you posted new v23*
patches, so I am posting it anyway in case some comments still apply.)

==
src/bin/pg_upgrade/check.c

1. check_for_lost_slots

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

1a
Maybe the comment should start uppercase for consistency with others.

~

1b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

2. check_for_lost_slots
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+   "\nWARNING: logical replication slot \"%s\" is in 'lost' state.",
+   PQgetvalue(res, i, i_slotname));
+ }
+
+

The braces {} are not needed anymore

~~~

3. check_for_confirmed_flush_lsn

+ /* logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) <= 1600)
+ return;

3a.
Maybe the comment should start uppercase for consistency with others.

~

3b.
IMO if you check < 1700 instead of <= 1600 it will be a better match with
the comment.

~~~

4. check_for_confirmed_flush_lsn
+ for (i = 0; i < ntups; i++)
+ {
+ pg_log(PG_WARNING,
+ "\nWARNING: logical replication slot \"%s\" has not consumed WALs yet",
+ PQgetvalue(res, i, i_slotname));
+ }
+

The braces {} are not needed anymore

==
src/bin/pg_upgrade/controldata.c

5. get_control_data
+ /*
+ * Gather latest checkpoint location if the cluster is newer or
+ * equal to 17. This is used for upgrading logical replication
+ * slots.
+ */
+ if (GET_MAJOR_VERSION(cluster->major_version) >= 17)

5a.
/newer or equal to 17/PG17 or later/

~~~

5b.
>= 17 should be >= 1700

~~~

6. get_control_data
+ {
+ char *slash = NULL;
+ uint64 upper_lsn, lower_lsn;
+
+ p = strchr(p, ':');
+
+ if (p == NULL || strlen(p) <= 1)
+ pg_fatal("%d: controldata retrieval problem", __LINE__);
+
+ p++; /* remove ':' char */
+
+ p = strpbrk(p, "01234567890ABCDEF");
+
+ /*
+ * Upper and lower part of LSN must be read separately
+ * because it is reported as %X/%X format.
+ */
+ upper_lsn = strtoul(p, &slash, 16);
+ lower_lsn = strtoul(++slash, NULL, 16);
+
+ /* And combine them */
+ cluster->controldata.chkpnt_latest =
+ (upper_lsn << 32) | lower_lsn;
+ }

Should 'upper_lsn' and 'lower_lsn' be declared as uint32? That seems a
better mirror for LSN_FORMAT_ARGS.

==
src/bin/pg_upgrade/info.c

7. get_logical_slot_infos
+
+ /*
+ * Do additional checks if slots are found on the old node. If something is
+ * found on the new node, a subsequent function
+ * check_new_cluster_is_empty() would report the name of slots and raise a
+ * fatal error.
+ */
+ if (cluster == &old_cluster && slot_count)
+ {
+ check_for_lost_slots(cluster);
+
+ if (!live_check)
+ check_for_confirmed_flush_lsn(cluster);
+ }

It somehow doesn't feel right for these extra checks to be jammed into this
function, just because you conveniently have the slot_count available.

On the NEW cluster side, there was extra checking in the
check_new_cluster() function.

For consistency, I think this OLD cluster checking should be done in the
check_and_dump_old_cluster() function -- see the "Check for various failure
cases" comment -- IMO this new fragment belongs there with the other checks.

==
src/bin/pg_upgrade/pg_upgrade.h

8.
  bool date_is_int;
  bool float8_pass_by_value;
  uint32 data_checksum_version;
+
+ XLogRecPtr chkpnt_latest;
 } ControlData;

I don't think the new field is particularly different from all the others
that it needs a blank line separator.

==
.../t/003_logical_replication_slots.pl

9.
 # Initialize old node
 my $old_publisher = PostgreSQL::Test::Cluster->new('old_publisher');
 $old_publisher->init(allows_streaming => 'logical');
-$old_publisher->start;

 # Initialize new node
 my $new_publisher = PostgreSQL::Test::Cluster->new('new_publisher');
 $new_publisher->init(allows_streaming => 'replica');

-my $bindir = $new_publisher->config_data('--bindir');
+# Initialize subscriber node
+my $subscriber = PostgreSQL::Test::Cluster->new('subscriber');
+$subscriber->init(allows_streaming => 'logical');

-$old_publisher->stop;
+my $bindir = $new_publisher->config_data('--bindir');

~

Are those removal of the old_publisher start/stop changes that actually
should be done in the 0002 patch?

~~~

10.
 $old_publisher->safe_psql(
  'postgres', qq[
  SELECT pg_create_logical_replication_slot('test_slot2', 'test_decoding',
false, true);
+ SELECT count(*) FROM pg_logical_slot_get_changes('test_slot1', NULL,
NULL);
 ]);

~

What is the purpose of the added SELECT? It doesn't seem covered by the
comment.

~~~

11.
# Remove an unnecessary slot and generate WALs. These records would not be
# consumed before doing pg_upgrade, so that the upcoming test would fail.
$old_publisher->start;
$old_publisher->safe_psql(
'postgres', qq[
SELECT pg_drop_replication_slot('test_slot2');
CREATE TABLE tbl AS SELECT generate_series(1, 10) AS a;
]);
$old_publisher->stop

Re: should frontend tools use syncfs() ?

2023-08-21 Thread Nathan Bossart
On Tue, Aug 22, 2023 at 08:56:26AM +0900, Michael Paquier wrote:
> --- a/src/include/storage/fd.h
> +++ b/src/include/storage/fd.h
> @@ -43,15 +43,11 @@
>  #ifndef FD_H
>  #define FD_H
>  
> +#ifndef FRONTEND
> +
>  #include 
>  #include 
>  
> Ugh.  So you need this part because pg_rewind's filemap.c includes
> fd.h, and pg_rewind also needs file_utils.h.  This is not the fault of
> your patch, but this does not make the situation better, either..  It
> looks like we need to think harder about this layer.  An improvement
> would be to split file_utils.c so as its frontend-only code is moved
> to OBJS_FRONTEND in a new file with a new header?  It should be OK to
> keep DataDirSyncMethod in file_utils.h as long as the split is clean.

I'm hoping there's a simpler path forward here.  pg_rewind only needs the
following lines from fd.h:

/* Filename components */
#define PG_TEMP_FILES_DIR "pgsql_tmp"
#define PG_TEMP_FILE_PREFIX "pgsql_tmp"

Maybe we could move these to file_utils.h instead.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Peter Smith
Here are some review comments for v23-0001

==
1. GENERAL -- git apply

The patch fails to apply cleanly. There are whitespace warnings.

[postgres@CentOS7-x64 oss_postgres_misc]$ git apply
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102:
trailing whitespace.
# SHUTDOWN_CHECKPOINT record.
warning: 1 line adds whitespace errors.

~~~

2. GENERAL -- which patch is the real one and which is the copy?

IMO this patch has become muddled.

Amit recently created a new thread [1] "persist logical slots to disk
during shutdown checkpoint", which I thought was dedicated to the
discussion/implementation of this 0001 patch. Therefore, I expected any
0001 patch changes to would be made only in that new thread from now on,
(and maybe you would mirror them here in this thread).

But now I see there are v23-0001 patch changes here again. So, now the same
patch is in 2 places and they are different. It is no longer clear to me
which 0001 ("Always persist...") patch is the definitive one, and which one
is the copy.

??

==
contrib/test_decoding/t/002_always_persist.pl

3.
+
+# Copyright (c) 2023, PostgreSQL Global Development Group
+
+# Test logical replication slots are always persist to disk during a
shutdown
+# checkpoint.
+
+use strict;
+use warnings;
+
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;


/always persist/always persisted/

~~~

4.
+
+# Test set-up
+my $node = PostgreSQL::Test::Cluster->new('test');
+$node->init(allows_streaming => 'logical');
+$node->append_conf('postgresql.conf', q{
+autovacuum = off
+checkpoint_timeout = 1h
+});
+
+$node->start;
+
+# Create table
+$node->safe_psql('postgres', "CREATE TABLE test (id int)");

Maybe it is better to call the table something different instead of the
same name as the cluster. e.g. 'test_tbl' would be better.

~~~

5.
+# Shutdown the node once to do shutdown checkpoint
+$node->stop();
+

SUGGESTION
# Stop the node to cause a shutdown checkpoint

~~~

6.
+# Fetch checkPoint from the control file itself
+my ($stdout, $stderr) = run_command([ 'pg_controldata', $node->data_dir ]);
+my @control_data = split("\n", $stdout);
+my $latest_checkpoint = undef;
+foreach (@control_data)
+{
+ if ($_ =~ /^Latest checkpoint location:\s*(.*)$/mg)
+ {
+ $latest_checkpoint = $1;
+ last;
+ }
+}
+die "No checkPoint in control file found\n"
+  unless defined($latest_checkpoint);
+

6a.
/checkPoint/checkpoint/  (2x)

~

6b.
+die "No checkPoint in control file found\n"

SUGGESTION
"No checkpoint found in control file\n"

--
[1]
https://www.postgresql.org/message-id/CAA4eK1JzJagMmb_E8D4au=gyqkxox0afnbm1fbp7sy7t4yw...@mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia


Re: should frontend tools use syncfs() ?

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 06:44:07PM -0700, Nathan Bossart wrote:
> I'm hoping there's a simpler path forward here.  pg_rewind only needs the
> following lines from fd.h:
> 
>   /* Filename components */
>   #define PG_TEMP_FILES_DIR "pgsql_tmp"
>   #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
> 
> Maybe we could move these to file_utils.h instead.  WDYT?

I guess so..  At the same time, something can be said about
pg_checksums that redeclares PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR
because it does not want to include fd.h and its sync routines.
--
Michael


signature.asc
Description: PGP signature


Re: should frontend tools use syncfs() ?

2023-08-21 Thread Nathan Bossart
On Tue, Aug 22, 2023 at 10:50:01AM +0900, Michael Paquier wrote:
> On Mon, Aug 21, 2023 at 06:44:07PM -0700, Nathan Bossart wrote:
>> I'm hoping there's a simpler path forward here.  pg_rewind only needs the
>> following lines from fd.h:
>> 
>>  /* Filename components */
>>  #define PG_TEMP_FILES_DIR "pgsql_tmp"
>>  #define PG_TEMP_FILE_PREFIX "pgsql_tmp"
>> 
>> Maybe we could move these to file_utils.h instead.  WDYT?
> 
> I guess so..  At the same time, something can be said about
> pg_checksums that redeclares PG_TEMP_FILE_PREFIX and PG_TEMP_FILES_DIR
> because it does not want to include fd.h and its sync routines.

This would look something like the attached patch.  I think this is nicer.
With this patch, we don't have to choose between including fd.h or
redefining the macros in the frontend code.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From af40f64933b5d8829b7e1641ce57e4ef8fd5a2c2 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Aug 2023 19:05:14 -0700
Subject: [PATCH 1/1] move PG_TEMP_FILE* macros to file_utils.h

---
 src/backend/backup/basebackup.c |  2 +-
 src/backend/storage/file/fileset.c  |  1 +
 src/bin/pg_checksums/pg_checksums.c | 10 --
 src/bin/pg_rewind/filemap.c |  2 +-
 src/include/common/file_utils.h |  4 
 src/include/storage/fd.h|  4 
 6 files changed, 7 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..5d66014499 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -25,6 +25,7 @@
 #include "commands/defrem.h"
 #include "common/compression.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
@@ -37,7 +38,6 @@
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/dsm_impl.h"
-#include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
index e9951b0269..84cae32548 100644
--- a/src/backend/storage/file/fileset.c
+++ b/src/backend/storage/file/fileset.c
@@ -25,6 +25,7 @@
 
 #include "catalog/pg_tablespace.h"
 #include "commands/tablespace.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
 #include "storage/ipc.h"
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 19eb67e485..9011a19b4e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -52,16 +52,6 @@ typedef enum
 	PG_MODE_ENABLE
 } PgChecksumMode;
 
-/*
- * Filename components.
- *
- * XXX: fd.h is not declared here as frontend side code is not able to
- * interact with the backend-side definitions for the various fsync
- * wrappers.
- */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e20..58280d9abc 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -27,12 +27,12 @@
 #include 
 
 #include "catalog/pg_tablespace_d.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "common/string.h"
 #include "datapagemap.h"
 #include "filemap.h"
 #include "pg_rewind.h"
-#include "storage/fd.h"
 
 /*
  * Define a hash table which we can use to store information about the files
diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h
index b7efa1226d..dd1532bcb0 100644
--- a/src/include/common/file_utils.h
+++ b/src/include/common/file_utils.h
@@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd,
 
 extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset);
 
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
 #endif			/* FILE_UTILS_H */
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..aea30c0622 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -195,8 +195,4 @@ extern int	durable_unlink(const char *fname, int elevel);
 extern void SyncDataDirectory(void);
 extern int	data_sync_elevel(int elevel);
 
-/* Filename components */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 #endif			/* FD_H */
-- 
2.25.1



Re: Wrong rows estimations with joins of CTEs slows queries by more than factor 500

2023-08-21 Thread Jian Guo
Sure, Tomas.

Here is the PG Commitfest link: https://commitfest.postgresql.org/44/4510/

From: Tomas Vondra 
Sent: Monday, August 21, 2023 18:56
To: Jian Guo ; Hans Buschmann ; 
pgsql-hackers@lists.postgresql.org 
Cc: Zhenghua Lyu 
Subject: Re: Wrong rows estimations with joins of CTEs slows queries by more 
than factor 500

!! External Email

On 8/21/23 10:16, Jian Guo wrote:
> Hi hackers,
>
> I found a new approach to fix this issue, which seems better, so I would
> like to post another version of the patch here. The origin patch made
> the assumption of the values of Vars from CTE must be unique, which
> could be very wrong. This patch examines variables for Vars inside CTE,
> which avoided the bad assumption, so the results could be much more
> accurate.
>

No problem with posting a reworked patch to the same thread, but I'll
repeat my suggestion to register this in the CF app [1]. The benefit is
that people are more likely to notice the patch and also cfbot [2] will
run regression tests.

[1] 
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcommitfest.postgresql.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=OmMo0lQtSvDFWu8VbI0ZorDpZ3BuxsmkTjagGfnryEc%3D&reserved=0
[2] 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fcfbot.cputube.org%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=xTYDRybLm0AYyvRNqtN85fZWeUJREshIq7PYhz8bMgU%3D&reserved=0


--
Tomas Vondra
EnterpriseDB: 
https://nam04.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.enterprisedb.com%2F&data=05%7C01%7Cgjian%40vmware.com%7C4562125966b248a1e18308dba2353d8f%7Cb39138ca3cee4b4aa4d6cd83d9dd62f0%7C0%7C0%7C638282121775872407%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=Zn4W8nPFmKxCLQ3XM555UlnM%2F9q1XLkJU5PRxT1VSig%3D&reserved=0
The Enterprise PostgreSQL Company

!! External Email: This email originated from outside of the organization. Do 
not click links or open attachments unless you recognize the sender.


Re: Oversight in reparameterize_path_by_child leading to executor crash

2023-08-21 Thread Richard Guo
On Tue, Aug 22, 2023 at 4:48 AM Tom Lane  wrote:

> I spent some time reviewing the v4 patch.  I noted that
> path_is_reparameterizable_by_child still wasn't modeling the pass/fail
> behavior of reparameterize_path_by_child very well, because that
> function checks this at every recursion level:
>
> /*
>  * If the path is not parameterized by the parent of the given
> relation,
>  * it doesn't need reparameterization.
>  */
> if (!path->param_info ||
> !bms_overlap(PATH_REQ_OUTER(path), child_rel->top_parent_relids))
> return path;
>
> So we might have a sub-path (eg a join input rel) that is not one of
> the supported kinds, and yet we can succeed because parameterization
> appears only in other sub-paths.  The patch as written will not crash
> in such a case, but it might refuse to use a path that we previously
> would have allowed.  So I think we need to put the same test into
> path_is_reparameterizable_by_child, which requires adding child_rel
> to its param list but is otherwise simple enough.


sigh.. I overlooked this check.  You're right.  We have to have this
same test in path_is_reparameterizable_by_child.


> So that led me to the attached v5, which seemed committable to me so I
> set about back-patching it ... and it fell over immediately in v15, as
> shown in the attached regression diffs from v15.  It looks to me like
> we are now failing to recognize that reparameterized quals are
> redundant with not-reparameterized ones, so this patch is somehow
> dependent on restructuring that happened during the v16 cycle.
> I don't have time to dig deeper than that, and I'm not sure that that
> is an area we'd want to mess with in a back-patched bug fix.


I looked at this and I think the culprit is that in create_nestloop_path
we are failing to recognize those restrict_clauses that have been moved
into the inner path.  In v16, we have the new serial number stuff to
help detect such clauses and that works very well.  In v15, we are still
using join_clause_is_movable_into() for that purpose.  It does not work
well with the patch because now in create_nestloop_path the inner path
is still parameterized by top-level parents, while the restrict_clauses
already have been adjusted to refer to the child rels.  So the check
performed by join_clause_is_movable_into() would always fail.

I'm wondering if we can instead adjust the 'inner_req_outer' in
create_nestloop_path before we perform the check to work around this
issue for the back branches, something like

@@ -2418,6 +2419,15 @@ create_nestloop_path(PlannerInfo *root,
NestPath   *pathnode = makeNode(NestPath);
Relids  inner_req_outer = PATH_REQ_OUTER(inner_path);

+   /*
+* Adjust the parameterization information, which refers to the topmost
+* parent.
+*/
+   inner_req_outer =
+   adjust_child_relids_multilevel(root, inner_req_outer,
+  outer_path->parent->relids,
+
 outer_path->parent->top_parent_relids);
+

And with it we do not need the changes as in the patch for
create_nestloop_path any more.

Thanks
Richard


Re: Extract numeric filed in JSONB more effectively

2023-08-21 Thread Andy Fan
(Just relalized this was sent to chap in private, resent it again).

On Mon, Aug 21, 2023 at 6:50 PM Andy Fan  wrote:

>
>
> On Mon, Aug 21, 2023 at 11:19 AM Chapman Flack 
> wrote:
>
>> On 2023-08-20 21:31, Andy Fan wrote:
>> > Highlighting the user case of makeRelableType is interesting! But using
>> > the Oid directly looks more promising for this question IMO, it looks
>> > like:
>> > "you said we can put anything in this arg,  so I put an OID const
>> > here",
>> > seems nothing is wrong.
>>
>> Perhaps one of the more senior developers will chime in, but to me,
>> leaving out the relabel nodes looks more like "all of PostgreSQL's
>> type checking happened before the SupportRequestSimplify, so nothing
>> has noticed that we rewrote the tree with mismatched types, and as
>> long as nothing crashes we sort of got away with it."
>>
>> Suppose somebody writes an extension to double-check that plan
>> trees are correctly typed. Or improves EXPLAIN to check a little more
>> carefully than it seems to. Omitting the relabel nodes could spell
>> trouble then.
>>
>> Or, someone more familiar with the code than I am might say "oh,
>> mismatches like that are common in rewritten trees, we live with it."
>> But unless somebody tells me that, I'm not believing it.
>>
>
> Well, this sounds long-lived.  I kind of prefer to label it now.  Adding
> the 3rd commit to relabel the arg and return value.
>
>
>> But I would say we have proved the concept of SupportRequestSimplify
>> for this task. :)
>>
>
> Yes,  this is great!
>
>
>> Now, it would make me happy to further reduce some of the code
>> duplication between the original and the _type versions of these
>> functions. I see that you noticed the duplication in the case of
>> jsonb_extract_path, and you factored out jsonb_get_jsonbvalue so
>> it could be reused. There is also some duplication with object_field
>> and array_element.
>
>
Yes, compared with jsonb_extract_path,  object_field and array_element
just have much less duplication, which are 2 lines and 6 lines separately.


> (Also, we may have overlooked jsonb_path_query
>> and jsonb_path_query_first as candidates for the source of the
>> cast. Two more candidates; five total.)
>>
>
I can try to add them at the same time when we talk about the
infrastruct,  thanks for the hint!


>> Here is one way this could be structured. Observe that every one
>> of those five source candidates operates in two stages:
>>
>
> I'm not very excited with this manner, reasons are: a).  It will have
> to emit more steps in ExprState->steps which will be harmful for
> execution. The overhead  is something I'm not willing to afford.
> b). this manner requires more *internal*, which is kind of similar
> to "void *"  in C.  Could you explain more about the benefits of this?
>
> --
> Best Regards
> Andy Fan
>


-- 
Best Regards
Andy Fan


v10-0003-relabel-the-arg-and-resultvalue-with-INTERNALOID.patch
Description: Binary data


v10-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data


v10-0002-convert-anyelement-to-internal.patch
Description: Binary data


Re: [PATCH] Add function to_oct

2023-08-21 Thread John Naylor
On Tue, Aug 22, 2023 at 3:10 AM Dean Rasheed 
wrote:
>
> OK, cool. This looks good to me.

LGTM too.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: should frontend tools use syncfs() ?

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 07:06:32PM -0700, Nathan Bossart wrote:
> This would look something like the attached patch.  I think this is nicer.
> With this patch, we don't have to choose between including fd.h or
> redefining the macros in the frontend code.

Yes, this one is moving the needle in the good direction.  +1.
--
Michael


signature.asc
Description: PGP signature


Re: Make all Perl warnings fatal

2023-08-21 Thread Michael Paquier
On Mon, Aug 21, 2023 at 11:51:24AM -0400, Andrew Dunstan wrote:
> It's not really the same as -Werror, because many warnings can be generated
> at runtime rather than compile-time.
> 
> Still, I guess that might not matter too much since apart from plperl we
> only use perl for building / testing.

However, is it possible to trust the out-of-core perl modules posted
on CPAN, assuming that these will never produce warnings?  I've never
seen any issues with IPC::Run in these last years, so perhaps that's
OK in the long-run.

> Regarding the dangers mentioned, I guess we can undo it if it proves a
> nuisance.

Yeah.  I am wondering what the buildfarm would say with this change.

> +1 to getting rid if the unnecessary call to getprotobyname().

Looking around here..
https://perldoc.perl.org/perlipc#Sockets%3A-Client%2FServer-Communication

Hmm.  Are you sure that this is OK even in the case where the TAP
tests run on Windows without unix-domain socket support?  The CI runs
on Windows, but always with unix domain sockets around as far as I
know.
--
Michael


signature.asc
Description: PGP signature


Re: persist logical slots to disk during shutdown checkpoint

2023-08-21 Thread Amit Kapila
On Mon, Aug 21, 2023 at 6:36 PM Ashutosh Bapat
 wrote:
>
> On Sun, Aug 20, 2023 at 8:40 AM Amit Kapila  wrote:
> >
> > The other possibility is that we introduce yet another dirty flag for
> > slots, say dirty_for_shutdown_checkpoint which will be set when we
> > update confirmed_flush LSN. The flag will be cleared each time we
> > persist the slot but we won't persist if only this flag is set. We can
> > then use it during the shutdown checkpoint to decide whether to
> > persist the slot.
>
> There are already two booleans controlling dirty-ness of slot, dirty
> and just_dirty. Adding third will created more confusion.
>
> Another idea is to record the confirm_flush_lsn at the time of
> persisting the slot. We can use it in two different ways 1. to mark a
> slot dirty and persist if the last confirm_flush_lsn when slot was
> persisted was too far from the current confirm_flush_lsn of the slot.
> 2. at shutdown checkpoint, persist all the slots which have these two
> confirm_flush_lsns different.
>

I think using it in the second (2) way sounds advantageous as compared
to storing another dirty flag because this requires us to update
last_persisted_confirm_flush_lsn only while writing the slot info.
OTOH, having a flag dirty_for_shutdown_checkpoint will require us to
update it each time we update confirm_flush_lsn under spinlock at
multiple places. But, I don't see the need of doing what you proposed
in (1) as the use case for it is very minor, basically this may
sometimes help us to avoid decoding after crash recovery.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Amit Kapila
On Tue, Aug 22, 2023 at 7:19 AM Peter Smith  wrote:
>
> Here are some review comments for v23-0001
>
> ==
> 1. GENERAL -- git apply
>
> The patch fails to apply cleanly. There are whitespace warnings.
>
> [postgres@CentOS7-x64 oss_postgres_misc]$ git apply 
> ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch
> ../patches_misc/v23-0001-Always-persist-to-disk-logical-slots-during-a-sh.patch:102:
>  trailing whitespace.
> # SHUTDOWN_CHECKPOINT record.
> warning: 1 line adds whitespace errors.
>
> ~~~
>
> 2. GENERAL -- which patch is the real one and which is the copy?
>
> IMO this patch has become muddled.
>
> Amit recently created a new thread [1] "persist logical slots to disk during 
> shutdown checkpoint", which I thought was dedicated to the 
> discussion/implementation of this 0001 patch.
>

Right, I feel it would be good to discuss 0001 on the new thread.
Here, we can just include it for the sake of completeness and testing
purposes.

-- 
With Regards,
Amit Kapila.




Re: pg_rewind WAL segments deletion pitfall

2023-08-21 Thread Michael Paquier
On Fri, Aug 18, 2023 at 03:40:57PM +0900, torikoshia wrote:
> Thanks for the patch, I've marked this as ready-for-committer.
> 
> BTW, this issue can be considered a bug, right?
> I think it would be appropriate to provide backpatch.

Hmm, I agree that there is a good argument in back-patching as we have
the WAL files between the redo LSN and the divergence LSN, but
pg_rewind is not smart enough to keep them around.  If the archives of
the primary were not able to catch up, the old primary is as good as
kaput, and restore_command won't help here.

I don't like much this patch.  While it takes correctly advantage of
the backward record read logic from SimpleXLogPageRead() able to
handle correctly timeline jumps, it creates a hidden dependency in the
code between the hash table from filemap.c and the page callback.
Wouldn't it be simpler to build a list of the segment names using the
information from WALOpenSegment and build this list in
findLastCheckpoint()?  Also, I am wondering if we should be smarter
with any potential conflict handling between the source and the
target, rather than just enforcing a FILE_ACTION_NONE for all these
files.  In short, could it be better to copy the WAL file from the
source if it exists there?

+   /*
+* Some entries (WAL segments) already have an action assigned
+* (see SimpleXLogPageRead()).
+*/
+   if (entry->action == FILE_ACTION_UNDECIDED)
+   entry->action = decide_file_action(entry);

This change makes me a bit uneasy, per se my previous comment with the
additional code dependencies.

I think that this scenario deserves a test case.  If one wants to
emulate a delay in WAL archiving, it is possible to set
archive_command to a command that we know will fail, for instance.
--
Michael


signature.asc
Description: PGP signature


Re: PG 16 draft release notes ready

2023-08-21 Thread David Rowley
On Tue, 22 Aug 2023 at 10:08, Bruce Momjian  wrote:
>
> On Sat, Aug 19, 2023 at 04:24:48AM +0200, Erwin Brandstetter wrote:
> > I posted to pgsql-docs first, but was kindly redirected here by Jonathan:
> >
> > The release notes for Postgres 16 says here:
> > https://www.postgresql.org/docs/16/release-16.html#RELEASE-16-PERFORMANCE
> >
> > Same as here:
> > https://momjian.us/pgsql_docs/release-16.html#RELEASE-16-PERFORMANCE
> >
> > Allow window functions to use ROWS mode internally when RANGE mode is
> > specified but unnecessary (David Rowley)
>
> Yes, I didn't like "specified" myself but never returned to improve it.
> I am now using:
>
> Allow window functions to use the faster  linkend="syntax-window-functions">ROWS mode
> internally when RANGE mode is active but 
> unnecessary
>  --
> (David Rowley)
>
> Can that be improved?

Looks good to me.

> > Also, I was hoping to be mentioned in the release note for working this out:
> >
> > Allow window functions to use the faster ROWS mode internally when RANGE
> > mode is specified or would be default, but unnecessary (David Rowley, Erwin
> > Brandstetter)
>
> Uh, I have CC'ed David Rowley because that is unclear based on the
> commit message.  I don't normally mention reviewers.

I confirm that Erwin reported in [1] that row_number() is not affected
by the ROWS/RANGE option and that ROWS performs better due to the
executor having less work to do.  I am the author of the patch which
implemented that plus a few other window functions that also can
benefit from the same optimisation.  Based on this, I don't see any
problems with the credits for this item as they are currently in the
release notes.

David

[1] 
https://postgr.es/m/CAGHENJ7LBBszxS%2BSkWWFVnBmOT2oVsBhDMB1DFrgerCeYa_DyA%40mail.gmail.com




Re: DecodeInterval fixes

2023-08-21 Thread Michael Paquier
On Mon, Jul 10, 2023 at 10:42:31AM -0700, Jacob Champion wrote:
> On Mon, Jul 10, 2023 at 10:19 AM Reid Thompson  
> wrote:
>> I made a another pass through the separated patches, it looks good to
>> me.
> 
> LGTM too. (Thanks Tom for the clarification on ECPG.)

+SELECT INTERVAL '42 days 2 seconds ago ago';
+SELECT INTERVAL '2 minutes ago 5 days';
[...]
+SELECT INTERVAL 'hour 5 months';
+SELECT INTERVAL '1 year months days 5 hours';

0002 and 0003 make this stuff fail, but isn't there a risk that this
breaks applications that relied on these accidental behaviors?
Assuming that this is OK makes me nervous.
--
Michael


signature.asc
Description: PGP signature


Re: Testing autovacuum wraparound (including failsafe)

2023-08-21 Thread Michael Paquier
On Wed, Jul 12, 2023 at 01:47:51PM +0200, Daniel Gustafsson wrote:
> +# bump the query timeout to avoid false negatives on slow test syetems.
> +$ENV{PG_TEST_TIMEOUT_DEFAULT} = 600;
> Does this actually work?  Utils.pm read the environment variable at compile
> time in the BEGIN block so this setting won't be seen?  A quick testprogram
> seems to confirm this but I might be missing something.

I wish that this test were cheaper, without a need to depend on
PG_TEST_EXTRA..  Actually, note that you are forgetting to update the
documentation of PG_TEST_EXTRA with this new value of xid_wraparound.
--
Michael


signature.asc
Description: PGP signature


Re: Extract numeric filed in JSONB more effectively

2023-08-21 Thread Andy Fan
>
>
>>> Perhaps one of the more senior developers will chime in, but to me,
>>> leaving out the relabel nodes looks more like "all of PostgreSQL's
>>> type checking happened before the SupportRequestSimplify, so nothing
>>> has noticed that we rewrote the tree with mismatched types, and as
>>> long as nothing crashes we sort of got away with it."
>>>
>>> Suppose somebody writes an extension to double-check that plan
>>> trees are correctly typed. Or improves EXPLAIN to check a little more
>>> carefully than it seems to. Omitting the relabel nodes could spell
>>> trouble then.
>>>
>>> Or, someone more familiar with the code than I am might say "oh,
>>> mismatches like that are common in rewritten trees, we live with it."
>>> But unless somebody tells me that, I'm not believing it.
>>>
>>
>> Well, this sounds long-lived.  I kind of prefer to label it now.  Adding
>> the 3rd commit to relabel the arg and return value.
>>
>>
After we label it, we will get error like this:

select (a->'a')::int4 from m;
ERROR:  cannot display a value of type internal

However the following statement can work well.

 select ('{"a": 12345}'::jsonb->'a')::numeric;
 numeric
-
   12345

That's mainly because the later query doesn't go through the planner
support function. I didn't realize this before so the test case doesn't
catch it.  Will add the test case  in the next version.  The reason why
we get the error for the first query is because the query tree says
we should output  an "internal"  result at last and then pg doesn't
know how to output an internal data type. This is kind of in conflict
with our goal.

So currently the only choices are:  PATCH 001 or PATCH 001 + 002.

https://www.postgresql.org/message-id/CAKU4AWrs4Pzajm2_tgtUTf%3DCWfDJEx%3D3h45Lhqg7tNOVZw5YxA%40mail.gmail.com


-- 
Best Regards
Andy Fan


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Amit Kapila
On Mon, Aug 21, 2023 at 6:35 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > 9. check_for_logical_replication_slots
> >
> > + /* logical replication slots can be migrated since PG17. */
> > + if (GET_MAJOR_VERSION(new_cluster->major_version) <= 1600)
> > + return;
> >
> > IMO the code matches the comment better if you say < 1700 instead of <= 
> > 1600.
>
> Changed.
>

I think it is better to be consistent with the existing code. There
are a few other checks in pg_upgrade.c that uses <=, so it is better
to use it in the same way here.

Another minor comment:
Note that
+ if the new cluser uses different port number from old one,
+ ALTER
SUBSCRIPTION ... CONNECTION
+ command must be also executed on subscriber.

I think this is true in general as well and not specific to
pg_upgrade. So, we can avoid adding anything about connection change
here.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Peter Smith
Hi Kuroda-san,

Here are some review comments for patch v23-0002

==
1. GENERAL

Please try to run a spell/grammar check on all the text like commit message
and docs changes before posting (e.g. cut/paste the rendered text into some
tool like MSWord or Grammarly or ChatGPT or whatever tool you like and
cross-check). There are lots of small typos etc but one up-front check
could avoid long cycles of
reviewing/reporting/fixing/re-posting/confirming...

==
Commit message

2.
Note that slot restoration must be done after the final pg_resetwal command
during the upgrade because pg_resetwal will remove WALs that are required by
the slots. Due to ths restriction, the timing of restoring replication
slots is
different from other objects.

~

/ths/this/

==
doc/src/sgml/ref/pgupgrade.sgml

3.
+
+ Before you start upgrading the publisher cluster, ensure that the
+ subscription is temporarily disabled, by executing
+ ALTER SUBSCRIPTION ...
DISABLE.
+ After the upgrade is complete, then re-enable the subscription. Note
that
+ if the new cluser uses different port number from old one,
+ ALTER SUBSCRIPTION ...
CONNECTION
+ command must be also executed on subscriber.
+

3a.
BEFORE
After the upgrade is complete, then re-enable the subscription.

SUGGESTION
Re-enable the subscription after the upgrade.

~

3b.
/cluser/cluster/

~

3c.
Note that
+ if the new cluser uses different port number from old one,
+ ALTER SUBSCRIPTION ...
CONNECTION
+ command must be also executed on subscriber.

SUGGESTION
Note that if the new cluster uses a different port number ALTER
SUBSCRIPTION ... CONNECTION command must be also executed on the subscriber.

~~~

4.
+ 
+  
+   confirmed_flush_lsn (see )
+   of all slots on old cluster must be same as latest checkpoint
location.
+  
+ 

4a.
/on old cluster/on the old cluster/

~

4b.
/as latest/as the latest/
~~

5.
+ 
+  
+   The output plugins referenced by the slots on the old cluster must
be
+   installed on the new PostgreSQL executable directory.
+  
+ 

/installed on/installed in/ ??

~~

6.
+ 
+  
+   The new cluster must have
+   max_replication_slots
+   configured to value larger than the existing slots on the old
cluster.
+  
+ 

BEFORE
...to value larger than the existing slots on the old cluster.

SUGGESTION
...to a value greater than or equal to the number of slots present on the
old cluster.

==
src/bin/pg_upgrade/check.c

7. GENERAL - check_for_logical_replication_slots

AFAICT this function is called *only* for the new_cluster, yet there is no
Assert and no checking inside this function to ensure that is the case or
not.  It seems strange that the *cluster is passed as an argument but then
the whole function body and messages assume it can only be a new cluster
anyway.

IMO it would be better to rename this function to something like
check_new_cluster_logical_replication_slots() and DO NOT pass any parameter
but just use the global new_cluster within the function body.

~~~

8. check_for_logical_replication_slots

+ /* logical replication slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;

Start comment with uppercase for consistency.

~~~

9. check_for_logical_replication_slots

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+  "FROM pg_catalog.pg_replication_slots "
+  "WHERE slot_type = 'logical' AND "
+  "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but found
\"%s\"",
+ PQgetvalue(res, 0, 0));

/replication slot/replication slots/

~

10. check_for_logical_replication_slots

+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

SUGGESTION
Do additional checks when there are logical replication slots on the old
cluster.

~~~

11.
+ if (nslots > max_replication_slots)
+ pg_fatal("max_replication_slots must be greater than or equal to existing
logical "
+ "replication slots on old cluster.");

11a.
SUGGESTION
max_replication_slots (%d) must be greater than or equal to the number of
logical replication slots (%d) on the old cluster.

~

11b.
I think it would be helpful for the current values to be displayed in the
fatal message so the user will know more about what value to set. Notice
that my above suggestion has some substitution markers.

==
src/bin/pg_upgrade/info.c

12.
+static void
+print_slot_infos(LogicalSlotInfoArr *slot_arr)
+{
+ int slotnum;
+
+ for (slotnum = 0; slotnum < slot_arr->nslots; slotnum++)
+ {
+ LogicalSlotInfo *slot_info = &slot_arr->slots[slotnum];
+ pg_log(PG_VERBOSE, "slotname: \"%s\", plugin: \"%s\", two_phase: %d",
+   slot_info->slotname,
+   slot_info->plugin,
+   slot_info->two_phase);
+ }
+}

Better to have a blank line after the 'slot_info' declaration.

==
.../pg_upgrade/t/003_logical_replication_slots.pl

13.
+# 

Re: Support run-time partition pruning for hash join

2023-08-21 Thread David Rowley
On Tue, 22 Aug 2023 at 00:34, Andy Fan  wrote:
>
> On Mon, Aug 21, 2023 at 11:48 AM Richard Guo  wrote:
>> 1. All the join partition prunning decisions are made in createplan.c
>>where the best path tree has been decided.  This is not great.  Maybe
>>it's better to make it happen when we build up the path tree, so that
>>we can take the partition prunning into consideration when estimating
>>the costs.
>
>
> fwiw, the current master totally ignores the cost reduction for run-time
> partition prune, even for init partition prune.  So in some real cases,
> pg chooses a hash join just because the cost of nest loop join is
> highly over estimated.

This is true about the existing code. It's a very tricky thing to cost
given that the parameter values are always unknown to the planner.
The best we have for these today is the various hardcoded constants in
selfuncs.h. While I do agree that it's not great that the costing code
knows nothing about run-time pruning, I also think that run-time
pruning during execution with parameterised nested loops is much more
likely to be able to prune partitions and save actual work than the
equivalent with Hash Joins.  It's more common for the planner to
choose to Nested Loop when there are fewer outer rows, so the pruning
code is likely to be called fewer times with Nested Loop than with
Hash Join.

With Hash Join, it seems to me that the pruning must take place for
every row that makes it into the hash table.  There will be maybe
cases where the unioned set of partitions simply yields every
partition and all the work results in no savings. Pruning on a scalar
value seems much more likely to be able to prune away unneeded
Append/MergeAppend subnodes.

Perhaps there can be something adaptive in Hash Join which stops
trying to prune when all partitions must be visited.  On a quick
glance of the patch, I don't see any code in ExecJoinPartitionPrune()
which gives up trying to prune when the number of members in
part_prune_result is equal to the prunable Append/MergeAppend
subnodes.

It would be good to see some performance comparisons of the worst case
to see how much overhead the pruning code adds to Hash Join.  It may
well be that we need to consider two Hash Join paths, one with and one
without run-time pruning. It's pretty difficult to meaningfully cost,
as I already mentioned, however.

>> 4. Is it possible and worthwhile to extend the join partition prunning
>>mechanism to support nestloop and mergejoin also?
>
>
> In my current knowledge, we have to build the inner table first for this
> optimization?  so hash join and sort merge should be OK, but nestloop should
> be impossible unless I missed something.

But run-time pruning already works for Nested Loops... I must be
missing something here.

I imagine for Merge Joins a more generic approach would be better by
implementing parameterised Merge Joins (a.k.a zigzag merge joins).
The Append/MergeAppend node could then select the correct partition(s)
based on the current parameter value at rescan. I don't think any code
changes would be needed in node[Merge]Append.c for that to work.  This
could also speed up Merge Joins to non-partitioned tables when an
index is providing presorted input to the join.

David




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-21 Thread Amit Kapila
On Mon, Aug 21, 2023 at 6:32 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> > 2.
> > + /*
> > + * Checking for logical slots must be done before
> > + * check_new_cluster_is_empty() because the slot_arr attribute of the
> > + * new_cluster will be checked in that function.
> > + */
> > + if (count_logical_slots(&old_cluster))
> > + {
> > + get_logical_slot_infos(&new_cluster, false);
> > + check_for_logical_replication_slots(&new_cluster);
> > + }
> > +
> >   check_new_cluster_is_empty();
> >
> > Can't we simplify this checking by simply querying
> > pg_replication_slots for any usable slot something similar to what we
> > are doing in check_for_prepared_transactions()? We can add this check
> > in the function check_for_logical_replication_slots().
>
> Some checks were included to check_for_logical_replication_slots(), and
> get_logical_slot_infos() for new_cluster was removed as you said.
>

+ res = executeQueryOrDie(conn, "SELECT slot_name "
+   "FROM pg_catalog.pg_replication_slots "
+   "WHERE slot_type = 'logical' AND "
+   "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slot, but
found \"%s\"",
+ PQgetvalue(res, 0, 0));
+
+ PQclear(res);
+
+ nslots = count_logical_slots(&old_cluster);
+
+ /*
+ * Do additional checks when the logical replication slots have on the old
+ * cluster.
+ */
+ if (nslots)

Shouldn't these checks be reversed? I mean it would be better to test
the presence of slots on the new cluster if there is any slot present
on the old cluster.

-- 
With Regards,
Amit Kapila.