Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Ashutosh Bapat
On Fri, Sep 1, 2023 at 2:47 AM Joe Conway  wrote:
>
> On 8/31/23 12:52, Jeff Davis wrote:
> > On Thu, 2023-08-31 at 10:59 +0530, Ashutosh Bapat wrote:
> >> The server's FDW has to be postgres_fdw. So we have to handle the
> >> awkward dependency between core and postgres_fdw (an extension).
> >
> > That sounds more than just "awkward". I can't think of any precedent
> > for that and it seems to violate the idea of an "extension" entirely.
> >
> > Can you explain more concretely how we might resolve that?
>
>
> Maybe move postgres_fdw to be a first class built in feature instead of
> an extension?

Yes, that's one way.

Thinking larger, how about we allow any FDW to be used here. We might
as well, allow extensions to start logical receivers which accept
changes from non-PostgreSQL databases. So we don't have to make an
exception for postgres_fdw. But I think there's some value in bringing
together these two subsystems which deal with foreign data logically
(as in logical vs physical view of data).

-- 
Best Wishes,
Ashutosh Bapat




Re: Improve heapgetpage() performance, overhead from serializable

2023-08-31 Thread tender wang
This thread [1] also Improving the heapgetpage function, and looks like
this thread.

[1]
https://www.postgresql.org/message-id/a9f40066-3d25-a240-4229-ec2fbe94e7a5%40yeah.net

Muhammad Malik  于2023年9月1日周五 04:04写道:

> Hi,
>
> Is there a plan to merge this patch in PG16?
>
> Thanks,
> Muhammad
>
> --
> *From:* Andres Freund 
> *Sent:* Saturday, July 15, 2023 6:56 PM
> *To:* pgsql-hack...@postgresql.org 
> *Cc:* Thomas Munro 
> *Subject:* Improve heapgetpage() performance, overhead from serializable
>
> Hi,
>
> Several loops which are important for query performance, like
> heapgetpage()'s
> loop over all tuples, have to call functions like
> HeapCheckForSerializableConflictOut() and PredicateLockTID() in every
> iteration.
>
> When serializable is not in use, all those functions do is to to return.
> But
> being situated in a different translation unit, the compiler can't inline
> (without LTO at least) the check whether serializability is needed.  It's
> not
> just the function call overhead that's noticable, it's also that registers
> have to be spilled to the stack / reloaded from memory etc.
>
> On a freshly loaded pgbench scale 100, with turbo mode disabled, postgres
> pinned to one core. Parallel workers disabled to reduce noise.  All times
> are
> the average of 15 executions with pgbench, in a newly started, but
> prewarmed
> postgres.
>
> SELECT * FROM pgbench_accounts OFFSET 1000;
> HEAD:
> 397.977
>
> removing the HeapCheckForSerializableConflictOut() from heapgetpage()
> (incorrect!), to establish the baseline of what serializable costs:
> 336.695
>
> pulling out CheckForSerializableConflictOutNeeded() from
> HeapCheckForSerializableConflictOut() in heapgetpage(), and avoiding
> calling
> HeapCheckForSerializableConflictOut() in the loop:
> 339.742
>
> moving the loop into a static inline function, marked as pg_always_inline,
> called with static arguments for always_visible, check_serializable:
> 326.546
>
> marking the always_visible, !check_serializable case likely():
> 322.249
>
> removing TestForOldSnapshot() calls, which we pretty much already decided
> on:
> 312.987
>
>
> FWIW, there's more we can do, with some hacky changes I got the time down
> to
> 273.261, but the tradeoffs start to be a bit more complicated. And
> 397->320ms
> for something as core as this, is imo worth considering on its own.
>
>
>
>
> Now, this just affects the sequential scan case. heap_hot_search_buffer()
> shares many of the same pathologies.  I find it a bit harder to improve,
> because the compiler's code generation seems to switch between good / bad
> with
> changes that seems unrelated...
>
>
> I wonder why we haven't used PageIsAllVisible() in
> heap_hot_search_buffer() so
> far?
>
>
> Greetings,
>
> Andres Freund
>


Buildfarm failures on urocryon

2023-08-31 Thread vignesh C
Hi,

Recently urocryon has been failing with the following errors at [1]:
checking for icu-uc icu-i18n... no
configure: error: ICU library not found
If you have ICU already installed, see config.log for details on the
failure.  It is possible the compiler isn't looking in the proper directory.
Use --without-icu to disable ICU support.

configure:8341: checking whether to build with ICU support
configure:8371: result: yes
configure:8378: checking for icu-uc icu-i18n
configure:8440: result: no
configure:8442: error: ICU library not found
If you have ICU already installed, see config.log for details on the
failure.  It is possible the compiler isn't looking in the proper directory.
Use --without-icu to disable ICU support.

Urocryon has been failing for the last 17 days.

I think ICU libraries need to be installed in urocryon to fix this issue.

[1] - 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=urocryon&dt=2023-09-01%2001%3A09%3A11&stg=configure

Regards,
Vignesh




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Dilip Kumar
On Fri, Sep 1, 2023 at 9:47 AM Dilip Kumar  wrote:
>
> On Thu, Aug 31, 2023 at 7:56 PM Hayato Kuroda (Fujitsu)
>  wrote:
>
Some more comments on 0002

1.
+ conn = connectToServer(&new_cluster, "template1");
+
+ prep_status("Checking for logical replication slots");
+
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");


I think we should add some comment saying this query will only fetch
logical slots because the database name will always be NULL in the
physical slots.  Otherwise looking at the query it is very confusing
how it is avoiding the physical slots.

2.
+void
+get_old_cluster_logical_slot_infos(void)
+{
+ int dbnum;
+
+ /* Logical slots can be migrated since PG17. */
+ if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
+ return;
+
+ pg_log(PG_VERBOSE, "\nsource databases:");

I think we need to change some headings like "slot info source
databases:"  Or add an extra message saying printing slot information.

Before this patch, we were printing all the relation information so
message ordering was quite clear e.g.

source databases:
Database: "template1"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""
Database: "postgres"
relname: "pg_catalog.pg_largeobject", reloid: 2613, reltblspace: ""
relname: "pg_catalog.pg_largeobject_loid_pn_index", reloid: 2683,
reltblspace: ""

But after this patch slot information is also getting printed in a
similar fashion so it's very confusing now.  Refer
get_db_and_rel_infos() for how it is fetching all the relation
information first and then printing them.




3. One more problem is that the slot information and the execute query
messages are intermingled so it becomes more confusing, see the below
example of the latest messaging.  I think ideally we should execute
these queries first
and then print all slot information together instead of intermingling
the messages.

source databases:
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "template1"
executing: SELECT pg_catalog.set_config('search_path', '', false);
executing: SELECT slot_name, plugin, two_phase FROM
pg_catalog.pg_replication_slots WHERE wal_status <> 'lost' AND
database = current_database() AND temporary IS FALSE;
Database: "postgres"
slotname: "isolation_slot1", plugin: "pgoutput", two_phase: 0

4.  Looking at the above two comments I feel that now the order should be like
- Fetch all the db infos
get_db_infos()
- loop
   get_rel_infos()
   get_old_cluster_logical_slot_infos()

-- and now print relation and slot information per database
 print_db_infos()

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: persist logical slots to disk during shutdown checkpoint

2023-08-31 Thread vignesh C
On Fri, 1 Sept 2023 at 10:06, Amit Kapila  wrote:
>
> On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
>  wrote:
> >
> > On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  wrote:
> > >
> > > All but one. Normally, the idea of marking dirty is to indicate that
> > > we will actually write/flush the contents at a later point (except
> > > when required for correctness) as even indicated in the comments atop
> > > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > > that protocol at all the current places including CreateSlotOnDisk().
> > > So, we can probably do it here as well.
> >
> > yes
> >
>
> I think we should also ensure that slots are not invalidated
> (slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
> because we don't allow decoding from such slots, so we shouldn't
> include those.

Added this check.

Apart from this I have also fixed the following issues that were
agreed on: a) Setting slots to dirty in CheckPointReplicationSlots
instead of setting it in SaveSlotToPath b) The comments were moved
from ReplicationSlot and moved to CheckPointReplicationSlots c) Tests
will be run in autovacuum = off d) updating last_saved_confirmed_flush
based on cp.slotdata.confirmed_flush rather than
slot->data.confirmed_flush.
I have also added the commitfest entry for this at [1].

Thanks to Ashutosh/Amit for the feedback.
Attached v7 version patch has the changes for the same.
[1] - https://commitfest.postgresql.org/44/4536/

Regards,
Vignesh
From 328f500c8d2a1980415c889abcdc398741bebe10 Mon Sep 17 00:00:00 2001
From: Julien Rouhaud 
Date: Fri, 14 Apr 2023 13:49:09 +0800
Subject: [PATCH v7] Persist logical slots to disk during a shutdown checkpoint
 if required.

It's entirely possible for a logical slot to have a confirmed_flush LSN
higher than the last value saved on disk while not being marked as dirty.
Currently, it is not a major problem but a later patch adding support for
the upgrade of slots relies on that value being properly persisted to disk.

It can also help with avoiding processing the same transactions again in
some boundary cases after the clean shutdown and restart. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

Author: Julien Rouhaud, Vignesh C, Kuroda Hayato based on suggestions by
Ashutosh Bapat
Reviewed-by: Amit Kapila, Peter Smith
Discussion: http://postgr.es/m/CAA4eK1JzJagMmb_E8D4au=GYQkxox0AfNBm1FbP7sy7t4YWXPQ@mail.gmail.com
Discussion: http://postgr.es/m/TYAPR01MB58664C81887B3AF2EB6B16E3F5939@TYAPR01MB5866.jpnprd01.prod.outlook.com
---
 src/backend/access/transam/xlog.c |   2 +-
 src/backend/replication/slot.c|  28 -
 src/include/replication/slot.h|   5 +-
 src/test/recovery/meson.build |   1 +
 .../t/038_save_logical_slots_shutdown.pl  | 102 ++
 5 files changed, 132 insertions(+), 6 deletions(-)
 create mode 100644 src/test/recovery/t/038_save_logical_slots_shutdown.pl

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index f6f8adc72a..f26c8d18a6 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -7039,7 +7039,7 @@ static void
 CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
 {
 	CheckPointRelationMap();
-	CheckPointReplicationSlots();
+	CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
 	CheckPointSnapBuild();
 	CheckPointLogicalRewriteHeap();
 	CheckPointReplicationOrigin();
diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index bb09c4010f..185a58de4c 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -321,6 +321,7 @@ ReplicationSlotCreate(const char *name, bool db_specific,
 	slot->candidate_xmin_lsn = InvalidXLogRecPtr;
 	slot->candidate_restart_valid = InvalidXLogRecPtr;
 	slot->candidate_restart_lsn = InvalidXLogRecPtr;
+	slot->last_saved_confirmed_flush = InvalidXLogRecPtr;
 
 	/*
 	 * Create the slot on disk.  We haven't actually marked the slot allocated
@@ -1572,11 +1573,10 @@ restart:
 /*
  * Flush all replication slots to disk.
  *
- * This needn't actually be part of a checkpoint, but it's a convenient
- * location.
+ * is_shutdown is true in case of a shutdown checkpoint.
  */
 void
-CheckPointReplicationSlots(void)
+CheckPointReplicationSlots(bool is_shutdown)
 {
 	int			i;
 
@@ -1601,6 +1601,24 @@ CheckPointReplicationSlots(void)
 
 		/* save the slot to disk, locking is handled in SaveSlotToPath() */
 		sprintf(path, "pg_replslot/%s", NameStr(s->data.name));
+
+		/*
+		 * We won't ensure that the slot is persisted after the
+		 * confirmed_flush LSN is updated as that could lead to frequent
+		 * writes.  However, we need to ensure that we do persist the slots at
+		 * the time of shutdown whose

Re: pg_upgrade fails with in-place tablespace[

2023-08-31 Thread Michael Paquier
On Sat, Aug 19, 2023 at 08:11:28PM +0800, Rui Zhao wrote:
> Please refer to the TAP test I have included for a better understanding
> of my suggestion.

Sure, but it seems to me that my original question is not really
answered:  what's your use case for being able to support in-place
tablespaces in pg_upgrade?  The original use case being such
tablespaces is to ease the introduction of tests with primaries and
standbys, which is not something that really applies to pg_upgrade,
no?  Perhaps there is meaning in having more TAP tests with
tablespaces and a combination of primary/standbys, still having
in-place tablespaces does not really make much sense to me because, as
these are in the data folder, we don't use them to test the path
re-creation logic.

I think that we should just add a routine in check.c that scans
pg_tablespace, reporting all the non-absolute paths found with their
associated tablespace names.
--
Michael


signature.asc
Description: PGP signature


RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Hayato Kuroda (Fujitsu)
Dear Amit,

Thank you for giving suggestions! I think your fixes are good.
New patch set can be available in [1].

> Apart from this, I have addressed some of the comments raised by you
> for the 0003 patch. Please find the diff patch attached. I think we
> should combine 0002 and 0003 patches.

Yeah, combined.

> I have another comment on the patch:
> + /* Check there are no logical replication slots with a 'lost' state. */
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status = 'lost' AND "
> + "temporary IS FALSE;");
> 
> In this place, shouldn't we explicitly check for slot_type as logical?
> I think we should consistently check for slot_type in all the queries
> used in this patch.

Seems right, the condition was added to all the place.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866CDC13CA9D6B9F4451606F5E4A%40TYAPR01MB5866.jpnprd01.prod.outlook.com


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thanks for giving comments! PSA new version.
I replied only comment 8 because others were replied by Amit.

> .../t/003_logical_replication_slots.pl
> 
> 8. Consider adding one more test
> 
> Maybe there should also be some "live check" test performed (e.g.
> using --check, and a running old_cluster).
> 
> This would demonstrate pg_upgrade working successfully even when the
> WAL records are not consumed (because LSN checks would be skipped in
> check_old_cluster_for_valid_slots function).

I was ignored the case because it did not improve improve code coverage, but
indeed, no one has checked the feature. I'm still not sure what should be, but
added. I want to hear your opinions.



Furthermore, based on comments from Dilip [1], added the comment and
check_new_cluster_logical_replication_slots() was modified. IIUC pg_upgrade
does not have method to handle plural form, so if-statement was used.
If you have better options, please tell me.

[1]: 
https://www.postgresql.org/message-id/CAFiTN-tgm9wCTyG4co%2BVZhyFTnzh-KoPtYbuH9bRFmxroJ34EQ%40mail.gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




v29-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch
Description:  v29-0001-Persist-logical-slots-to-disk-during-a-shutdown-.patch


v29-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch
Description:  v29-0002-pg_upgrade-Allow-to-replicate-logical-replicatio.patch


Re: persist logical slots to disk during shutdown checkpoint

2023-08-31 Thread Amit Kapila
On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
 wrote:
>
> On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  wrote:
> >
> > All but one. Normally, the idea of marking dirty is to indicate that
> > we will actually write/flush the contents at a later point (except
> > when required for correctness) as even indicated in the comments atop
> > ReplicatioinSlotMarkDirty(). However, I see your point that we use
> > that protocol at all the current places including CreateSlotOnDisk().
> > So, we can probably do it here as well.
>
> yes
>

I think we should also ensure that slots are not invalidated
(slot.data.invalidated != RS_INVAL_NONE) before marking them dirty
because we don't allow decoding from such slots, so we shouldn't
include those.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Dilip Kumar
On Thu, Aug 31, 2023 at 7:56 PM Hayato Kuroda (Fujitsu)
 wrote:

> Thanks for giving comments!

Thanks

> > Some comments in 0002
> >
> > 1.
> > + res = executeQueryOrDie(conn, "SELECT slot_name "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE slot_type = 'logical' AND "
> > + "temporary IS FALSE;");
> >
> > What is the reason we are ignoring temporary slots here?  I think we
> > better explain in the comments.
>
> The temporary slots were expressly ignored while checking because such slots
> cannot exist after the upgrade. Before doing pg_upgrade, both old and new 
> cluster
> must be turned off, and they start/stop several times during the upgrade.
>
> How do you think?

LGTM

>
> > 2.
> > + res = executeQueryOrDie(conn, "SELECT slot_name "
> > + "FROM pg_catalog.pg_replication_slots "
> > + "WHERE slot_type = 'logical' AND "
> > + "temporary IS FALSE;");
> > +
> > + if (PQntuples(res))
> > + pg_fatal("New cluster must not have logical replication slots but
> > found \"%s\"",
> > + PQgetvalue(res, 0, 0));
> >
> > It looks a bit odd to me that first it is fetching all the logical
> > slots from the new cluster and then printing the name of one of the
> > slots.  If it is printing the name of the slots then shouldn't it be
> > printing all the slots' names or it should just say that there
> > existing slots on the new cluster without giving any names?  And if we
> > are planning for option 2 i.e. not printing the name then better to
> > put LIMIT 1 at the end of the query.
>
> I'm planning to change that the number of slots are reported by using 
> count(*).

Yeah, that seems a better option.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: SQL:2011 application time

2023-08-31 Thread Corey Huinker
>
> The PERIOD patch is not finished and includes some deliberately-failing
> tests. I did make some progress here finishing ALTER TABLE ADD PERIOD.
>

If it's ok with you, I need PERIODs for System Versioning, and planned on
developing a highly similar version, albeit closer to the standard. It
shouldn't interfere with your work as you're heavily leveraging range
types.


Re: Extract numeric filed in JSONB more effectively

2023-08-31 Thread Andy Fan
> An incompatible issue at error message level is found during test:
> create table jb(a jsonb);
> insert into jb select '{"a": "a"}'::jsonb;
> select (a->'a')::int4 from jb;
>
> master:   ERROR:  cannot cast jsonb string to type *integer*
> patch:  ERROR:  cannot cast jsonb string to type *numeric*
>
> That's mainly because we first extract the field to numeric and
> then cast it to int4 and the error raised at the first step and it
> doesn't know the final type.  One way to fix it is adding a 2nd
> argument for jsonb_finish_numeric for the real type, but
> it looks weird and more suggestions on this would be good.
>
>
v12 is attached to address the above issue, I added a new argument
named target_oid for jsonb_finish_numeric so that it can raise a
correct error message.  I also fixed the issue reported by opr_sanity
in this version.

Chap, do you still think we should refactor the code for the previous
existing functions like jsonb_object_field for less code duplication
purpose?  I think we should not do it because a). The code duplication
is just ~10 rows.  b).  If we do the refactor, we have to implement
two DirectFunctionCall1.   Point b) is the key reason I am not willing
to do it.  Or do I miss other important reasons?

-- 
Best Regards
Andy Fan


v12-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data


Re: Should we use MemSet or {0} for struct initialization?

2023-08-31 Thread Richard Guo
On Thu, Aug 31, 2023 at 7:07 PM John Naylor 
wrote:

> > On Thu, Aug 31, 2023 at 5:34 PM Richard Guo 
> wrote:
> > >
> > > While working on a bug in expandRecordVariable() I noticed that in the
> > > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > > that there is nothing wrong with this, just cannot get away with the
> > > inconsistency inside the same function (sorry for the nitpicking).
> > >
> > > Do we have a preference for how to initialize structures?  From
> 9fd45870
> > > it seems that we prefer to {0}.  So here is a trivial patch doing that.
>
> It seems to have been deliberately left that way in the wake of that
> commit, see:
>
>
> https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
>
> (If so, it deserves a comment to keep people from trying to change it...)
>

Thanks for pointing this out.  Yeah, struct initialization does not work
for some cases with padding bits, such as for a hash key we need to
clear the padding too.

The case in expandRecordVariable() mentioned here should be safe though,
maybe this is an omission from 9fd45870?

Thanks
Richard


Re: should frontend tools use syncfs() ?

2023-08-31 Thread Nathan Bossart
On Fri, Sep 01, 2023 at 10:40:12AM +0900, Michael Paquier wrote:
> That should be OK this way.  The extra running time is not really
> visible, right?

AFAICT it is negligible.  Presumably it could take a little longer if there
is a lot to sync on the file system, but I don't know if that's worth
worrying about.

> +command_ok([ 'initdb', '-S', $datadir, '--sync-method', 'fsync' ],
> +   'sync method fsync');
> 
> Removing this one may be fine, actually, because we test the sync
> paths on other places like pg_dump.

Done.

> This split is OK by me, so WFM.

Cool.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From bc2210a3cee0a852b65f74b544a0c0d23c59b78f Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Aug 2023 19:05:14 -0700
Subject: [PATCH v9 1/4] move PG_TEMP_FILE* macros to file_utils.h

---
 src/backend/backup/basebackup.c |  2 +-
 src/backend/postmaster/postmaster.c |  1 +
 src/backend/storage/file/fileset.c  |  1 +
 src/bin/pg_checksums/pg_checksums.c | 10 --
 src/bin/pg_rewind/filemap.c |  2 +-
 src/include/common/file_utils.h |  4 
 src/include/storage/fd.h|  4 
 7 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..5d66014499 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -25,6 +25,7 @@
 #include "commands/defrem.h"
 #include "common/compression.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
@@ -37,7 +38,6 @@
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/dsm_impl.h"
-#include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d7bfb28ff3..54e9bfb8c4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #include "access/xlogrecovery.h"
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/ip.h"
 #include "common/pg_prng.h"
 #include "common/string.h"
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
index e9951b0269..84cae32548 100644
--- a/src/backend/storage/file/fileset.c
+++ b/src/backend/storage/file/fileset.c
@@ -25,6 +25,7 @@
 
 #include "catalog/pg_tablespace.h"
 #include "commands/tablespace.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
 #include "storage/ipc.h"
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 19eb67e485..9011a19b4e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -52,16 +52,6 @@ typedef enum
 	PG_MODE_ENABLE
 } PgChecksumMode;
 
-/*
- * Filename components.
- *
- * XXX: fd.h is not declared here as frontend side code is not able to
- * interact with the backend-side definitions for the various fsync
- * wrappers.
- */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e20..58280d9abc 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -27,12 +27,12 @@
 #include 
 
 #include "catalog/pg_tablespace_d.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "common/string.h"
 #include "datapagemap.h"
 #include "filemap.h"
 #include "pg_rewind.h"
-#include "storage/fd.h"
 
 /*
  * Define a hash table which we can use to store information about the files
diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h
index b7efa1226d..dd1532bcb0 100644
--- a/src/include/common/file_utils.h
+++ b/src/include/common/file_utils.h
@@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd,
 
 extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset);
 
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
 #endif			/* FILE_UTILS_H */
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..aea30c0622 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -195,8 +195,4 @@ extern int	durable_unlink(const char *fname, int elevel);
 extern void SyncDataDirectory(void);
 extern int	data_sync_elevel(int elevel);
 
-/* Filename components */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 #endif			/* FD_H */
-- 
2.25.1

>From ae51ea0182e88398a5c8ebd644fea9e98229e1d0 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 31 Aug 2023 07:38:42 -0700
Subject: [PATCH v9 2/4] make common enum for sync methods

---
 src/backend/storage/file/fd.c

Re: More new SQL/JSON item methods

2023-08-31 Thread Chapman Flack

On 2023-08-31 20:50, Vik Fearing wrote:

  — An SQL/JSON item is defined recursively as any of the following:
...
• An SQL/JSON array, defined as an ordered list of zero or more
  SQL/JSON items, called the SQL/JSON elements of the SQL/JSON
  array.
...
  — An SQL/JSON sequence is an ordered list of zero or more SQL/JSON
items.


As I was thinking, because "an ordered list of zero or more SQL/JSON
items" is also exactly what an SQL/JSON array is, it seems at least
possible to implement things that are specified to return "SQL/JSON
sequence" by having them return an SQL/JSON array (the kind of thing
that isn't possible for XML(SEQUENCE), because there isn't any other
XML construct that can subsume it).

Still, it seems noteworthy that both terms are used in the spec, rather
than saying the function in question should return a JSON array. Makes
me wonder if there are some other details that make the two distinct.

Regards,
-Chap




Re: should frontend tools use syncfs() ?

2023-08-31 Thread Michael Paquier
On Thu, Aug 31, 2023 at 08:48:58AM -0700, Nathan Bossart wrote:
> On Thu, Aug 31, 2023 at 02:30:33PM +0900, Michael Paquier wrote:
> > - Should we have some regression tests?  We should only need one test
> > in one of the binaries to be able to stress the new code paths of
> > file_utils.c with syncfs.   The cheapest one may be pg_dump with a
> > dump in directory format?  Note that we have tests there that depend
> > on lz4 or gzip existing, which are conditional.
> 
> I added one for initdb in v8.

+my $supports_syncfs = check_pg_config("#define HAVE_SYNCFS 1"); 

That should be OK this way.  The extra running time is not really
visible, right?

+command_ok([ 'initdb', '-S', $datadir, '--sync-method', 'fsync' ],
+   'sync method fsync');

Removing this one may be fine, actually, because we test the sync
paths on other places like pg_dump.

> Ha, I was just thinking about this, too.  I actually split it into 3
> patches.  The first adds DataDirSyncMethod and uses it for
> recovery_init_sync_method.  The second adds syncfs() support in
> file_utils.c.  And the third adds the ability to specify syncfs in the
> frontend utilities.  WDYT?

This split is OK by me, so WFM.
--
Michael


signature.asc
Description: PGP signature


Re: Eliminate redundant tuple visibility check in vacuum

2023-08-31 Thread Peter Geoghegan
On Thu, Aug 31, 2023 at 3:35 PM Melanie Plageman
 wrote:
> Any inserting transaction which aborts after heap_page_prune()'s
> visibility check will now be of no concern to lazy_scan_prune(). Since
> we don't do the visibility check again, we won't find the tuple
> HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
> the array of tuples for vacuum to reap. This does mean that we won't
> reap and remove tuples whose inserting transactions abort right after
> heap_page_prune()'s visibility check. But, this doesn't seem like an
> issue.

It's definitely not an issue.

The loop is essentially a hacky way of getting a consistent picture of
which tuples should be treated as HEAPTUPLE_DEAD, and which tuples
need to be left behind (consistent at the level of each page and each
HOT chain, at least). Making that explicit seems strictly better.

> They may not be removed until the next vacuum, but ISTM it is
> actually worse to pay the cost of re-pruning the whole page just to
> clean up that one tuple. Maybe if that renders the page all visible
> and we can mark it as such in the visibility map -- but that seems
> like a relatively narrow use case.

The chances of actually hitting the retry are microscopic anyway. It
has nothing to do with making sure that dead tuples from aborted
tuples get removed for its own sake, or anything. Rather, the retry is
all about making sure that all TIDs that get removed from indexes can
only point to LP_DEAD stubs. Prior to Postgres 14, HEAPTUPLE_DEAD
tuples with storage would very occasionally be left behind, which made
life difficult in a bunch of other places -- for no good reason.

--
Peter Geoghegan




Re: Show inline comments from pg_hba lines in the pg_hba_file_rules view

2023-08-31 Thread Michael Paquier
On Fri, Sep 01, 2023 at 12:01:37AM +0200, Jim Jones wrote:
> Often we make changes in the pg_hba.conf and leave a #comment there, just in
> case we forget why the change was done. To avoid having to open the
> configuration file every time just to check the comments, it would be quite
> nice to have the option to read these comments in the pg_hba_file_rules
> view. Something like adding it in the end of the line and wrapping it with
> characters like "", '', {}, [], etc
> 
> For instance, this pg_hba.conf ...
> 
> # TYPE  DATABASE    USER   ADDRESS METHOD
> local   all all    trust [foo]
> host    all all    127.0.0.1/32    trust
> host    all all    ::1/128 trust [bar]
> local   replication all    trust
> host    replication all    127.0.0.1/32    trust
> hostssl replication all    ::1/128 cert map=abc [this will fail
> :)]
> 
> ... could be displayed like this

hba.c is complex enough these days (inclusion logic, tokenization of
the items) that I am not in favor of touching its code paths for
anything like that.  This is not something that can apply only to
pg_hba.conf, but to all configuration files.  And this touches in
adding support for a second type of comment format.  This is one of
these areas where we may want a smarter version of pg_read_file that
returns a SRF for (line_number, line_contents) of a file read?  Note
that it is possible to add comments at the end of a HBA entry already,
like:
local all all trust # My comment, and this is a correct HBA entry.
--
Michael


signature.asc
Description: PGP signature


Re: sandboxing untrusted code

2023-08-31 Thread Jeff Davis
On Thu, 2023-08-31 at 11:25 -0400, Robert Haas wrote:
> As a refresher, the scenario I'm talking about is any one in which
> one
> user, who I'll call Bob, does something that results in executing
> code
> provided by another user, who I'll call Alice. The most obvious way
> that this can happen is if Bob performs some operation that targets a
> table owned by Alice. That operation might be DML, like an INSERT or
> UPDATE; or it might be some other kind of maintenance command that
> can
> cause code execution, like REINDEX, which can evaluate index
> expressions.

REINDEX executes index expressions as the table owner. (You are correct
that INSERT executes index expressions as the inserting user.)

>  The code being executed might be run either as Alice or
> as Bob, depending on how it's been attached to the table and what
> operation is being performed and maybe whether some function or
> procedure that might contain it is SECURITY INVOKER or SECURITY
> DEFINER. Regardless of the details, our concern is that Alice's code
> might do something that Bob does not like. This is a particularly
> lively concern if the code happens to be running with the privileges
> of Bob, because then Alice might try to do something like access
> objects for which Bob has permissions and Alice does not.

Agreed.


> 1. Compute stuff. There's no restriction on the permissible amount of
> compute; if you call untrusted code, nothing prevents it from running
> forever.
> 2. Call other code. This may be done by a function call or a command
> such as CALL or DO, all subject to the usual permissions checks but
> no
> further restrictions.
> 3. Access the current session state, without modifying it. For
> example, executing SHOW or current_setting() is fine.
> 4. Transiently modify the current session state in ways that are
> necessarily reversed before returning to the caller. For example, an
> EXCEPTION block or a configuration change driven by proconfig is
> fine.
> 5. Produce messages at any log level. This includes any kind of
> ERROR.

Nothing in that list really exercises privileges (except #2?). If those
are the allowed set of things a sandboxed function can do, is a
sandboxed function equivalent to a function running with no privileges
at all?

Please explain #2 in a bit more detail. Whose EXECUTE privileges would
be used (I assume it depende on SECURITY DEFINER/INVOKER)? Would the
called code also be sandboxed?

> In general if we have a great big call stack that involves calling a
> whole bunch of functions either as SECURITY INVOKER or as SECURITY
> DEFINER, changing the session state is blocked unless the session
> user
> trusts the owners of all of those functions.

That clarifies the earlier mechanics you described, thank you.

>  And if we got to any of
> those functions by means of code attached directly to tables, like an
> index expression or default expression, changing the session state is
> blocked unless the session user also trusts the owners of those
> tables.
> 
> I see a few obvious objections to this line of attack that someone
> might raise, and I'd like to address them now. First, somebody might
> argue that this is too hard to implement.

That seems to be a response to my question above: "Isn't that a hard
problem; maybe impossible?".

Let me qualify that: if the function is written by Alice, and the code
is able to really exercise the privileges of the caller (Bob), then it
seems really hard to make it safe for the caller.

If the function is sandboxed such that it's not really using Bob's
privileges (it's just nominally running as Bob) that's a much more
tractable problem.

I believe there's some nuance to your proposal where some of Bob's
privileges could be used safely, but I'm not clear on exactly which
ones. The difficulty of the implementation would depend on these
details.

> Second, somebody might argue that full sandboxing is such a
> draconian set of restrictions that it will inconvenience users
> greatly
> or that it's pointless to even allow anything to be executed or
> something along those lines. I think that argument has some merit,
> but
> I think the restrictions sound worse than they actually are in
> context.

+100. We should make typical cases easy to secure.

> Even if they do something as
> simple as reading from another table, that's not necessarily going to
> dump and restore properly, even if it's secure, because the table
> ordering dependencies won't be clear to pg_dump.

A good point. A lot of these extraordinary cases are either incredibly
fragile or already broken.

> What if such a function wants to ALTER ROLE ...
> SUPERUSER? I think that's bonkers and should almost certainly be
> categorically denied.

...also agreed, a lot of these extraordinary cases are really just
surface area for attack with no legitimate use case.




One complaint (not an objection, because I don't think we have
the luxury of objecting to viable proposals when it comes to improving
our security mo

Re: More new SQL/JSON item methods

2023-08-31 Thread Vik Fearing

On 8/30/23 19:20, Chapman Flack wrote:

On 2023-08-30 12:28, Alvaro Herrera wrote:

    b) Otherwise, the result of JAE is the SQL/JSON sequence V_1,
   ..., V_n.


This has my Spidey sense tingling, as it seems very parallel to SQL/XML
where the result of XMLQUERY is to have type XML(SEQUENCE), which is a
type we do not have, and I'm not sure we have a type for "JSON sequence"
either, unless SQL/JSON makes it equivalent to a JSON array (which
I guess is conceivable, more easily than with XML). What does SQL/JSON
say about this SQL/JSON sequence type and how it should behave?


The SQL/JSON data model comprises SQL/JSON items and SQL/JSON sequences. 
The components of the SQL/JSON data model are:


  — An SQL/JSON item is defined recursively as any of the following:
• An SQL/JSON scalar, defined as a non-null value of any of the
  following predefined (SQL) types: character string with character
  set Unicode, numeric, Boolean, or datetime.
• An SQL/JSON null, defined as a value that is distinct from any
  value of any SQL type. NOTE 109 — An SQL/JSON null is distinct
  from the SQL null value.
• An SQL/JSON array, defined as an ordered list of zero or more
  SQL/JSON items, called the SQL/JSON elements of the SQL/JSON
  array.
• An SQL/JSON object, defined as an unordered collection of zero or
  more SQL/JSON members, where an SQL/JSON member is a pair whose
  first value is a character string with character set Unicode and
  whose second value is an SQL/JSON item. The first value of an
  SQL/JSON member is called the key and the second value is called
  the bound value.

  — An SQL/JSON sequence is an ordered list of zero or more SQL/JSON
items.

--
Vik Fearing





Re: trying again to get incremental backup

2023-08-31 Thread David Steele

Hi Robert,

On 8/30/23 10:49, Robert Haas wrote:

In the limited time that I've had to work on this project lately, I've
been trying to come up with a test case for this feature -- and since
I've gotten completely stuck, I thought it might be time to post and
see if anyone else has a better idea. I thought a reasonable test case
would be: Do a full backup. Change some stuff. Do an incremental
backup. Restore both backups and perform replay to the same LSN. Then
compare the files on disk. But I cannot make this work. The first
problem I ran into was that replay of the full backup does a
restartpoint, while the replay of the incremental backup does not.
That results in, for example, pg_subtrans having different contents.


pg_subtrans, at least, can be ignored since it is excluded from the 
backup and not required for recovery.



I'm not sure whether it can also result in data files having different
contents: are changes that we replayed following the last restartpoint
guaranteed to end up on disk when the server is shut down? It wasn't
clear to me that this is the case. I thought maybe I could get both
servers to perform a restartpoint at the same location by shutting
down the primary and then replaying through the shutdown checkpoint,
but that doesn't work because the primary doesn't finish archiving
before shutting down. After some more fiddling I settled (at least for
research purposes) on having the restored backups PITR and promote,
instead of PITR and pause, so that we're guaranteed a checkpoint. But
that just caused me to run into a far worse problem: replay on the
standby doesn't actually create a state that is byte-for-byte
identical to the one that exists on the primary. I quickly discovered
that in my test case, I was ending up with different contents in the
"hole" of a block wherein a tuple got updated. Replay doesn't think
it's important to make the hole end up with the same contents on all
machines that replay the WAL, so I end up with one server that has
more junk in there than the other one and the tests fail.


This is pretty much what I discovered when investigating backup from 
standby back in 2016. My (ultimately unsuccessful) efforts to find a 
clean delta resulted in [1] as I systematically excluded directories 
that are not required for recovery and will not be synced between a 
primary and standby.


FWIW Heikki also made similar attempts at this before me (back then I 
found the thread but I doubt I could find it again) and arrived at 
similar results. We discussed this in person and figured out that we had 
come to more or less the same conclusion. Welcome to the club!



Unless someone has a brilliant idea that I lack, this suggests to me
that this whole line of testing is a dead end. I can, of course, write
tests that compare clusters *logically* -- do the correct relations
exist, are they accessible, do they have the right contents? But I
feel like it would be easy to have bugs that escape detection in such
a test but would be detected by a physical comparison of the clusters.


Agreed, though a matching logical result is still very compelling.


However, such a comparison can only be conducted if either (a) there's
some way to set up the test so that byte-for-byte identical clusters
can be expected or (b) there's some way to perform the comparison that
can distinguish between expected, harmless differences and unexpected,
problematic differences. And at the moment my conclusion is that
neither (a) nor (b) exists. Does anyone think otherwise?


I do not. My conclusion back then was that validating a physical 
comparison would be nearly impossible without changes to Postgres to 
make the primary and standby match via replication. Which, by the way, I 
still think would be a great idea. In principle, at least. Replay is 
already a major bottleneck and anything that makes it slower will likely 
not be very popular.


This would also be great for WAL, since last time I tested the same WAL 
segment can be different between the primary and standby because the 
unused (and recycled) portion at the end is not zeroed as it is on the 
primary (but logically they do match). I would be very happy if somebody 
told me that my info is out of date here and this has been fixed. But 
when I looked at the code it was incredibly tricky to do this because of 
how WAL is replicated.



Meanwhile, here's a rebased set of patches. The somewhat-primitive
attempts at writing tests are in 0009, but they don't work, for the
reasons explained above. I think I'd probably like to go ahead and
commit 0001 and 0002 soon if there are no objections, since I think
those are good refactorings independently of the rest of this.


No objections to 0001/0002.

Regards,
-David

[1] 
http://git.postgresql.org/pg/commitdiff/6ad8ac6026287e3ccbc4d606b6ab6116ccc0eec8





Re: Eliminate redundant tuple visibility check in vacuum

2023-08-31 Thread Melanie Plageman
On Thu, Aug 31, 2023 at 5:39 AM David Geier  wrote:
> Regarding the 2nd patch (disclaimer: I'm not too familiar with that area
> of the code): I don't completely understand why the retry loop is not
> needed anymore and how you now detect/handle the possible race
> condition? It can still happen that an aborting transaction changes the
> state of a row after heap_page_prune() looked at that row. Would that
> case now not be ignored?

Thanks for asking. I've updated the comment in the code and the commit
message about this, as it seems important to be clear.

Any inserting transaction which aborts after heap_page_prune()'s
visibility check will now be of no concern to lazy_scan_prune(). Since
we don't do the visibility check again, we won't find the tuple
HEAPTUPLE_DEAD and thus won't have the problem of adding the tuple to
the array of tuples for vacuum to reap. This does mean that we won't
reap and remove tuples whose inserting transactions abort right after
heap_page_prune()'s visibility check. But, this doesn't seem like an
issue. They may not be removed until the next vacuum, but ISTM it is
actually worse to pay the cost of re-pruning the whole page just to
clean up that one tuple. Maybe if that renders the page all visible
and we can mark it as such in the visibility map -- but that seems
like a relatively narrow use case.

- Melanie




Re: Eliminate redundant tuple visibility check in vacuum

2023-08-31 Thread Melanie Plageman
On Thu, Aug 31, 2023 at 2:03 PM Robert Haas  wrote:
>
> I have a few suggestions:
>
> - Rather than removing the rather large comment block at the top of
> lazy_scan_prune() I'd like to see it rewritten to explain how we now
> deal with the problem. I'd suggest leaving the first paragraph ("Prior
> to...") just as it is and replace all the words following "The
> approach we take now is" with a description of the approach that this
> patch takes to the problem.

Good idea. I've updated the comment. I also explain why this new
approach works in the commit message and reference the commit which
added the previous approach.

> - I'm not a huge fan of the caller of heap_page_prune() having to know
> how to initialize the PruneResult. Can heap_page_prune() itself do
> that work, so that presult is an out parameter rather than an in-out
> parameter? Or, if not, can it be moved to a new helper function, like
> heap_page_prune_init(), rather than having that logic in 2+ places?

Ah, yes. Now that it has two callers, and since it is exclusively an
output parameter, it is quite ugly to initialize it in both callers.
Fixed in the attached.

> - int ndeleted,
> - nnewlpdead;
> -
> - ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
> -limited_ts, &nnewlpdead, NULL);
> + int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
> +limited_ts, &presult, NULL);
>
> - I don't particularly like merging the declaration with the
> assignment unless the call is narrow enough that we don't need a line
> break in there, which is not the case here.

I have changed this.

> - I haven't thoroughly investigated yet, but I wonder if there are any
> other places where comments need updating. As a general note, I find
> it desirable for a function's header comment to mention whether any
> pointer parameters are in parameters, out parameters, or in-out
> parameters, and what the contract between caller and callee actually
> is.

I've investigated vacuumlazy.c and pruneheap.c and looked at the
commit that added the retry loop (8523492d4e349) to see everywhere it
added comments and don't see anywhere else that needs updating.

I have updated lazy_scan_prune()'s function header comment to describe
the nature of the in-out and output parameters and the contract.

- Melanie
From 6ea7a2aa5698e08ce67d47efd3dbfb54be30d9cb Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Thu, 3 Aug 2023 15:08:58 -0400
Subject: [PATCH v2 1/2] Rebrand LVPagePruneState as PruneResult

With this rename, and relocation to heapam.h, we can use PruneResult as
an output parameter for on-access pruning as well. It also makes it
harder to confuse with PruneState.

We'll be adding and removing PruneResult fields in future commits, but
this rename and relocation is separate to make the diff easier to
understand.

Discussion: https://postgr.es/m/CAAKRu_br124qsGJieuYA0nGjywEukhK1dKBfRdby_4yY3E9SXA%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 122 ---
 src/include/access/heapam.h  |  19 +
 src/tools/pgindent/typedefs.list |   2 +-
 3 files changed, 76 insertions(+), 67 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6a41ee635d..5e0a7422bb 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -212,24 +212,6 @@ typedef struct LVRelState
 	int64		missed_dead_tuples; /* # removable, but not removed */
 } LVRelState;
 
-/*
- * State returned by lazy_scan_prune()
- */
-typedef struct LVPagePruneState
-{
-	bool		hastup;			/* Page prevents rel truncation? */
-	bool		has_lpdead_items;	/* includes existing LP_DEAD items */
-
-	/*
-	 * State describes the proper VM bit states to set for the page following
-	 * pruning and freezing.  all_visible implies !has_lpdead_items, but don't
-	 * trust all_frozen result unless all_visible is also set to true.
-	 */
-	bool		all_visible;	/* Every item visible to all? */
-	bool		all_frozen;		/* provided all_visible is also true */
-	TransactionId visibility_cutoff_xid;	/* For recovery conflicts */
-} LVPagePruneState;
-
 /* Struct for saving and restoring vacuum error information. */
 typedef struct LVSavedErrInfo
 {
@@ -250,7 +232,7 @@ static bool lazy_scan_new_or_empty(LVRelState *vacrel, Buffer buf,
    bool sharelock, Buffer vmbuffer);
 static void lazy_scan_prune(LVRelState *vacrel, Buffer buf,
 			BlockNumber blkno, Page page,
-			LVPagePruneState *prunestate);
+			PruneResult *presult);
 static bool lazy_scan_noprune(LVRelState *vacrel, Buffer buf,
 			  BlockNumber blkno, Page page,
 			  bool *hastup, bool *recordfreespace);
@@ -854,7 +836,7 @@ lazy_scan_heap(LVRelState *vacrel)
 		Buffer		buf;
 		Page		page;
 		bool		all_visible_according_to_vm;
-		LVPagePruneState prunestate;
+		PruneResult presult;
 
 		if (blkno == next_unskippable_block)
 		{
@@ -1019,12 +1001,12 @@ lazy_scan_heap(LVRelState *vacrel)
 		 * wer

Show inline comments from pg_hba lines in the pg_hba_file_rules view

2023-08-31 Thread Jim Jones

Hi,

Often we make changes in the pg_hba.conf and leave a #comment there, 
just in case we forget why the change was done. To avoid having to open 
the configuration file every time just to check the comments, it would 
be quite nice to have the option to read these comments in the 
pg_hba_file_rules view. Something like adding it in the end of the line 
and wrapping it with characters like "", '', {}, [], etc


For instance, this pg_hba.conf ...

# TYPE  DATABASE    USER   ADDRESS METHOD
local   all all    trust [foo]
host    all all    127.0.0.1/32    trust
host    all all    ::1/128 trust [bar]
local   replication all    trust
host    replication all    127.0.0.1/32    trust
hostssl replication all    ::1/128 cert map=abc [this will 
fail :)]


... could be displayed like this

postgres=# SELECT type, database, user_name, address, comment, error 
FROM pg_hba_file_rules ;

  type   |   database    | user_name |  address  | comment  | error
-+---+---+---+---+-
 local   | {all} | {all} |   | foo   |
 host    | {all} | {all} | 127.0.0.1 |   |
 host    | {all} | {all} | ::1   | bar   |
 local   | {replication} | {all} | |   |
 host    | {replication} | {all} | 127.0.0.1 |   |
 hostssl | {replication} | {all} | ::1   | this will fail :) | 
hostssl record cannot match because SSL is disabled

(6 rows)

I wrote a very quick&dirty PoC (attached) but before going any further I 
would like to ask if there is a better way to read these comments using 
SQL - or if it makes sense at all ;-)


Any feedback is much appreciated. Thanks!

Jim
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index ac602bfc37..b15c913e14 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -211,6 +211,15 @@ next_token(char **lineptr, StringInfo buf,
 	while (c != '\0' &&
 		   (!pg_isblank(c) || in_quote))
 	{
+
+		if (c == '[' && !in_quote)
+		{
+			appendStringInfoChar(buf, c);
+			while ((c = (*(*lineptr)++)) != '\0')
+appendStringInfoChar(buf, c);
+			break;
+		}
+
 		/* skip comments to EOL */
 		if (c == '#' && !in_quote)
 		{
@@ -1861,6 +1870,15 @@ parse_hba_line(TokenizedAuthLine *tok_line, int elevel)
 
 			str = pstrdup(token->string);
 			val = strchr(str, '=');
+
+			if(str[0]=='[' && str[strlen(str)-1]==']')
+			{
+str = str + 1 ;
+str[strlen(str)-1]='\0';
+parsedline->comments = str;
+continue;
+			}
+
 			if (val == NULL)
 			{
 /*
diff --git a/src/backend/utils/adt/hbafuncs.c b/src/backend/utils/adt/hbafuncs.c
index 73d3ad1dad..389c14bc2e 100644
--- a/src/backend/utils/adt/hbafuncs.c
+++ b/src/backend/utils/adt/hbafuncs.c
@@ -159,7 +159,7 @@ get_hba_options(HbaLine *hba)
 }
 
 /* Number of columns in pg_hba_file_rules view */
-#define NUM_PG_HBA_FILE_RULES_ATTS	 11
+#define NUM_PG_HBA_FILE_RULES_ATTS	 12
 
 /*
  * fill_hba_line
@@ -346,6 +346,12 @@ fill_hba_line(Tuplestorestate *tuple_store, TupleDesc tupdesc,
 			values[index++] = PointerGetDatum(options);
 		else
 			nulls[index++] = true;
+
+		/* comments */
+		if(hba->comments)
+			values[index++] = CStringGetTextDatum(hba->comments);
+		else
+			nulls[index++] = true;
 	}
 	else
 	{
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 9805bc6118..360f71e8ef 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -6244,9 +6244,9 @@
 { oid => '3401', descr => 'show pg_hba.conf rules',
   proname => 'pg_hba_file_rules', prorows => '1000', proretset => 't',
   provolatile => 'v', prorettype => 'record', proargtypes => '',
-  proallargtypes => '{int4,text,int4,text,_text,_text,text,text,text,_text,text}',
-  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o}',
-  proargnames => '{rule_number,file_name,line_number,type,database,user_name,address,netmask,auth_method,options,error}',
+  proallargtypes => '{int4,text,int4,text,_text,_text,text,text,text,_text,text,text}',
+  proargmodes => '{o,o,o,o,o,o,o,o,o,o,o,o}',
+  proargnames => '{rule_number,file_name,line_number,type,database,user_name,address,netmask,auth_method,options,comment,error}',
   prosrc => 'pg_hba_file_rules' },
 { oid => '6250', descr => 'show pg_ident.conf mappings',
   proname => 'pg_ident_file_mappings', prorows => '1000', proretset => 't',
diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index 189f6d0df2..5cb7224712 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -135,6 +135,7 @@ typedef struct HbaLine
 	char	   *radiusidentifiers_s;
 	List	   *radiusports;
 	char	   *radiusports_s;
+	char	   *comments;
 } HbaLine;
 
 typedef struct IdentLine


Re: Reducing connection overhead in pg_upgrade compat check phase

2023-08-31 Thread Daniel Gustafsson
> On 12 Jul 2023, at 01:36, Nathan Bossart  wrote:
> 
> On Wed, Jul 12, 2023 at 12:43:14AM +0200, Daniel Gustafsson wrote:
>> I did have coffee before now, but only found time to actually address this 
>> now
>> so here is a v7 with just that change and a fresh rebase.
> 
> Thanks.  I think the patch is in decent shape.

Due to ENOTENOUGHTIME it bitrotted a bit, so here is a v8 rebase which I really
hope to close in this CF.

--
Daniel Gustafsson



v8-0001-pg_upgrade-run-all-data-type-checks-per-connectio.patch
Description: Binary data


Re: Statistics Import and Export

2023-08-31 Thread Corey Huinker
>
>
> Thanks. I think this may be used with postgres_fdw to import
> statistics directly from the foreigns server, whenever possible,
> rather than fetching the rows and building it locally. If it's known
> that the stats on foreign and local servers match for a foreign table,
> we will be one step closer to accurately estimating the cost of a
> foreign plan locally rather than through EXPLAIN.
>
>
Yeah, that use makes sense as well, and if so then postgres_fdw would
likely need to be aware of the appropriate query for several versions back
- they change, not by much, but they do change. So now we'd have each query
text in three places: a system view, postgres_fdw, and the bin/scripts
pre-upgrade program. So I probably should consider the best way to share
those in the codebase.


Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Joe Conway

On 8/31/23 12:52, Jeff Davis wrote:

On Thu, 2023-08-31 at 10:59 +0530, Ashutosh Bapat wrote:

The server's FDW has to be postgres_fdw. So we have to handle the
awkward dependency between core and postgres_fdw (an extension).


That sounds more than just "awkward". I can't think of any precedent
for that and it seems to violate the idea of an "extension" entirely.

Can you explain more concretely how we might resolve that?



Maybe move postgres_fdw to be a first class built in feature instead of 
an extension?


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





Re: Document efficient self-joins / UPDATE LIMIT techniques.

2023-08-31 Thread Corey Huinker
On Wed, Jun 28, 2023 at 2:20 PM Corey Huinker 
wrote:

> This patch adds a few examples to demonstrate the following:
>

Bumping so CF app can see thread.

>


Suppressing compiler warning on Debian 12/gcc 12.2.0

2023-08-31 Thread Bruce Momjian
Being a new user of Debian 12/gcc 12.2.0, I wrote the following shell
script to conditionally add gmake rules with compiler flags to
src/Makefile.custom to suppress warnings for certain files.  This allows
me to compile all supported Postgres releases without warnings.

I actually didn't how simple it was to add per-file compile flags until
I read:


https://stackoverflow.com/questions/6546162/how-to-add-different-rules-for-specific-files

---

# PG 14+ uses configure.ac
if [ ! -e configure.in ] || grep -q 'AC_INIT(\[PostgreSQL\], \[13\.' 
configure.in
thencat >> src/Makefile.custom > src/Makefile.custom 

Re: UUID v7

2023-08-31 Thread Andrey M. Borodin
Thanks for interesting ideas, Mat!

> On 31 Aug 2023, at 20:32, Mat Arye  wrote:
> 
> From a user perspective, it would be great to add 2 things:
> - A function to extract the timestamp from a V7 UUID (very useful for 
> defining constraints if partitioning by the uuid-embedded timestamps, for 
> instance).

Well, as far as I know, RFC discourages extracting timestamps from UUIDs. But 
we still can have such functions...maybe as an extension?

> - Can we add an optional timestamptz argument to gen_uuid_v7 so that you can 
> explicitly specify a time instead of always generating for the current time? 
> If the argument is NULL, then use current time. This could be useful for 
> backfilling and other applications.

I think this makes sense. We could also have a counter as an argument. I'll try 
to implement that.
However, so far I haven't figured out how to implement optional arguments for 
catalog functions. I'd appreciate any pointers here.


Best regards, Andrey Borodin.



Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Dagfinn Ilmari Mannsåker
Peter Eisentraut  writes:

> On 31.08.23 16:10, Ranier Vilela wrote:
>> Em qui., 31 de ago. de 2023 às 09:51, Andrew Dunstan
>> mailto:and...@dunslane.net>> escreveu:
>> 
>> On 2023-08-31 Th 07:41, John Naylor wrote:
>>>
>>> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela >> > wrote:
>>> >
>>> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
>>> mailto:mich...@paquier.xyz>> escreveu:
>>> >>
>>> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>>> >> > cstring_to_text has a small overhead, because call strlen for
>>> >> > pointer to char parameter.
>>> >> >
>>> >> > Is it worth the effort to avoid this, where do we know the
>>> size of the
>>> >> > parameter?
>>> >>
>>> >> Are there workloads where this matters?
>>> >
>>> > None, but note this change has the same spirit of 8b26769bc.
>>>
>>> - return cstring_to_text("");
>>> + return cstring_to_text_with_len("", 0);
>>>
>>> This looks worse, so we'd better be getting something in return.
>> 
>> I agree this is a bit ugly. I wonder if we'd be better off creating
>> a function that returned an empty text value, so we'd just avoid
>> converting the empty cstring altogether and say:
>>    return empty_text();
>> Hi,
>> Thanks for the suggestion,  I agreed.
>> New patch is attached.
>
> I think these patches make the code uniformly uglier and harder to
> understand.
>
> If a performance benefit could be demonstrated, then making
> cstring_to_text() an inline function could be sensible.  But I wouldn't 
> go beyond that.

I haven't benchmarked it yet, but here's a patch that inlines them and
changes callers of cstring_to_text_with_len() with a aliteral string and
constant length to cstring_to_text().

On an x86-64 Linux build (meson with -Dbuildtype=debugoptimized
-Dcassert=true), the inlining increases the size of the text section of
the postgres binary from 9719722 bytes to 9750557, i.e. an increase of
30KiB or 0.3%, while the change to cstring_to_text() makes zero
difference (as expected from my investigation).

- ilmari

>From e332e5229dafd94e44ad8608c5fa2d3c58d80e0e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= 
Date: Thu, 31 Aug 2023 19:11:55 +0100
Subject: [PATCH] Inline cstring_to_text(_with_len) functions

This lets the compiler optimise out the strlen() and memcpy() calls
when they are called with a literal string or constant length.

Thus, convert cstring_to_text_with_length("literal", 7) calls to plain
cstring_to_text("literal") to avoid having to count string lenghts
manually.
---
 contrib/btree_gin/btree_gin.c |  2 +-
 contrib/hstore/hstore_io.c|  4 ++--
 src/backend/utils/adt/json.c  |  4 ++--
 src/backend/utils/adt/jsonfuncs.c |  4 ++--
 src/backend/utils/adt/varlena.c   | 32 +
 src/include/utils/builtins.h  | 34 +--
 6 files changed, 40 insertions(+), 40 deletions(-)

diff --git a/contrib/btree_gin/btree_gin.c b/contrib/btree_gin/btree_gin.c
index c50d68ce18..59db475791 100644
--- a/contrib/btree_gin/btree_gin.c
+++ b/contrib/btree_gin/btree_gin.c
@@ -347,7 +347,7 @@ GIN_SUPPORT(cidr, true, leftmostvalue_inet, network_cmp)
 static Datum
 leftmostvalue_text(void)
 {
-	return PointerGetDatum(cstring_to_text_with_len("", 0));
+	return PointerGetDatum(cstring_to_text(""));
 }
 
 GIN_SUPPORT(text, true, leftmostvalue_text, bttextcmp)
diff --git a/contrib/hstore/hstore_io.c b/contrib/hstore/hstore_io.c
index 999ddad76d..7e3b52fbef 100644
--- a/contrib/hstore/hstore_io.c
+++ b/contrib/hstore/hstore_io.c
@@ -1347,7 +1347,7 @@ hstore_to_json_loose(PG_FUNCTION_ARGS)
 dst;
 
 	if (count == 0)
-		PG_RETURN_TEXT_P(cstring_to_text_with_len("{}", 2));
+		PG_RETURN_TEXT_P(cstring_to_text("{}"));
 
 	initStringInfo(&tmp);
 	initStringInfo(&dst);
@@ -1402,7 +1402,7 @@ hstore_to_json(PG_FUNCTION_ARGS)
 dst;
 
 	if (count == 0)
-		PG_RETURN_TEXT_P(cstring_to_text_with_len("{}", 2));
+		PG_RETURN_TEXT_P(cstring_to_text("{}"));
 
 	initStringInfo(&tmp);
 	initStringInfo(&dst);
diff --git a/src/backend/utils/adt/json.c b/src/backend/utils/adt/json.c
index 2c620809b2..cd2b494259 100644
--- a/src/backend/utils/adt/json.c
+++ b/src/backend/utils/adt/json.c
@@ -1290,7 +1290,7 @@ json_build_object(PG_FUNCTION_ARGS)
 Datum
 json_build_object_noargs(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_TEXT_P(cstring_to_text_with_len("{}", 2));
+	PG_RETURN_TEXT_P(cstring_to_text("{}"));
 }
 
 Datum
@@ -1346,7 +1346,7 @@ json_build_array(PG_FUNCTION_ARGS)
 Datum
 json_build_array_noargs(PG_FUNCTION_ARGS)
 {
-	PG_RETURN_TEXT_P(cstring_to_text_with_len("[]", 2));
+	PG_RETURN_TEXT_P(cstring_to_text("[]"));
 }
 
 /*
diff --git a/src/backend/utils/adt/jsonfuncs.c b/src/backend/utils/adt/jsonfuncs.c
index a4bfa5e404..b8845635ac 100644
--- a/src/backend/utils/adt/jsonfuncs.c
+++ b/src/backend/utils/adt/jsonfuncs.c
@@ -1800,8 +1800,8 @@ JsonbVal

Re: Improve heapgetpage() performance, overhead from serializable

2023-08-31 Thread Muhammad Malik
Hi,

Is there a plan to merge this patch in PG16?

Thanks,
Muhammad


From: Andres Freund 
Sent: Saturday, July 15, 2023 6:56 PM
To: pgsql-hack...@postgresql.org 
Cc: Thomas Munro 
Subject: Improve heapgetpage() performance, overhead from serializable

Hi,

Several loops which are important for query performance, like heapgetpage()'s
loop over all tuples, have to call functions like
HeapCheckForSerializableConflictOut() and PredicateLockTID() in every
iteration.

When serializable is not in use, all those functions do is to to return. But
being situated in a different translation unit, the compiler can't inline
(without LTO at least) the check whether serializability is needed.  It's not
just the function call overhead that's noticable, it's also that registers
have to be spilled to the stack / reloaded from memory etc.

On a freshly loaded pgbench scale 100, with turbo mode disabled, postgres
pinned to one core. Parallel workers disabled to reduce noise.  All times are
the average of 15 executions with pgbench, in a newly started, but prewarmed
postgres.

SELECT * FROM pgbench_accounts OFFSET 1000;
HEAD:
397.977

removing the HeapCheckForSerializableConflictOut() from heapgetpage()
(incorrect!), to establish the baseline of what serializable costs:
336.695

pulling out CheckForSerializableConflictOutNeeded() from
HeapCheckForSerializableConflictOut() in heapgetpage(), and avoiding calling
HeapCheckForSerializableConflictOut() in the loop:
339.742

moving the loop into a static inline function, marked as pg_always_inline,
called with static arguments for always_visible, check_serializable:
326.546

marking the always_visible, !check_serializable case likely():
322.249

removing TestForOldSnapshot() calls, which we pretty much already decided on:
312.987


FWIW, there's more we can do, with some hacky changes I got the time down to
273.261, but the tradeoffs start to be a bit more complicated. And 397->320ms
for something as core as this, is imo worth considering on its own.




Now, this just affects the sequential scan case. heap_hot_search_buffer()
shares many of the same pathologies.  I find it a bit harder to improve,
because the compiler's code generation seems to switch between good / bad with
changes that seems unrelated...


I wonder why we haven't used PageIsAllVisible() in heap_hot_search_buffer() so
far?


Greetings,

Andres Freund


Re: Eliminate redundant tuple visibility check in vacuum

2023-08-31 Thread Robert Haas
On Mon, Aug 28, 2023 at 7:49 PM Melanie Plageman
 wrote:
> While working on a set of patches to combine the freeze and visibility
> map WAL records into the prune record, I wrote the attached patches
> reusing the tuple visibility information collected in heap_page_prune()
> back in lazy_scan_prune().
>
> heap_page_prune() collects the HTSV_Result for every tuple on a page
> and saves it in an array used by heap_prune_chain(). If we make that
> array available to lazy_scan_prune(), it can use it when collecting
> stats for vacuum and determining whether or not to freeze tuples.
> This avoids calling HeapTupleSatisfiesVacuum() again on every tuple in
> the page.
>
> It also gets rid of the retry loop in lazy_scan_prune().

In general, I like these patches. I think it's a clever approach, and
I don't really see any down side. It should just be straight-up better
than what we have now, and if it's not better, it still shouldn't be
any worse.

I have a few suggestions:

- Rather than removing the rather large comment block at the top of
lazy_scan_prune() I'd like to see it rewritten to explain how we now
deal with the problem. I'd suggest leaving the first paragraph ("Prior
to...") just as it is and replace all the words following "The
approach we take now is" with a description of the approach that this
patch takes to the problem.

- I'm not a huge fan of the caller of heap_page_prune() having to know
how to initialize the PruneResult. Can heap_page_prune() itself do
that work, so that presult is an out parameter rather than an in-out
parameter? Or, if not, can it be moved to a new helper function, like
heap_page_prune_init(), rather than having that logic in 2+ places?

- int ndeleted,
- nnewlpdead;
-
- ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
-limited_ts, &nnewlpdead, NULL);
+ int ndeleted = heap_page_prune(relation, buffer, vistest, limited_xmin,
+limited_ts, &presult, NULL);

- I don't particularly like merging the declaration with the
assignment unless the call is narrow enough that we don't need a line
break in there, which is not the case here.

- I haven't thoroughly investigated yet, but I wonder if there are any
other places where comments need updating. As a general note, I find
it desirable for a function's header comment to mention whether any
pointer parameters are in parameters, out parameters, or in-out
parameters, and what the contract between caller and callee actually
is.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Use virtual tuple slot for Unique node

2023-08-31 Thread Денис Смирнов
Again the new patch hasn't been attached to the thread, so resend it.


v3-use-virtual-slots-for-unique-node.patch
Description: Binary data


Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Jeff Davis
On Thu, 2023-08-31 at 08:37 -0400, Robert Haas wrote:
> What I feel is kind of weird about this syntax is that it seems like
> it's entangled with the FDW mechanism but doesn't really overlap with
> it.

I like the fact that it works with user mappings and benefits from the
other thinking that's gone into that system. I would call that a
"feature" not an "entanglement".

>  You could have something that is completely separate (CREATE
> SUBSCRIPTION CONNECTION)

I thought about that but it would be a new object type with a new
catalog and I didn't really see an upside. It would open up questions
about permissions, raw string vs individual options, whether we need
user mappings or not, etc., and those have all been worked out already
with foreign servers.

>  or something that truly does have some
> overlap (no new syntax and a dummy fdw, as Tom proposes, or somehow
> knowing that postgres_fdw is special, as Ashutosh proposes).

I ran into a (perhaps very minor?) challenge[1] with the dummy FDW:

https://www.postgresql.org/message-id/c47e8ba923bf0a13671f7d8230a81d465c21fb04.ca...@j-davis.com

suggestions welcome there, of course.

Regarding core code depending on postgres_fdw: how would that work?
Would that be acceptable?

>  But this
> seems like sort of an odd middle ground.

I assume here that you're talking about the CREATE SERVER ... FOR
CONNECTION ONLY syntax. I don't think it's odd. We have lots of objects
that are a lot like another object but treated differently for various
reasons. A foreign table is an obvious example.

> I also think that the decision to make pg_create_connection a member
> of pg_create_subscription by default, but encouraging users to think
> about revoking it, is kind of strange. I don't think we really want
> to
> encourage users to tinker with predefined roles in this kind of way.
> I
> think there are better ways of achieving the goals here.

Such as?

Regards,
Jeff Davis





Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Dagfinn Ilmari Mannsåker
Ranier Vilela  writes:

> Em qui., 31 de ago. de 2023 às 12:12, Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> escreveu:
>
>> Ranier Vilela  writes:
>>
>> > Em qui., 31 de ago. de 2023 às 10:12, Dagfinn Ilmari Mannsåker <
>> > ilm...@ilmari.org> escreveu:
>> >
>> >> Andrew Dunstan  writes:
>> >>
>> >> > On 2023-08-31 Th 07:41, John Naylor wrote:
>> >> >>
>> >> >> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela 
>> >> wrote:
>> >> >> >
>> >> >> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
>> >> >>  escreveu:
>> >> >> >>
>> >> >> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> >> >> >> > cstring_to_text has a small overhead, because call strlen for
>> >> >> >> > pointer to char parameter.
>> >> >> >> >
>> >> >> >> > Is it worth the effort to avoid this, where do we know the size
>> >> >> of the
>> >> >> >> > parameter?
>> >> >> >>
>> >> >> >> Are there workloads where this matters?
>> >> >> >
>> >> >> > None, but note this change has the same spirit of 8b26769bc.
>> >> >>
>> >> >> - return cstring_to_text("");
>> >> >> + return cstring_to_text_with_len("", 0);
>> >> >>
>> >> >> This looks worse, so we'd better be getting something in return.
>> >> >
>> >> >
>> >> > I agree this is a bit ugly. I wonder if we'd be better off creating a
>> >> > function that returned an empty text value, so we'd just avoid
>> >> > converting the empty cstring altogether and say:
>> >> >
>> >> >   return empty_text();
>> >>
>> >> Or we could generalise it for any string literal (of which there are
>> >> slightly more¹ non-empty than empty in calls to
>> >> cstring_to_text(_with_len)):
>> >>
>> >> #define literal_to_text(str) cstring_to_text_with_len("" str "",
>> >> sizeof(str)-1)
>> >>
>> > I do not agree, I think this will get worse.
>>
>> How exactly will it get worse?  It's exactly equivalent to
>> cstring_to_text_with_len("", 0), since sizeof() is a compile-time
>> construct, and the string concatenation makes it fail if the argument is
>> not a literal string.
>>
> I think that concatenation makes the strings twice bigger, doesn't it?

No, it's just taking advantage of the fact that C string literals can be
split into multiple pieces separated by whitespace (like in SQL, but
without requiring a newline between them).  E.g. "foo" "bar" is exactly
equivalent to "foobar" after parsing.  The reason to use it in the macro
is to make it a syntax error if the argument is not a literal string but
instead a string variable, becuause in that case the sizeof() would
return the size of the pointer, not the string.

- ilmari




Re: Use virtual tuple slot for Unique node

2023-08-31 Thread Denis Smirnov
I have made a small research and found out that though the patch itself is correct (i.e. we can benefit from replacing TTSOpsMinimalTuple with TTSOpsVirtual for the Unique node), my explanation WHY was wrong.1. We always materialize the new unique tuple in the slot, never mind what type of tuple table slots do we use.2. But the virtual tuple materialization (tts_virtual_copyslot) have performance benefits over the minimal tuple one (tts_minimal_copyslot):    - tts_minimal_copyslot always allocates zeroed memory with palloc0 (expensive according to the flame graph);    - tts_minimal_copyslot() doesn’t allocate additional memory if the tuples are constructed from the passed by value column (but for the variable-size columns we still need memory allocation);    - if tts_minimal_copyslot() need allocation it doesn’t need to zero the memory;So as a result we seriously benefit from virtual TTS for the tuples constructed from the fixed-sized columns when get a Unique node in the plan.commit 148642d81f046b7d72b3a40182c165e42a8ab6d7
Author: Denis Smirnov 
Date:   Thu Aug 31 08:51:14 2023 +0700

Change tuple table slot for Unique node to "virtual"

The Unique node uses minimal TTS implementation to copy the unique
tuples from the sorted stream into the resulting tuple slot. But if
we replace the minimal TTS with the virtual TTS copy method, the
performance improves.

1. Minimal TTS always allocates zeroed memory for the materialized
   tuple.
2. Virtual TTS doesn't allocate additional memory for the tuples
   with the columns passed by value. For the columns with external
   memory we don't need to zero the bytes but can simply take the
   memory chunk from the free list "as is".

diff --git a/src/backend/executor/nodeUnique.c 
b/src/backend/executor/nodeUnique.c
index 45035d74fa..c859add6e0 100644
--- a/src/backend/executor/nodeUnique.c
+++ b/src/backend/executor/nodeUnique.c
@@ -141,7 +141,7 @@ ExecInitUnique(Unique *node, EState *estate, int eflags)
 * Initialize result slot and type. Unique nodes do no projections, so
 * initialize projection info for this node appropriately.
 */
-   ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsMinimalTuple);
+   ExecInitResultTupleSlotTL(&uniquestate->ps, &TTSOpsVirtual);
uniquestate->ps.ps_ProjInfo = NULL;
 
/*
31 авг. 2023 г., в 10:28, Denis Smirnov  написал(а):It looks like my patch was not analyzed by the hackers mailing list due to incorrect mime type, so I duplicate it here.31 авг. 2023 г., в 10:06, Denis Smirnov  написал(а):

Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Ranier Vilela
Em qui., 31 de ago. de 2023 às 12:12, Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> escreveu:

> Ranier Vilela  writes:
>
> > Em qui., 31 de ago. de 2023 às 10:12, Dagfinn Ilmari Mannsåker <
> > ilm...@ilmari.org> escreveu:
> >
> >> Andrew Dunstan  writes:
> >>
> >> > On 2023-08-31 Th 07:41, John Naylor wrote:
> >> >>
> >> >> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela 
> >> wrote:
> >> >> >
> >> >> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
> >> >>  escreveu:
> >> >> >>
> >> >> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
> >> >> >> > cstring_to_text has a small overhead, because call strlen for
> >> >> >> > pointer to char parameter.
> >> >> >> >
> >> >> >> > Is it worth the effort to avoid this, where do we know the size
> >> >> of the
> >> >> >> > parameter?
> >> >> >>
> >> >> >> Are there workloads where this matters?
> >> >> >
> >> >> > None, but note this change has the same spirit of 8b26769bc.
> >> >>
> >> >> - return cstring_to_text("");
> >> >> + return cstring_to_text_with_len("", 0);
> >> >>
> >> >> This looks worse, so we'd better be getting something in return.
> >> >
> >> >
> >> > I agree this is a bit ugly. I wonder if we'd be better off creating a
> >> > function that returned an empty text value, so we'd just avoid
> >> > converting the empty cstring altogether and say:
> >> >
> >> >   return empty_text();
> >>
> >> Or we could generalise it for any string literal (of which there are
> >> slightly more¹ non-empty than empty in calls to
> >> cstring_to_text(_with_len)):
> >>
> >> #define literal_to_text(str) cstring_to_text_with_len("" str "",
> >> sizeof(str)-1)
> >>
> > I do not agree, I think this will get worse.
>
> How exactly will it get worse?  It's exactly equivalent to
> cstring_to_text_with_len("", 0), since sizeof() is a compile-time
> construct, and the string concatenation makes it fail if the argument is
> not a literal string.
>
I think that concatenation makes the strings twice bigger, doesn't it?


>
> Whether we want an even-more-optimised version for an empty text value
> is another matter, but I doubt it'd be worth it.  Another option would
> be to make cstring_to_text(_with_len) static inline functions, which
> lets the compiler eliminate the memcpy() call when len == 0.
>

> In fact, after playing around a bit (https://godbolt.org/z/x51aYGadh),
> it seems like GCC, Clang and MSVC all eliminate the strlen() and
> memcpy() calls for cstring_to_text("") under -O2 if it's static inline.
>
In that case, it seems to me that would be good too.
Compilers removing memcpy would have the same as empty_text.
Without the burden of a new function and all its future maintenance.

best regards,
Ranier Vilela


Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Jeff Davis
On Thu, 2023-08-31 at 10:59 +0530, Ashutosh Bapat wrote:
> The server's FDW has to be postgres_fdw. So we have to handle the
> awkward dependency between core and postgres_fdw (an extension).

That sounds more than just "awkward". I can't think of any precedent
for that and it seems to violate the idea of an "extension" entirely.

Can you explain more concretely how we might resolve that?

Regards,
Jeff Davis





Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Jeff Davis
On Wed, 2023-08-30 at 09:09 -0700, Jeff Davis wrote:
> Admittedly, I didn't complete the dummy-FDW approach, so perhaps it
> works out better overall. I can give it a try.

We need to hide the dummy FDW from pg_dump. And we need to hide it from
psql's \dew, because that's used in tests and prints the owner's name,
and the bootstrap superuser doesn't have a consistent name. But I
didn't find a good way to hide it because it doesn't have a schema.

The best I could come up with is special-casing by the name, but that
seems like a pretty bad hack. For other built-in objects, psql is
willing to print them out if you just specify something like "\dT
pg_catalog.*", but that wouldn't work here. We could maybe do something
based on the "pg_" prefix, but we'd have to retroactively restrict FDWs
with that prefix, which sounds like a bad idea.

Suggestions?

Regards,
Jeff Davis





Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
> I was definitely hand-waving additional implementation here for
> non-native 128 bit support; the modulus algorithm as presented
> requires 4 times the space as the divisor, so a uint16 implementation
> should work for all 64-bit machines.  Certainly open to other ideas or
> implementations, this was the one I was able to find initially.  If
> the 16bit approach is all that is needed in practice we can also see
> about narrowing the domain and not worry about making this a
> general-purpose function.

Here's a patch atop the series which converts to 16-bit uints and
passes regressions, but I don't consider well-vetted at this point.

David


0001-Switch-to-using-only-16-bit-to-avoid-128-bit-require.patch
Description: Binary data


Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
> + * pg_fastmod - calculates the modulus of a 32-bit number against a constant
> + * divisor without using the division operator
> + */
> +static inline uint32 pg_fastmod(uint32 n, uint32 divisor, uint64 fastinv)
> +{
> +#ifdef HAVE_INT128
> + uint64_t lowbits = fastinv * n;
> + return ((uint128)lowbits * divisor) >> 64;
> +#else
> + return n % divisor;
> +#endif
> +}
>
> Requiring 128-bit arithmetic to avoid serious regression is a non-starter as 
> written. Software that relies on fast 128-bit multiplication has to do 
> backflips to get that working on multiple platforms. But I'm not sure it's 
> necessary -- if the max block number is UINT32_MAX and max block size is 
> UINT16_MAX, can't we just use 64-bit multiplication?

I was definitely hand-waving additional implementation here for
non-native 128 bit support; the modulus algorithm as presented
requires 4 times the space as the divisor, so a uint16 implementation
should work for all 64-bit machines.  Certainly open to other ideas or
implementations, this was the one I was able to find initially.  If
the 16bit approach is all that is needed in practice we can also see
about narrowing the domain and not worry about making this a
general-purpose function.

Best,

David




Re: Initdb-time block size specification

2023-08-31 Thread John Naylor
On Thu, Aug 31, 2023 at 8:51 AM David Christensen <
david.christen...@crunchydata.com> wrote:

> 0005 - utility functions for fast div/mod operations; basically
> montgomery multiplication

+/*
+ * pg_fastmod - calculates the modulus of a 32-bit number against a
constant
+ * divisor without using the division operator
+ */
+static inline uint32 pg_fastmod(uint32 n, uint32 divisor, uint64 fastinv)
+{
+#ifdef HAVE_INT128
+ uint64_t lowbits = fastinv * n;
+ return ((uint128)lowbits * divisor) >> 64;
+#else
+ return n % divisor;
+#endif
+}

Requiring 128-bit arithmetic to avoid serious regression is a non-starter
as written. Software that relies on fast 128-bit multiplication has to do
backflips to get that working on multiple platforms. But I'm not sure it's
necessary -- if the max block number is UINT32_MAX and max block size is
UINT16_MAX, can't we just use 64-bit multiplication?

--
John Naylor
EDB: http://www.enterprisedb.com


Re: should frontend tools use syncfs() ?

2023-08-31 Thread Nathan Bossart
On Thu, Aug 31, 2023 at 02:30:33PM +0900, Michael Paquier wrote:
> - Should we have some regression tests?  We should only need one test
> in one of the binaries to be able to stress the new code paths of
> file_utils.c with syncfs.   The cheapest one may be pg_dump with a
> dump in directory format?  Note that we have tests there that depend
> on lz4 or gzip existing, which are conditional.

I added one for initdb in v8.

> - Perhaps 0002 should be split into two parts?  The first patch could
> introduce DataDirSyncMethod in file_utils.h with the new routines in
> file_utils.h (including syncfs support), and the second patch would
> plug the new option to all the binaries.  In the first patch, I would
> hardcode DATA_DIR_SYNC_METHOD_FSYNC.

Ha, I was just thinking about this, too.  I actually split it into 3
patches.  The first adds DataDirSyncMethod and uses it for
recovery_init_sync_method.  The second adds syncfs() support in
file_utils.c.  And the third adds the ability to specify syncfs in the
frontend utilities.  WDYT?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From e627b8ea4dcc01ac9e2fe3a21e3b3cc109b815e4 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Mon, 21 Aug 2023 19:05:14 -0700
Subject: [PATCH v8 1/4] move PG_TEMP_FILE* macros to file_utils.h

---
 src/backend/backup/basebackup.c |  2 +-
 src/backend/postmaster/postmaster.c |  1 +
 src/backend/storage/file/fileset.c  |  1 +
 src/bin/pg_checksums/pg_checksums.c | 10 --
 src/bin/pg_rewind/filemap.c |  2 +-
 src/include/common/file_utils.h |  4 
 src/include/storage/fd.h|  4 
 7 files changed, 8 insertions(+), 16 deletions(-)

diff --git a/src/backend/backup/basebackup.c b/src/backend/backup/basebackup.c
index 45be21131c..5d66014499 100644
--- a/src/backend/backup/basebackup.c
+++ b/src/backend/backup/basebackup.c
@@ -25,6 +25,7 @@
 #include "commands/defrem.h"
 #include "common/compression.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "lib/stringinfo.h"
 #include "miscadmin.h"
 #include "nodes/pg_list.h"
@@ -37,7 +38,6 @@
 #include "storage/bufpage.h"
 #include "storage/checksum.h"
 #include "storage/dsm_impl.h"
-#include "storage/fd.h"
 #include "storage/ipc.h"
 #include "storage/reinit.h"
 #include "utils/builtins.h"
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index d7bfb28ff3..54e9bfb8c4 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -94,6 +94,7 @@
 #include "access/xlogrecovery.h"
 #include "catalog/pg_control.h"
 #include "common/file_perm.h"
+#include "common/file_utils.h"
 #include "common/ip.h"
 #include "common/pg_prng.h"
 #include "common/string.h"
diff --git a/src/backend/storage/file/fileset.c b/src/backend/storage/file/fileset.c
index e9951b0269..84cae32548 100644
--- a/src/backend/storage/file/fileset.c
+++ b/src/backend/storage/file/fileset.c
@@ -25,6 +25,7 @@
 
 #include "catalog/pg_tablespace.h"
 #include "commands/tablespace.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "miscadmin.h"
 #include "storage/ipc.h"
diff --git a/src/bin/pg_checksums/pg_checksums.c b/src/bin/pg_checksums/pg_checksums.c
index 19eb67e485..9011a19b4e 100644
--- a/src/bin/pg_checksums/pg_checksums.c
+++ b/src/bin/pg_checksums/pg_checksums.c
@@ -52,16 +52,6 @@ typedef enum
 	PG_MODE_ENABLE
 } PgChecksumMode;
 
-/*
- * Filename components.
- *
- * XXX: fd.h is not declared here as frontend side code is not able to
- * interact with the backend-side definitions for the various fsync
- * wrappers.
- */
-#define PG_TEMP_FILES_DIR "pgsql_tmp"
-#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
-
 static PgChecksumMode mode = PG_MODE_CHECK;
 
 static const char *progname;
diff --git a/src/bin/pg_rewind/filemap.c b/src/bin/pg_rewind/filemap.c
index bd5c598e20..58280d9abc 100644
--- a/src/bin/pg_rewind/filemap.c
+++ b/src/bin/pg_rewind/filemap.c
@@ -27,12 +27,12 @@
 #include 
 
 #include "catalog/pg_tablespace_d.h"
+#include "common/file_utils.h"
 #include "common/hashfn.h"
 #include "common/string.h"
 #include "datapagemap.h"
 #include "filemap.h"
 #include "pg_rewind.h"
-#include "storage/fd.h"
 
 /*
  * Define a hash table which we can use to store information about the files
diff --git a/src/include/common/file_utils.h b/src/include/common/file_utils.h
index b7efa1226d..dd1532bcb0 100644
--- a/src/include/common/file_utils.h
+++ b/src/include/common/file_utils.h
@@ -46,4 +46,8 @@ extern ssize_t pg_pwritev_with_retry(int fd,
 
 extern ssize_t pg_pwrite_zeros(int fd, size_t size, off_t offset);
 
+/* Filename components */
+#define PG_TEMP_FILES_DIR "pgsql_tmp"
+#define PG_TEMP_FILE_PREFIX "pgsql_tmp"
+
 #endif			/* FILE_UTILS_H */
diff --git a/src/include/storage/fd.h b/src/include/storage/fd.h
index 6791a406fc..aea30c0622 100644
--- a/src/include/storage/fd.h
+++ b/src/include/storage/fd.h
@@ -195,8 +195,4 @@ extern int	durable_unlink(const char *fname,

Re: UUID v7

2023-08-31 Thread Mat Arye
Andrey,

Thanks for all your work on this. I think this will be really useful.

>From a user perspective, it would be great to add 2 things:
- A function to extract the timestamp from a V7 UUID (very useful for
defining constraints if partitioning by the uuid-embedded timestamps, for
instance).
- Can we add an optional timestamptz argument to gen_uuid_v7 so that you
can explicitly specify a time instead of always generating for the current
time? If the argument is NULL, then use current time. This could be useful
for backfilling and other applications.

Thanks,
Matvey Arye
Timescale software developer.


On Wed, Aug 30, 2023 at 3:05 PM Andrey M. Borodin 
wrote:

>
>
> > On 21 Aug 2023, at 13:42, Andrey M. Borodin 
> wrote:
> >
> >
> 
>
> FPA attached next version.
> Changes:
> - implemented protection from time leap backwards when series is generated
> on the same backend
> - counter overflow is now translated into ms step forward
>
>
> Best regards, Andrey Borodin.
>


Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Peter Eisentraut

On 31.08.23 16:10, Ranier Vilela wrote:
Em qui., 31 de ago. de 2023 às 09:51, Andrew Dunstan 
mailto:and...@dunslane.net>> escreveu:



On 2023-08-31 Th 07:41, John Naylor wrote:


On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela mailto:ranier...@gmail.com>> wrote:
>
> Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
mailto:mich...@paquier.xyz>> escreveu:
>>
>> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> > cstring_to_text has a small overhead, because call strlen for
>> > pointer to char parameter.
>> >
>> > Is it worth the effort to avoid this, where do we know the
size of the
>> > parameter?
>>
>> Are there workloads where this matters?
>
> None, but note this change has the same spirit of 8b26769bc.

- return cstring_to_text("");
+ return cstring_to_text_with_len("", 0);

This looks worse, so we'd better be getting something in return.



I agree this is a bit ugly. I wonder if we'd be better off creating
a function that returned an empty text value, so we'd just avoid
converting the empty cstring altogether and say:

   return empty_text();

Hi,
Thanks for the suggestion,  I agreed.

New patch is attached.


I think these patches make the code uniformly uglier and harder to 
understand.


If a performance benefit could be demonstrated, then making 
cstring_to_text() an inline function could be sensible.  But I wouldn't 
go beyond that.





sandboxing untrusted code

2023-08-31 Thread Robert Haas
On Mon, Feb 27, 2023 at 7:37 PM Jeff Davis  wrote:
> On Mon, 2023-02-27 at 16:13 -0500, Robert Haas wrote:
> > On Mon, Feb 27, 2023 at 1:25 PM Jeff Davis  wrote:
> > > I think you are saying that we should still run Alice's code with
> > > the
> > > privileges of Bob, but somehow make that safe(r) for Bob. Is that
> > > right?
> >
> > Yeah. That's the idea I was floating, at least.
>
> Isn't that a hard problem; maybe impossible?

I want to flesh out the ideas I previously articulated in this area a bit more.

As a refresher, the scenario I'm talking about is any one in which one
user, who I'll call Bob, does something that results in executing code
provided by another user, who I'll call Alice. The most obvious way
that this can happen is if Bob performs some operation that targets a
table owned by Alice. That operation might be DML, like an INSERT or
UPDATE; or it might be some other kind of maintenance command that can
cause code execution, like REINDEX, which can evaluate index
expressions. The code being executed might be run either as Alice or
as Bob, depending on how it's been attached to the table and what
operation is being performed and maybe whether some function or
procedure that might contain it is SECURITY INVOKER or SECURITY
DEFINER. Regardless of the details, our concern is that Alice's code
might do something that Bob does not like. This is a particularly
lively concern if the code happens to be running with the privileges
of Bob, because then Alice might try to do something like access
objects for which Bob has permissions and Alice does not. But the
problems don't completely go away if the code is being run as Alice,
because even then, Alice could try to manipulate the session state in
some way that will cause Bob to hose himself later on. The existing
SECURITY_RESTRICTED_OPERATION flag defends against some scenarios of
this type, but at present we also rely heavily on Bob being *very*
careful, as Jeff has highlighted rather compellingly.

I think we can do better, both in the case where Bob is running code
provided by Alice using his own permissions, and also in the case
where Bob is running code provided by Alice using Alice's permissions.
To that end, I'd like to define a few terms. First, let's define the
provider of a piece of code as either (a) the owner of the function or
procedure that contains it or (b) the owner of the object to which
it's directly attached or (c) the session user, for code directly
entered at top level. For example, if Alice owns a table T1 and
applies a default expression which uses a function provided by
Charlie, and Bob then inserts into T1, then Bob provides the insert
statement, Alice provides the default expression, and Charlie provides
the code inside the function. I assert that in every context where
PostgreSQL evaluates expressions or runs SQL statements, there's a
well-defined provider for the expression or statement, and we can make
the system track it if we want to. Second, I'd like to define trust.
Users trust themselves, and they also trust users who have a superset
of their permissions, a category that most typically just includes
superusers but could include others if role grants are in use. A user
can also declare through some mechanism or other that they trust
another user even if that other user does not have a superset of their
permissions. Such a declaration carries the risk that the trusted user
could hijack the trusting user's permissions; we would document and
disclaim this risk.

Finally, let's define sandboxing. When code is sandboxed, the set of
operations that it is allowed to perform is restricted. Sandboxing
isn't necessarily all or nothing; there can be different categories of
operations and we can allow some and deny others, if we wish.
Obviously this is quite a bit of work to implement, but I don't think
it's unmanageable. YMMV. To keep things simple for purposes of
discussion, I'm going to just define two levels of sandboxing for the
moment; I think we might want more. If code is fully sandboxed, it can
only do the following things:

1. Compute stuff. There's no restriction on the permissible amount of
compute; if you call untrusted code, nothing prevents it from running
forever.
2. Call other code. This may be done by a function call or a command
such as CALL or DO, all subject to the usual permissions checks but no
further restrictions.
3. Access the current session state, without modifying it. For
example, executing SHOW or current_setting() is fine.
4. Transiently modify the current session state in ways that are
necessarily reversed before returning to the caller. For example, an
EXCEPTION block or a configuration change driven by proconfig is fine.
5. Produce messages at any log level. This includes any kind of ERROR.

Fully sandboxed code can't access or modify data beyond what gets
passed to it, with the exception of the session state mentioned above.
This includes data inside of PostgreSQL, like tables or statistics, 

Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Dagfinn Ilmari Mannsåker
Ranier Vilela  writes:

> Em qui., 31 de ago. de 2023 às 10:12, Dagfinn Ilmari Mannsåker <
> ilm...@ilmari.org> escreveu:
>
>> Andrew Dunstan  writes:
>>
>> > On 2023-08-31 Th 07:41, John Naylor wrote:
>> >>
>> >> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela 
>> wrote:
>> >> >
>> >> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
>> >>  escreveu:
>> >> >>
>> >> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> >> >> > cstring_to_text has a small overhead, because call strlen for
>> >> >> > pointer to char parameter.
>> >> >> >
>> >> >> > Is it worth the effort to avoid this, where do we know the size
>> >> of the
>> >> >> > parameter?
>> >> >>
>> >> >> Are there workloads where this matters?
>> >> >
>> >> > None, but note this change has the same spirit of 8b26769bc.
>> >>
>> >> - return cstring_to_text("");
>> >> + return cstring_to_text_with_len("", 0);
>> >>
>> >> This looks worse, so we'd better be getting something in return.
>> >
>> >
>> > I agree this is a bit ugly. I wonder if we'd be better off creating a
>> > function that returned an empty text value, so we'd just avoid
>> > converting the empty cstring altogether and say:
>> >
>> >   return empty_text();
>>
>> Or we could generalise it for any string literal (of which there are
>> slightly more¹ non-empty than empty in calls to
>> cstring_to_text(_with_len)):
>>
>> #define literal_to_text(str) cstring_to_text_with_len("" str "",
>> sizeof(str)-1)
>>
> I do not agree, I think this will get worse.

How exactly will it get worse?  It's exactly equivalent to
cstring_to_text_with_len("", 0), since sizeof() is a compile-time
construct, and the string concatenation makes it fail if the argument is
not a literal string.

Whether we want an even-more-optimised version for an empty text value
is another matter, but I doubt it'd be worth it.  Another option would
be to make cstring_to_text(_with_len) static inline functions, which
lets the compiler eliminate the memcpy() call when len == 0.

In fact, after playing around a bit (https://godbolt.org/z/x51aYGadh),
it seems like GCC, Clang and MSVC all eliminate the strlen() and
memcpy() calls for cstring_to_text("") under -O2 if it's static inline.

- ilmari




RE: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Hayato Kuroda (Fujitsu)
Dear Dilip,

Thanks for giving comments!

> Some comments in 0002
> 
> 1.
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> 
> What is the reason we are ignoring temporary slots here?  I think we
> better explain in the comments.

The temporary slots were expressly ignored while checking because such slots
cannot exist after the upgrade. Before doing pg_upgrade, both old and new 
cluster
must be turned off, and they start/stop several times during the upgrade.

How do you think?

> 2.
> + res = executeQueryOrDie(conn, "SELECT slot_name "
> + "FROM pg_catalog.pg_replication_slots "
> + "WHERE slot_type = 'logical' AND "
> + "temporary IS FALSE;");
> +
> + if (PQntuples(res))
> + pg_fatal("New cluster must not have logical replication slots but
> found \"%s\"",
> + PQgetvalue(res, 0, 0));
> 
> It looks a bit odd to me that first it is fetching all the logical
> slots from the new cluster and then printing the name of one of the
> slots.  If it is printing the name of the slots then shouldn't it be
> printing all the slots' names or it should just say that there
> existing slots on the new cluster without giving any names?  And if we
> are planning for option 2 i.e. not printing the name then better to
> put LIMIT 1 at the end of the query.

I'm planning to change that the number of slots are reported by using count(*).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Ranier Vilela
Em qui., 31 de ago. de 2023 às 10:12, Dagfinn Ilmari Mannsåker <
ilm...@ilmari.org> escreveu:

> Andrew Dunstan  writes:
>
> > On 2023-08-31 Th 07:41, John Naylor wrote:
> >>
> >> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela 
> wrote:
> >> >
> >> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
> >>  escreveu:
> >> >>
> >> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
> >> >> > cstring_to_text has a small overhead, because call strlen for
> >> >> > pointer to char parameter.
> >> >> >
> >> >> > Is it worth the effort to avoid this, where do we know the size
> >> of the
> >> >> > parameter?
> >> >>
> >> >> Are there workloads where this matters?
> >> >
> >> > None, but note this change has the same spirit of 8b26769bc.
> >>
> >> - return cstring_to_text("");
> >> + return cstring_to_text_with_len("", 0);
> >>
> >> This looks worse, so we'd better be getting something in return.
> >
> >
> > I agree this is a bit ugly. I wonder if we'd be better off creating a
> > function that returned an empty text value, so we'd just avoid
> > converting the empty cstring altogether and say:
> >
> >   return empty_text();
>
> Or we could generalise it for any string literal (of which there are
> slightly more¹ non-empty than empty in calls to
> cstring_to_text(_with_len)):
>
> #define literal_to_text(str) cstring_to_text_with_len("" str "",
> sizeof(str)-1)
>
I do not agree, I think this will get worse.

best regards,
Ranier Vilela


Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Ranier Vilela
Em qui., 31 de ago. de 2023 às 09:51, Andrew Dunstan 
escreveu:

>
> On 2023-08-31 Th 07:41, John Naylor wrote:
>
>
> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela  wrote:
> >
> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier <
> mich...@paquier.xyz> escreveu:
> >>
> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
> >> > cstring_to_text has a small overhead, because call strlen for
> >> > pointer to char parameter.
> >> >
> >> > Is it worth the effort to avoid this, where do we know the size of the
> >> > parameter?
> >>
> >> Are there workloads where this matters?
> >
> > None, but note this change has the same spirit of 8b26769bc.
>
> - return cstring_to_text("");
> + return cstring_to_text_with_len("", 0);
>
> This looks worse, so we'd better be getting something in return.
>
>
> I agree this is a bit ugly. I wonder if we'd be better off creating a
> function that returned an empty text value, so we'd just avoid converting
> the empty cstring altogether and say:
>
>   return empty_text();
>
Hi,
Thanks for the suggestion,  I agreed.

New patch is attached.

best regards,
Ranier Vilela

>


0001-Replace-cstring_to_text-with-empty_text.patch
Description: Binary data


0002-Avoid-unecessary-calls-to-strlen-function.patch
Description: Binary data


Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Ranier Vilela
Em qui., 31 de ago. de 2023 às 08:41, John Naylor <
john.nay...@enterprisedb.com> escreveu:

>
> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela  wrote:
> >
> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier <
> mich...@paquier.xyz> escreveu:
> >>
> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
> >> > cstring_to_text has a small overhead, because call strlen for
> >> > pointer to char parameter.
> >> >
> >> > Is it worth the effort to avoid this, where do we know the size of the
> >> > parameter?
> >>
> >> Are there workloads where this matters?
> >
> > None, but note this change has the same spirit of 8b26769bc.
>
> - return cstring_to_text("");
> + return cstring_to_text_with_len("", 0);
>
> This looks worse, so we'd better be getting something in return.
>
Per suggestion by Andrew Dustan, I provided a new function.

>
> @@ -360,7 +360,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS)
>   sourcepath)));
>   targetpath[rllen] = '\0';
>
> - PG_RETURN_TEXT_P(cstring_to_text(targetpath));
> + PG_RETURN_TEXT_P(cstring_to_text_with_len(targetpath, rllen));
>
> This could be a worthwhile cosmetic improvement if the nul-terminator (and
> space reserved for it, and comment explaining that) is taken out as well,
> but the patch didn't bother to do that.
>
Thanks for the tip.

Please see a new version of the patch in the Andrew Dunstan, reply.

best regards,
Ranier Vilela


Re: persist logical slots to disk during shutdown checkpoint

2023-08-31 Thread Amit Kapila
On Thu, Aug 31, 2023 at 6:12 PM Ashutosh Bapat
 wrote:
>
> On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  wrote:
> > > > > I
> > > > > think we should shut down subscriber, restart publisher and then make 
> > > > > this
> > > > > check based on the contents of the replication slot instead of server 
> > > > > log.
> > > > > Shutting down subscriber will ensure that the subscriber won't send 
> > > > > any new
> > > > > confirmed flush location to the publisher after restart.
> > > > >
> > > >
> > > > But if we shutdown the subscriber before the publisher there is no
> > > > guarantee that the publisher has sent all outstanding logs up to the
> > > > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > > > can only be there if we do a clean shutdown of the publisher before
> > > > the subscriber.
> > >
> > > So the sequence is shutdown publisher node, shutdown subscriber node,
> > > start publisher node and carry out the checks.
> > >
> >
> > This can probably work but I still prefer the current approach as that
> > will be closer to the ideal values on the disk instead of comparison
> > with a later in-memory value of confirmed_flush LSN. Ideally, if we
> > would have a tool like pg_replslotdata which can read the on-disk
> > state of slots that would be better but missing that, the current one
> > sounds like the next best possibility. Do you see any problem with the
> > current approach of test?
>
> > +  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> > reading WAL from ([A-F0-9]+\/[A-F0-9]+)./
>
> I don't think the LSN reported in this message is guaranteed to be the
> confirmed_flush LSN of the slot. It's usually confirmed_flush but not
> always. It's the LSN that snapshot builder computes based on factors
> including confirmed_flush. There's a chance that this test will fail
> sometimes because  of this behaviour.
>

I think I am missing something here because as per my understanding,
the LOG referred by the test is generated in CreateDecodingContext()
before which we shouldn't be changing the slot's confirmed_flush LSN.
The LOG [1] refers to the slot's persistent value for confirmed_flush,
so how it could be different from what the test is expecting.

[1]
errdetail("Streaming transactions committing after %X/%X, reading WAL
from %X/%X.",
   LSN_FORMAT_ARGS(slot->data.confirmed_flush),


-- 
With Regards,
Amit Kapila.




Re: remaining sql/json patches

2023-08-31 Thread Erik Rijkers

Op 8/31/23 om 14:57 schreef Amit Langote:

Hello,

On Wed, Aug 16, 2023 at 1:27 PM Amit Langote  wrote:

I will post a new version after finishing working on a few other
improvements I am working on.


Sorry about the delay.  Here's a new version.


Hi,

While compiling the new set

[v12-0001-Support-soft-error-handling-during-CoerceViaIO-e.patch]
[v12-0002-SQL-JSON-query-functions.patch]
[v12-0003-JSON_TABLE.patch]
[v12-0004-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch]

gcc 13.2.0 is sputtering somewhat:

--
In function ‘transformJsonFuncExpr’,
inlined from ‘transformExprRecurse’ at parse_expr.c:374:13:
parse_expr.c:4362:13: warning: ‘contextItemExpr’ may be used 
uninitialized [-Wmaybe-uninitialized]

 4362 | if (exprType(contextItemExpr) != JSONBOID)
  | ^
parse_expr.c: In function ‘transformExprRecurse’:
parse_expr.c:4214:21: note: ‘contextItemExpr’ was declared here
 4214 | Node   *contextItemExpr;
  | ^~~
nodeFuncs.c: In function ‘exprSetCollation’:
nodeFuncs.c:1238:25: warning: this statement may fall through 
[-Wimplicit-fallthrough=]

 1238 | {
  | ^
nodeFuncs.c:1247:17: note: here
 1247 | case T_JsonCoercion:
  | ^~~~
--

Those looks pretty unimportant, but I thought I'd let you know.

Tests (check, check-world and my own) still run fine.

Thanks,

Erik Rijkers







I found out that llvmjit_expr.c additions have been broken all along,
I mean since I rewrote the JsonExpr evaluation code to use soft error
handling back in January or so.  For example, I had made CoerceiViaIO
evaluation code (EEOP_IOCOERCE ExprEvalStep) invoked by JsonCoercion
node's evaluation to pass an ErrorSaveContext to the type input
functions so that any errors result in returning NULL instead of
throwing the error.  Though the llvmjit_expr.c code was not modified
to do the same, so the SQL/JSON query functions would return wrong
results when JITed.  I have made many revisions to the JsonExpr
expression evaluation itself, not all of which were reflected in the
llvmjit_expr.c counterparts.   I've fixed all that in the attached.

I've broken the parts to teach the CoerceViaIO evaluation code to
handle errors softly into a separate patch attached as 0001.

Other notable changes in the SQL/JSON query functions patch (now 0002):

* Significantly rewrote the parser changes to make it a bit more
readable than before.  My main goal was to separate the code for each
JSON_EXISTS_OP, JSON_QUERY_OP, and JSON_VALUE_OP such that the
op-type-specific behaviors are more readily apparent by reading the
code.

* Got rid of JsonItemCoercions struct/node, which contained a
JsonCoercion field to store the coercion expressions for each JSON
item type that needs to be coerced to the RETURNING type, in favor of
using List of JsonCoercion nodes.  That resulted in simpler code in
many places, most notably in the executor / llvmjit_expr.c.






Re: Adding a pg_get_owned_sequence function?

2023-08-31 Thread Dagfinn Ilmari Mannsåker
On Fri, 9 Jun 2023, at 20:19, Dagfinn Ilmari Mannsåker wrote:

> Hi hackers,
>
> I've always been annoyed by the fact that pg_get_serial_sequence takes
> the table and returns the sequence as strings rather than regclass. And
> since identity columns were added, the name is misleading as well (which
> is even acknowledged in the docs, together with a suggestion for a
> better name).
> 
> So, instead of making excuses in the documentation, I thought why not
> add a new function which addresses all of these issues, and document the
> old one as a backward-compatibilty wrapper?
> 
> Please see the attached patch for my stab at this.

This didn't get any replies, so I've created a commitfest entry to make sure it 
doesn't get lost:

https://commitfest.postgresql.org/44/4535/

-- 
- ilmari




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Dilip Kumar
On Tue, Aug 29, 2023 at 5:28 PM Hayato Kuroda (Fujitsu)
 wrote:

Some comments in 0002

1.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");

What is the reason we are ignoring temporary slots here?  I think we
better explain in the comments.

2.
+ res = executeQueryOrDie(conn, "SELECT slot_name "
+ "FROM pg_catalog.pg_replication_slots "
+ "WHERE slot_type = 'logical' AND "
+ "temporary IS FALSE;");
+
+ if (PQntuples(res))
+ pg_fatal("New cluster must not have logical replication slots but
found \"%s\"",
+ PQgetvalue(res, 0, 0));

It looks a bit odd to me that first it is fetching all the logical
slots from the new cluster and then printing the name of one of the
slots.  If it is printing the name of the slots then shouldn't it be
printing all the slots' names or it should just say that there
existing slots on the new cluster without giving any names?  And if we
are planning for option 2 i.e. not printing the name then better to
put LIMIT 1 at the end of the query.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Dagfinn Ilmari Mannsåker
Andrew Dunstan  writes:

> On 2023-08-31 Th 07:41, John Naylor wrote:
>>
>> On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela  wrote:
>> >
>> > Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier
>>  escreveu:
>> >>
>> >> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> >> > cstring_to_text has a small overhead, because call strlen for
>> >> > pointer to char parameter.
>> >> >
>> >> > Is it worth the effort to avoid this, where do we know the size
>> of the
>> >> > parameter?
>> >>
>> >> Are there workloads where this matters?
>> >
>> > None, but note this change has the same spirit of 8b26769bc.
>>
>> - return cstring_to_text("");
>> + return cstring_to_text_with_len("", 0);
>>
>> This looks worse, so we'd better be getting something in return.
>
>
> I agree this is a bit ugly. I wonder if we'd be better off creating a
> function that returned an empty text value, so we'd just avoid 
> converting the empty cstring altogether and say:
>
>   return empty_text();

Or we could generalise it for any string literal (of which there are
slightly more¹ non-empty than empty in calls to
cstring_to_text(_with_len)):

#define literal_to_text(str) cstring_to_text_with_len("" str "", sizeof(str)-1)

[1]: 

~/src/postgresql $ rg 'cstring_to_text(_with_len)?\("[^"]+"' | wc -l
17
~/src/postgresql $ rg 'cstring_to_text(_with_len)?\(""' | wc -l
15

- ilmari




Re: Initdb-time block size specification

2023-08-31 Thread David Christensen
Enclosed are TPC-H results for 1GB shared_buffers, 64MB work_mem on a
64GB laptop with SSD storage; everything else is default settings.

TL;DR: unpatched version: 17.30 seconds, patched version: 17.15; there
are some slight variations in runtime, but seems to be within the
noise level at this point.


tpch.unpatched.rst
Description: Binary data


tpch.patched.rst
Description: Binary data


Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Andrew Dunstan


On 2023-08-31 Th 07:41, John Naylor wrote:


On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela  wrote:
>
> Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier 
 escreveu:

>>
>> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> > cstring_to_text has a small overhead, because call strlen for
>> > pointer to char parameter.
>> >
>> > Is it worth the effort to avoid this, where do we know the size 
of the

>> > parameter?
>>
>> Are there workloads where this matters?
>
> None, but note this change has the same spirit of 8b26769bc.

- return cstring_to_text("");
+ return cstring_to_text_with_len("", 0);

This looks worse, so we'd better be getting something in return.



I agree this is a bit ugly. I wonder if we'd be better off creating a 
function that returned an empty text value, so we'd just avoid 
converting the empty cstring altogether and say:


  return empty_text();


cheers


andrew



--
Andrew Dunstan
EDB:https://www.enterprisedb.com


Re: persist logical slots to disk during shutdown checkpoint

2023-08-31 Thread Ashutosh Bapat
On Thu, Aug 31, 2023 at 2:52 PM Amit Kapila  wrote:
> > > > I
> > > > think we should shut down subscriber, restart publisher and then make 
> > > > this
> > > > check based on the contents of the replication slot instead of server 
> > > > log.
> > > > Shutting down subscriber will ensure that the subscriber won't send any 
> > > > new
> > > > confirmed flush location to the publisher after restart.
> > > >
> > >
> > > But if we shutdown the subscriber before the publisher there is no
> > > guarantee that the publisher has sent all outstanding logs up to the
> > > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > > can only be there if we do a clean shutdown of the publisher before
> > > the subscriber.
> >
> > So the sequence is shutdown publisher node, shutdown subscriber node,
> > start publisher node and carry out the checks.
> >
>
> This can probably work but I still prefer the current approach as that
> will be closer to the ideal values on the disk instead of comparison
> with a later in-memory value of confirmed_flush LSN. Ideally, if we
> would have a tool like pg_replslotdata which can read the on-disk
> state of slots that would be better but missing that, the current one
> sounds like the next best possibility. Do you see any problem with the
> current approach of test?

> +  qr/Streaming transactions committing after ([A-F0-9]+\/[A-F0-9]+),
> reading WAL from ([A-F0-9]+\/[A-F0-9]+)./

I don't think the LSN reported in this message is guaranteed to be the
confirmed_flush LSN of the slot. It's usually confirmed_flush but not
always. It's the LSN that snapshot builder computes based on factors
including confirmed_flush. There's a chance that this test will fail
sometimes because  of this behaviour. Reading directly from
replication slot is better that this. pg_replslotdata might help if we
read replication slot content between shutdown and restart of
publisher.

>
> BTW, I think we can keep autovacuum = off for this test just to avoid
> any extra record generation even though that doesn't matter for the
> purpose of test.

Autovacuum is one thing, but we can't guarantee the absence of any
concurrent activity forever.

>
> > >
> > > > All the places which call ReplicationSlotSave() mark the slot as dirty. 
> > > >  All
> > > > the places where SaveSlotToPath() is called, the slot is marked dirty 
> > > > except
> > > > when calling from CheckPointReplicationSlots(). So I am wondering 
> > > > whether we
> > > > should be marking the slot dirty in CheckPointReplicationSlots() and 
> > > > avoid
> > > > passing down is_shutdown flag to SaveSlotToPath().
> > > >
> > >
> > > I feel that will add another spinlock acquire/release pair without
> > > much benefit. Sure, it may not be performance-sensitive but still
> > > adding another pair of lock/release doesn't seem like a better idea.
> >
> > We call ReplicatioinSlotMarkDirty() followed by ReplicationSlotSave()
> > at all the places, even those which are more frequent than this.
> >
>
> All but one. Normally, the idea of marking dirty is to indicate that
> we will actually write/flush the contents at a later point (except
> when required for correctness) as even indicated in the comments atop
> ReplicatioinSlotMarkDirty(). However, I see your point that we use
> that protocol at all the current places including CreateSlotOnDisk().
> So, we can probably do it here as well.

yes

I didn't see this entry in commitfest. Since we are discussing it and
the next CF is about to begin, probably it's good to add one there.
-- 
Best Wishes,
Ashutosh Bapat




Re: [17] CREATE SUBSCRIPTION ... SERVER

2023-08-31 Thread Robert Haas
On Wed, Aug 30, 2023 at 1:19 PM Jeff Davis  wrote:
> On Wed, 2023-08-30 at 09:49 -0400, Tom Lane wrote:
> > This seems like it requires a whole lot of new mechanism (parser
> > and catalog infrastructure) that could be done far more easily
> > in other ways.  In particular, how about inventing a built-in
> > dummy FDW to serve the purpose?
>
> That was my initial approach, but it was getting a bit messy.
>
> FDWs don't have a schema, so we can't put it in pg_catalog, and names
> beginning with "pg_" aren't restricted now. Should I retroactively
> restrict FDW names that begin with "pg_"? Or just use special cases in
> pg_dump and elsewhere? Also I didn't see a great place to document it.
>
> Admittedly, I didn't complete the dummy-FDW approach, so perhaps it
> works out better overall. I can give it a try.

What I feel is kind of weird about this syntax is that it seems like
it's entangled with the FDW mechanism but doesn't really overlap with
it. You could have something that is completely separate (CREATE
SUBSCRIPTION CONNECTION) or something that truly does have some
overlap (no new syntax and a dummy fdw, as Tom proposes, or somehow
knowing that postgres_fdw is special, as Ashutosh proposes). But this
seems like sort of an odd middle ground.

I also think that the decision to make pg_create_connection a member
of pg_create_subscription by default, but encouraging users to think
about revoking it, is kind of strange. I don't think we really want to
encourage users to tinker with predefined roles in this kind of way. I
think there are better ways of achieving the goals here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread John Naylor
On Thu, Aug 31, 2023 at 6:07 PM Ranier Vilela  wrote:
>
> Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier 
escreveu:
>>
>> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
>> > cstring_to_text has a small overhead, because call strlen for
>> > pointer to char parameter.
>> >
>> > Is it worth the effort to avoid this, where do we know the size of the
>> > parameter?
>>
>> Are there workloads where this matters?
>
> None, but note this change has the same spirit of 8b26769bc.

- return cstring_to_text("");
+ return cstring_to_text_with_len("", 0);

This looks worse, so we'd better be getting something in return.

@@ -360,7 +360,7 @@ pg_tablespace_location(PG_FUNCTION_ARGS)
  sourcepath)));
  targetpath[rllen] = '\0';

- PG_RETURN_TEXT_P(cstring_to_text(targetpath));
+ PG_RETURN_TEXT_P(cstring_to_text_with_len(targetpath, rllen));

This could be a worthwhile cosmetic improvement if the nul-terminator (and
space reserved for it, and comment explaining that) is taken out as well,
but the patch didn't bother to do that.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Should we use MemSet or {0} for struct initialization?

2023-08-31 Thread Junwang Zhao
On Thu, Aug 31, 2023 at 7:07 PM John Naylor
 wrote:
>
> > On Thu, Aug 31, 2023 at 5:34 PM Richard Guo  wrote:
> > >
> > > While working on a bug in expandRecordVariable() I noticed that in the
> > > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > > that there is nothing wrong with this, just cannot get away with the
> > > inconsistency inside the same function (sorry for the nitpicking).
> > >
> > > Do we have a preference for how to initialize structures?  From 9fd45870
> > > it seems that we prefer to {0}.  So here is a trivial patch doing that.
>
> It seems to have been deliberately left that way in the wake of that commit, 
> see:
>
> https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com
>
> (If so, it deserves a comment to keep people from trying to change it...)
>
> > > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > > can also be replaced with {0}, so include that in the patch too.
>
> I _believe_ that's harmless to change.
>
> On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao  wrote:
>
> > If the struct has padding or aligned, {0} only guarantee the struct
> > members initialized to 0, while memset sets the alignment/padding
> > to 0 as well, but since we will not access the alignment/padding, so
> > they give the same effect.
>
> See above -- if it's used as a hash key, for example, you must clear 
> everything.

Yeah, if memcmp was used as the key comparison function, there is a problem.

>
> > I bet {0} should be faster since there is no function call, but I'm not
> > 100% sure ;)
>
> Neither has a function call. MemSet is a PG macro. You're thinking of memset, 
> the libc library function, but a decent compiler can easily turn that into 
> something else for fixed-size inputs.

good to know, thanks.

>
> --
> John Naylor
> EDB: http://www.enterprisedb.com
>
>


-- 
Regards
Junwang Zhao




Re: Commitfest 2023-09 starts soon

2023-08-31 Thread Aleksander Alekseev
Hi,

> > There are a number of patches carried over from the PG16 development
> > cycle that have been in "Waiting on author" for several months.  I will
> > aggressively prune those after the start of this commitfest if there
> > hasn't been any author activity by then.
>
> The "64-bit TOAST value ID" [1] is one of such "Waiting on author"
> patches I've been reviewing. See the last 2-3 messages in the thread.
> I believe it's safe to mark it as RwF for now.
>
> [1]: https://commitfest.postgresql.org/44/4296/

This was the one that I could name off the top of my head.

1. Flush SLRU counters in checkpointer process
https://commitfest.postgresql.org/44/4120/

Similarly, I suggest marking it as RwF

2. Allow logical replication via inheritance root table
https://commitfest.postgresql.org/44/4225/

This one seems to be in active development. Changing status to "Needs
review" since it definitely could use more code review.

3. ResourceOwner refactoring
https://commitfest.postgresql.org/44/3982/

The patch is in good shape but requires actions from Heikki. I suggest
keeping it as is for now.

4. Move SLRU data into the regular buffer pool
https://commitfest.postgresql.org/44/3514/

Rotted one and for a long time. Suggestion: RwF

5. A minor adjustment to get_cheapest_path_for_pathkeys
https://commitfest.postgresql.org/44/4286/

Doesn't seem to be valuable. Suggestion: Rejected.

I will apply corresponding status changes if there will be no objections.

-- 
Best regards,
Aleksander Alekseev




Re: generate syscache info automatically

2023-08-31 Thread Peter Eisentraut
I have committed 0002 and 0003, and also a small bug fix in the 
ObjectProperty entry for "transforms".


I have also gotten the automatic generation of the ObjectProperty lookup 
table working (with some warts).


Attached is an updated patch set.

One win here is that the ObjectProperty lookup now uses a hash function 
instead of a linear search.  So hopefully the performance is better (to 
be confirmed) and it could now be used for things like [0].


[0]: 
https://www.postgresql.org/message-id/flat/12f1b1d2-f8cf-c4a2-72ec-441bd7954...@enterprisedb.com


There was some discussion about whether the catalog files are the right 
place to keep syscache information.  Personally, I would find that more 
convenient.  (Note that we also recently moved the definitions of 
indexes and toast tables from files with the whole list into the various 
catalog files.)  But we can also keep it somewhere else.  The important 
thing is that Catalog.pm can find it somewhere in a structured form.


To finish up the ObjectProperty generation, we also need to store some 
more data, namely the OBJECT_* mappings.  We probably do not want to 
store those in the catalog headers; that looks like a layering violation 
to me.


So, there is some discussion to be had about where to put all this 
across various use cases.



On 24.08.23 16:03, Peter Eisentraut wrote:

On 03.07.23 07:45, Peter Eisentraut wrote:
Here is an updated patch set that adjusts for the recently introduced 
named captures.  I will address the other issues later.  I think the 
first few patches in the series can be considered nonetheless.


I have committed the 0001 patch, which was really a (code comment) bug fix.

I think the patches 0002 and 0003 should be uncontroversial, so I'd like 
to commit them if no one objects.


The remaining patches are still WIP.



On 15.06.23 09:45, John Naylor wrote:
On Wed, May 31, 2023 at 4:58 AM Peter Eisentraut 
mailto:pe...@eisentraut.org>> wrote:

 >
 > I want to report on my on-the-plane-to-PGCon project.
 >
 > The idea was mentioned in [0]. genbki.pl  
already knows everything about

 > system catalog indexes.  If we add a "please also make a syscache for
 > this one" flag to the catalog metadata, we can have genbki.pl 
 produce

 > the tables in syscache.c and syscache.h automatically.
 >
 > Aside from avoiding the cumbersome editing of those tables, I 
think this

 > layout is also conceptually cleaner, as you can more easily see which
 > system catalog indexes have syscaches and maybe ask questions 
about why

 > or why not.

When this has come up before, one objection was that index 
declarations shouldn't know about cache names and bucket sizes [1]. 
The second paragraph above makes a reasonable case for that, however. 
I believe one alternative idea was for a script to read the enum, 
which would look something like this:


#define DECLARE_SYSCACHE(cacheid,indexname,numbuckets) cacheid

enum SysCacheIdentifier
{
DECLARE_SYSCACHE(AGGFNOID, pg_aggregate_fnoid_index, 16) = 0,
...
};

...which would then look up the other info in the usual way from 
Catalog.pm.


 > As a possible follow-up, I have also started work on generating the
 > ObjectProperty structure in objectaddress.c.  One of the things 
you need
 > for that is making genbki.pl  aware of the 
syscache information.  There

 > is some more work to be done there, but it's looking promising.

I haven't studied this, but it seems interesting.

One other possible improvement: syscache.c has a bunch of #include's, 
one for each catalog with a cache, so there's still a bit of manual 
work in adding a cache, and the current #include list is a bit 
cumbersome. Perhaps it's worth it to have the script emit them as well?


I also wonder if at some point it will make sense to split off a 
separate script(s) for some things that are unrelated to the 
bootstrap data. genbki.pl  is getting pretty large, 
and there are additional things that could be done with syscaches, 
e.g. inlined eq/hash functions for cache lookup [2].


[1] 
https://www.postgresql.org/message-id/12460.1570734...@sss.pgh.pa.us 

[2] 
https://www.postgresql.org/message-id/20210831205906.4wk3s4lvgzkdaqpi%40alap3.anarazel.de 


--
John Naylor
EDB: http://www.enterprisedb.com 




From 399b6add1e775083317b60f51dfd7e6c41209421 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 3 Jul 2023 07:39:25 +0200
Subject: [PATCH v3 1/3] Fill in more of ObjectProperty

Fill in .objtype field where an appropriate value exists.

These cases are currently not used (see also comments at
get_object_type()), but we might as well fill in what's possible in
case additional uses arise.

Discussion: 
https://www.postgresql.org/message-id/flat/75ae5875-3abc-dafc-8aec

Re: Commitfest 2023-09 starts soon

2023-08-31 Thread Aleksander Alekseev
Hi Peter,

> Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in
> less than 28 hours.
>
> If you have any patches you would like considered, be sure to add them
> in good time.
>
> All patch authors, and especially experienced hackers, are requested to
> make sure the patch status is up to date.  If the patch is still being
> worked on, it might not need to be in "Needs review".  Conversely, if
> you are hoping for a review but the status is "Waiting on author", then
> it will likely be ignored by reviewers.
>
> There are a number of patches carried over from the PG16 development
> cycle that have been in "Waiting on author" for several months.  I will
> aggressively prune those after the start of this commitfest if there
> hasn't been any author activity by then.

The "64-bit TOAST value ID" [1] is one of such "Waiting on author"
patches I've been reviewing. See the last 2-3 messages in the thread.
I believe it's safe to mark it as RwF for now.

[1]: https://commitfest.postgresql.org/44/4296/

-- 
Best regards,
Aleksander Alekseev




Re: pg_stat_get_backend_subxact() and backend IDs?

2023-08-31 Thread Dilip Kumar
On Thu, Aug 31, 2023 at 4:38 AM Nathan Bossart  wrote:
>
> On Wed, Aug 30, 2023 at 10:56:22AM -0400, Robert Haas wrote:
> > On Wed, Aug 30, 2023 at 10:27 AM Nathan Bossart
> >  wrote:
> >> I'm about to spend way too much time writing the commit message for 0002,
> >> but I plan to commit both patches sometime today.
> >
> > Thanks! I'm glad your committing the patches, and I approve of you
> > spending way too much time on the commit message. :-)
>
> Committed.

Sorry, I didn't notice this thread earlier.  The new behavior looks
better to me, thanks for working on it.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)

2023-08-31 Thread Thomas Munro
On Thu, Aug 31, 2023 at 2:32 PM Thomas Munro  wrote:
> On Thu, Aug 31, 2023 at 12:16 AM Tomas Vondra
>  wrote:
> > I have another case of this on dikkop (on 11 again). Is there anything
> > else we'd want to try? Or maybe someone would want access to the machine
> > and do some investigation directly?

Hmm.  No conclusion tonight but I think it's weird.  We have:

bsd@generic:/mnt/data/buildfarm $ ps x -O wchan | grep 52663
52663 select  -  Is   0:07.40 postgres: bsd regression [local] SELECT (postgres)
52731 select  -  Is   0:00.09 postgres: parallel worker for PID 52663
  (postgres)
52732 select  -  Is   0:00.06 postgres: parallel worker for PID 52663
  (postgres)
81525 piperd  0  S+   0:00.01 grep 52663

wchan=select means sleeping in poll()/select().

bsd@generic:/mnt/data/buildfarm $ procstat -i 52732 | grep USR1
52732 postgres USR1 P-C
bsd@generic:/mnt/data/buildfarm $ procstat -j 52732 | grep USR1
52732 100121 postgres USR1 --

We have a signal that is pending and not blocked, so I don't
immediately know why poll() hasn't returned control.




Re: Should we use MemSet or {0} for struct initialization?

2023-08-31 Thread John Naylor
> On Thu, Aug 31, 2023 at 5:34 PM Richard Guo 
wrote:
> >
> > While working on a bug in expandRecordVariable() I noticed that in the
> > switch statement for case RTE_SUBQUERY we initialize struct ParseState
> > with {0} while for case RTE_CTE we do that with MemSet.  I understand
> > that there is nothing wrong with this, just cannot get away with the
> > inconsistency inside the same function (sorry for the nitpicking).
> >
> > Do we have a preference for how to initialize structures?  From 9fd45870
> > it seems that we prefer to {0}.  So here is a trivial patch doing that.

It seems to have been deliberately left that way in the wake of that
commit, see:

https://www.postgresql.org/message-id/87d2e5f8-3c37-d185-4bbc-1de163ac4b10%40enterprisedb.com

(If so, it deserves a comment to keep people from trying to change it...)

> > And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> > can also be replaced with {0}, so include that in the patch too.

I _believe_ that's harmless to change.

On Thu, Aug 31, 2023 at 4:57 PM Junwang Zhao  wrote:

> If the struct has padding or aligned, {0} only guarantee the struct
> members initialized to 0, while memset sets the alignment/padding
> to 0 as well, but since we will not access the alignment/padding, so
> they give the same effect.

See above -- if it's used as a hash key, for example, you must clear
everything.

> I bet {0} should be faster since there is no function call, but I'm not
> 100% sure ;)

Neither has a function call. MemSet is a PG macro. You're thinking of
memset, the libc library function, but a decent compiler can easily turn
that into something else for fixed-size inputs.

--
John Naylor
EDB: http://www.enterprisedb.com


Re: Replace some cstring_to_text to cstring_to_text_with_len

2023-08-31 Thread Ranier Vilela
Em qui., 31 de ago. de 2023 às 00:22, Michael Paquier 
escreveu:

> On Wed, Aug 30, 2023 at 03:00:13PM -0300, Ranier Vilela wrote:
> > cstring_to_text has a small overhead, because call strlen for
> > pointer to char parameter.
> >
> > Is it worth the effort to avoid this, where do we know the size of the
> > parameter?
>
> Are there workloads where this matters?
>
None, but note this change has the same spirit of 8b26769bc

.

best regards,
Ranier Vilela


Re: cataloguing NOT NULL constraints

2023-08-31 Thread Alexander Lakhin

31.08.2023 13:26, Alvaro Herrera wrote:

Hmm, that's some weird code I left there all right.  Can you please try
this patch?  (Not final; I'll review it more completely later,
particularly to add this test case.)


Yes, your patch fixes the issue. I get the same error now:
ERROR:  column "a" in child table must be marked NOT NULL

Thank you!

Best regards,
Alexander




RE: Initial Schema Sync for Logical Replication

2023-08-31 Thread Kumar, Sachin
Hi Everyone, based on internal discussion with Masahiko
I have implemented concurrent DDL support for initial schema sync.

Concurrent Patch workflow

1. When TableSync worker creates a replicaton slot, It will
save the slot lsn into pg_subscription_rel with
SUBREL_SYNC_SCHEMA_DATA_SYNC state, and it will wait for
its state to be SUBREL_STATE_DATASYNC.

2. Applier process will apply DDLs till tablesync lsn, and then
it will change pg_subscription_rel state to SUBREL_STATE_DATASYNC.

3. TableSync will continue applying pending DML/DDls till it catch up.

This patch needs DDL replication to apply concurrent DDLs, I have cherry-
picked this DDL patch [0]

Issues
1) needs testing for concurrent DDLs, Not sure how to make tablesync process 
wait so that
concurrent DDLs can be issued on publisher.
2) In my testing created table does not appear on the same conenction on 
subscriber,
I have to reconnect to see table.
3) maybe different chars for SUBREL_SYNC_SCHEMA_DATA_INIT and 
SUBREL_SYNC_SCHEMA_DATA_SYNC,
currently they are 'x' and 'y'.
4) I need to add SUBREL_SYNC_SCHEMA_DATA_INIT and SUBREL_SYNC_SCHEMA_DATA_SYNC 
to
pg_subscription_rel_d.h to make it compile succesfully.
5) It only implement concurrent alter as of now

[0] = 
https://www.postgresql.org/message-id/OS0PR01MB57163E6487EFF7378CB8E17C9438A%40OS0PR01MB5716.jpnprd01.prod.outlook.com


ConcurrentDDL.patch
Description: ConcurrentDDL.patch


Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Amit Kapila
On Wed, Aug 30, 2023 at 10:58 AM Peter Smith  wrote:
>
> Here are some review comments for v28-0003.
>
> ==
> src/bin/pg_upgrade/check.c
>
> 1. check_and_dump_old_cluster
> + /*
> + * Logical replication slots can be migrated since PG17. See comments atop
> + * get_old_cluster_logical_slot_infos().
> + */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) >= 1700)
> + check_old_cluster_for_valid_slots(live_check);
> +
>
> IIUC we are preferring to use the <= 1600 style of version check
> instead of >= 1700 where possible.
>

Yeah, but in this case, following the nearby code style, I think it is
okay to keep it as it is.

> ~
>
> 3b.
> /Quick exit/Quick return/
>

Hmm, either way should be okay.

> ~
>
> 4.
> + prep_status("Checking for logical replication slots");
>
> I felt that should add the word "valid" like:
> "Checking for valid logical replication slots"
>

Agreed and fixed.

> ~~~
>
> 5.
> + /* Check there are no logical replication slots with a 'lost' state. */
> + res = executeQueryOrDie(conn,
> + "SELECT slot_name FROM pg_catalog.pg_replication_slots "
> + "WHERE wal_status = 'lost' AND "
> + "temporary IS FALSE;");
>
> Since the SQL is checking if there *are* lost slots I felt it would be
> more natural to reverse that comment.
>
> SUGGESTION
> /* Check and reject if there are any logical replication slots with a
> 'lost' state. */
>

I changed the comments but differently.

> ~~~
>
> 6.
> + /*
> + * Do additional checks if a live check is not required. This requires
> + * that confirmed_flush_lsn of all the slots is the same as the latest
> + * checkpoint location, but it would be satisfied only when the server has
> + * been shut down.
> + */
> + if (!live_check)
>
> I think the comment can be rearranged slightly:
>
> SUGGESTION
> Do additional checks to ensure that 'confirmed_flush_lsn' of all the
> slots is the same as the latest checkpoint location.
> Note: This can be satisfied only when the old_cluster has been shut
> down, so we skip this for "live" checks.
>

Changed as per suggestion.

> ==
> src/bin/pg_upgrade/controldata.c
>
> 7.
> + /*
> + * Read the latest checkpoint location if the cluster is PG17
> + * or later. This is used for upgrading logical replication
> + * slots.
> + */
> + if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
> + {
>
> Fetching this "Latest checkpoint location:" value is only needed for
> the check_old_cluster_for_valid_slots validation check, isn't it? But
> AFAICT this code is common for both old_cluster and new_cluster.
>
> I am not sure what is best to do:
> - Do only the minimal logic needed?
> - Read the value redundantly even for new_cluster just to keep code simpler?
>
> Either way, maybe the comment should say something about this.
>

Added the comment.

-- 
With Regards,
Amit Kapila.




Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-08-31 Thread Amit Kapila
On Wed, Aug 30, 2023 at 7:55 AM Peter Smith  wrote:
>
> Here are some minor review comments for patch v28-0002
>
> ==
> src/sgml/ref/pgupgrade.sgml
>
> 1.
> -   with the primary.)  Replication slots are not copied and must
> -   be recreated.
> +   with the primary.)  Replication slots on old standby are not copied.
> +   Only logical slots on the primary are migrated to the new standby,
> +   and other slots must be recreated.
>
>
> /on old standby/on the old standby/
>

Fixed.

> ==
> src/bin/pg_upgrade/info.c
>
> 2. get_old_cluster_logical_slot_infos
>
> +void
> +get_old_cluster_logical_slot_infos(void)
> +{
> + int dbnum;
> +
> + /* Logical slots can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) <= 1600)
> + return;
> +
> + pg_log(PG_VERBOSE, "\nsource databases:");
> +
> + for (dbnum = 0; dbnum < old_cluster.dbarr.ndbs; dbnum++)
> + {
> + DbInfo*pDbInfo = &old_cluster.dbarr.dbs[dbnum];
> +
> + get_old_cluster_logical_slot_infos_per_db(pDbInfo);
> +
> + if (log_opts.verbose)
> + {
> + pg_log(PG_VERBOSE, "Database: \"%s\"", pDbInfo->db_name);
> + print_slot_infos(&pDbInfo->slot_arr);
> + }
> + }
> +}
>
> It might be worth putting an Assert before calling the
> get_old_cluster_logical_slot_infos_per_db(...) just as a sanity check:
> Assert(pDbInfo->slot_arr.nslots == 0);
>
> This also helps to better document the "Note" of the
> count_old_cluster_logical_slots() function comment.
>

I have changed the comments atop count_old_cluster_logical_slots() and
also I don't see the need for this Assert.

> ~~~
>
> 3. count_old_cluster_logical_slots
>
> +/*
> + * count_old_cluster_logical_slots()
> + *
> + * Sum up and return the number of logical replication slots for all 
> databases.
> + *
> + * Note: this function always returns 0 if the old_cluster is PG16 and prior
> + * because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for PG17 and
> + * later.
> + */
> +int
> +count_old_cluster_logical_slots(void)
>
> Maybe that "Note" should be expanded a bit to say who does this:
>
> SUGGESTION
>
> Note: This function always returns 0 if the old_cluster is PG16 and
> prior because old_cluster.dbarr.dbs[dbnum].slot_arr is set only for
> PG17 and later. See where get_old_cluster_logical_slot_infos_per_db()
> is called.
>

Changed, but written differently because saying in terms of variable
name doesn't sound good to me.

> ==
> src/bin/pg_upgrade/pg_upgrade.c
>
> 4.
> + /*
> + * Logical replication slot upgrade only supported for old_cluster >=
> + * PG17.
> + *
> + * Note: This must be done after doing the pg_resetwal command because
> + * pg_resetwal would remove required WALs.
> + */
> + if (count_old_cluster_logical_slots())
> + {
> + start_postmaster(&new_cluster, true);
> + create_logical_replication_slots();
> + stop_postmaster(false);
> + }
> +
>
> 4a.
> I felt this comment needs a bit more detail otherwise you can't tell
> how the >= PG17 version check works.
>
> 4b.
> /slot upgrade only supported/slot upgrade is only supported/
>
> ~
>
> SUGGESTION
>
> Logical replication slot upgrade is only supported for old_cluster >=
> PG17. An explicit version check is not necessary here because function
> count_old_cluster_logical_slots() will always return 0 for old_cluster
> <= PG16.
>

I don't see the need to explain anything about version check here, so
removed that part of the comment.

Apart from this, I have addressed some of the comments raised by you
for the 0003 patch. Please find the diff patch attached. I think we
should combine 0002 and 0003 patches.

I have another comment on the patch:
+ /* Check there are no logical replication slots with a 'lost' state. */
+ res = executeQueryOrDie(conn,
+ "SELECT slot_name FROM pg_catalog.pg_replication_slots "
+ "WHERE wal_status = 'lost' AND "
+ "temporary IS FALSE;");

In this place, shouldn't we explicitly check for slot_type as logical?
I think we should consistently check for slot_type in all the queries
used in this patch.

-- 
With Regards,
Amit Kapila.


changes_amit.1.patch
Description: Binary data


Re: cataloguing NOT NULL constraints

2023-08-31 Thread Alvaro Herrera
Hello Alexander,

Thanks for testing.

On 2023-Aug-31, Alexander Lakhin wrote:

> 25.08.2023 14:38, Alvaro Herrera wrote:
> > I have now pushed this again.  Hopefully it'll stick this time.
> 
> I've found that after that commit the following query:
> CREATE TABLE t(a int PRIMARY KEY) PARTITION BY RANGE (a);
> CREATE TABLE tp1(a int);
> ALTER TABLE t ATTACH PARTITION tp1 FOR VALUES FROM (0) to (1);
> 
> triggers a server crash:

Hmm, that's some weird code I left there all right.  Can you please try
this patch?  (Not final; I'll review it more completely later,
particularly to add this test case.)

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
 It does it in a really, really complicated way
 why does it need to be complicated?
 Because it's MakeMaker.
>From ab241913dec84265ca64d3cb76d1509bb7ce1808 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Thu, 31 Aug 2023 12:24:18 +0200
Subject: [PATCH] Fix not-null constraint test

Per report from Alexander Lakhin
---
 src/backend/commands/tablecmds.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/src/backend/commands/tablecmds.c b/src/backend/commands/tablecmds.c
index d097da3c78..5941d0a4be 100644
--- a/src/backend/commands/tablecmds.c
+++ b/src/backend/commands/tablecmds.c
@@ -15750,7 +15750,8 @@ MergeAttributesIntoExisting(Relation child_rel, Relation parent_rel)
 
 contup = findNotNullConstraintAttnum(RelationGetRelid(parent_rel),
 	 attribute->attnum);
-if (!((Form_pg_constraint) GETSTRUCT(contup))->connoinherit)
+
+if (!HeapTupleIsValid(contup))
 	ereport(ERROR,
 			(errcode(ERRCODE_DATATYPE_MISMATCH),
 			 errmsg("column \"%s\" in child table must be marked NOT NULL",
@@ -15975,10 +15976,20 @@ MergeConstraintsIntoExisting(Relation child_rel, Relation parent_rel)
 		systable_endscan(child_scan);
 
 		if (!found)
+		{
+			if (parent_con->contype == CONSTRAINT_NOTNULL)
+ereport(ERROR,
+		errcode(ERRCODE_DATATYPE_MISMATCH),
+		errmsg("column \"%s\" in child table must be marked NOT NULL",
+			   get_attname(parent_relid,
+		   extractNotNullColumn(parent_tuple),
+		   false)));
+
 			ereport(ERROR,
 	(errcode(ERRCODE_DATATYPE_MISMATCH),
 	 errmsg("child table is missing constraint \"%s\"",
 			NameStr(parent_con->conname;
+		}
 	}
 
 	systable_endscan(parent_scan);
-- 
2.39.2



Re: Sync scan & regression tests

2023-08-31 Thread Heikki Linnakangas

On 31/08/2023 02:37, Melanie Plageman wrote:

On Wed, Aug 30, 2023 at 5:15 PM David Rowley  wrote:


I just looked at v15's code and I agree that the ss_report_location()
would be called even when the scan is finished.  It wasn't intentional
that that was changed in v16, so I'm happy for your patch to be
applied and backpatched to 16.  Thanks for digging into that.


Yes, thanks for catching my mistake.
LGTM.


Pushed, thanks for the reviews!

--
Heikki Linnakangas
Neon (https://neon.tech)





Re: cataloguing NOT NULL constraints

2023-08-31 Thread Alvaro Herrera
On 2023-Mar-29, Peter Eisentraut wrote:

> On 27.03.23 15:55, Peter Eisentraut wrote:
> > The information schema should be updated.  I think the following views:
> > 
> > - CHECK_CONSTRAINTS
> > - CONSTRAINT_COLUMN_USAGE
> > - DOMAIN_CONSTRAINTS
> > - TABLE_CONSTRAINTS
> > 
> > It looks like these have no test coverage; maybe that could be addressed
> > at the same time.
> 
> Here are patches for this.  I haven't included the expected files for the
> tests; this should be checked again that output is correct or the changes
> introduced by this patch set are as expected.
> 
> The reason we didn't have tests for this before was probably in part because
> the information schema made up names for not-null constraints involving
> OIDs, so the test wouldn't have been stable.
> 
> Feel free to integrate this, or we can add it on afterwards.

I'm eyeing patch 0002 here.  I noticed that in view check_constraints it
defines the not-null constraint definition as substrings over the
pg_get_constraintdef() function[q1], so I wondered whether it might be
better to join to pg_attribute instead.  I see two options:

1. add a scalar subselect in the select list for each constraint [q2]
2. add a LEFT JOIN to pg_attribute to the main FROM list [q3]
   ON con.conrelid=att.attrelid AND con.conkey[1] = con.attrelid

With just the regression test tables in place, these forms are all
pretty much the same in execution time.  I then created 20k tables with
6 columns each and NOT NULL constraint on each column[4].  That's not a
huge amount but it's credible for a medium-size database with a bunch of
partitions (it's amazing what passes for a medium-size database these
days).  I was surprised to find out that q3 (~130ms) is three times
faster than q2 (~390ms), which is in turn more than twice faster than
your proposed q1 (~870ms).  So unless you have another reason to prefer
it, I think we should use q3 here.


In constraint_column_usage, you're adding a relkind to the catalog scan
that goes through pg_depend for CHECK constraints.  Here we can rely on
a simple conkey[1] check and a separate UNION ALL arm[q5]; this is also
faster when there are many tables.

The third view definition looks ok.  It's certainly very nice to be able
to delete XXX comments there.


[q1]
SELECT current_database()::information_schema.sql_identifier AS 
constraint_catalog,
rs.nspname::information_schema.sql_identifier AS constraint_schema,
con.conname::information_schema.sql_identifier AS constraint_name,
CASE con.contype
WHEN 'c'::"char" THEN 
"left"(SUBSTRING(pg_get_constraintdef(con.oid) FROM 8), '-1'::integer)
WHEN 'n'::"char" THEN SUBSTRING(pg_get_constraintdef(con.oid) FROM 
10) || ' IS NOT NULL'::text 
ELSE NULL::text
END::information_schema.character_data AS check_clause
   FROM pg_constraint con
 LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace
 LEFT JOIN pg_class c ON c.oid = con.conrelid
 LEFT JOIN pg_type t ON t.oid = con.contypid

  WHERE pg_has_role(COALESCE(c.relowner, t.typowner), 'USAGE'::text) AND 
(con.contype = ANY (ARRAY['c'::"char", 'n'::"char"]));

[q2]
SELECT current_database()::information_schema.sql_identifier AS 
constraint_catalog,
rs.nspname::information_schema.sql_identifier AS constraint_schema,
con.conname::information_schema.sql_identifier AS constraint_name,
CASE con.contype
WHEN 'c'::"char" THEN 
"left"(SUBSTRING(pg_get_constraintdef(con.oid) FROM 8), '-1'::integer)
WHEN 'n'::"char" THEN FORMAT('CHECK (%s IS NOT NULL)',
 (SELECT attname FROM pg_attribute 
WHERE attrelid = conrelid AND attnum = conkey[1]))
ELSE NULL::text
END::information_schema.character_data AS check_clause
   FROM pg_constraint con
 LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace
 LEFT JOIN pg_class c ON c.oid = con.conrelid
 LEFT JOIN pg_type t ON t.oid = con.contypid
  WHERE pg_has_role(COALESCE(c.relowner, t.typowner), 'USAGE'::text) AND 
(con.contype = ANY (ARRAY['c'::"char", 'n'::"char"]));

[q3]
SELECT current_database()::information_schema.sql_identifier AS 
constraint_catalog,
rs.nspname::information_schema.sql_identifier AS constraint_schema,
con.conname::information_schema.sql_identifier AS constraint_name,
CASE con.contype
WHEN 'c'::"char" THEN 
"left"(SUBSTRING(pg_get_constraintdef(con.oid) FROM 8), '-1'::integer)
WHEN 'n'::"char" THEN FORMAT('CHECK (%s IS NOT NULL)', at.attname)  
   
ELSE NULL::text
END::information_schema.character_data AS check_clause
   FROM pg_constraint con
 LEFT JOIN pg_namespace rs ON rs.oid = con.connamespace
 LEFT JOIN pg_class c ON c.oid = con.conrelid
 

Re: cataloguing NOT NULL constraints

2023-08-31 Thread Alexander Lakhin

Hi Alvaro,

25.08.2023 14:38, Alvaro Herrera wrote:

I have now pushed this again.  Hopefully it'll stick this time.


I've found that after that commit the following query:
CREATE TABLE t(a int PRIMARY KEY) PARTITION BY RANGE (a);
CREATE TABLE tp1(a int);
ALTER TABLE t ATTACH PARTITION tp1 FOR VALUES FROM (0) to (1);

triggers a server crash:
Core was generated by `postgres: law regression [local] ALTER TABLE 
 '.
Program terminated with signal SIGSEGV, Segmentation fault.

warning: Section `.reg-xstate/2194811' in core file too small.
#0  0x556007711d77 in MergeAttributesIntoExisting (child_rel=0x7fc30ba309d8,
    parent_rel=0x7fc30ba33f18) at tablecmds.c:15771
15771   if (!((Form_pg_constraint) 
GETSTRUCT(contup))->connoinherit)
(gdb) bt
#0  0x556007711d77 in MergeAttributesIntoExisting (child_rel=0x7fc30ba309d8,
    parent_rel=0x7fc30ba33f18) at tablecmds.c:15771
#1  0x5560077118d4 in CreateInheritance (child_rel=0x7fc30ba309d8, 
parent_rel=0x7fc30ba33f18)
    at tablecmds.c:15631
...

(gdb) print contup
$1 = (HeapTuple) 0x0

On b0e96f311~1 I get:
ERROR:  column "a" in child table must be marked NOT NULL

Best regards,
Alexander

Re: Should we use MemSet or {0} for struct initialization?

2023-08-31 Thread Junwang Zhao
On Thu, Aug 31, 2023 at 5:34 PM Richard Guo  wrote:
>
> While working on a bug in expandRecordVariable() I noticed that in the
> switch statement for case RTE_SUBQUERY we initialize struct ParseState
> with {0} while for case RTE_CTE we do that with MemSet.  I understand
> that there is nothing wrong with this, just cannot get away with the
> inconsistency inside the same function (sorry for the nitpicking).
>
> Do we have a preference for how to initialize structures?  From 9fd45870
> it seems that we prefer to {0}.  So here is a trivial patch doing that.
> And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
> can also be replaced with {0}, so include that in the patch too.
>
> Thanks
> Richard

If the struct has padding or aligned, {0} only guarantee the struct
members initialized to 0, while memset sets the alignment/padding
to 0 as well, but since we will not access the alignment/padding, so
they give the same effect.

I bet {0} should be faster since there is no function call, but I'm not
100% sure ;)


-- 
Regards
Junwang Zhao




Re: Sync scan & regression tests

2023-08-31 Thread Heikki Linnakangas

On 29/08/2023 13:35, Heikki Linnakangas wrote:

On 07/08/2023 03:55, Tom Lane wrote:

This is possibly explained by the fact that it uses (per its
extra_config)
'shared_buffers = 10MB',
although it's done that for a long time and portals.out hasn't changed
since before chipmunk's latest success.  Perhaps some change in an
earlier test script affected this?


I changed the config yesterday to 'shared_buffers = 20MB', before seeing
this thread. If we expect the regression tests to work with
'shared_buffers=10MB' - and I think that would be nice - I'll change it
back. But let's see what happens with 'shared_buffers=20MB'. It will
take a few days for the tests to complete.


With shared_buffers='20MB', the tests passed. I'm going to change it 
back to 10MB now, so that we continue to cover that case.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Eliminate redundant tuple visibility check in vacuum

2023-08-31 Thread David Geier

Hi Melanie,

On 8/31/23 02:59, Melanie Plageman wrote:

I created a large table and then updated a tuple on every page in the
relation and vacuumed it. I saw a consistent slight improvement in
vacuum execution time. I profiled a bit with perf stat as well. The
difference is relatively small for this kind of example, but I can
work on a more compelling, realistic example. I think eliminating the
retry loop is also useful, as I have heard that users have had to
cancel vacuums which were in this retry loop for too long.

Just to provide a specific test case, if you create a small table like this

create table foo (a int, b int, c int) with(autovacuum_enabled=false);
insert into foo select i, i, i from generate_series(1, 1000);

And then vacuum it. I find that with my patch applied I see a
consistent ~9% speedup (averaged across multiple runs).

master: ~533ms
patch: ~487ms

And in the profile, with my patch applied, you notice less time spent
in HeapTupleSatisfiesVacuumHorizon()

master:
 11.83%  postgres  postgres   [.] heap_page_prune
 11.59%  postgres  postgres   [.] heap_prepare_freeze_tuple
  8.77%  postgres  postgres   [.] lazy_scan_prune
  8.01%  postgres  postgres   [.] HeapTupleSatisfiesVacuumHorizon
  7.79%  postgres  postgres   [.] heap_tuple_should_freeze

patch:
 13.41%  postgres  postgres   [.] heap_prepare_freeze_tuple
  9.88%  postgres  postgres   [.] heap_page_prune
  8.61%  postgres  postgres   [.] lazy_scan_prune
  7.00%  postgres  postgres   [.] heap_tuple_should_freeze
  6.43%  postgres  postgres   [.] HeapTupleSatisfiesVacuumHorizon


Thanks a lot for providing additional information and the test case.

I tried it on a release build and I also see a 10% speed-up. I reset the 
visibility map between VACUUM runs, see:


CREATE EXTENSION pg_visibility; CREATE TABLE foo (a INT, b INT, c INT) 
WITH(autovacuum_enabled=FALSE); INSERT INTO foo SELECT i, i, i from 
generate_series(1, 1000) i; VACUUM foo; SELECT 
pg_truncate_visibility_map('foo'); VACUUM foo; SELECT 
pg_truncate_visibility_map('foo'); VACUUM foo; ...


The first patch, which refactors the code so we can pass the result of 
the visibility checks to the caller, looks good to me.


Regarding the 2nd patch (disclaimer: I'm not too familiar with that area 
of the code): I don't completely understand why the retry loop is not 
needed anymore and how you now detect/handle the possible race 
condition? It can still happen that an aborting transaction changes the 
state of a row after heap_page_prune() looked at that row. Would that 
case now not be ignored?


--
David Geier
(ServiceNow)





Re: persist logical slots to disk during shutdown checkpoint

2023-08-31 Thread Amit Kapila
On Thu, Aug 31, 2023 at 12:25 PM Ashutosh Bapat
 wrote:
>
> On Thu, Aug 31, 2023 at 12:10 PM Amit Kapila  wrote:
> >
> > > +
> > > +/*
> > > + * We won't ensure that the slot is persisted after the 
> > > confirmed_flush
> > > + * LSN is updated as that could lead to frequent writes.  However, 
> > > we need
> > > + * to ensure that we do persist the slots at the time of shutdown 
> > > whose
> > > + * confirmed_flush LSN is changed since we last saved the slot to 
> > > disk.
> > > + * This will help in avoiding retreat of the confirmed_flush LSN 
> > > after
> > > + * restart.  This variable is used to track the last saved 
> > > confirmed_flush
> > > + * LSN value.
> > > + */
> > >
> > > This comment makes more sense in SaveSlotToPath() than here. We may 
> > > decide to
> > > use last_saved_confirmed_flush for something else in future.
> > >
> >
> > I have kept it here because it contains some information that is not
> > specific to SaveSlotToPath. So, it seems easier to follow the whole
> > theory if we keep it at the central place in the structure and then
> > add the reference wherever required but I am fine if you and others
> > feel strongly about moving this to SaveSlotToPath().
>
> Saving slot to disk happens only in SaveSlotToPath, so except the last
> sentence rest of the comment makes sense in SaveSlotToPath().
>
> > >
> > > This function assumes that the subscriber will receive and confirm WAL 
> > > upto
> > > checkpoint location and publisher's WAL sender will update it in the slot.
> > > Where is the code to ensure that? Does the WAL sender process wait for
> > > checkpoint
> > > LSN to be confirmed when shutting down?
> > >
> >
> > Note, that we need to compare if all the WAL before the
> > shutdown_checkpoint WAL record is sent. Before the clean shutdown, we
> > do ensure that all the pending WAL is confirmed back. See the use of
> > WalSndDone() in WalSndLoop().
>
> Ok. Thanks for pointing that out to me.
>
> >
> > > I
> > > think we should shut down subscriber, restart publisher and then make this
> > > check based on the contents of the replication slot instead of server log.
> > > Shutting down subscriber will ensure that the subscriber won't send any 
> > > new
> > > confirmed flush location to the publisher after restart.
> > >
> >
> > But if we shutdown the subscriber before the publisher there is no
> > guarantee that the publisher has sent all outstanding logs up to the
> > shutdown checkpoint record (i.e., the latest record). Such a guarantee
> > can only be there if we do a clean shutdown of the publisher before
> > the subscriber.
>
> So the sequence is shutdown publisher node, shutdown subscriber node,
> start publisher node and carry out the checks.
>

This can probably work but I still prefer the current approach as that
will be closer to the ideal values on the disk instead of comparison
with a later in-memory value of confirmed_flush LSN. Ideally, if we
would have a tool like pg_replslotdata which can read the on-disk
state of slots that would be better but missing that, the current one
sounds like the next best possibility. Do you see any problem with the
current approach of test?

BTW, I think we can keep autovacuum = off for this test just to avoid
any extra record generation even though that doesn't matter for the
purpose of test.

> >
> > > All the places which call ReplicationSlotSave() mark the slot as dirty.  
> > > All
> > > the places where SaveSlotToPath() is called, the slot is marked dirty 
> > > except
> > > when calling from CheckPointReplicationSlots(). So I am wondering whether 
> > > we
> > > should be marking the slot dirty in CheckPointReplicationSlots() and avoid
> > > passing down is_shutdown flag to SaveSlotToPath().
> > >
> >
> > I feel that will add another spinlock acquire/release pair without
> > much benefit. Sure, it may not be performance-sensitive but still
> > adding another pair of lock/release doesn't seem like a better idea.
>
> We call ReplicatioinSlotMarkDirty() followed by ReplicationSlotSave()
> at all the places, even those which are more frequent than this.
>

All but one. Normally, the idea of marking dirty is to indicate that
we will actually write/flush the contents at a later point (except
when required for correctness) as even indicated in the comments atop
ReplicatioinSlotMarkDirty(). However, I see your point that we use
that protocol at all the current places including CreateSlotOnDisk().
So, we can probably do it here as well.

-- 
With Regards,
Amit Kapila.




pg_basebackup: Always return valid temporary slot names

2023-08-31 Thread Jelte Fennema
With PgBouncer in the middle PQbackendPID can return negative values
due to it filling all 32 bits of the be_pid with random bits.

When this happens it results in pg_basebackup generating an invalid slot
name (when no specific slot name is passed in) and thus throwing an
error like this:

pg_basebackup: error: could not send replication command
"CREATE_REPLICATION_SLOT "pg_basebackup_-1201966863" TEMPORARY
PHYSICAL ( RESERVE_WAL)": ERROR:  replication slot name
"pg_basebackup_-1201966863" contains invalid character
HINT:  Replication slot names may only contain lower case letters,
numbers, and the underscore character.

This patch fixes that problem by formatting the result from PQbackendPID
as an unsigned integer when creating the temporary replication slot
name.

I think it would be good to backport this fix too.

Replication connection support for PgBouncer is not merged yet, but
it's pretty much ready:
https://github.com/pgbouncer/pgbouncer/pull/876

The reason PgBouncer does not pass on the actual Postgres backend PID
is that it doesn't have an associated server connection yet when it
needs to send the startup message to the client. It also cannot use
it's own PID, because that would be the same for all clients, since
pgbouncer is a single process.


v1-0001-pg_basebackup-Always-return-valid-temporary-slot-.patch
Description: Binary data


Re: Extract numeric filed in JSONB more effectively

2023-08-31 Thread Andy Fan
Hi Chap,

The v11 attached, mainly changes are:
1.  use the jsonb_xx_start and jsonb_finish_numeric style.
2.  improve the test case a bit.

It doesn't include:
1.  the jsonb_finish_text function, since we have a operator ->> for text
already and the performance for it is OK and there is no cast entry for
jsonb to text.
2.  the jsonb_finish_jsonb since I can't see a clear user case for now.
Rewriting jsonb_object_field with 2 DirectFunctionCall looks not pretty
reasonable as we paid 2 DirectFunctionCall overhead to reduce ~10 lines
code duplication.


An incompatible issue at error message level is found during test:
create table jb(a jsonb);
insert into jb select '{"a": "a"}'::jsonb;
select (a->'a')::int4 from jb;

master:   ERROR:  cannot cast jsonb string to type *integer*
patch:  ERROR:  cannot cast jsonb string to type *numeric*

That's mainly because we first extract the field to numeric and
then cast it to int4 and the error raised at the first step and it
doesn't know the final type.  One way to fix it is adding a 2nd
argument for jsonb_finish_numeric for the real type, but
it looks weird and more suggestions on this would be good.

Performance comparison between v10 and v11.

create table tb (a jsonb);
insert into tb select '{"a": 1}'::jsonb from generate_series(1, 10)i;
select 1 from tb where (a->'a')::int2 = 2;   (pgbench 5 times)

v11:  16.273 ms
v10:  15.986 ms
master: 32.530ms

So I think the performance would not be an issue.


> I noticed there is another patch registered in this CF: [1]
> It adds new operations within jsonpath like .bigint .time
> and so on.
>
> I was wondering whether that work would be conflicting or
> complementary with this. It looks to be complementary. The
> operations being added there are within jsonpath evaluation.
> Here we are working on faster ways to get those results out.
>
> It does not seem that [1] will add any new choices in
> JsonbValue. All of its (.bigint .integer .number) seem to
> verify the requested form and then put the result as a
> numeric in ->val.numeric. So that doesn't add any new
> cases for this patch to handle. (Too bad, in a way: if that
> other patch added ->val.bigint, this patch could add a case
> to retrieve that value without going through the work of
> making a numeric. But that would complicate other things
> touching JsonbValue, and be a matter for that other patch.)
>
> It may be expanding the choices for what we might one day
> find in ->val.datetime though.
>
> Thanks for this information. I tried the  jsonb_xx_start and
jsonb_finish_numeric style, and it looks like a good experience
and it may not make things too complicated even if the above
things happen IMO.

Any feedback is welcome.

-- 
Best Regards
Andy Fan


v11-0001-optimize-casting-jsonb-to-a-given-type.patch
Description: Binary data


Re: Allow specifying a dbname in pg_basebackup connection string

2023-08-31 Thread Jelte Fennema
Attached is a new version with some slightly updated wording in the docs


v4-0001-Allow-specifying-a-dbname-in-pg_basebackup-connec.patch
Description: Binary data


Commitfest 2023-09 starts soon

2023-08-31 Thread Peter Eisentraut
Commitfest 2023-09 (https://commitfest.postgresql.org/44/) starts in 
less than 28 hours.


If you have any patches you would like considered, be sure to add them 
in good time.


All patch authors, and especially experienced hackers, are requested to 
make sure the patch status is up to date.  If the patch is still being 
worked on, it might not need to be in "Needs review".  Conversely, if 
you are hoping for a review but the status is "Waiting on author", then 
it will likely be ignored by reviewers.


There are a number of patches carried over from the PG16 development 
cycle that have been in "Waiting on author" for several months.  I will 
aggressively prune those after the start of this commitfest if there 
hasn't been any author activity by then.





Should we use MemSet or {0} for struct initialization?

2023-08-31 Thread Richard Guo
While working on a bug in expandRecordVariable() I noticed that in the
switch statement for case RTE_SUBQUERY we initialize struct ParseState
with {0} while for case RTE_CTE we do that with MemSet.  I understand
that there is nothing wrong with this, just cannot get away with the
inconsistency inside the same function (sorry for the nitpicking).

Do we have a preference for how to initialize structures?  From 9fd45870
it seems that we prefer to {0}.  So here is a trivial patch doing that.
And with a rough scan the MemSet calls in pg_stat_get_backend_subxact()
can also be replaced with {0}, so include that in the patch too.

Thanks
Richard


v1-0001-Replace-more-MemSet-calls-with-struct-initialization.patch
Description: Binary data


Re: Make --help output fit within 80 columns per line

2023-08-31 Thread torikoshia
On Mon, Aug 21, 2023 at 1:09 PM Masahiro Ikeda 
 wrote:

(1)
Why don't you add test for the purpose? It could be overkill...
I though the following function is the best place.


Added the test.

BTW, psql --help outputs the content of PGHOST, which caused a failure 
in the test:


```
-h, --host=HOSTNAME  database server host or socket directory
 (default: 
"/var/folders/m7/9snkd5b54cx_b4lxkl9ljlccgn/T/LobrmSUf7t")

```

It may be overkill, added a logic for removing the content of PGHOST.


On 2023-08-23 09:45, Masahiro Ikeda wrote:

Hi,

On 2023-08-22 22:57, torikoshia wrote:

On 2023-08-21 13:08, Masahiro Ikeda wrote:

(2)

Is there any reason that only src/bin commands are targeted? I found 
that
we also need to fix vacuumlo with the above test. I think it's better 
to

fix it because it's a contrib module.

$ vacuumlo --help | while IFS='' read line; do echo $((`echo $line |
wc -m` - 1)) $line; done | sort -n -r  | head -n 2
84   -n, --dry-run don't remove large objects, just show
what would be done
74   -l, --limit=LIMIT commit after removing each LIMIT large 
objects


This is because I wasn't sure making all --help outputs fit into 80
columns per line is right thing to do as described below:

| If this is the way to go, I'll do same things for contrib commands.

If there are no objection, I'm going to make other commands fit within
80 columns per line including (4).


OK. Sorry, I missed the sentence above.
I'd like to hear what others comment too.


Although there are no comments, attached patch modifies vaccumlo.


(3)

Is to delete '/mnt/server' intended?  I though it better to leave it 
as

is since archive_cleanup_command example uses the absolute path.

-"  pg_archivecleanup /mnt/server/archiverdir
00010010.0020.backup\n"));
+"  pg_archivecleanup archiverdir
00010010.0020.backup\n"));

I will confirmed that the --help text are not changed and only
the line breaks are changed.  But, currently the above change
break it.


Yes, it is intended as described in the thread.

https://www.postgresql.org/message-id/20230615.152036.1556630042388070221.horikyota.ntt%40gmail.com

| We could shorten it by removing the "/mnt/server" portion, but
I'm not sure if it's worth doing.

However, I feel it is acceptable to make an exception and exceed 80
characters for this line.


Thanks for sharing the thread. I understood.

It seems that the directory name should be consistent with the example
of archive_cleanup_command. However, it does not seem appropriate to
make archive_cleanup_command to use a relative path.

```

pg_archivecleanup --help

(snip)
e.g.
  archive_cleanup_command = 'pg_archivecleanup /mnt/server/archiverdir 
%r'


Or for use as a standalone archive cleaner:
e.g.
  pg_archivecleanup /mnt/server/archiverdir
00010010.0020.backup
```

IMHO, is simply breaking the line acceptable?


Agreed.



(4)



I found that some binaries, for example ecpg, are not tested with
program_help_ok(). Is it better to add tests in the patch?


Added program_help_ok() to ecpg and pgbench.
Although pg_regress and zic are not tested using program_help_ok, but 
left as they are because they are not commands that users execute 
directly.


--
Regards,

--
Atsushi Torikoshi
NTT DATA Group CorporationFrom 4728cab3e39f7994b8b6dd41787cac90b5cb64f6 Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Thu, 31 Aug 2023 16:22:39 +0900
Subject: [PATCH v2] Make --help output fit within 80 columns per line

Outputs of --help output of some commands fit into 80 columns per line,
while others do not.
For the consistency and for 80-column terminal, this patch makes them
fit within 80 columns per line.

Also this patch adds a test which checks if --help outputs fit within
80 columns per line.

---
 contrib/vacuumlo/vacuumlo.c   |  3 ++-
 src/bin/initdb/initdb.c   |  6 --
 src/bin/pg_amcheck/pg_amcheck.c   | 21 ---
 src/bin/pg_archivecleanup/pg_archivecleanup.c |  3 ++-
 src/bin/pg_basebackup/pg_receivewal.c | 12 +++
 src/bin/pg_basebackup/pg_recvlogical.c| 18 ++--
 src/bin/pg_checksums/pg_checksums.c   |  3 ++-
 src/bin/pg_dump/pg_dump.c |  3 ++-
 src/bin/pg_dump/pg_dumpall.c  |  3 ++-
 src/bin/pg_dump/pg_restore.c  |  6 --
 src/bin/pg_upgrade/option.c   |  9 +---
 src/bin/pgbench/pgbench.c |  6 --
 src/bin/pgbench/t/002_pgbench_no_server.pl|  1 +
 src/bin/psql/help.c   |  9 +---
 src/bin/scripts/createdb.c|  3 ++-
 src/bin/scripts/pg_isready.c  |  3 ++-
 src/bin/scripts/vacuumdb.c| 21 ---
 src/interfaces/ecpg/Makefile  |  1 +
 src/interfaces/ecpg/t/0

Re: Statistics Import and Export

2023-08-31 Thread Ashutosh Bapat
On Thu, Aug 31, 2023 at 12:17 PM Corey Huinker  wrote:
>
> While the primary purpose of the import function(s) are to reduce downtime
> during an upgrade, it is not hard to see that they could also be used to
> facilitate tuning and development operations, asking questions like "how might
> this query plan change if this table has 1000x rows in it?", without actually
> putting those rows into the table.

Thanks. I think this may be used with postgres_fdw to import
statistics directly from the foreigns server, whenever possible,
rather than fetching the rows and building it locally. If it's known
that the stats on foreign and local servers match for a foreign table,
we will be one step closer to accurately estimating the cost of a
foreign plan locally rather than through EXPLAIN.

-- 
Best Wishes,
Ashutosh Bapat