date:20220728

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Amit Kapila

On Fri, Jul 29, 2022 at 5:36 AM Masahiko Sawada  wrote:
>
> On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila  wrote:
> >
> > On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila  wrote:
> >
> > I have another comment on this patch:
> > SnapBuildPurgeOlderTxn()
> > {
> > ...
> > + if (surviving_xids > 0)
> > + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> > + surviving_xids * sizeof(TransactionId))
> > ...
> >
> > For this code to hit, we must have a situation where one or more of
> > the xacts in this array must be still running. And, if that is true,
> > we would not have started from the restart point where the
> > corresponding snapshot (that contains the still running xacts) has
> > been serialized because we advance the restart point to not before the
> > oldest running xacts restart_decoding_lsn. This may not be easy to
> > understand so let me take an example to explain. Say we have two
> > transactions t1 and t2, and both have made catalog changes. We want a
> > situation where one of those gets purged and the other remains in
> > builder->catchange.xip array. I have tried variants of the below
> > sequence to see if I can get into the required situation but am not
> > able to make it.
> >
> > Session-1
> > Checkpoint -1;
> > T1
> > DDL
> >
> > Session-2
> > T2
> > DDL
> >
> > Session-3
> > Checkpoint-2;
> > pg_logical_slot_get_changes()
> >  -- Here when we serialize the snapshot corresponding to
> > CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> > as catalog-changing xacts.
> >
> > Session-1
> > T1
> > Commit;
> >
> > Checkpoint;
> > pg_logical_slot_get_changes()
> >  -- Here we will restore from Checkpoint-1's serialized snapshot and
> > won't be able to move restart_point to Checkpoint-2 because T2 is
> > still open.
> >
> > Now, as per my understanding, it is only possible to move
> > restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> > which case we will never have that in surviving_xids array after the
> > purge.
> >
> > It is possible I am missing something here. Do let me know your thoughts.
>
> Yeah, your description makes sense to me. I've also considered how to
> hit this path but I guess it is never hit. Thinking of it in another
> way, first of all, at least 2 catalog modifying transactions have to
> be running while writing a xl_running_xacts. The serialized snapshot
> that is written when we decode the first xl_running_xact has two
> transactions. Then, one of them is committed before the second
> xl_running_xacts. The second serialized snapshot has only one
> transaction. Then, the transaction is also committed after that. Now,
> in order to execute the path, we need to start decoding from the first
> serialized snapshot. However, if we start from there, we cannot decode
> the full contents of the transaction that was committed later.
>

I think then we should change this code in the master branch patch
with an additional comment on the lines of: "Either all the xacts got
purged or none. It is only possible to partially remove the xids from
this array if one or more of the xids are still running but not all.
That can happen if we start decoding from a point (LSN where the
snapshot state became consistent) where all the xacts in this were
running and then at least one of those got committed and a few are
still running. We will never start from such a point because we won't
move the slot's restart_lsn past the point where the oldest running
transaction's restart_decoding_lsn is."

I suggest keeping the back branch as it is w.r.t this change as if
this logic proves to be faulty it won't affect the stable branches. We
can always back-patch this small change if required.

-- 
With Regards,
Amit Kapila.

Re: [Commitfest 2022-07] Patch Triage: Needs Review, Part 1

2022-07-28 Thread Julien Rouhaud

Hi,

On Thu, Jul 28, 2022 at 02:28:23PM -0700, Jacob Champion wrote:
>
> = Stalled Patches, Need Help =
> [...]
> - Add extra statistics to explain for Nested Loop
>   https://commitfest.postgresql.org/38/2765/
>
> I think the author is hoping for help with testing and performance
> characterization.

As I mentioned in [1], this patch breaks the current assumption that
INSTRUMENT_ALL will lead to statement-level metrics that are generally useful.
According to the benchmark, the proposed patch would add a 1.5% overhead for
pg_stat_statements or any other similar extension that relies on INSTRUMENT_ALL
for no additional information, and I don't think it's acceptable.

I'm still not sure of what is the best way to fix that, but clearly something
has to be done, ideally without requiring every single pg_stat_statements-like
extension to be modified.

> The following are actively being worked and I expect to move them to
> next CF:
>
> - session variables, LET command

Yes please.

Re: Pluggable toaster

2022-07-28 Thread Nikita Malakhov

Hi hackers!

Aleksander, thanks for the remark, seems that we've missed recent change -
the pubication
test does not have the new column 'Toaster'. Will send a corrected patch
tomorrow. Also, thanks
for the patch name note, I've changed it as you suggested.
I'm on vacation, so I read emails not very often and answers take some
time, sorry.


On Tue, Jul 26, 2022 at 11:23 AM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> Thanks for an update!
>
> > 0002_toaster_interface_v10 contains TOAST API with Dummy toaster as an
> example (but I would,
> > as recommended, remove Dummy toaster and provide it as an extension),
> and default Toaster
> > was left as-is (reference implementation).
> >
> > 0003_toaster_default_v9 implements reference TOAST as Default Toaster
> via TOAST API,
> > so Heap AM calls Toast only via API, and does not have direct calls to
> Toast functionality.
> >
> > 0004_toaster_snapshot_v8 continues refactoring and has some important
> changes (added
> > into README.toastapi new part related TOAST API extensibility - the
> virtual functions table).
>
> This numbering is confusing. Please use a command like:
>
> ```
> git format-patch origin/master -v 42
> ```
>
> This will produce a patchset with a consistent naming like:
>
> ```
> v42-0001-foo-bar.patch
> v42-0002-baz-qux.patch
> ... etc ...
> ```
>
> Also cfbot [1] will know in which order to apply them.
>
> > GIT branch with this patch resides here:
> > https://github.com/postgrespro/postgres/tree/toasterapi_clean
>
> Unfortunately the three patches in question from this branch don't
> pass `make check`. Please update
> src/test/regress/expected/publication.out and make sure the patchset
> passes the rest of the tests at least on one platform before
> submitting.
>
> Personally I have a little set of scripts for this [2]. The following
> commands should pass:
>
> ```
> # quick check
> ./quick-build.sh && ./single-install.sh && make installcheck
>
> # full check
> ./full-build.sh && ./single-install.sh && make installcheck-world
> ```
>
> Finally, please update the commit messages. Each commit message should
> include a brief description (one line) , a detailed description (the
> body), and also the list of the authors, the reviewers and a link to
> the discussion. Please use [3] as a template.
>
> [1]: http://cfbot.cputube.org/
> [2]: https://github.com/afiskon/pgscripts/
> [3]:
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=784cedda0604ee4ac731fd0b00cd8b27e78c02d3
>
> --
> Best regards,
> Aleksander Alekseev
>


-- 
Regards,
Nikita Malakhov
Postgres Professional
https://postgrespro.ru/

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C

On Fri, Jul 29, 2022 at 8:31 AM Peter Smith  wrote:
>
> Here are some comments for the patch v40-0001:
>
> ==
>
> 1. Commit message
>
> It might be better to always use 'copy_data = true' in favour of
> 'copy_data = on' just for consistency with all the docs and the error
> messages.
>
> ==

Modified

> 2. doc/src/sgml/ref/create_subscription.sgml
>
> @@ -386,6 +401,15 @@ CREATE SUBSCRIPTION  class="parameter">subscription_name can have non-existent publications.
>
>
> +  
> +   If the subscription is created with origin = NONE and
> +   copy_data = true, it will check if the publisher has
> +   subscribed to the same table from other publishers and, if so, throw an
> +   error to prevent possible non-local data from being copied. The user can
> +   override this check and continue with the copy operation by specifying
> +   copy_data = force.
> +  
>
> 2a.
> It is interesting that you changed the note to say origin = NONE.
> Personally, I prefer it written as you did, but I think maybe this
> change does not belong in this patch. The suggestion for changing from
> "none" to NONE is being discussed elsewhere in this thread and
> probably all such changes should be done together (if at all) as a
> separate patch. Until then I think this patch 0001 should just stay
> consistent with whatever is already pushed on HEAD.

Modified

> 2b.
> "possible no-local data". Maybe the terminology "local/non-local" is a
> hangover from back when the subscription parameter was called local
> instead of origin. I'm not sure if you want to change this or not, and
> anyway I didn't have any better suggestions – so this comment is just
> to bring it to your attention.
>
> ==

Modified

> 3. src/backend/commands/subscriptioncmds.c - DefGetCopyData
>
> + /*
> + * The set of strings accepted here should match up with the
> + * grammar's opt_boolean_or_string production.
> + */
> + if (pg_strcasecmp(sval, "false") == 0 ||
> + pg_strcasecmp(sval, "off") == 0)
> + return COPY_DATA_OFF;
> + if (pg_strcasecmp(sval, "true") == 0 ||
> + pg_strcasecmp(sval, "on") == 0)
> + return COPY_DATA_ON;
> + if (pg_strcasecmp(sval, "force") == 0)
> + return COPY_DATA_FORCE;
>
> I understand the intention of the comment, but it is not strictly
> correct to say "should match up" because "force" is a new value.
> Perhaps the comment should be as suggested below.
>
> SUGGESTION
> The set of strings accepted here must include all those accepted by
> the grammar's opt_boolean_or_string production.
>
> ~~

Modified

>
> 4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed
>
> @@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
>   table_close(rel, RowExclusiveLock);
>  }
>
> +/*
> + * Check and throw an error if the publisher has subscribed to the same table
> + * from some other publisher. This check is required only if "copy_data = on"
> + * and "origin = NONE" for CREATE SUBSCRIPTION and
> + * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
> + * replicating data that has an origin.
> + *
> + * This check need not be performed on the tables that are already added as
> + * incremental sync for such tables will happen through WAL and the origin of
> + * the data can be identified from the WAL records.
> + *
> + * subrel_local_oids contains the list of relation oids that are already
> + * present on the subscriber.
> + */
> +static void
> +check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
> +CopyData copydata, char *origin,
> +Oid *subrel_local_oids, int subrel_count)
>
> 4a.
> "copy_data = on" -> "copy_data = true" (for consistency with the docs
> and the error messages)

Modified

> 4b.
> The same NONE/none review comment from #2a applies here too. Probably
> it should be written as none for now unless/until *everything* changes
> to NONE.

Modified

> 4c.
> "to avoid the publisher from replicating data that has an origin." ->
> "to avoid replicating data that has an origin."

Modified

> 4d.
> + * This check need not be performed on the tables that are already added as
> + * incremental sync for such tables will happen through WAL and the origin of
> + * the data can be identified from the WAL records.
>
> SUGGESTION (maybe?)
> This check need not be performed on the tables that are already added
> because incremental sync for those tables will happen through WAL and
> the origin of the data can be identified from the WAL records.
>
> ==

Modified

>
> 5. src/test/subscription/t/030_origin.pl
>
> + "Refresh publication when the publisher has subscribed for the new table"
>
> SUGGESTION (Just to mention origin = none somehow. Maybe you can
> reword it better than this)
> Refresh publication when the publisher has subscribed for the new
> table, but the subscriber-side wants origin=none

Modified

Thanks for the comments, the attached v41 patch has the changes for the same.

Regards,
Vignesh
From de95301eed4dec2a76e40effb9c8b8132c2c7e5

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-07-28 Thread Kyotaro Horiguchi

At Fri, 29 Jul 2022 11:27:01 +1200, Thomas Munro  wrote 
in 
> Maybe it just needs a replication slot?  I see:
> 
> ERROR:  requested WAL segment 00010003 has already been 
> removed

Agreed, I see the same.  The same failure can be surely reproducible
by inserting wal-switch+checkpoint after taking backup [1].  And it is
fixed by the attached.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center


[1]:
--- a/src/test/recovery/t/033_replay_tsp_drops.pl
+++ b/src/test/recovery/t/033_replay_tsp_drops.pl
@@ -30,6 +30,13 @@ sub test_tablespace
my $backup_name = 'my_backup';
$node_primary->backup($backup_name);
 
+   $node_primary->psql(
+   'postgres',
+   qq[
+   CREATE TABLE t(); DROP TABLE t; SELECT pg_switch_wal();
+   CHECKPOINT;
+   ]);
+
my $node_standby = PostgreSQL::Test::Cluster->new("standby2_$strategy");
$node_standby->init_from_backup($node_primary, $backup_name,
has_streaming => 1);
diff --git a/src/test/recovery/t/033_replay_tsp_drops.pl b/src/test/recovery/t/033_replay_tsp_drops.pl
index 9b74cb09ac..0756ca6c87 100644
--- a/src/test/recovery/t/033_replay_tsp_drops.pl
+++ b/src/test/recovery/t/033_replay_tsp_drops.pl
@@ -20,6 +20,7 @@ sub test_tablespace
 	$node_primary->psql(
 		'postgres',
 		qq[
+			SELECT pg_create_physical_replication_slot('slot1', true);
 			SET allow_in_place_tablespaces=on;
 			CREATE TABLESPACE dropme_ts1 LOCATION '';
 			CREATE TABLESPACE dropme_ts2 LOCATION '';

Re: generic plans and "initial" pruning

2022-07-28 Thread Tom Lane

Amit Langote  writes:
> On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
>> I wonder whether it's really necessary to added the PartitionPruneInfo
>> objects to a list in PlannerInfo first and then roll them up into
>> PlannerGlobal later. I know we do that for range table entries, but
>> I've never quite understood why we do it that way instead of creating
>> a flat range table in PlannerGlobal from the start. And so by
>> extension I wonder whether this table couldn't be flat from the start
>> also.

> Tom may want to correct me but my understanding of why the planner
> waits till the end of planning to start populating the PlannerGlobal
> range table is that it is not until then that we know which subqueries
> will be scanned by the final plan tree, so also whose range table
> entries will be included in the range table passed to the executor.

It would not be profitable to flatten the range table before we've
done remove_useless_joins.  We'd end up with useless entries from
subqueries that ultimately aren't there.  We could perhaps do it
after we finish that phase, but I don't really see the point: it
wouldn't be better than what we do now, just the same work at a
different time.

regards, tom lane

Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.

2022-07-28 Thread Peter Smith

PSA v2 of this patch, modified as suggested.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v2-0001-Functions-is_publishable_class-and-is_publishable.patch
Description: Binary data

Re: generic plans and "initial" pruning

2022-07-28 Thread Amit Langote

On Thu, Jul 28, 2022 at 1:27 AM Robert Haas  wrote:
> On Tue, Jul 26, 2022 at 11:01 PM Amit Langote  wrote:
> > Needed to be rebased again, over 2d04277121f this time.

Thanks for looking.

> 0001 adds es_part_prune_result but does not use it, so maybe the
> introduction of that field should be deferred until it's needed for
> something.

Oops, looks like a mistake when breaking the patch.  Will move that bit to 0002.

> I wonder whether it's really necessary to added the PartitionPruneInfo
> objects to a list in PlannerInfo first and then roll them up into
> PlannerGlobal later. I know we do that for range table entries, but
> I've never quite understood why we do it that way instead of creating
> a flat range table in PlannerGlobal from the start. And so by
> extension I wonder whether this table couldn't be flat from the start
> also.

Tom may want to correct me but my understanding of why the planner
waits till the end of planning to start populating the PlannerGlobal
range table is that it is not until then that we know which subqueries
will be scanned by the final plan tree, so also whose range table
entries will be included in the range table passed to the executor.  I
can see that subquery pull-up causes a pulled-up subquery's range
table entries to be added into the parent's query's and all its nodes
changed using OffsetVarNodes() to refer to the new RT indexes.  But
for subqueries that are not pulled up, their subplans' nodes (present
in PlannerGlboal.subplans) would still refer to the original RT
indexes (per range table in the corresponding PlannerGlobal.subroot),
which must be fixed and the end of planning is the time to do so.  Or
maybe that could be done when build_subplan() creates a subplan and
adds it to PlannerGlobal.subplans, but for some reason it's not?

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Windows default locale vs initdb

2022-07-28 Thread Thomas Munro

On Fri, Jul 22, 2022 at 11:59 PM Juan José Santamaría Flecha
 wrote:
> TL;DR; What I want to show through this example is that Windows ACP is not 
> modified by setlocale(), it can only be done through the Windows registry and 
> only in recent releases.

Thanks, that was helpful, and so was that SO link.

So it sounds like I should forget about the v3-0002 patch, but the
v3-0001 and v3-0003 patches might have a future.  And it sounds like
we might need to investigate maybe defending ourselves against the ACP
being different than what we expect (ie not matching the database
encoding)?  Did I understand correctly that you're looking into that?

Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.

2022-07-28 Thread Amit Kapila

On Fri, Jul 29, 2022 at 8:26 AM Peter Smith  wrote:
>
> On Fri, Jul 29, 2022 at 11:55 AM houzj.f...@fujitsu.com
>  wrote:
> >
> > On Friday, July 29, 2022 7:17 AM Peter Smith  wrote:
> > > During a recent review, I happened to notice that in the file
> > > src/backend/catalog/pg_publication.c the two functions 
> > > 'is_publishable_class'
> > > and 'is_publishable_relation' used to be [1] adjacent in the source code. 
> > > This is
> > > also evident in 'is_publishable_relation' because the wording of the 
> > > function
> > > comment just refers to the prior function (e.g. "Another variant of this, 
> > > taking a
> > > Relation.") and also this just "wraps" the prior function.
> > >
> > > It seems that sometime last year another commit [2] inadvertently inserted
> > > another function ('filter_partitions') between those aforementioned, and 
> > > that
> > > means the "Another variant of this" comment doesn't make much sense
> > > anymore.
> >
> > Agreed.
> >
> > Personally, I think it would be better to modify the comments of
> > is_publishable_relation and directly mention the function name it refers to
> > which can prevent future code to break it again.
>
> I'd intended only to make the minimal changes necessary to set things
> right again, but your way is better.
>

Yeah, Hou-San's suggestion sounds better to me as well.

> >
> > Besides,
> >
> > /*
> >  * Returns if relation represented by oid and Form_pg_class entry
> >  * is publishable.
> >  *
> >  * Does same checks as the above,
> >
> > This comment was also intended to refer to the function
> > check_publication_add_relation(), but is invalid now because there is 
> > another
> > function check_publication_add_schema() inserted between them. We'd better 
> > fix
> > this as well.
>

+1. Here, I think it will be better to add the function name in the
comments and keep the current order as it is.


-- 
With Regards,
Amit Kapila.

Re: [Refactor]Avoid to handle FORCE_NOT_NULL/FORCE_NULL options when COPY TO

2022-07-28 Thread Richard Guo

On Thu, Jul 28, 2022 at 9:04 PM Zhang Mingli  wrote:

> Assertions added.
>

Can we also add assertions to make sure force_quote, force_notnull and
force_null are available only in CSV mode?

Thanks
Richard

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread Peter Smith

Here are some comments for the patch v40-0001:

==

1. Commit message

It might be better to always use 'copy_data = true' in favour of
'copy_data = on' just for consistency with all the docs and the error
messages.

==

2. doc/src/sgml/ref/create_subscription.sgml

@@ -386,6 +401,15 @@ CREATE SUBSCRIPTION subscription_name

+  
+   If the subscription is created with origin = NONE and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, throw an
+   error to prevent possible non-local data from being copied. The user can
+   override this check and continue with the copy operation by specifying
+   copy_data = force.
+  

2a.
It is interesting that you changed the note to say origin = NONE.
Personally, I prefer it written as you did, but I think maybe this
change does not belong in this patch. The suggestion for changing from
"none" to NONE is being discussed elsewhere in this thread and
probably all such changes should be done together (if at all) as a
separate patch. Until then I think this patch 0001 should just stay
consistent with whatever is already pushed on HEAD.

2b.
"possible no-local data". Maybe the terminology "local/non-local" is a
hangover from back when the subscription parameter was called local
instead of origin. I'm not sure if you want to change this or not, and
anyway I didn't have any better suggestions – so this comment is just
to bring it to your attention.

==

3. src/backend/commands/subscriptioncmds.c - DefGetCopyData

+ /*
+ * The set of strings accepted here should match up with the
+ * grammar's opt_boolean_or_string production.
+ */
+ if (pg_strcasecmp(sval, "false") == 0 ||
+ pg_strcasecmp(sval, "off") == 0)
+ return COPY_DATA_OFF;
+ if (pg_strcasecmp(sval, "true") == 0 ||
+ pg_strcasecmp(sval, "on") == 0)
+ return COPY_DATA_ON;
+ if (pg_strcasecmp(sval, "force") == 0)
+ return COPY_DATA_FORCE;

I understand the intention of the comment, but it is not strictly
correct to say "should match up" because "force" is a new value.
Perhaps the comment should be as suggested below.

SUGGESTION
The set of strings accepted here must include all those accepted by
the grammar's opt_boolean_or_string production.

~~~

4. src/backend/commands/subscriptioncmds.c - check_pub_table_subscribed

@@ -1781,6 +1858,122 @@ AlterSubscriptionOwner_oid(Oid subid, Oid newOwnerId)
  table_close(rel, RowExclusiveLock);
 }

+/*
+ * Check and throw an error if the publisher has subscribed to the same table
+ * from some other publisher. This check is required only if "copy_data = on"
+ * and "origin = NONE" for CREATE SUBSCRIPTION and
+ * ALTER SUBSCRIPTION ... REFRESH statements to avoid the publisher from
+ * replicating data that has an origin.
+ *
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.
+ *
+ * subrel_local_oids contains the list of relation oids that are already
+ * present on the subscriber.
+ */
+static void
+check_pub_table_subscribed(WalReceiverConn *wrconn, List *publications,
+CopyData copydata, char *origin,
+Oid *subrel_local_oids, int subrel_count)

4a.
"copy_data = on" -> "copy_data = true" (for consistency with the docs
and the error messages)

4b.
The same NONE/none review comment from #2a applies here too. Probably
it should be written as none for now unless/until *everything* changes
to NONE.

4c.
"to avoid the publisher from replicating data that has an origin." ->
"to avoid replicating data that has an origin."

4d.
+ * This check need not be performed on the tables that are already added as
+ * incremental sync for such tables will happen through WAL and the origin of
+ * the data can be identified from the WAL records.

SUGGESTION (maybe?)
This check need not be performed on the tables that are already added
because incremental sync for those tables will happen through WAL and
the origin of the data can be identified from the WAL records.

==

5. src/test/subscription/t/030_origin.pl

+ "Refresh publication when the publisher has subscribed for the new table"

SUGGESTION (Just to mention origin = none somehow. Maybe you can
reword it better than this)
Refresh publication when the publisher has subscribed for the new
table, but the subscriber-side wants origin=none

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.

2022-07-28 Thread Peter Smith

On Fri, Jul 29, 2022 at 11:55 AM houzj.f...@fujitsu.com
 wrote:
>
> On Friday, July 29, 2022 7:17 AM Peter Smith  wrote:
> > During a recent review, I happened to notice that in the file
> > src/backend/catalog/pg_publication.c the two functions 
> > 'is_publishable_class'
> > and 'is_publishable_relation' used to be [1] adjacent in the source code. 
> > This is
> > also evident in 'is_publishable_relation' because the wording of the 
> > function
> > comment just refers to the prior function (e.g. "Another variant of this, 
> > taking a
> > Relation.") and also this just "wraps" the prior function.
> >
> > It seems that sometime last year another commit [2] inadvertently inserted
> > another function ('filter_partitions') between those aforementioned, and 
> > that
> > means the "Another variant of this" comment doesn't make much sense
> > anymore.
>
> Agreed.
>
> Personally, I think it would be better to modify the comments of
> is_publishable_relation and directly mention the function name it refers to
> which can prevent future code to break it again.

I'd intended only to make the minimal changes necessary to set things
right again, but your way is better.

>
> Besides,
>
> /*
>  * Returns if relation represented by oid and Form_pg_class entry
>  * is publishable.
>  *
>  * Does same checks as the above,
>
> This comment was also intended to refer to the function
> check_publication_add_relation(), but is invalid now because there is another
> function check_publication_add_schema() inserted between them. We'd better fix
> this as well.

Thanks, I'll post another patch later to address that one too.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: Patch to address creation of PgStat* contexts with null parent context

2022-07-28 Thread Kyotaro Horiguchi

At Thu, 28 Jul 2022 22:03:13 +0800, Zhang Mingli  wrote 
in 
> Hi,
> 
> On Jul 28, 2022, 21:30 +0800, Reid Thompson , 
> wrote:
> > Attached is a patch to address this.

Good Catch!

> Codes seem good, my question is:
> 
> Do auto vacuum processes need CacheMemoryContext?

pgstat_report_vacuum requires it. Startup process doesn't seem to use
pgstats while recovery proceeding but requires the context only at
termination...

> Is it designed not to  create CacheMemoryContext in such processes?
> 
> If so, we’d better use TopMemoryContext in such processes.

That makes the memorycontext-tree structure unstable because
CacheMemoryContext can be created on-the-fly.

Honestly I don't like to call CreateCacheMemoryContext in the two
functions on-the-fly.  Since every process that calls
pgstat_initialize() necessarily calls pgstat_setup_memcxt() at latest
at process termination, I think we can create at least
CacheMemoryContext in pgstat_initialize(). Or couldn't we create the
all three contexts in the function, instead of calling
pgstat_setup_memcxt() on-the fly?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

RE: Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.

2022-07-28 Thread houzj.f...@fujitsu.com

On Friday, July 29, 2022 7:17 AM Peter Smith  wrote:
> During a recent review, I happened to notice that in the file
> src/backend/catalog/pg_publication.c the two functions 'is_publishable_class'
> and 'is_publishable_relation' used to be [1] adjacent in the source code. 
> This is
> also evident in 'is_publishable_relation' because the wording of the function
> comment just refers to the prior function (e.g. "Another variant of this, 
> taking a
> Relation.") and also this just "wraps" the prior function.
> 
> It seems that sometime last year another commit [2] inadvertently inserted
> another function ('filter_partitions') between those aforementioned, and that
> means the "Another variant of this" comment doesn't make much sense
> anymore.

Agreed.

Personally, I think it would be better to modify the comments of
is_publishable_relation and directly mention the function name it refers to
which can prevent future code to break it again.

Besides,

/*
 * Returns if relation represented by oid and Form_pg_class entry
 * is publishable.
 *
 * Does same checks as the above,

This comment was also intended to refer to the function
check_publication_add_relation(), but is invalid now because there is another
function check_publication_add_schema() inserted between them. We'd better fix
this as well.

Best regards,
Hou zj

Re: Cirrus CI (Windows help wanted)

2022-07-28 Thread Justin Pryzby

Hi,

On Tue, Jan 12, 2021 at 09:04:51AM -0500, Andrew Dunstan wrote:
> On 1/5/21 11:19 PM, Thomas Munro wrote:
> >
> > It seems we can make our own, either on-the-fly with caching, or
> > hosted somewhere, like this:
> >
> > https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
> 
> OK, I got this working.

I tried this to use the "dockerfile-as-a-ci-environment" process, to see if it
would allow cirrus to start windows builds without the 4 minute delay that we
currently have.

But it failed like:

https://cirrus-ci.com/task/5622728425209856?logs=push#L16
[00:09:53.675] unauthorized: You don't have the needed permissions to perform 
this operation, and you may have invalid credentials. To authenticate your 
request, follow the steps in:
https://cloud.google.com/container-registry/docs/advanced-authentication

Does this require tying my github account to a google account ?
Or paying cirrusci ?  Or ??

-- 
Justin

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Masahiko Sawada

On Thu, Jul 28, 2022 at 8:57 PM Amit Kapila  wrote:
>
> On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila  wrote:
> >
> > On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada  
> > wrote:
> > >
> > > Okay, I've attached an updated patch that does the above idea. Could
> > > you please do the performance tests again to see if the idea can help
> > > reduce the overhead, Shi yu?
> > >
> >
> > While reviewing the patch for HEAD, I have changed a few comments. See
> > attached, if you agree with these changes then include them in the
> > next version.
> >
>
> I have another comment on this patch:
> SnapBuildPurgeOlderTxn()
> {
> ...
> + if (surviving_xids > 0)
> + memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
> + surviving_xids * sizeof(TransactionId))
> ...
>
> For this code to hit, we must have a situation where one or more of
> the xacts in this array must be still running. And, if that is true,
> we would not have started from the restart point where the
> corresponding snapshot (that contains the still running xacts) has
> been serialized because we advance the restart point to not before the
> oldest running xacts restart_decoding_lsn. This may not be easy to
> understand so let me take an example to explain. Say we have two
> transactions t1 and t2, and both have made catalog changes. We want a
> situation where one of those gets purged and the other remains in
> builder->catchange.xip array. I have tried variants of the below
> sequence to see if I can get into the required situation but am not
> able to make it.
>
> Session-1
> Checkpoint -1;
> T1
> DDL
>
> Session-2
> T2
> DDL
>
> Session-3
> Checkpoint-2;
> pg_logical_slot_get_changes()
>  -- Here when we serialize the snapshot corresponding to
> CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
> as catalog-changing xacts.
>
> Session-1
> T1
> Commit;
>
> Checkpoint;
> pg_logical_slot_get_changes()
>  -- Here we will restore from Checkpoint-1's serialized snapshot and
> won't be able to move restart_point to Checkpoint-2 because T2 is
> still open.
>
> Now, as per my understanding, it is only possible to move
> restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
> which case we will never have that in surviving_xids array after the
> purge.
>
> It is possible I am missing something here. Do let me know your thoughts.

Yeah, your description makes sense to me. I've also considered how to
hit this path but I guess it is never hit. Thinking of it in another
way, first of all, at least 2 catalog modifying transactions have to
be running while writing a xl_running_xacts. The serialized snapshot
that is written when we decode the first xl_running_xact has two
transactions. Then, one of them is committed before the second
xl_running_xacts. The second serialized snapshot has only one
transaction. Then, the transaction is also committed after that. Now,
in order to execute the path, we need to start decoding from the first
serialized snapshot. However, if we start from there, we cannot decode
the full contents of the transaction that was committed later.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

[Commitfest 2022-07] Patch Triage: Needs Review, Part 2

2022-07-28 Thread Jacob Champion

Hello,

Part 2 should include entries four commitfests and older. (For the rest,
it's probably too early to call something "stalled", so I don't plan to
do any more triage there.) Patch authors CC'd.

= Stalled Patches, Recommend Return =

I plan to return these with a note saying "needs more interest".

- Extended statistics in EXPLAIN
  https://commitfest.postgresql.org/38/3050/

  There's interest, but it seemed controversial. And a reviewer
  attempted to revive this in January, but there hasn't been new
  engagement or response from the original people involved in a year.

- Map WAL segment files on PMEM as WAL buffers
  https://commitfest.postgresql.org/38/3181/

  Stalled out; last review was in January and it needs a rebase.

- Support pg_ident mapping for LDAP
  https://commitfest.postgresql.org/38/3314/

  This one's mine; I think it's clear that there's not enough interest
  in this idea yet and I think I'd rather put effort into SASL, which
  would ideally do the same thing natively.

- Improve logging when using Huge Pages
  https://commitfest.postgresql.org/38/3310/

  There was interest last year but no mails this year, so this probably
  needs some buy-in first.

- functions to compute size of schemas/AMs (and maybe \dn++ and \dA++)
  https://commitfest.postgresql.org/38/3256/

  Been rebasing without review since September 2021. Seems like there's
  an idea in here that people want, though. Any sponsors for a future
  CF?

- Upgrade pgcrypto to crypt_blowfish 1.3
  https://commitfest.postgresql.org/38/3338/

  No conversation on this since October 2021. Like above, seems like a
  reasonable feature, so maybe someone can sponsor it and quickly get it
  resurrected in a future CF?

= Stalled Patches, Need Help =

I plan to move these forward unless someone says otherwise, but they
look stuck to me and need assistance.

- pgbench: add multiconnect support
  https://commitfest.postgresql.org/38/3227/

  Seems to be interest. Not much review, though.

- pg_stats and range statistics
  https://commitfest.postgresql.org/38/3184/

  There was immediate agreement that this feature was desirable, and
  then the reviews dried up. Anyone want to bump this?

- Asymmetric partition-wise JOIN
  https://commitfest.postgresql.org/38/3099/

  There was a rebase in January by a reviewer, so there's definite
  interest, but work hasn't progressed in a while. I've marked Waiting
  on Author in the meantime.

- Logging plan of the currently running query
  https://commitfest.postgresql.org/38/3142/

  Last review in Febrary and currently in a rebase loop.

- schema change not getting invalidated, both renamed table and new
  table data were getting replicated
  https://commitfest.postgresql.org/38/3262/

  This looks like a bug fix that should not be closed out, but it's been
  in a rebase loop without review for... a year? Any takers? Should we
  make an open issue?

- pgbench: using prepared BEGIN statement in a pipeline could cause an
  error
  https://commitfest.postgresql.org/38/3260/

  A bug fix, but maybe the approach taken for the fix is controversial?

- Atomic rename feature for Windows
  https://commitfest.postgresql.org/38/3347/

  I think this got derailed by a committer conversation about platform
  deprecation? I have no idea where the patch stands after that
  exchange; can someone recap?

= Active Patches =

These will be moved ahead:

- Lazy JIT IR code generation to increase JIT speed with partitions
- Logical replication failure "ERROR: could not map filenode
  "base/13237/442428" to relation OID" with catalog modifying txns
- Add proper planner support for ORDER BY / DISTINCT aggregates
- Fix ExecRTCheckPerms() inefficiency with many prunable partitions
- Using each rel as both outer and inner for anti-joins
- Postgres picks suboptimal index after building extended statistics
- Cache tuple routing info during bulk loads into partitioned tables
- postgres_fdw: commit remote (sub)transactions in parallel during
  pre-commit
- add checkpoint stats of snapshot and mapping files of pg_logical dir
- Allows database-specific role memberships

Thanks,
--Jacob

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-07-28 Thread Thomas Munro

On Fri, Jul 29, 2022 at 9:57 AM Tom Lane  wrote:
> Matthias van de Meent  writes:
> > I'd like to bring to your attention that the test that was introduced
> > with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it
> > sometimes times out while waiting for the secondary to catch up. Or,
> > at least I think it does, and I'm not too familiar with TAP failure
> > outputs: it returns with error code 29 and logs that I'd expect when
> > the timeout is reached.
>
> It's also failing in the buildfarm, eg
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-28%2020%3A57%3A50
>
> Looks like only conchuela so far, reinforcing the idea that we're
> only seeing it on FreeBSD.  I'd tentatively bet on a timing problem
> that requires some FreeBSD scheduling quirk to manifest; we've seen
> such quirks before.

Maybe it just needs a replication slot?  I see:

ERROR:  requested WAL segment 00010003 has already been removed

Functions 'is_publishable_class' and 'is_publishable_relation' should stay together.

2022-07-28 Thread Peter Smith

During a recent review, I happened to notice that in the file
src/backend/catalog/pg_publication.c the two functions
'is_publishable_class' and 'is_publishable_relation' used to be [1]
adjacent in the source code. This is also evident in
'is_publishable_relation' because the wording of the function comment
just refers to the prior function (e.g. "Another variant of this,
taking a Relation.") and also this just "wraps" the prior function.

It seems that sometime last year another commit [2] inadvertently
inserted another function ('filter_partitions') between those
aforementioned, and that means the "Another variant of this" comment
doesn't make much sense anymore.

PSA a patch just to put those original 2 functions back together
again. No code is "changed" - only moved.

--

[1] 
https://github.com/postgres/postgres/blame/f0b051e322d530a340e62f2ae16d99acdbcb3d05/src/backend/catalog/pg_publication.c
[2] 
https://github.com/postgres/postgres/commit/5a2832465fd8984d089e8c44c094e6900d987fcd#diff-1ecc273c7808aba21749ea2718482c153cd6c4dc9d90c69124f3a7c5963b2b4a

Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Functions-is_publishable_class-and-is_publishable.patch
Description: Binary data

Re: Cygwin cleanup

2022-07-28 Thread Thomas Munro

On Wed, Jul 27, 2022 at 5:09 AM Robert Haas  wrote:
> On Tue, Jul 26, 2022 at 7:40 AM Tom Lane  wrote:
> > I think maybe we should re-open the discussion.  I've certainly
> > reached the stage of fed-up-ness.  That platform seems seriously
> > broken, upstream is making no progress on fixing it, and there
> > doesn't seem to be any real-world use-case.  The only positive
> > argument for it is that Readline doesn't work in the other
> > Windows builds --- but we've apparently not rechecked that
> > statement in eighteen years, so maybe things are better now.
> >
> > If we could just continue to blithely ignore lorikeet's failures,
> > I wouldn't mind so much; but doing any significant amount of new
> > code development work for the platform seems like throwing away
> > developer time.
>
> I agree with that. All things being equal, I like the idea of
> supporting a bunch of different platforms, and Cygwin doesn't really
> look that dead. It has recent releases. But if blocking signals
> doesn't actually work on that platform, making PostgreSQL work
> reliably there seems really difficult.

It's one thing to drop old dead Unixes but I don't think anyone would
enjoy dropping support for an active open source project.  The best
outcome would be for people who have an interest in seeing PostgreSQL
work correctly on Cygwin to help get the bug fixed.  Here are the
threads I'm aware of:

https://cygwin.com/pipermail/cygwin/2017-August/234001.html
https://cygwin.com/pipermail/cygwin/2017-August/234097.html

I wonder if these problems would go away as a nice incidental
side-effect if we used latches for postmaster wakeups.  I don't
know... maybe, if the problem is just with the postmaster's pattern of
blocking/unblocking?  Maybe backend startup is simple enough that it
doesn't hit the bug?  From a quick glance, I think the assertion
failures that occur in regular backends can possibly be blamed on the
postmaster getting confused about its children due to unexpected
handler re-entry.

Re: [Patch] Fix bounds check in trim_array()

2022-07-28 Thread Nathan Bossart

On Mon, Jul 25, 2022 at 04:40:51PM +0200, Martin Kalcher wrote:
> +SELECT trim_array(ARRAY[]::int[], 1); -- fail
> +ERROR:  number of elements to trim must be between 0 and 0

Can we improve the error message?  Maybe it should look something like

ERROR:  number of elements to trim must be 0

for this case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Mingw task for Cirrus CI

2022-07-28 Thread Justin Pryzby

I think the "only_if" should allow separately running one but not both of the
windows instances, like:

+  only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || 
$CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*mingw64'

I'm not sure, but maybe this task should only run "by request", and omit the
first condition:

+  only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*mingw64'

I think it should include something like

+  setup_additional_packages_script: |
+REM C:\msys64\usr\bin\pacman.exe -S --noconfirm ...

Let's see what others think about those.

Do you know if this handles logging of crash dumps ?

With tweaks to prep_buildtree, and with ./configure --cache-file, that step
goes down to ~36sec (unless configure needs to be re-run).

I also looked into using busybox to avoid running separate processes for each
"ln", but I think 36sec is good enough.

At one point, I tried setting "CIRRUS_SHELL: bash" to avoid writing "bash -c"
over and over, but never got it working.

-- 
Justin
>From ec28354e1e526ddab3a8df90540f9aa127c40193 Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 21 Feb 2022 14:46:05 +0300
Subject: [PATCH 1/2] Added Windows with MinGW environment in Cirrus CI

---
 .cirrus.yml | 78 -
 1 file changed, 65 insertions(+), 13 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae552..1a06cdcaadb 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -338,13 +338,29 @@ task:
 cores_script: src/tools/ci/cores_backtrace.sh macos "${HOME}/cores"
 
 
+WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE
+env:
+  # Half the allowed per-user CPU cores
+  CPUS: 4
+  # The default working dir is in a directory msbuild complains about
+  CIRRUS_WORKING_DIR: "c:/cirrus"
+
+  # Avoids port conflicts between concurrent tap test runs
+  PG_TEST_USE_UNIX_SOCKETS: 1
+
+only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
+
+sysinfo_script: |
+  chcp
+  systeminfo
+  powershell -Command get-psdrive -psprovider filesystem
+  set
+
 task:
+  << : *WINDOWS_ENVIRONMENT_BASE
   name: Windows - Server 2019, VS 2019
 
   env:
-# Half the allowed per-user CPU cores
-CPUS: 4
-
 # Our windows infrastructure doesn't have test concurrency above the level
 # of a single vcregress test target. Due to that, it's useful to run prove
 # with multiple jobs. For the other tasks it isn't, because two sources
@@ -354,15 +370,11 @@ task:
 # likely can be improved upon further.
 PROVE_FLAGS: -j10 --timer
 
-# The default cirrus working dir is in a directory msbuild complains about
-CIRRUS_WORKING_DIR: "c:/cirrus"
 # Avoid re-installing over and over
 NO_TEMP_INSTALL: 1
 # git's tar doesn't deal with drive letters, see
 # https://postgr.es/m/b6782dc3-a7b0-ed56-175f-f8f54cb08d67%40dunslane.net
 TAR: "c:/windows/system32/tar.exe"
-# Avoids port conflicts between concurrent tap test runs
-PG_TEST_USE_UNIX_SOCKETS: 1
 PG_REGRESS_SOCK_DIR: "c:/cirrus/"
 # -m enables parallelism
 # verbosity:minimal + Summary reduce verbosity, while keeping a summary of
@@ -393,12 +405,6 @@ task:
 cpu: $CPUS
 memory: 4G
 
-  sysinfo_script: |
-chcp
-systeminfo
-powershell -Command get-psdrive -psprovider filesystem
-set
-
   setup_additional_packages_script: |
 REM choco install -y --no-progress ...
 
@@ -456,6 +462,52 @@ task:
   path: "crashlog-*.txt"
   type: text/plain
 
+task:
+  << : *WINDOWS_ENVIRONMENT_BASE
+  name: Windows - Server 2019, MinGW64
+  windows_container:
+image: $CONTAINER_REPO/windows_ci_mingw64:latest
+cpu: $CPUS
+memory: 4G
+  env:
+CCACHE_DIR: C:/msys64/ccache
+BUILD_DIR: "%CIRRUS_WORKING_DIR%/build"
+
+  ccache_cache:
+folder: ${CCACHE_DIR}
+
+  mingw_info_script:
+- C:\msys64\usr\bin\dash.exe -lc "where gcc"
+- C:\msys64\usr\bin\dash.exe -lc "gcc --version"
+- C:\msys64\usr\bin\dash.exe -lc "where perl"
+- C:\msys64\usr\bin\dash.exe -lc "perl --version"
+
+  configure_script:
+- C:\msys64\usr\bin\dash.exe -lc "mkdir %BUILD_DIR% &&
+  cd %BUILD_DIR% &&
+  %CIRRUS_WORKING_DIR%/configure
+--enable-cassert
+--enable-tap-tests
+--with-icu
+--with-libxml
+--with-libxslt
+--with-lz4
+--enable-debug
+CC='ccache gcc'
+CXX='ccache g++'
+CFLAGS='-Og -ggdb -pipe'
+CXXFLAGS='-Og -ggdb'"
+
+  build_script:
+C:\msys64\usr\bin\dash.exe -lc "cd %BUILD_DIR% && make -s world-bin -j${CPUS}"
+
+  upload_caches: ccache
+
+  tests_script:
+  - set "NoDefaultCurrentDirectoryInExePath=0"
+  - C:\msys64\usr\bin\dash.exe -lc "cd %BUILD_DIR% && make -s ${CHECK} ${CHECKFLAGS} -j${CPUS} TMPDIR=%BUILD_DIR%/tmp_install"
+
+  on_failure: *on_failure
 
 task:
   name: CompilerWarnings
-- 
2.17.1

>From c8122c49502e9f95df392

Re: Cygwin cleanup

2022-07-28 Thread Thomas Munro

On Fri, Jul 29, 2022 at 10:23 AM Justin Pryzby  wrote:
> On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
> > [04:33:55.234] Starting cygwin install, version 2.918
>
> Hm, I think that's the version of "cygwinsetup" but not cygwin..
> It also says this: [13:16:36.014] Cygwin v3.3.4.20220408 [Approved]

Oops.  Ok so we're testing the very latest then, and it definitely
still has the bug as we thought.

> It occurred to me today that if cfbot preserved the original patch series, and
> commit messages, that would allow patch authors to write things like
> "ci-os-only: docs" for a doc only patch.  I've never gotten cirrus'
> changesOnly() stuff to work...

Maybe it's time to switch to "git am -3 ..." and reject patches that
don't apply that way.

> > Looks like we can expect to be able to build and test fast on Windows
> > soonish, though,
>
> Do you mean with meson ?

Yeah.  Also there are some other things we can do to speed up testing
on Windows (and elsewhere), like not running every test query with new
psql + backend process pair, which takes at least a few hundred ms and
sometimes up to several seconds on this platform; I have some patches
I need to finish...

> > so maybe one day we'd just turn Cygwin and MSYS on?
>
> I didn't understand this ?

I mean, if, some sunny day, we can compile and test on Windows at
non-glacial speeds, then it would become possible to contemplate
having cfbot run these tasks for every patch every time.

Re: fix stats_fetch_consistency value in postgresql.conf.sample

2022-07-28 Thread Justin Pryzby

Note that this can currently exposes internal elog() errors to users:

postgres=# select pg_normalize_config_value('log_min_messages','abc');
WARNING:  invalid value for parameter "log_min_messages": "abc"
HINT:  Available values: debug5, debug4, debug3, debug2, debug1, info, notice, 
warning, error, log, fatal, panic.
ERROR:  could not find enum option 0 for log_min_messages

postgres=# \errverbose
ERROR:  XX000: could not find enum option 0 for log_min_messages
LOCATION:  config_enum_lookup_by_value, guc.c:7284

Re: predefined role(s) for VACUUM and ANALYZE

2022-07-28 Thread Nathan Bossart

On Tue, Jul 26, 2022 at 01:54:38PM -0400, Robert Haas wrote:
> I think we're down to 0 remaining now, so it'd be hard to justify
> consuming 2 of 0 remaining bits.

AFAICT there are 2 remaining.  N_ACL_RIGHTS is only 14.

> However, I maintain that the solution
> to this is either (1) change the aclitem representation to get another
> 32 bits or (2) invent a different system for less-commonly used
> permission bits. Checking permissions for SELECT or UPDATE has to be
> really fast, because most queries will need to do that sort of thing.
> If we represented VACUUM or ANALYZE in some other way in the catalogs
> that was more scalable but less efficient, it wouldn't be a big deal
> (although there's the issue of code duplication to consider).

Perhaps we could add something like a relacl_ext column to affected
catalogs with many more than 32 privilege bits.  However, if we actually do
have 2 bits remaining, we wouldn't need to do that work now unless someone
else uses them first.  That being said, it's certainly worth thinking about
the future of this stuff.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: Cygwin cleanup

2022-07-28 Thread Justin Pryzby

On Fri, Jul 29, 2022 at 10:04:04AM +1200, Thomas Munro wrote:
> Thanks for working on this!
> 
> Huh, that Cygwin being shipped by Choco is quite old, older than
> lorikeet's, but not old enough to not have the bug:
> 
> [04:33:55.234] Starting cygwin install, version 2.918

Hm, I think that's the version of "cygwinsetup" but not cygwin..
It also says this: [13:16:36.014] Cygwin v3.3.4.20220408 [Approved]

> I wonder if you can tell Choco
> to install an ancient version, but even if that's possible you'd be
> dealing with other stupid problems and bugs.

Yes: choco install -y --no-progress --version 4.6.1 ccache

> > > XXX This should use a canned Docker image with all the right packages
> > > installed
> >
> > Has anyone tried using non-canned images ?  It sounds like this could reduce
> > the 4min startup time for windows.
> >
> > https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment
> 
> Yeah, I had that working once.  Not sure what the pros and cons would be for 
> us.

I think it could be a lot faster to start, since cirrus caches the generated
docker image locally.  Rather than (I gather) pulling the image every time.

> > > XXX We would never want this to run by default in CI, but it'd be nice
> > > to be able to ask for it with ci-os-only!  (See commented out line)
> > >  only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
> >
> > Doesn't this already do what's needed?
> > As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
> > the task will runs only on request.
> 
> Yeah I was just trying to say that I was sharing the script in a way
> that always runs, but for commit we'd want that.  This is all far too
> slow for cfbot to have to deal with on every build.

It occurred to me today that if cfbot preserved the original patch series, and
commit messages, that would allow patch authors to write things like
"ci-os-only: docs" for a doc only patch.  I've never gotten cirrus'
changesOnly() stuff to work...

> Looks like we can expect to be able to build and test fast on Windows
> soonish, though,

Do you mean with meson ?

> so maybe one day we'd just turn Cygwin and MSYS on?

I didn't understand this ?

-- 
Justin

Re: Cygwin cleanup

2022-07-28 Thread Thomas Munro

On Wed, Jul 27, 2022 at 6:44 PM Justin Pryzby  wrote:
> On Tue, Jul 26, 2022 at 04:24:25PM +1200, Thomas Munro wrote:
> > 3.  You can't really run PostgreSQL on Cygwin for real, because its
> > implementation of signals does not have reliable signal masking, so
> > unsubtle and probably also subtle breakage occurs.  That was reported
> > upstream by Noah years ago, but they aren't working on a fix.
> > lorikeet shows random failures, and presumably any CI system will do
> > the same...
>
> Reference: 
> https://www.postgresql.org/message-id/20170321034703.GB2097809%40tornado.leadboat.com
>
> On my 2nd try:
>
> https://cirrus-ci.com/task/5311911574110208
> TRAP: FailedAssertion("mq->mq_sender == NULL", File: "shm_mq.c", Line: 230, 
> PID: 16370)
> 2022-07-26 06:32:35.525 PDT [15538][postmaster] LOG:  background worker 
> "parallel worker" (PID 16370) was terminated by signal 6: Aborted

Thanks for working on this!

Huh, that Cygwin being shipped by Choco is quite old, older than
lorikeet's, but not old enough to not have the bug:

[04:33:55.234] Starting cygwin install, version 2.918

Based on clues in Noah's emails in the archives, I think versions from
maybe somewhere around 2015 didn't have the bug, and then the bug
appeared, and AFAIK it's still here.  I wonder if you can tell Choco
to install an ancient version, but even if that's possible you'd be
dealing with other stupid problems and bugs.

> > XXX Doesn't get all the way through yet...
>
> Mainly because getopt was causing all tap tests to fail.
> I tried to fix that in configure, but ended up changing the callers.
>
> This is getting close, but I don't think has actually managed to pass all 
> tests
> yet..  https://cirrus-ci.com/task/5274721116749824

Woo.

> > 4.  When building with Cygwin GCC 11.3 you get a bunch of warnings
> > that don't show up on other platforms, seemingly indicating that it
> > interprets -Wimplicit-fallthrough=3 differently.  Huh?
>
> Evidently due to the same getopt issues.

Ahh, nice detective work.

> > XXX This should use a canned Docker image with all the right packages
> > installed
>
> Has anyone tried using non-canned images ?  It sounds like this could reduce
> the 4min startup time for windows.
>
> https://cirrus-ci.org/guide/docker-builder-vm/#dockerfile-as-a-ci-environment

Yeah, I had that working once.  Not sure what the pros and cons would be for us.

> > XXX configure is s slooow, can we cache it?!  Compiling is also
> > insanely slow, but ccache gets it down to a couple of minutes if you're
> > lucky
>
> One reason compiling was slow is because you ended up with -O2.

Ah, right.

> You can cache configure as long as you're willing to re-run it whenever 
> options
> were changed.  That also applies to the existing headerscheck.
>
> > XXX I don't know how to put variables like BUILD_JOBS into the scripts
>
> WDYM ?  If it's outside of bash and in windows shell it's like %var%, right ?
> https://cirrus-ci.org/guide/writing-tasks/#environment-variables

Right.  I should have taken the clue from the %cd% (I got a few ideas
about how to do this from libarchive's CI scripting[1]).

> I just noticed that cirrus is misbehaving: if there's a variable called CI
> (which there is), then it expands $CI_FOO like ${CI}_FOO rather than 
> ${CI_FOO}.
> I've also seen weirdness when variable names or operators appear in the commit
> message...
>
> > XXX Needs some --with-X options
>
> Done

Neat.

> > XXX We would never want this to run by default in CI, but it'd be nice
> > to be able to ask for it with ci-os-only!  (See commented out line)
> >  only_if: $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*cygwin.*'
>
> Doesn't this already do what's needed?
> As long as it doesn't also check: CHANGE_MESSAGE !~ 'ci-os-only',
> the task will runs only on request.

Yeah I was just trying to say that I was sharing the script in a way
that always runs, but for commit we'd want that.  This is all far too
slow for cfbot to have to deal with on every build.  Looks like we can
expect to be able to build and test fast on Windows soonish, though,
so maybe one day we'd just turn Cygwin and MSYS on?

[1] 
https://github.com/libarchive/libarchive/blob/master/build/ci/cirrus_ci/ci.cmd

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-07-28 Thread Tom Lane

Matthias van de Meent  writes:
> I'd like to bring to your attention that the test that was introduced
> with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it
> sometimes times out while waiting for the secondary to catch up. Or,
> at least I think it does, and I'm not too familiar with TAP failure
> outputs: it returns with error code 29 and logs that I'd expect when
> the timeout is reached.

It's also failing in the buildfarm, eg

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=conchuela&dt=2022-07-28%2020%3A57%3A50

Looks like only conchuela so far, reinforcing the idea that we're
only seeing it on FreeBSD.  I'd tentatively bet on a timing problem
that requires some FreeBSD scheduling quirk to manifest; we've seen
such quirks before.

regards, tom lane

Re: [Commitfest 2022-07] Patch Triage: Needs Review, Part 1

2022-07-28 Thread Tom Lane

Jacob Champion  writes:
> Next up is the large list of Needs Review. This part 1 should include
> entries as old or older than seven commitfests running.

I'm just commenting on a couple that I've been involved with.

> = Stalled Patches, Recommend Return =

> - Fix up partitionwise join on how equi-join conditions between the
> partition keys are identified
>   https://commitfest.postgresql.org/38/2266/
> It looks like this one was Returned with Feedback but did not actually
> have feedback, which may have caused confusion. (Solid motivation for a
> new close status.) I don't think there's been any review since 2020.

Yeah, there was an earlier discussion of this same patch in some
previous CF-closing thread, IIRC, but I can't find that right now.
I think it basically is stuck behind the outer-join-variables work
I'm pursuing at https://commitfest.postgresql.org/39/3755/ ... and
when/if that lands, the present patch probably won't be anywhere
near what we want anyway.  +1 for RWF.

> = Stalled Patches, Need Help =

> - Fix behavior of geo_ops when NaN is involved
>   https://commitfest.postgresql.org/38/2710/

> Stuck in a half-committed state, which is tricky. Could maybe use a
> reframing or recap (or a new thread?).

We fixed a couple of easy cases but then realized that the hard cases
are hard.  I don't have much faith that the current patch is going to
lead to anything committable, and it doesn't look like anyone has the
appetite to put in a lot of work on the topic.  I'd vote for RWF.

regards, tom lane

Re: optimize lookups in snapshot [sub]xip arrays

2022-07-28 Thread Nathan Bossart

On Tue, Jul 26, 2022 at 11:19:06AM -0700, Andres Freund wrote:
> On 2022-07-25 12:04:19 -0700, Nathan Bossart wrote:
>> From the discussion thus far, it seems there is interest in optimizing
>> [sub]xip lookups, so I'd like to spend some time moving it forward.  I
>> think the biggest open question is which approach to take.  Both the SIMD
>> and hash table approaches seem viable, but I think I prefer the SIMD
>> approach at the moment (see the last paragraph of quoted text for the
>> reasons).  What do folks think?
> 
> Agreed on all points.

Great!  Here is a new patch.  A couple notes:

 * I briefly looked into seeing whether auto-vectorization was viable and
   concluded it was not for these loops.

 * I borrowed USE_SSE2 from one of John Naylor's patches [0].  I'm not sure
   whether this is committable, so I would welcome thoughts on the proper
   form.  Given the comment says that SSE2 is supported by all x86-64
   hardware, I'm not seeing why we need the SSE 4.2 checks.  Is it not
   enough to check for __x86_64__ and _M_AMD64?

 * I haven't looked into adding an ARM implementation yet.

[0] 
https://postgr.es/m/CAFBsxsHko7yc8A-2PpjQ%3D2StomXF%2BT2jgKF%3DWaMFZWi8CvV7hA%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
>From 10a0369182be525dbe849d856b663aede10c4c16 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Thu, 28 Jul 2022 12:15:47 -0700
Subject: [PATCH v3 1/1] Use SSE2 intrinsics for XidInMVCCSnapshot().

This optimizes the linear searches through the [sub]xip arrays when
possible, which should improve performance significantly when the
arrays are large.
---
 src/backend/utils/time/snapmgr.c | 28 +++
 src/include/c.h  | 13 ++
 src/include/utils/linearsearch.h | 80 
 3 files changed, 100 insertions(+), 21 deletions(-)
 create mode 100644 src/include/utils/linearsearch.h

diff --git a/src/backend/utils/time/snapmgr.c b/src/backend/utils/time/snapmgr.c
index 5bc2a15160..834c8867d4 100644
--- a/src/backend/utils/time/snapmgr.c
+++ b/src/backend/utils/time/snapmgr.c
@@ -63,6 +63,7 @@
 #include "storage/sinvaladt.h"
 #include "storage/spin.h"
 #include "utils/builtins.h"
+#include "utils/linearsearch.h"
 #include "utils/memutils.h"
 #include "utils/old_snapshot.h"
 #include "utils/rel.h"
@@ -2284,8 +2285,6 @@ RestoreTransactionSnapshot(Snapshot snapshot, void *source_pgproc)
 bool
 XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 {
-	uint32		i;
-
 	/*
 	 * Make a quick range check to eliminate most XIDs without looking at the
 	 * xip arrays.  Note that this is OK even if we convert a subxact XID to
@@ -2317,13 +2316,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		if (!snapshot->suboverflowed)
 		{
 			/* we have full data, so search subxip */
-			int32		j;
-
-			for (j = 0; j < snapshot->subxcnt; j++)
-			{
-if (TransactionIdEquals(xid, snapshot->subxip[j]))
-	return true;
-			}
+			if (pg_linearsearch_uint32(xid, snapshot->subxip, snapshot->subxcnt))
+return true;
 
 			/* not there, fall through to search xip[] */
 		}
@@ -2344,16 +2338,11 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 return false;
 		}
 
-		for (i = 0; i < snapshot->xcnt; i++)
-		{
-			if (TransactionIdEquals(xid, snapshot->xip[i]))
-return true;
-		}
+		if (pg_linearsearch_uint32(xid, snapshot->xip, snapshot->xcnt))
+			return true;
 	}
 	else
 	{
-		int32		j;
-
 		/*
 		 * In recovery we store all xids in the subxact array because it is by
 		 * far the bigger array, and we mostly don't know which xids are
@@ -2383,11 +2372,8 @@ XidInMVCCSnapshot(TransactionId xid, Snapshot snapshot)
 		 * indeterminate xid. We don't know whether it's top level or subxact
 		 * but it doesn't matter. If it's present, the xid is visible.
 		 */
-		for (j = 0; j < snapshot->subxcnt; j++)
-		{
-			if (TransactionIdEquals(xid, snapshot->subxip[j]))
-return true;
-		}
+		if (pg_linearsearch_uint32(xid, snapshot->subxip, snapshot->subxcnt))
+			return true;
 	}
 
 	return false;
diff --git a/src/include/c.h b/src/include/c.h
index d35405f191..8b7d844fc9 100644
--- a/src/include/c.h
+++ b/src/include/c.h
@@ -371,6 +371,19 @@ typedef void (*pg_funcptr_t) (void);
 #endif
 #endif
 
+/*
+ * Are SSE2 intrinsics available?
+ *
+ * Note: We piggy-back on the check for SSE 4.2 intrinstics but only need SSE2
+ * at runtime.  That's supported by all x84-64 hardware, so we don't need an
+ * indirect function call.
+ *
+ * XXX: Consider removing CRC from the names.
+ */
+#if (defined(__x86_64__) || defined(_M_AMD64)) && (defined(USE_SSE42_CRC32C) || defined(USE_SSE42_CRC32C_WITH_RUNTIME_CHECK))
+#define USE_SSE2
+#endif
+
 
 /* 
  *Section 2:	bool, true, false
diff --git a/src/include/utils/linearsearch.h b/src/include/utils/linearsearch.h
new file mode 100644
index 00..c797fd18ca
--- /dev/null
+++ b/src/include/utils/linearsearch.

[Commitfest 2022-07] Patch Triage: Needs Review, Part 1

2022-07-28 Thread Jacob Champion

Hi,

Next up is the large list of Needs Review. This part 1 should include
entries as old or older than seven commitfests running.

My heuristics for classifying these continue to evolve as I go, and
there's a lot to read, so please let me know if I've made any mistakes.

= Stalled Patches, Recommend Return =

These are stalled and I recommend outright that we return them. We don't
have a separate status for "needs more interest" (working on a patch) so
I'd just RwF, with a note explaining that what is actually needed to
continue isn't more code work but more coalition building.

- Implement INSERT SET syntax
  https://commitfest.postgresql.org/38/2218/

A recent author rebase this CF, but unfortunately I think the the real
issue is just a lack of review interest. It's been suggested for return
for a few CFs now.

- Fix up partitionwise join on how equi-join conditions between the
partition keys are identified
  https://commitfest.postgresql.org/38/2266/

It looks like this one was Returned with Feedback but did not actually
have feedback, which may have caused confusion. (Solid motivation for a
new close status.) I don't think there's been any review since 2020.

- New default role allowing to change per-role/database settings
  https://commitfest.postgresql.org/38/2918/

Stalled on review in January, and needs a rebase.

= Stalled Patches, Need Help =

These are stalled but seem to have interest. They need help to either
get them out of the rut, or else be Returned so that the author can try
a different approach instead of perma-rebasing. I plan to move them to
the next CF unless someone speaks up to say otherwise.

- Show shared filesets in pg_ls_tmpdir (pg_ls_* functions for showing
metadata and recurse)
  https://commitfest.postgresql.org/38/2377/

>From a quick skim it looks like there was a flurry of initial positive
feedback followed by a stall and then some design whiplash. This thread
needs help to avoid burnout, I think.

- Make message at end-of-recovery less scary
  https://commitfest.postgresql.org/38/2490/

This got marked RfC twice, fell back out, and has been stuck in a rebase
loop.

- Fix behavior of geo_ops when NaN is involved
  https://commitfest.postgresql.org/38/2710/

Stuck in a half-committed state, which is tricky. Could maybe use a
reframing or recap (or a new thread?).

- Add extra statistics to explain for Nested Loop
  https://commitfest.postgresql.org/38/2765/

I think the author is hoping for help with testing and performance
characterization.

- CREATE INDEX CONCURRENTLY on partitioned table
  https://commitfest.postgresql.org/38/2815/

This had an author switch since last CF, so I think it'd be
inappropriate to close it out this time around, but it needs assistance.

- New Table Access Methods for Multi and Single Inserts
  https://commitfest.postgresql.org/38/2871/

Although there was a brief flicker in March, I think this one has
stalled out and is just about ready to be returned.

- Fix pg_rewind race condition just after promotion
  https://commitfest.postgresql.org/38/2864/

Seems like an important fix, but it's silent? Does it need to be
promoted to an Open Issue?

- pg_stat_statements and "IN" conditions
  https://commitfest.postgresql.org/38/2837/

Some good, recent interest. Last review in March.

- Function to log backtrace of postgres processes
  https://commitfest.postgresql.org/38/2863/

This is just starting to stall; I think it needs some assistance.

- Allow batched insert during cross-partition updates
  https://commitfest.postgresql.org/38/2992/

Was RfC (twice), then dropped out, now it's stuck rebasing. Last
substantial review in 2021.

= Active Patches =

The following are actively being worked and I expect to move them to
next CF:

- session variables, LET command
- Remove self join on a unique column
- Incremental Materialized View Maintenance
- More scalable multixacts buffers and locking
- Fast COPY FROM command for the foreign tables
- Extended statistics / estimate Var op Var clauses

Thanks,
--Jacob

Re: pg_auth_members.grantor is bunk

2022-07-28 Thread David G. Johnston

On Thu, Jul 28, 2022 at 12:09 PM Robert Haas  wrote:

> On Tue, Jul 26, 2022 at 12:46 PM Robert Haas 
> wrote:
> > I believe that these patches are mostly complete, but I think that
> > dumpRoleMembership() probably needs some more work. I don't know what
> > exactly, but there's nothing to cause it to dump the role grants in an
> > order that will create dependent grants after the things that they
> > depend on, which seems essential.
>
> OK, so I fixed that, and also updated the documentation a bit more. I
> think these patches are basically done, and I'd like to get them
> committed before too much more time goes by, because I have other
> things that depend on this which I also want to get done for this
> release. Anybody object?
>
> I'm hoping not, because, while this is a behavior change, the current
> state of play in this area is just terrible. To my knowledge, this is
> the only place in the system where we allow a dangling OID reference
> in a catalog table to persist after the object to which it refers has
> been dropped. I believe it's also the object type where multiple
> grants by different grantors aren't tracked separately, and where the
> grantor need not themselves have the permission being granted. It
> doesn't really look like any of these things were intentional behavior
> so much as just ... nobody ever bothered to write the code to make it
> work properly. I'm hoping the fact that I have now done that will be
> viewed as a good thing, but maybe that won't turn out to be the case.
>
>
I suggest changing \du memberof to output something like this:

select r.rolname,
array(
  select format('%s:%s/%s',
b.rolname,
case when m.admin_option then 'admin' else 'member' end,
g.rolname)
  from pg_catalog.pg_auth_members m
  join pg_catalog.pg_roles b on (m.roleid = b.oid)
  join pg_catalog.pg_roles g on (m.grantor = g.oid)
  where m.member = r.oid
) as memberof
from pg_catalog.pg_roles r where r.rolname !~ '^pg_';

 rolname |  memberof
-+
 vagrant | {}
 o   | {}
 a   | {o:admin/p,o:admin/vagrant}
 x   | {o:admin/a,p:member/vagrant}
 b   | {o:admin/a}
 p   | {o:admin/vagrant}
 y   | {x:member/vagrant}
 q   | {}
 r   | {q:admin/vagrant}
 s   | {}
 t   | {q:admin/vagrant,s:member/vagrant}


(needs sorting, tried to model it after ACL - column privileges
specifically)

=> \dp mytable
  Access privileges
 Schema |  Name   | Type  |   Access privileges   |   Column privileges   |
Policies
+-+---+---+---+--
 public | mytable | table | miriam=arwdDxt/miriam+| col1:+|
| |   | =r/miriam+|   miriam_rw=rw/miriam |
| |   | admin=arw/miriam  |   |
(1 row)

If we aren't dead set on having \du and \dg be aliases for each other I'd
rather redesign \dg (or add a new meta-command) to be a group-centric view
of this exact same data instead of user-centric one.  Namely it has a
"members" column instead of "memberof" and have it output, one line per
member:

user=[admin|member]/grantor

I looked over the rest of the patch and played with the circularity a bit,
which motivated the expanded info in \du, and the confirmation that two
separate admin grants that are not circular can exist.

I don't have any meaningful insight as to breaking things with these
changes but I am strongly in favor of tightening this up and formalizing it.

David J.

Re: Inconvenience of pg_read_binary_file()

2022-07-28 Thread Tom Lane

Kyotaro Horiguchi  writes:
> - Simplified the implementation (by complexifying argument handling..).
> - REVOKEd EXECUTE from the new functions.
> - Edited the signature of the two functions.

>> - pg_read_file ( filename text [, offset bigint, length bigint [, missing_ok 
>> boolean ]] ) $B"*(B text
>> + pg_read_file ( filename text [, offset bigint, length bigint ] [, 
>> missing_ok boolean ] ) $B"*(B text

I'm okay with allowing this variant of the functions.  Since there's
no implicit cast between bigint and bool, plus the fact that you
can't give just offset without length, there shouldn't be much risk
of confusion as to which variant to invoke.

I don't really like the implementation style though.  That mess of
PG_NARGS tests is illegible code already and this makes it worse.
I think it'd be way cleaner to have all the PG_GETARG calls in the
bottom SQL-callable functions (which are already one-per-signature)
and then pass them on to a common function that has an ordinary C
call signature, along the lines of

static Datum
pg_read_file_common(text *filename_t,
int64 seek_offset, int64 bytes_to_read,
bool read_to_eof, bool missing_ok)
{
if (read_to_eof)
bytes_to_read = -1;// or just Assert that it's -1 ?
else if (bytes_to_read < 0)
ereport(...);
...
}

Datum
pg_read_file_off_len(PG_FUNCTION_ARGS)
{
text   *filename_t = PG_GETARG_TEXT_PP(0);
int64   seek_offset = PG_GETARG_INT64(1);
int64   bytes_to_read = PG_GETARG_INT64(2);

return pg_read_file_common(filename_t, seek_offset, bytes_to_read,
   false, false);
}

regards, tom lane

Re: pg15b2: large objects lost on upgrade

2022-07-28 Thread Robert Haas

On Tue, Jul 26, 2022 at 9:09 PM Bruce Momjian  wrote:
> This behavior is new in PG 15, and I would be worried to have one new
> behavior in PG 15 and another one in PG 16.  Therefore, I would like to
> see it in PG 15 and master.

That's also my preference, so committed and back-patched to v15.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: PROXY protocol support

2022-07-28 Thread Jacob Champion

This needs a rebase, but after that I expect it to be RfC.

--Jacob

The new status of this patch is: Waiting on Author

Re: Maximize page freezing

2022-07-28 Thread Peter Geoghegan

On Thu, Jul 28, 2022 at 6:56 AM Matthias van de Meent
 wrote:
> Great idea, yet this patch seems to only freeze those tuples that are
> located after the first to-be-frozen tuple. It should probably
> re-visit earlier live tuples to potentially freeze those as well.

I have a big patch set pending that does this (which I dubbed
"page-level freezing"), plus a bunch of other things that control the
overhead. Although the basic idea of freezing all of the tuples on a
page together appears in earlier patching that were posted. These were
things that didn't make it into Postgres 15.

I should be able to post something in a couple of weeks.

-- 
Peter Geoghegan

Re: pg_auth_members.grantor is bunk

2022-07-28 Thread Robert Haas

On Tue, Jul 26, 2022 at 12:46 PM Robert Haas  wrote:
> I believe that these patches are mostly complete, but I think that
> dumpRoleMembership() probably needs some more work. I don't know what
> exactly, but there's nothing to cause it to dump the role grants in an
> order that will create dependent grants after the things that they
> depend on, which seems essential.

OK, so I fixed that, and also updated the documentation a bit more. I
think these patches are basically done, and I'd like to get them
committed before too much more time goes by, because I have other
things that depend on this which I also want to get done for this
release. Anybody object?

I'm hoping not, because, while this is a behavior change, the current
state of play in this area is just terrible. To my knowledge, this is
the only place in the system where we allow a dangling OID reference
in a catalog table to persist after the object to which it refers has
been dropped. I believe it's also the object type where multiple
grants by different grantors aren't tracked separately, and where the
grantor need not themselves have the permission being granted. It
doesn't really look like any of these things were intentional behavior
so much as just ... nobody ever bothered to write the code to make it
work properly. I'm hoping the fact that I have now done that will be
viewed as a good thing, but maybe that won't turn out to be the case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

v3-0001-Ensure-that-pg_auth_members.grantor-is-always-val.patch
Description: Binary data

v3-0002-Make-role-grant-system-more-consistent-with-other.patch
Description: Binary data

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Tom Lane

Simon Riggs  writes:
> Thanks for the nudge. New version attached.

I also see a speed improvement from this, so pushed (after minor comment
editing).  I notice though that if I feed it random data,

---
DROP TABLE IF EXISTS hash_speed;
CREATE unlogged TABLE hash_speed (x integer);
INSERT INTO hash_speed SELECT random()*1000 FROM
generate_series(1,1000) x;
vacuum hash_speed;
\timing on
CREATE INDEX ON hash_speed USING hash (x);
---

then the speed improvement is only about 5% not the 7.5% I see
with your original test case.  I don't have an explanation
for that, do you?

Also, it seems like we've left some money on the table by not
exploiting downstream the knowledge that this sorting happened.
During an index build, it's no longer necessary for
_hash_pgaddtup to do _hash_binsearch, and therefore also not
_hash_get_indextuple_hashkey: we could just always append the new
tuple at the end.  Perhaps checking it against the last existing
tuple is worth the trouble as a bug guard, but for sure we don't
need the log2(N) comparisons that _hash_binsearch will do.

Another point that I noticed is that it's not really desirable to
use the same _hash_binsearch logic for insertions and searches.
_hash_binsearch finds the first entry with hash >= target, which
is necessary for searches, but for insertions we'd really rather
find the first entry with hash > target.  As things stand, to
the extent that there are duplicate hash values we are still
performing unnecessary data motion within PageAddItem.

I've not looked into how messy these things would be to implement,
nor whether we get any noticeable speed gain thereby.  But since
you've proven that cutting the PageAddItem data motion cost
yields visible savings, these things might be visible too.

At this point the cfbot will start to bleat that the patch of
record doesn't apply, so I'm going to mark the CF entry committed.
If anyone wants to produce a follow-on patch, please make a
new entry.

regards, tom lane

Re: Add proper planner support for ORDER BY / DISTINCT aggregates

2022-07-28 Thread David Rowley

On Wed, 27 Jul 2022 at 15:16, Richard Guo  wrote:
> That makes sense. The patch looks in a good shape to me in this part.

Thanks for giving it another look.

I'm also quite happy with the patch now. The 2 plan changes are
explained. I have a patch on another thread [1] for the change in the
Merge Join plan.  I'd like to consider that separately from this
patch.

The postgres_fdw changes are explained in [2]. This can be fixed by
setting fdw_tuple_cost to something more realistic in the foreign
server settings on the test.

I'd like to take a serious look at pushing this patch on the first few
days of August, so if anyone is following along here that might have
objections, can you do so before then?

David

[1] 
https://www.postgresql.org/message-id/caaphdvrtzu0phvfdpfm4yx3jnr2wuwosv+t2zqa7lrhhbr2...@mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAApHDvpXiXLxg4TsA8P_4etnuGQqAAbHWEOM4hGe=dcaxmi...@mail.gmail.com

Re: standby recovery fails (tablespace related) (tentative patch and discussion)

2022-07-28 Thread Matthias van de Meent

On Wed, 27 Jul 2022 at 20:55, Alvaro Herrera  wrote:
>
> Okay, I think I'm done with this.  Here's v27 for the master branch,
> where I fixed some comments as well as thinkos in the test script.
> The ones on older branches aren't materially different, they just have
> tonnes of conflicts resolved.  I'll get this pushed tomorrow morning.
>
> I have run it through CI and it seems ... not completely broken, at
> least, but I have no working recipes for Windows on branches 14 and
> older, so it doesn't really work fully.  If anybody does, please share.
> You can see mine here
> https://github.com/alvherre/postgres/commits/REL_11_STABLE [etc]
> https://cirrus-ci.com/build/5320904228995072
> https://cirrus-ci.com/github/alvherre/postgres

I'd like to bring to your attention that the test that was introduced
with 9e4f914b seem to be flaky in FreeBSD 13 in the CFBot builds: it
sometimes times out while waiting for the secondary to catch up. Or,
at least I think it does, and I'm not too familiar with TAP failure
outputs: it returns with error code 29 and logs that I'd expect when
the timeout is reached.

See bottom for examples (all 3 builds for different patches).

Kind regards,

Matthias van de Meent.

[1] https://cirrus-ci.com/task/4960990331666432?logs=test_world#L2631-L2662
[2] https://cirrus-ci.com/task/5012678384025600?logs=test_world#L2631-L2662
[3] https://cirrus-ci.com/task/5147001137397760?logs=test_world#L2631-L2662

Re: ci: update to freebsd 13.1 / remove minor versions from image names

2022-07-28 Thread Matthias van de Meent

On Thu, 28 Jul 2022 at 19:31, Andres Freund  wrote:
>
> Hi,
>
> On July 28, 2022 7:29:43 PM GMT+02:00, Matthias van de Meent 
>  wrote:
> >On Thu, 28 Jul 2022 at 11:57, Andres Freund  wrote:
> >>
> >> Hi,
> >>
> >> Freebsd 13.0, so far used by CI, is out of support. I've changed the
> >> image to be built against 13.1, so we can switch to that.
> >>
> >> I suspect it'd be better to remove the minor version numbers from the
> >> image name, so that switches from 13.0 -> 13.1 don't require CI
> >> changes. Any argument against?
> >>
> >> I can also see an argument for not having 13 in the image name, given
> >> that the image is CI specific anyway? But perhaps we might want to have
> >> a 13 and a 14 image for some debugging issue?
> >
> >Has this change in the BSD configuration been applied today? I see
> >failures in the cfbot builds of 4 different patches [0..3] that all
> >fail in 033_replay_tsp_drops.pl with the same output:
>
> No, this hasn't yet been applied.
>
> ># poll_query_until timed out executing this query:
> ># SELECT '0/40EAXXX' <= replay_lsn AND state = 'streaming'
> ># FROM pg_catalog.pg_stat_replication
> ># WHERE application_name IN ('standby2_WAL_LOG', 'walreceiver')
> ># expecting this output:
> ># t
> ># last actual query output:
> >#
> ># with stderr:
> ># Tests were run but no plan was declared and done_testing() was not seen.
> ># Looks like your test exited with 29 just after 1.
> >t/033_replay_tsp_drops.pl 
> >Dubious, test returned 29 (wstat 7424, 0x1d00)
> >All 1 subtests passed
>
> That seems more likely related to the recent changes in this area.

Hmm, I should've looked further than just this, so I would've realised
that this was a new test. I guess I'll go bother Alvaro on the
relevant thread instead.

Thanks for the quick response.


Kind regards,

Matthias van de Meent

Re: ci: update to freebsd 13.1 / remove minor versions from image names

2022-07-28 Thread Andres Freund

Hi,

On July 28, 2022 7:29:43 PM GMT+02:00, Matthias van de Meent 
 wrote:
>On Thu, 28 Jul 2022 at 11:57, Andres Freund  wrote:
>>
>> Hi,
>>
>> Freebsd 13.0, so far used by CI, is out of support. I've changed the
>> image to be built against 13.1, so we can switch to that.
>>
>> I suspect it'd be better to remove the minor version numbers from the
>> image name, so that switches from 13.0 -> 13.1 don't require CI
>> changes. Any argument against?
>>
>> I can also see an argument for not having 13 in the image name, given
>> that the image is CI specific anyway? But perhaps we might want to have
>> a 13 and a 14 image for some debugging issue?
>
>Has this change in the BSD configuration been applied today? I see
>failures in the cfbot builds of 4 different patches [0..3] that all
>fail in 033_replay_tsp_drops.pl with the same output:

No, this hasn't yet been applied.


># poll_query_until timed out executing this query:
># SELECT '0/40EAXXX' <= replay_lsn AND state = 'streaming'
># FROM pg_catalog.pg_stat_replication
># WHERE application_name IN ('standby2_WAL_LOG', 'walreceiver')
># expecting this output:
># t
># last actual query output:
>#
># with stderr:
># Tests were run but no plan was declared and done_testing() was not seen.
># Looks like your test exited with 29 just after 1.
>t/033_replay_tsp_drops.pl 
>Dubious, test returned 29 (wstat 7424, 0x1d00)
>All 1 subtests passed

That seems more likely related to the recent changes in this area.

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: ci: update to freebsd 13.1 / remove minor versions from image names

2022-07-28 Thread Matthias van de Meent

On Thu, 28 Jul 2022 at 11:57, Andres Freund  wrote:
>
> Hi,
>
> Freebsd 13.0, so far used by CI, is out of support. I've changed the
> image to be built against 13.1, so we can switch to that.
>
> I suspect it'd be better to remove the minor version numbers from the
> image name, so that switches from 13.0 -> 13.1 don't require CI
> changes. Any argument against?
>
> I can also see an argument for not having 13 in the image name, given
> that the image is CI specific anyway? But perhaps we might want to have
> a 13 and a 14 image for some debugging issue?

Has this change in the BSD configuration been applied today? I see
failures in the cfbot builds of 4 different patches [0..3] that all
fail in 033_replay_tsp_drops.pl with the same output:

---

# poll_query_until timed out executing this query:
# SELECT '0/40EAXXX' <= replay_lsn AND state = 'streaming'
# FROM pg_catalog.pg_stat_replication
# WHERE application_name IN ('standby2_WAL_LOG', 'walreceiver')
# expecting this output:
# t
# last actual query output:
#
# with stderr:
# Tests were run but no plan was declared and done_testing() was not seen.
# Looks like your test exited with 29 just after 1.
t/033_replay_tsp_drops.pl 
Dubious, test returned 29 (wstat 7424, 0x1d00)
All 1 subtests passed

---

Kind regards,

Matthias van de Meent

[0] https://cirrus-ci.com/task/5147001137397760?logs=test_world#L2631-L2662
[1] https://cirrus-ci.com/task/4960990331666432?logs=test_world#L2631-L2662
[2] https://cirrus-ci.com/task/5012678384025600?logs=test_world#L2631-L2662
[3] https://cirrus-ci.com/task/5147001137397760?logs=test_world#L2631-L2662

Re: making relfilenodes 56 bits

2022-07-28 Thread Joshua Drake

On Thu, Jul 28, 2022 at 9:52 AM Robert Haas  wrote:

> On Thu, Jul 28, 2022 at 11:59 AM Alvaro Herrera 
> wrote:
> > I do wonder why do we keep relfilenodes limited to decimal digits.  Why
> > not use hex digits?  Then we know the limit is 14 chars, as in
> > 0x00FF in the MAX_RELFILENUMBER definition.
>
> Hmm, but surely we want the error messages to be printed using the
> same format that we use for the actual filenames. We could make the
> filenames use hex characters too, but I'm not wild about changing
> user-visible details like that.
>

>From a DBA perspective this would be a regression in usability.

JD

-- 

   - Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
   - Founder and Co-Chair - https://postgresconf.org/
   - Founder - https://postgresql.us - United States PostgreSQL
   - Public speaker, published author, postgresql expert, and people
   believer.
   - Host - More than a refresh
   : A podcast about
   data and the people who wrangle it.

Re: making relfilenodes 56 bits

2022-07-28 Thread Robert Haas

On Thu, Jul 28, 2022 at 11:59 AM Alvaro Herrera  wrote:
> I do wonder why do we keep relfilenodes limited to decimal digits.  Why
> not use hex digits?  Then we know the limit is 14 chars, as in
> 0x00FF in the MAX_RELFILENUMBER definition.

Hmm, but surely we want the error messages to be printed using the
same format that we use for the actual filenames. We could make the
filenames use hex characters too, but I'm not wild about changing
user-visible details like that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Support for grabbing multiple consecutive values with nextval()

2022-07-28 Thread Tom Lane

I wrote:
> I've got no strong opinion about this bit:
>> As suggested upthread, returning a resultset would probably be better.

Actually, on further thought, I do like the resultset idea, because
it'd remove the need for a complex rewrite of nextval_internal.
Assuming the SRF is written in ValuePerCall style, each iteration
can just call nextval_internal with no modifications needed in that
function.  There'll be a CHECK_FOR_INTERRUPTS somewhere in the
query-level loop, or at least it's not nextval's fault if there's not.
The situation is then no different from generate_series with a large
loop count, or any other query that can generate lots of data.

Of course, this does imply a lot more cycles expended per generated value
--- but most of that is inherent in the larger amount of data being
handed back.

regards, tom lane

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-28 Thread tushar


On 7/28/22 8:03 PM, Robert Haas wrote:

No, it seems to me that's behaving as intended. REVOKE BLAH OPTION ...
is intended to be a way of switching an option off.
Ok, Thanks, Robert. I tested with a couple of more scenarios like 
pg_upgrade/pg_dumpall /grant/revoke .. with admin option/inherit

and things look good to me.

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: doc phrase: "inheritance child"

2022-07-28 Thread Alvaro Herrera

On 2022-Jun-30, Justin Pryzby wrote:

> I updated the language to say "values from".  Is this better ?
> 
> And rebased to include changes to 401f623c7.

Applied to 15 and master, thanks.

> BTW nobody complained about my "child child" typo.

:-(

BTW I didn't notice your annotation in the CF app until I had already
pushed it and went there to update the status.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
Maybe there's lots of data loss but the records of data loss are also lost.
(Lincoln Yeoh)

Re: Support for grabbing multiple consecutive values with nextval()

2022-07-28 Thread Tom Lane

Ronan Dunklau  writes:
> The problem the author wants to solve is the fact they don't have a way of 
> returning the ids when using COPY FROM. Pre-allocating them and assigning 
> them 
> to the individual records before sending them via COPY FROM would solve that 
> for them.

True.

I took a quick look at this patch and am not pleased at all with the
implementation.  That loop in nextval_internal is okay performance-wise
today, since there's a small upper bound on the number of times it will
iterate.  But with this patch, a user can trivially lock up a backend for
up to 2^63 iterations; let's just say that's longer than you want to wait.
There's not even a CHECK_FOR_INTERRUPTS() in the loop :-(.  Even without
mistakes or deliberate DoS attempts, looping means holding the sequence
lock for longer than we really want to.

I think to seriously consider a feature like this, nextval_internal
would have to be rewritten so that it can advance the counter the
correct number of steps without using a loop.  That would be quite a
headache, once you've dealt with integer overflow, cyclic sequences,
and suchlike complications, but it's probably do-able.

I don't think I agree with Ronan's upthread comment that

>> However, I don't think that returning only the last value is a sensible 
>> thing 
>> to do. The client will need to know the details of the sequence to do 
>> anything 
>> useful about this, especially it's increment, minvalue, maxvalue and cycle 
>> options. 

Most applications are probably quite happy to assume that they know the
sequence's static parameters, and those that aren't can easily fetch them
using existing facilities.  So I don't think that returning them in this
function's result is really necessary.

I've got no strong opinion about this bit:

> As suggested upthread, returning a resultset would probably be better.

There are use-cases for that, sure, but there are also use-cases for
returning just the first or just the last value --- I'd think "just the
first" is the more common need of those two.  Aggregating over a resultset
is a remarkably inefficient answer when that's what you want.

In any case, "nextval()" is an increasingly poor name for these different
definitions, so I counsel picking some other name instead of overloading
nextval().  "nextvals()" would be a pretty good choice for the resultset
case, I think.

regards, tom lane

Re: [PATCH] Log details for client certificate failures

2022-07-28 Thread Jacob Champion

On Thu, Jul 21, 2022 at 4:29 PM Jacob Champion  wrote:
> v4 attempts to fix this by letting the check hooks pass
> MCXT_ALLOC_NO_OOM to pg_clean_ascii(). (It's ignored in the frontend,
> which just mallocs.)

Ping -- should I add an open item somewhere so this isn't lost?

--Jacob

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-07-28 Thread Melih Mutlu

>
> Why after step 4, do you need to drop the replication slot? Won't just
> clearing the required info from the catalog be sufficient?
>

The replication slots that we read from the catalog will not be used for
anything else after we're done with syncing the table which the rep slot
belongs to.
It's removed from the catalog when the sync is completed and it basically
becomes a slot that is not linked to any table or worker. That's why I
think it should be dropped rather than left behind.

Note that if a worker dies and its replication slot continues to exist,
that slot will only be used to complete the sync process of the one table
that the dead worker was syncing but couldn't finish.
When that particular table is synced and becomes ready, the replication
slot has no use anymore.


> Hmm, I think even if there is an iota of a chance which I think is
> there, we can't use worker_pid. Assume, that if the same worker_pid is
> assigned to another worker once the worker using it got an error out,
> the new worker will fail as soon as it will try to create a
> replication slot.
>

Right. If something like that happens, worker will fail without doing
anything. Then a new one will be launched and that one will continue to
do the work.
The worst case might be having conflicting pid over and over again while
also having replication slots whose name includes one of those pids still
exist.
It seems unlikely but possible, yes.


> I feel it would be better or maybe we need to think of some other
> identifier but one thing we need to think about before using a 64-bit
> unique identifier here is how will we retrieve its last used value
> after restart of server. We may need to store it in a persistent way
> somewhere.
>

We might consider storing this info in a catalog again. Since this last
used value will be different for each subscription, pg_subscription can be
a good place to keep that.


> The problems will be similar to the slot name. The origin is used to
> track the progress of replication, so, if we use the wrong origin name
> after the restart, it can send the wrong start_streaming position to
> the publisher.
>

I understand. But origin naming logic is still the same. Its format is like
pg__ .
I did not need to change this since it seems to me origins should belong to
only one table. The patch does not reuse origins.
So I don't think this change introduces an issue with origin. What do you
think?

Thanks,
Melih

Re: making relfilenodes 56 bits

2022-07-28 Thread Alvaro Herrera

Not a full review, just a quick skim of 0003.

On 2022-Jul-28, Dilip Kumar wrote:

> + if (!shutdown)
> + {
> + if (ShmemVariableCache->loggedRelFileNumber < 
> checkPoint.nextRelFileNumber)
> + elog(ERROR, "nextRelFileNumber can not go backward from 
> " INT64_FORMAT "to" INT64_FORMAT,
> +  checkPoint.nextRelFileNumber, 
> ShmemVariableCache->loggedRelFileNumber);
> +
> + checkPoint.nextRelFileNumber = 
> ShmemVariableCache->loggedRelFileNumber;
> + }

Please don't do this; rather use %llu and cast to (long long).
Otherwise the string becomes mangled for translation.  I think there are
many uses of this sort of pattern in strings, but not all of them are
translatable so maybe we don't care -- for example contrib doesn't have
translations.  And the rmgrdesc routines don't translate either, so we
probably don't care about it there; and nothing that uses elog either.
But this one in particular I think should be an ereport, not an elog.
There are several other ereports in various places of the patch also.

> @@ -2378,7 +2378,7 @@ verifyBackupPageConsistency(XLogReaderState *record)
>   if (memcmp(replay_image_masked, primary_image_masked, BLCKSZ) 
> != 0)
>   {
>   elog(FATAL,
> -  "inconsistent page found, rel %u/%u/%u, 
> forknum %u, blkno %u",
> +  "inconsistent page found, rel %u/%u/" 
> INT64_FORMAT ", forknum %u, blkno %u",
>rlocator.spcOid, rlocator.dbOid, 
> rlocator.relNumber,
>forknum, blkno);

Should this one be an ereport, and thus you do need to change it to that
and handle it like that?

> + if (xlrec->rlocator.relNumber > 
> ShmemVariableCache->nextRelFileNumber)
> + elog(ERROR, "unexpected relnumber " INT64_FORMAT "that 
> is bigger than nextRelFileNumber " INT64_FORMAT,
> +  xlrec->rlocator.relNumber, 
> ShmemVariableCache->nextRelFileNumber);

You missed one whitespace here after the INT64_FORMAT.

> diff --git a/src/bin/pg_controldata/pg_controldata.c 
> b/src/bin/pg_controldata/pg_controldata.c
> index c390ec5..f727078 100644
> --- a/src/bin/pg_controldata/pg_controldata.c
> +++ b/src/bin/pg_controldata/pg_controldata.c
> @@ -250,6 +250,8 @@ main(int argc, char *argv[])
>   printf(_("Latest checkpoint's NextXID:  %u:%u\n"),
>  
> EpochFromFullTransactionId(ControlFile->checkPointCopy.nextXid),
>  
> XidFromFullTransactionId(ControlFile->checkPointCopy.nextXid));
> + printf(_("Latest checkpoint's NextRelFileNumber:  " INT64_FORMAT "\n"),
> +ControlFile->checkPointCopy.nextRelFileNumber);

This one must definitely be translatable.

>  /* Characters to allow for an RelFileNumber in a relation path */
> -#define RELNUMBERCHARS   OIDCHARS/* same as OIDCHARS */
> +#define RELNUMBERCHARS   20  /* max chars printed by %lu */

Maybe say %llu here instead.

I do wonder why do we keep relfilenodes limited to decimal digits.  Why
not use hex digits?  Then we know the limit is 14 chars, as in
0x00FF in the MAX_RELFILENUMBER definition.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end." (2nd Commandment for C programmers)

Re: [Commitfest 2022-07] Patch Triage: Waiting on Author

2022-07-28 Thread Jacob Champion

On Thu, Jul 28, 2022 at 4:46 AM Andrey Borodin  wrote:
> Daniil is working on this, but currently he's on vacation.
> I think we should not mark patch as RwF and move it to next CF instead.

Is there a downside to marking it RwF, from your perspective? As
Robert pointed out upthread, it can be switched back at any time once
Daniil's ready.

Life happens; there isn't (or there shouldn't be) any shame in having
a patch returned temporarily. But it is important that patches which
aren't ready for review at the moment don't stick around for months.
They take up reviewer time and need to be triaged continually.

--Jacob

Re: [Commitfest 2022-07] Patch Triage: Waiting on Author

2022-07-28 Thread Jacob Champion

On Wed, Jul 27, 2022 at 7:09 PM houzj.f...@fujitsu.com
 wrote:
> Sorry, I think we don't enough time to work on this recently. So please mark 
> it as RWF and
> we will get back to this in the future.

Done, thanks!

--Jacob

Re: Parallel Inserts (WAS: [bug?] Missed parallel safety checks..)

2022-07-28 Thread Jacob Champion

On Thu, Jul 28, 2022 at 8:43 AM Julien Rouhaud  wrote:
> Could you send a rebased version?  In the meantime I will switch the entry to
> Waiting on Author.

By request in [1] I'm marking this Returned with Feedback for now.
Whenever you're ready, you can resurrect the patch entry by visiting

https://commitfest.postgresql.org/38/3143/

and changing the status to "Needs Review", and then changing the
status again to "Move to next CF". (Don't forget the second step;
hopefully we will have streamlined this in the near future!)

Thanks,
--Jacob

[1] 
https://www.postgresql.org/message-id/OS0PR01MB571696D623F35A09AB51903A94969%40OS0PR01MB5716.jpnprd01.prod.outlook.com

Re: make update-po@master stops at pg_upgrade

2022-07-28 Thread Tom Lane

Alvaro Herrera  writes:
> In short, +1 to this patch.

Thanks for testing it.  I think the only remaining concern is
Peter's objection that $(wildcard) might pick up random junk files
that end in ".c".  That's true, but the backend's nls.mk also
picks up everything matching "*.c" (over the whole backend tree,
not just one directory!), and I don't recall people complaining
about that.  So I think the reduction in maintenance burden
justifies the risk.  What do others think?

regards, tom lane

Re: How come drongo didn't fail authentication here?

2022-07-28 Thread Andrew Dunstan



On 2022-07-28 Th 10:55, Tom Lane wrote:
> Andrew Dunstan  writes:
>> On 2022-07-28 Th 10:24, Tom Lane wrote:
>>> How can that be?  Have we somehow broken SSPI authentication
>>> in HEAD?
>> Nothing is broken. On HEAD drongo uses Unix sockets.
> I see.  Seems like we've created a gotcha for ourselves:
> a test script can look perfectly fine in Unix-based testing,
> and even in Windows CI, and then fail when it hits the back
> branches in the buildfarm.  Is it worth doing something to
> cause the lack of a valid auth_extra spec to fail on Unix?
>
>   


Maybe we should just have a windows testing instance that doesn't use
Unix sockets at all.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: make update-po@master stops at pg_upgrade

2022-07-28 Thread Alvaro Herrera

On 2022-Jul-13, Tom Lane wrote:

> Actually, we can get rid of those easily enough anyway with $(sort).
> Here's a draft that might solve these problems.  The idea is to use
> $(wildcard) for files in the srcdir, and manually enumerate only
> built files.

I checked the files in src/bin/scripts and they look OK; there are no
missing messages now.  I also checked the es.po.new files with and
without patch; they look good, nothing is lost.

Files that weren't previously being processed:
src/interfaces/libpq/fe-print.c
src/interfaces/ecpg/preproc/parser.c

Incidentally, this patch is pretty similar to what Kyotaro-san sent when
opening the thread, with the addition of the $(notdir) call.

In short, +1 to this patch.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"El sentido de las cosas no viene de las cosas, sino de
las inteligencias que las aplican a sus problemas diarios
en busca del progreso." (Ernesto Hernández-Novich)

Re: How to get accurate backup end time when it is taken from synchronous standby ?

2022-07-28 Thread Bharath Rupireddy

On Thu, Jul 28, 2022 at 11:50 AM Harinath Kanchu  wrote:
>
> Greetings,
>
>
> When we take backups from a synchronous standby replica, how can we get the 
> accurate timestamp of the backup end time ? (As backup history files are not 
> generated on standbys)For example:
> this is a part of control file after a backup (created using wal-g by calling 
> pg_startbackup and pg_stopbackup),
>
> Fake LSN counter for unlogged rels:   0/3E8
> Minimum recovery ending location: 28/28000B68
> Min recovery ending loc's timeline:   2
> Backup start location:0/0
> Backup end location:  0/0
> End-of-backup record required:no
>
> here I can see that minimum recovery ending location as LSN value, how can we 
> get the timestamp of it ?The backup label file looks like this.
>
> INFO: 2022/07/26 23:25:23.850621  LABLE FILE START --
> INFO: 2022/07/26 23:25:23.850628 START WAL LOCATION: 1D/F94C7320 (file 
> 0002001D00F9)
> INFO: 2022/07/26 23:25:23.850633 CHECKPOINT LOCATION: 1E/EDA8700
> INFO: 2022/07/26 23:25:23.850639 BACKUP METHOD: streamed
> INFO: 2022/07/26 23:25:23.850645 BACKUP FROM: standby
> INFO: 2022/07/26 23:25:23.850653 START TIME: 2022-07-26 23:10:27 GMT
> INFO: 2022/07/26 23:25:23.850659 LABEL: 2022-07-26 23:10:27.545378 + UTC 
> m=+0.167723956
> INFO: 2022/07/26 23:25:23.850665 START TIMELINE: 2
> INFO: 2022/07/26 23:25:23.850669
> INFO: 2022/07/26 23:25:23.850676  LABLE FILE END --
>
>
> How can we do PITR using timestamp if we don’t know the accurate timestamp of 
> minimum recovery point ?

You can use any of the methods specified in the other thread [1].
Otherwise, you can as well use recovery_target_lsn =
min_recovery_point for PITR target instead of relying on timestamps.

I believe the other thread [1] can be merged into this thread for a
focussed, use-case based and meaningful discussion.

[1] 
https://www.postgresql.org/message-id/CALj2ACVgFvOQQEoyuuZeceQrStGsePWvU1noU5aAvJNenv8qTQ%40mail.gma

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/

Re: How come drongo didn't fail authentication here?

2022-07-28 Thread Tom Lane

Andrew Dunstan  writes:
> On 2022-07-28 Th 10:24, Tom Lane wrote:
>> How can that be?  Have we somehow broken SSPI authentication
>> in HEAD?

> Nothing is broken. On HEAD drongo uses Unix sockets.

I see.  Seems like we've created a gotcha for ourselves:
a test script can look perfectly fine in Unix-based testing,
and even in Windows CI, and then fail when it hits the back
branches in the buildfarm.  Is it worth doing something to
cause the lack of a valid auth_extra spec to fail on Unix?

regards, tom lane

Re: How come drongo didn't fail authentication here?

2022-07-28 Thread Andrew Dunstan



On 2022-07-28 Th 10:24, Tom Lane wrote:
> In commits 7c34555f8/e1bd4990b, I added a new role used by a TAP
> script but neglected the auth_extra incantation needed to allow
> login as that role.  This should have resulted in SSPI auth
> failures on certain Windows configurations, and indeed it did
> on drongo's next run in the v15 branch:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2022-07-27%2022%3A01%3A47
>
> However, its immediately-following run on HEAD succeeded,
> though I'd obviously not had time to put in the fix yet:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2022-07-27%2022%3A30%3A27
>
> How can that be?  Have we somehow broken SSPI authentication
> in HEAD?
>
>   


Nothing is broken. On HEAD drongo uses Unix sockets.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-28 Thread Robert Haas

On Thu, Jul 28, 2022 at 10:16 AM tushar  wrote:
> On 7/19/22 12:56 AM, Robert Haas wrote:
> > Another good catch. Here is v5 with a fix for that problem.
> Here is one scenario in which I have NOT granted (inherit false)
> explicitly but still revoke
> command is changing the current state
>
> postgres=# create group foo;
> CREATE ROLE
> postgres=# create user bar in group foo;
> CREATE ROLE
> postgres=# revoke inherit option for foo from bar;
> REVOKE ROLE
>
> [edb@centos7tushar bin]$ ./pg_dumpall > /tmp/a11
>
> [edb@centos7tushar bin]$ cat /tmp/a11 |grep 'inherit false' -i
> GRANT foo TO bar WITH INHERIT FALSE GRANTED BY edb;
>
> I think this revoke command should be ignored and inherit option should
> remain 'TRUE'
> as it was before?

No, it seems to me that's behaving as intended. REVOKE BLAH OPTION ...
is intended to be a way of switching an option off.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: making relfilenodes 56 bits

2022-07-28 Thread Robert Haas

On Thu, Jul 28, 2022 at 7:32 AM Dilip Kumar  wrote:
> Thanks, I have rebased other patches,  actually, there is a new 0001
> patch now.  It seems during renaming relnode related Oid to
> RelFileNumber, some of the references were missed and in the last
> patch set I kept it as part of main patch 0003, but I think it's
> better to keep it separate.  So took out those changes and created
> 0001, but you think this can be committed as part of 0003 only then
> also it's fine with me.

I committed this in part. I took out the introduction of
RELNUMBERCHARS as I think that should probably be a separate commit,
but added in a comment change that you seem to have overlooked.

> I have done some cleanup in 0002 as well, basically, earlier we were
> storing the result of the BufTagGetRelFileLocator() in a separate
> variable which is not required everywhere.  So wherever possible I
> have avoided using the intermediate variable.

I'll have a look at this next.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: small windows psqlrc re-wording

2022-07-28 Thread Tom Lane

Julien Rouhaud  writes:
> On Thu, Jul 28, 2022 at 10:04:12AM -0400, Tom Lane wrote:
>> If all supported versions do have home directories now, should we
>> instead think about aligning the Windows behavior with everywhere
>> else?

> As far as I know the expected usage on Windows is still different.  Even with
> home directories application are still expected to put stuff in %APPDATA% (1),
> in a dedicated directory.  That's especially important since there is still no
> concept of "hidden" files and the explorer still hides the extensions by
> default.

Ah.  Yeah, if there's no convention about hiding files based on a
leading "." then we definitely don't want to do that.

regards, tom lane

How come drongo didn't fail authentication here?

2022-07-28 Thread Tom Lane

In commits 7c34555f8/e1bd4990b, I added a new role used by a TAP
script but neglected the auth_extra incantation needed to allow
login as that role.  This should have resulted in SSPI auth
failures on certain Windows configurations, and indeed it did
on drongo's next run in the v15 branch:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2022-07-27%2022%3A01%3A47

However, its immediately-following run on HEAD succeeded,
though I'd obviously not had time to put in the fix yet:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2022-07-27%2022%3A30%3A27

How can that be?  Have we somehow broken SSPI authentication
in HEAD?

regards, tom lane

Re: Any way to get timestamp from LSN value ?

2022-07-28 Thread Bharath Rupireddy

On Thu, Jul 28, 2022 at 1:17 PM Harinath Kanchu  wrote:
>
> Hello,
>
> Is there any way to get the timestamp of the transaction using LSN value ?
>
> For example:
> can we use the minimum recovery ending location in pg control file to get the 
> minimum recovery timestamp ?
>
> Minimum recovery ending location: 28/28000B68

Can't pg_waldump be used? If you are on PG 15, you could as well use
pg_walinspect functions, something like below:

select * from pg_get_wal_records_info_till_end_of_wal(<>)
where record_type like '%COMMIT%'

[1] https://www.postgresql.org/docs/15/pgwalinspect.html

-- 
Bharath Rupireddy
RDS Open Source Databases: https://aws.amazon.com/

Re: small windows psqlrc re-wording

2022-07-28 Thread Julien Rouhaud

On Thu, Jul 28, 2022 at 10:04:12AM -0400, Tom Lane wrote:
> Julien Rouhaud  writes:
> > On Wed, Jul 27, 2022 at 02:42:11PM -0400, Robert Treat wrote:
> >> The attached patch tweaks the wording around finding the psqlrc file
> >> on windows, with the primary goal of removing the generally incorrect
> >> statement that windows has no concept of a home directory.
>
> > Windows only has a concept of home directory since Vista, so that used to be
> > true.
> > Anyway, since we don't support XP or anything older since about 3 weeks ago
> > (495ed0ef2d72a6a74def296e042022479d5d07bd), +1 for the patch.
>
> If all supported versions do have home directories now, should we
> instead think about aligning the Windows behavior with everywhere
> else?

As far as I know the expected usage on Windows is still different.  Even with
home directories application are still expected to put stuff in %APPDATA% (1),
in a dedicated directory.  That's especially important since there is still no
concept of "hidden" files and the explorer still hides the extensions by
default.  I can however see that having a file named ".something" is now mostly
working, which IIRC wasn't really the case the last time I used Windows (around
XP).

[1] https://en.wikipedia.org/wiki/Special_folder#File_system_directories

Re: replacing role-level NOINHERIT with a grant-level option

2022-07-28 Thread tushar


On 7/19/22 12:56 AM, Robert Haas wrote:

Another good catch. Here is v5 with a fix for that problem.
Here is one scenario in which I have NOT granted (inherit false) 
explicitly but still revoke

command is changing the current state

postgres=# create group foo;
CREATE ROLE
postgres=# create user bar in group foo;
CREATE ROLE
postgres=# revoke inherit option for foo from bar;
REVOKE ROLE

[edb@centos7tushar bin]$ ./pg_dumpall > /tmp/a11

[edb@centos7tushar bin]$ cat /tmp/a11 |grep 'inherit false' -i
GRANT foo TO bar WITH INHERIT FALSE GRANTED BY edb;

I think this revoke command should be ignored and inherit option should 
remain 'TRUE'

as it was before?

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: Mingw task for Cirrus CI

2022-07-28 Thread Melih Mutlu

Hi hackers,

I'm sharing the rebased version of this patch, if you're still interested.

I would appreciate any feedback or concerns.

Best,
Melih


Andrew Dunstan , 9 Nis 2022 Cmt, 19:34 tarihinde şunu
yazdı:

>
> On 4/8/22 21:02, Andres Freund wrote:
> > Hi,
> >
> > On 2022-04-08 19:27:58 -0500, Justin Pryzby wrote:
> >> On Thu, Apr 07, 2022 at 10:10:21AM -0700, Andres Freund wrote:
> >>> On 2022-04-06 11:03:37 -0400, Andrew Dunstan wrote:
>  On 3/30/22 20:26, Andres Freund wrote:
> > Could you try using dash to invoke configure here, and whether it
> makes configure faster?
>  I got weird failures re libxml/parser.h when I tried with dash. See
>   (It would be nice if we
>  could see config.log on failure.)
> >>> Since dash won't help us to get the build time down sufficiently, and
> the
> >>> tests don't pass without a separate build tree, I looked at what makes
> >>> config/prep_buildtree so slow.
> >>>
> >>> It's largely just bad code. The slowest part are spawning one expr and
> mkdir
> >>> -p for each directory. One 'cmp' for each makefile doesn't help either.
> >>>
> >>> The expr can be replaced with
> >>>   subdir=${item#$sourcetree}
> >>> that's afaics posix syntax ([1]), not bash.
> >>>
> >>> Spawning one mkdir for each directory can be replaced by a single mkdir
> >>> invocation with all the directories. On my linux workstation that gets
> the
> >>> time for the first loop down from 1005ms to 38ms, really.
> >> Even better?
> >>
> >> (cd "$sourcetree" && find . -print |grep -E '/Makefile$|/GNUmakefile$'
> |grep -v "$sourcetree/doc/src/sgml/images/" |xargs tar c) |
> >> (cd "$buildtree" && tar x)
> > Don't think we depend on tar for building, at the moment. But yes, it'd
> be
> > faster... Tar is certainly a smaller dependency than perl, not sure if
> there's
> > any relevant platform without it?
> >
> >
>
>
> Couple of things here.
>
>
> 1. The second grep should lose "$sourcetree" I think.
>
> 2. Isn't this going to be a change in behaviour in that it will copy
> rather than symlinking the Makefiles? That is in fact the default
> behaviour of msys2's 'ln -s', as I pointed out upthread, but I don't
> think it's what we really want, especially in the general case. If you
> modify the Makefile and you're using a vpath you want to see the change
> reflected in your vpath.
>
>
> cheers
>
>
> andrew
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>


v3-0001-Added-Windows-with-MinGW-environment-in-Cirrus-CI.patch
Description: Binary data

Re: COPY FROM FORMAT CSV FORCE_NULL(*) ?

2022-07-28 Thread Zhang Mingli

Hi,


Agree, FORCE_NULL(*) is useful as well as FORCE_NOT_NULL(*).

We can have them both.

They are useful when users copy tables that have many columns.

Regards,
Zhang Mingli
On Jul 25, 2022, 21:28 +0800, Andrew Dunstan , wrote:
>
> On 2022-07-25 Mo 00:18, jian he wrote:
> > Hi, there.
> >
> > copy force null git commit
> > 
> > didn't attach a discussion link. So I don't know if it's already been
> > discussed.
> >
> > Current seem you cannot do
> >     COPY forcetest  FROM STDIN WITH (FORMAT csv,  FORCE_NULL(*));
> >
> > can we have FORCE_NULL(*)? Since We already have FORCE_QUOTE(*).
> >
>
> We only started adding discussion links in later years. Here's a link to
> the original discussion.
>
>
> 
>
>
> Offhand I don't see why we shouldn't have this. Someone interested
> enough would need to submit a patch.
>
>
> cheers
>
>
> andrew
>
>
> --
> Andrew Dunstan
> EDB: https://www.enterprisedb.com
>
>
>

Re: small windows psqlrc re-wording

2022-07-28 Thread Tom Lane

Julien Rouhaud  writes:
> On Wed, Jul 27, 2022 at 02:42:11PM -0400, Robert Treat wrote:
>> The attached patch tweaks the wording around finding the psqlrc file
>> on windows, with the primary goal of removing the generally incorrect
>> statement that windows has no concept of a home directory.

> Windows only has a concept of home directory since Vista, so that used to be
> true.
> Anyway, since we don't support XP or anything older since about 3 weeks ago
> (495ed0ef2d72a6a74def296e042022479d5d07bd), +1 for the patch.

If all supported versions do have home directories now, should we
instead think about aligning the Windows behavior with everywhere
else?

regards, tom lane

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C

On Thu, Jul 28, 2022 at 11:28 AM Peter Smith  wrote:
>
> Hi Vignesh.
>
> FYI the v39* patch fails to apply [1]. Can you please rebase it?
>
>
> [1]
> === Applying patches on top of PostgreSQL commit ID
> 5f858dd3bebd1f3845aef2bff7f4345bfb7b74b3 ===
> === applying patch
> ./v39-0001-Check-and-throw-an-error-if-publication-tables-w.patch
> patching file doc/src/sgml/ref/alter_subscription.sgml
> patching file doc/src/sgml/ref/create_subscription.sgml
> patching file src/backend/commands/subscriptioncmds.c
> Hunk #10 FAILED at 886.
> 1 out of 14 hunks FAILED -- saving rejects to file
> src/backend/commands/subscriptioncmds.c.rej
> patching file src/test/regress/expected/subscription.out
> patching file src/test/regress/sql/subscription.sql
> patching file src/test/subscription/t/030_origin.pl
> patching file src/tools/pgindent/typedefs.list
>
> --

Please find the v40 patch attached which is rebased on top of head.

Regards,
Vignesh
From e59c864801c066aff069deb528427788bd0cff0a Mon Sep 17 00:00:00 2001
From: Vigneshwaran C 
Date: Thu, 28 Jul 2022 19:11:35 +0530
Subject: [PATCH v40 1/2] Check and throw an error if publication tables were
 also subscribing from other publishers and support force value for copy_data
 parameter.

This patch does a couple of things:
1) Checks and throws an error if 'copy_data = on' and 'origin =
none' but the publication tables were also replicated from other publishers.
2) Adds 'force' value for copy_data parameter.

---
The steps below help to demonstrate how the new exception is useful:

The initial copy phase has no way to know the origin of the row data,
so if 'copy_data = on' in the step 4 below, then an error will be
thrown to prevent any potentially non-local data from being copied:

e.g.
CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
PUBLICATION pub_node1 WITH (copy_data = on, origin = none);
ERROR:  CREATE/ALTER SUBSCRIPTION with origin = none and copy_data = on is
not allowed when the publisher might have replicated data.

---
The following steps help to demonstrate how the 'copy_data = force'
change will be useful:

Let's take a scenario where the user wants to set up bidirectional
logical replication between node1 and node2 where the same table on
node1 has pre-existing data and node2 has no pre-existing data.

e.g.
node1: Table t1 (c1 int) has data 11, 12, 13, 14
node2: Table t1 (c1 int) has no pre-existing data

The following steps are required in this case:
step 1:
node1=# CREATE PUBLICATION pub_node1 FOR TABLE t1;
CREATE PUBLICATION

step 2:
node2=# CREATE PUBLICATION pub_node2 FOR TABLE t1;
CREATE PUBLICATION

step 3:
node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
node1-# PUBLICATION pub_node2;
CREATE SUBSCRIPTION

step 4:
node2=# CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
node2-# PUBLICATION pub_node1;
CREATE SUBSCRIPTION

After the subscription is created on node2, node1 will be synced to
node2 and the newly synced data will be sent to node2. This process of
node1 sending data to node2 and node2 sending data to node1 will repeat
infinitely. If table t1 has a unique key, this will lead to a unique key
violation and replication won't proceed.

This problem can be avoided by using origin and copy_data parameters as given
below:
Step 1 & Step 2 are same as above.

step 3: Create a subscription on node1 to subscribe to node2:
node1=# CREATE SUBSCRIPTION sub_node1_node2 CONNECTION ''
node1-# PUBLICATION pub_node2 WITH (copy_data = off, origin = none);
CREATE SUBSCRIPTION

step 4: Create a subscription on node2 to subscribe to node1. Use
'copy_data = force' when creating a subscription to node1 so that the
existing table data is copied during initial sync:
node2=# CREATE SUBSCRIPTION sub_node2_node1 CONNECTION ''
node2-# PUBLICATION pub_node1 WITH (copy_data = force, origin = none);
CREATE SUBSCRIPTION

Author: Vignesh C
Reviewed-By: Peter Smith, Amit Kapila, Jonathan Katz, Shi yu, Wang wei
Discussion: https://www.postgresql.org/message-id/CALDaNm0gwjY_4HFxvvty01BOT01q_fJLKQ3pWP9=9orqubh...@mail.gmail.com
---
 doc/src/sgml/ref/alter_subscription.sgml   |  14 +-
 doc/src/sgml/ref/create_subscription.sgml  |  32 +-
 src/backend/commands/subscriptioncmds.c| 215 +++-
 src/test/regress/expected/subscription.out |  22 +-
 src/test/regress/sql/subscription.sql  |  14 +
 src/test/subscription/t/030_origin.pl  | 382 ++---
 src/tools/pgindent/typedefs.list   |   1 +
 7 files changed, 613 insertions(+), 67 deletions(-)

diff --git a/doc/src/sgml/ref/alter_subscription.sgml b/doc/src/sgml/ref/alter_subscription.sgml
index 64efc21f53..f4fb9c5282 100644
--- a/doc/src/sgml/ref/alter_subscription.sgml
+++ b/doc/src/sgml/ref/alter_subscription.sgml
@@ -161,12 +161,20 @@ ALTER SUBSCRIPTION name RENAME TO <

-copy_data (boolean)
+copy_d

Re: Patch to address creation of PgStat* contexts with null parent context

2022-07-28 Thread Zhang Mingli

Hi,

On Jul 28, 2022, 21:30 +0800, Reid Thompson , 
wrote:
> Hi,
>
> There are instances where pgstat_setup_memcxt() and
> pgstat_prep_pending_entry() are invoked before the CacheMemoryContext
> has been created.  This results in PgStat* contexts being created
> without a parent context.  Most easily reproduced/seen in autovacuum
> worker via pgstat_setup_memcxt().
>
> Attached is a patch to address this.
>
> To see the issue one can add a line similar to this to the top of
> MemoryContextCreate() in mcxt.c
> fprintf(stderr, "pid: %d ctxname: %s parent is %s CacheMemoryContext is 
> %s\n", MyProcPid, name, parent == NULL ? "NULL" : "not NULL", 
> CacheMemoryContext == NULL ? "NULL" : "Not NULL")
> and startup postgres and grep for the above after autovacuum workers
> have been invoked
>
> ...snip...
> pid: 1427756 ctxname: PgStat Pending parent is NULL CacheMemoryContext is NULL
> pid: 1427756 ctxname: PgStat Shared Ref parent is NULL CacheMemoryContext is 
> NULL
> ...snip...
>
> or
>
> startup postgres, attach gdb to postgres following child, break at
> pgstat_setup_memcxt and wait for autovacuum worker to start...
>
> ...snip...
> ─── Source 
> ─
>  384  AllocSetContextCreateInternal(MemoryContext parent,
>  385                                const char *name,
>  386                                Size minContextSize,
>  387                                Size initBlockSize,
>  388                                Size maxBlockSize)
>  389  {
>  390      int            freeListIndex;
>  391      Size        firstBlockSize;
>  392      AllocSet    set;
>  393      AllocBlock    block;
> ─── Stack 
> ──
> [0] from 0x55b7e4088b40 in AllocSetContextCreateInternal+0 at 
> /home/rthompso/src/git/postgres/src/backend/utils/mmgr/aset.c:389
> [1] from 0x55b7e3f41c88 in pgstat_setup_memcxt+2544 at 
> /home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:979
> [2] from 0x55b7e3f41c88 in pgstat_get_entry_ref+2648 at 
> /home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:410
> [3] from 0x55b7e3f420ea in pgstat_get_entry_ref_locked+26 at 
> /home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_shmem.c:598
> [4] from 0x55b7e3f3e2c4 in pgstat_report_autovac+36 at 
> /home/rthompso/src/git/postgres/src/backend/utils/activity/pgstat_database.c:68
> [5] from 0x55b7e3e7f267 in AutoVacWorkerMain+807 at 
> /home/rthompso/src/git/postgres/src/backend/postmaster/autovacuum.c:1694
> [6] from 0x55b7e3e7f435 in StartAutoVacWorker+133 at 
> /home/rthompso/src/git/postgres/src/backend/postmaster/autovacuum.c:1493
> [7] from 0x55b7e3e87367 in StartAutovacuumWorker+557 at 
> /home/rthompso/src/git/postgres/src/backend/postmaster/postmaster.c:5539
> [8] from 0x55b7e3e87367 in sigusr1_handler+935 at 
> /home/rthompso/src/git/postgres/src/backend/postmaster/postmaster.c:5244
> [9] from 0x7fb02bca7420 in __restore_rt
> [+]
> ─── Threads 
> 
> [1] id 1174088 name postgres from 0x55b7e4088b40 in 
> AllocSetContextCreateInternal+0 at 
> /home/rthompso/src/git/postgres/src/backend/utils/mmgr/aset.c:389
> ─── Variables 
> ──
> arg parent = 0x0, name = 0x55b7e416f179 "PgStat Shared Ref": 80 'P', 
> minContextSize = 0, initBlockSize = 1024, maxBlockSize = 8192
> loc firstBlockSize = , set = , block = 
> , __FUNCTION__ = "AllocSetContextCreateInternal", __func__ = 
> "AllocSetContextCreateInternal"
> 
> > > > print CacheMemoryContext == NULL
> $4 = 1
> > > > print parent
> $5 = (MemoryContext) 0x0
>
> Thanks,
> Reid
>
>


Codes seem good, my question is:

Do auto vacuum processes need CacheMemoryContext?

Is it designed not to  create CacheMemoryContext in such processes?

If so, we’d better use TopMemoryContext in such processes.

Regards,
Zhang Mingli

Re: [PATCH] Reuse Workers and Replication Slots during Logical Replication

2022-07-28 Thread Amit Kapila

On Wed, Jul 27, 2022 at 3:56 PM Melih Mutlu  wrote:
>
> Hi Amit,
>
> I updated the patch in order to prevent the problems that might be caused by 
> using different replication slots for syncing a table.
> As suggested in previous emails, replication slot names are stored in the 
> catalog. So slot names can be reached later and it is ensured
> that same replication slot is used during tablesync step of a table.
>
> With the current version of the patch:
> -. "srrelslotname" column is introduced into pg_subscibtion_rel catalog. It 
> stores the slot name for tablesync
>
> -. Tablesync worker logic is now as follows:
> 1. Tablesync worker is launched by apply worker for a table.
> 2. Worker generates a default replication slot name for itself. Slot name 
> includes subid and worker pid for tracking purposes.
> 3. If table has a slot name value in the catalog:
>
> i. If the table state is DATASYNC, drop the replication slot from the catalog 
> and proceed tablesync with a new slot.
>
> ii. If the table state is FINISHEDCOPY, use the replicaton slot from the 
> catalog, do not create a new slot.
>
> 4. Before worker moves to new table, drop any replication slot that are 
> retrieved from the catalog and used.
>

Why after step 4, do you need to drop the replication slot? Won't just
clearing the required info from the catalog be sufficient?

> 5. In case of no table left to sync, drop the replication slot of that sync 
> worker with worker pid if it exists. (It's possible that a sync worker do not 
> create a replication slot for itself but uses slots read from the catalog in 
> each iteration)
>
>
>> I think using worker_pid also has similar risks of mixing slots from
>> different workers because after restart same worker_pid could be
>> assigned to a totally different worker. Can we think of using a unique
>> 64-bit number instead? This will be allocated when each workers
>> started for the very first time and after that we can refer catalog to
>> find it as suggested in the idea below.
>
>
> I'm not sure how likely to have colliding pid's for different tablesync 
> workers in the same subscription.
>

Hmm, I think even if there is an iota of a chance which I think is
there, we can't use worker_pid. Assume, that if the same worker_pid is
assigned to another worker once the worker using it got an error out,
the new worker will fail as soon as it will try to create a
replication slot.

> Though ,having pid in slot name makes it easier to track which slot belongs 
> to which worker. That's why I kept using pid in slot names.
> But I think it should be simple to switch to using a unique 64-bit number. So 
> I can remove pid's from slot names, if you think that it would be better.
>

I feel it would be better or maybe we need to think of some other
identifier but one thing we need to think about before using a 64-bit
unique identifier here is how will we retrieve its last used value
after restart of server. We may need to store it in a persistent way
somewhere.
>>
>> We should use the same for the origin name as well.
>
>
> I did not really change anything related to origin names. Origin names are 
> still the same and include relation id. What do you think would be an issue 
> with origin names in this patch?
>

The problems will be similar to the slot name. The origin is used to
track the progress of replication, so, if we use the wrong origin name
after the restart, it can send the wrong start_streaming position to
the publisher.

-- 
With Regards,
Amit Kapila.

Re: Maximize page freezing

2022-07-28 Thread Matthias van de Meent

On Thu, 28 Jul 2022 at 15:36, Simon Riggs  wrote:
>
> Starting new thread with updated patch to avoid confusion, as
> mentioned by David Steele on the original thread:
> Original messageid: 20201118020418.GA13408@alvherre.pgsql
> On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera  wrote:
> > On 2020-Nov-17, Simon Riggs wrote:
> >
> > > As an additional optimization, if we do find a row that needs freezing
> > > on a data block, we should simply freeze *all* row versions on the
> > > page, not just the ones below the selected cutoff. This is justified
> > > since writing the block is the biggest cost and it doesn't make much
> > > sense to leave a few rows unfrozen on a block that we are dirtying.
> >
> > Yeah.  We've had earlier proposals to use high and low watermarks: if any
> > tuple is past the high watermark, then freeze all tuples that are past
> > the low watermark.  However this is ancient thinking (prior to
> > HEAP_XMIN_FROZEN) and we don't need the low watermark to be different
> > from zero, since the original xid is retained anyway.
> >
> > So +1 for this idea.
>
> Updated patch attached.

Great idea, yet this patch seems to only freeze those tuples that are
located after the first to-be-frozen tuple. It should probably
re-visit earlier live tuples to potentially freeze those as well.

Kind regards,

Matthias van de Meent

Re: small windows psqlrc re-wording

2022-07-28 Thread Julien Rouhaud

Hi,

On Wed, Jul 27, 2022 at 02:42:11PM -0400, Robert Treat wrote:
>
> The attached patch tweaks the wording around finding the psqlrc file
> on windows, with the primary goal of removing the generally incorrect
> statement that windows has no concept of a home directory.

Windows only has a concept of home directory since Vista, so that used to be
true.

Anyway, since we don't support XP or anything older since about 3 weeks ago
(495ed0ef2d72a6a74def296e042022479d5d07bd), +1 for the patch.

Maximize page freezing

2022-07-28 Thread Simon Riggs

Starting new thread with updated patch to avoid confusion, as
mentioned by David Steele on the original thread:
Original messageid: 20201118020418.GA13408@alvherre.pgsql
On Wed, 18 Nov 2020 at 02:04, Alvaro Herrera  wrote:
> On 2020-Nov-17, Simon Riggs wrote:
>
> > As an additional optimization, if we do find a row that needs freezing
> > on a data block, we should simply freeze *all* row versions on the
> > page, not just the ones below the selected cutoff. This is justified
> > since writing the block is the biggest cost and it doesn't make much
> > sense to leave a few rows unfrozen on a block that we are dirtying.
>
> Yeah.  We've had earlier proposals to use high and low watermarks: if any
> tuple is past the high watermark, then freeze all tuples that are past
> the low watermark.  However this is ancient thinking (prior to
> HEAP_XMIN_FROZEN) and we don't need the low watermark to be different
> from zero, since the original xid is retained anyway.
>
> So +1 for this idea.

Updated patch attached.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


one_freeze_then_max_freeze.v9.patch
Description: Binary data

Re: Perform streaming logical transactions by background workers and parallel apply

2022-07-28 Thread Amit Kapila

On Wed, Jul 27, 2022 at 1:27 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, July 27, 2022 1:29 PM Dilip Kumar  wrote:
> >
> > On Wed, Jul 27, 2022 at 10:06 AM Amit Kapila 
> > >
> > > What kind of failure do you have in mind and how it can occur? The one
> > > way it can fail is if the publisher doesn't have a corresponding
> > > foreign key on the table because then the publisher could have allowed
> > > an insert into a table (insert into FK table without having the
> > > corresponding key in PK table) which may not be allowed on the
> > > subscriber. However, I don't see any check that could prevent this
> > > because for this we need to compare the FK list for a table from the
> > > publisher with the corresponding one on the subscriber. I am not
> > > really sure if due to the risk of such conflicts we should block
> > > parallelism of transactions operating on tables with FK because those
> > > conflicts can occur even without parallelism, it is just a matter of
> > > timing. But, I could be missing something due to which the above check
> > > can be useful?
> >
> > Actually, my question starts with this check[1][2], from this it
> > appears that if this relation is having a foreign key then we are
> > marking it parallel unsafe[2] and later in [1] while the worker is
> > applying changes for that relation and if it was marked parallel
> > unsafe then we are throwing error.  So my question was why we are
> > putting this restriction?  Although this error is only talking about
> > unique and non-immutable functions this is also giving an error if the
> > target table had a foreign key.  So my question was do we really need
> > to restrict this? I mean why we are restricting this case?
> >
>
> Hi,
>
> I think the foreign key check is used to prevent the apply worker from waiting
> indefinitely which is caused by foreign key difference between publisher and
> subscriber, Like the following example:
>
> -
> Publisher:
> -- both table are published
> CREATE TABLE PKTABLE ( ptest1 int);
> CREATE TABLE FKTABLE ( ftest1 int);
>
> -- initial data
> INSERT INTO PKTABLE VALUES(1);
>
> Subcriber:
> CREATE TABLE PKTABLE ( ptest1 int PRIMARY KEY);
> CREATE TABLE FKTABLE ( ftest1 int REFERENCES PKTABLE);
>
> -- Execute the following transactions on publisher
>
> Tx1:
> INSERT ... -- make enough changes to start streaming mode
> DELETE FROM PKTABLE;
> Tx2:
> INSERT ITNO FKTABLE VALUES(1);
> COMMIT;
> COMMIT;
> -
>
> The subcriber's apply worker will wait indefinitely, because the main apply 
> worker is
> waiting for the streaming transaction to finish which is in another apply
> bgworker.
>

IIUC, here the problem will be that TX2 (Insert in FK table) performed
by the apply worker will wait for a parallel worker doing streaming
transaction TX1 which has performed Delete from PK table. This wait is
required because we can't decide if Insert will be successful or not
till TX1 is either committed or Rollback. This is similar to the
problem related to primary/unique keys mentioned earlier [1]. If so,
then, we should try to forbid this in some way to avoid subscribers
from being stuck.

Dilip, does this reason sounds sufficient to you for such a check, or
do you still think we don't need any check for FK's?

>
> BTW, I think the foreign key won't take effect in subscriber's apply worker by
> default. Because we set session_replication_role to 'replica' in apply worker
> which prevent the FK trigger function to be executed(only the trigger with
> FIRES_ON_REPLICA flag will be executed in this mode). User can only alter the
> trigger to enable it on replica mode to make the foreign key work. So, ISTM, 
> we
> won't hit this ERROR frequently.
>
> And based on this, another comment about the patch is that it seems 
> unnecessary
> to directly check the FK returned by RelationGetFKeyList. Checking the actual 
> FK
> trigger function seems enough.
>

That is correct. I think it would have been better if we can detect
that publisher doesn't have FK but the subscriber has FK as it can
occur only in that scenario. If that requires us to send more
information from the publisher, we can leave it for now (as this
doesn't seem to be a frequent scenario) and keep a simpler check based
on subscriber schema.

I think we should add a test as mentioned by you above so that if
tomorrow one tries to remove the FK check, we have a way to know.
Also, please add comments and tests for additional checks related to
constraints in the patch.

[1] - 
https://www.postgresql.org/message-id/CAA4eK1JwahU_WuP3S%2B7POqta%3DPhm_3gxZeVmJuuoUq1NV%3DkrXA%40mail.gmail.com

-- 
With Regards,
Amit Kapila.

Re: [Refactor]Avoid to handle FORCE_NOT_NULL/FORCE_NULL options when COPY TO

2022-07-28 Thread Zhang Mingli

Hi, all

Assertions added.

Thanks for review.

Regards,

Zhang Mingli

Sent with a Spark
On Jul 27, 2022, 14:37 +0800, Richard Guo , wrote:
>
> > On Wed, Jul 27, 2022 at 12:55 PM Kyotaro Horiguchi 
> >  wrote:
> > > ProcessCopyOptions previously rejects force_quote_all for COPY FROM
> > > and copyfrom.c is not even conscious of the option (that is, even no
> > > assertion on it). The two options are rejected for COPY TO by the same
> > > function so it seems like a thinko of the commit. Getting rid of the
> > > code would be good in the view of code coverage and maintenance.
> >
> > Yeah, ProcessCopyOptions() does have the check for force_notnull and
> > force_null whether it is using COPY FROM and whether it is in CSV mode.
> > So the codes in copyto.c processing force_notnull/force_null are
> > actually dead codes.
> >
> > > On the otherhand I wonder if it is good that we have assertions on the
> > > option values.
> >
> > Agree. Assertions would be better.
> >
> > Thanks
> > Richard


vn-0002-Avoid-to-handle-FORCE_NOT_NULL-FORCE_NULL-options.patch
Description: Binary data

Re: Hash index build performance tweak from sorting

2022-07-28 Thread Simon Riggs

On Wed, 27 Jul 2022 at 19:22, Tom Lane  wrote:
>
> Simon Riggs  writes:
> > [ hash_sort_by_hash.v2.patch ]
>
> The cfbot says this no longer applies --- probably sideswiped by
> Korotkov's sorting-related commits last night.

Thanks for the nudge. New version attached.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


hash_sort_by_hash.v3.patch
Description: Binary data

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Amit Kapila

On Thu, Jul 28, 2022 at 3:23 PM Amit Kapila  wrote:
>
> On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada  wrote:
> >
> > Okay, I've attached an updated patch that does the above idea. Could
> > you please do the performance tests again to see if the idea can help
> > reduce the overhead, Shi yu?
> >
>
> While reviewing the patch for HEAD, I have changed a few comments. See
> attached, if you agree with these changes then include them in the
> next version.
>

I have another comment on this patch:
SnapBuildPurgeOlderTxn()
{
...
+ if (surviving_xids > 0)
+ memmove(builder->catchange.xip, &(builder->catchange.xip[off]),
+ surviving_xids * sizeof(TransactionId))
...

For this code to hit, we must have a situation where one or more of
the xacts in this array must be still running. And, if that is true,
we would not have started from the restart point where the
corresponding snapshot (that contains the still running xacts) has
been serialized because we advance the restart point to not before the
oldest running xacts restart_decoding_lsn. This may not be easy to
understand so let me take an example to explain. Say we have two
transactions t1 and t2, and both have made catalog changes. We want a
situation where one of those gets purged and the other remains in
builder->catchange.xip array. I have tried variants of the below
sequence to see if I can get into the required situation but am not
able to make it.

Session-1
Checkpoint -1;
T1
DDL

Session-2
T2
DDL

Session-3
Checkpoint-2;
pg_logical_slot_get_changes()
 -- Here when we serialize the snapshot corresponding to
CHECKPOINT-2's running_xact record, we will serialize both t1 and t2
as catalog-changing xacts.

Session-1
T1
Commit;

Checkpoint;
pg_logical_slot_get_changes()
 -- Here we will restore from Checkpoint-1's serialized snapshot and
won't be able to move restart_point to Checkpoint-2 because T2 is
still open.

Now, as per my understanding, it is only possible to move
restart_point to Checkpoint-2 if T2 gets committed/rolled-back in
which case we will never have that in surviving_xids array after the
purge.

It is possible I am missing something here. Do let me know your thoughts.

-- 
With Regards,
Amit Kapila.

Re: [Commitfest 2022-07] Patch Triage: Waiting on Author

2022-07-28 Thread Andrey Borodin




> 27 июля 2022 г., в 00:26, Jacob Champion  написал(а):
> 
> - libpq compression
>  https://commitfest.postgresql.org/38/3499/
> 
>  Needs a rebase and response to feedback; mostly quiet since January.

Daniil is working on this, but currently he's on vacation.
I think we should not mark patch as RwF and move it to next CF instead.

Thank you!

Best regards, Andrey Borodin.

Re: Handle infinite recursion in logical replication setup

2022-07-28 Thread vignesh C

On Thu, Jul 28, 2022 at 11:28 AM Peter Smith  wrote:
>
> Hi Vignesh.
>
> FYI the v39* patch fails to apply [1]. Can you please rebase it?
>
>
> [1]
> === Applying patches on top of PostgreSQL commit ID
> 5f858dd3bebd1f3845aef2bff7f4345bfb7b74b3 ===
> === applying patch
> ./v39-0001-Check-and-throw-an-error-if-publication-tables-w.patch
> patching file doc/src/sgml/ref/alter_subscription.sgml
> patching file doc/src/sgml/ref/create_subscription.sgml
> patching file src/backend/commands/subscriptioncmds.c
> Hunk #10 FAILED at 886.
> 1 out of 14 hunks FAILED -- saving rejects to file
> src/backend/commands/subscriptioncmds.c.rej
> patching file src/test/regress/expected/subscription.out
> patching file src/test/regress/sql/subscription.sql
> patching file src/test/subscription/t/030_origin.pl
> patching file src/tools/pgindent/typedefs.list

Thanks for reporting this, I will post an updated version for this soon.

Regards,
Vignesh

ci: update to freebsd 13.1 / remove minor versions from image names

2022-07-28 Thread Andres Freund

Hi,

Freebsd 13.0, so far used by CI, is out of support. I've changed the
image to be built against 13.1, so we can switch to that.

I suspect it'd be better to remove the minor version numbers from the
image name, so that switches from 13.0 -> 13.1 don't require CI
changes. Any argument against?

I can also see an argument for not having 13 in the image name, given
that the image is CI specific anyway? But perhaps we might want to have
a 13 and a 14 image for some debugging issue?

Greetings,

Andres Freund

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Amit Kapila

On Tue, Jul 26, 2022 at 1:22 PM Masahiko Sawada  wrote:
>
> Okay, I've attached an updated patch that does the above idea. Could
> you please do the performance tests again to see if the idea can help
> reduce the overhead, Shi yu?
>

While reviewing the patch for HEAD, I have changed a few comments. See
attached, if you agree with these changes then include them in the
next version.

-- 
With Regards,
Amit Kapila.

master_v9_amit.diff.patch
Description: Binary data

Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)

2022-07-28 Thread Nitin Jadhav

> > To understand the performance effects of the above, I have taken the
> > average of five checkpoints with the patch and without the patch in my
> > environment. Here are the results.
> > With patch: 269.65 s
> > Without patch: 269.60 s
>
> Those look like timed checkpoints - if the checkpoints are sleeping a
> part of the time, you're not going to see any potential overhead.

Yes. The above data is collected from timed checkpoints.

create table t1(a int);
insert into t1 select * from generate_series(1,1000);

I generated a lot of data by using the above queries which would in
turn trigger the checkpoint (wal).
---

> To see whether this has an effect you'd have to make sure there's a
> certain number of dirty buffers (e.g. by doing CREATE TABLE AS
> some_query) and then do a manual checkpoint and time how long that
> times.

For this case I have generated data by using below queries.

create table t1(a int);
insert into t1 select * from generate_series(1,800);

This does not trigger the checkpoint automatically. I have issued the
CHECKPOINT manually and measured the performance by considering an
average of 5 checkpoints. Here are the details.

With patch: 2.457 s
Without patch: 2.334 s

Please share your thoughts.

Thanks & Regards,
Nitin Jadhav

On Thu, Jul 7, 2022 at 5:34 AM Andres Freund  wrote:
>
> Hi,
>
> On 2022-06-13 19:08:35 +0530, Nitin Jadhav wrote:
> > > Have you measured the performance effects of this? On fast storage with 
> > > large
> > > shared_buffers I've seen these loops in profiles. It's probably fine, but 
> > > it'd
> > > be good to verify that.
> >
> > To understand the performance effects of the above, I have taken the
> > average of five checkpoints with the patch and without the patch in my
> > environment. Here are the results.
> > With patch: 269.65 s
> > Without patch: 269.60 s
>
> Those look like timed checkpoints - if the checkpoints are sleeping a
> part of the time, you're not going to see any potential overhead.
>
> To see whether this has an effect you'd have to make sure there's a
> certain number of dirty buffers (e.g. by doing CREATE TABLE AS
> some_query) and then do a manual checkpoint and time how long that
> times.
>
> Greetings,
>
> Andres Freund

Re: Checking pgwin32_is_junction() errors

2022-07-28 Thread Thomas Munro

Here's a better idea, now that I'm emboldened by having working CI for
Windows frankenbuilds, and since I broke some stuff in this area on
MSYS[1], which caused me to look more closely at this area.

Why don't we just nuke pgwin32_is_junction() from orbit, and teach
Windows how to lstat()?  We're already defining our own replacement
stat() used in both MSVC and MSYS builds along with our own junction
point-based symlink() and readlink() functions, and lstat() was
already suggested in a comment in win32stat.c.

There's one curious change in the draft patch attached: you can't
unlink() a junction point, you have to rmdir() it.  Previously, things
that traverse directories without ever calling pgwin32_is_junction()
would see junction points as S_ISDIR() and call rmdir(), which was OK,
but now they see S_ISLNK() and call unlink().  So I taught unlink() to
try both things.  Which is kinda weird, and not beautiful, especially
when combined with the existing looping weirdness.

0001 is a copy of v2 of Melih Mutlu's CI patch[2] to show cfbot how to
test this on MSYS (alongside the normal MSVC result), but that's not
part of this submission.

[1] 
https://www.postgresql.org/message-id/flat/b9ddf605-6b36-f90d-7c30-7b3e95c46276%40dunslane.net
[2] 
https://www.postgresql.org/message-id/flat/CAGPVpCSKS9E0An4%3De7ZDnme%2By%3DWOcQFJYJegKO8kE9%3Dgh8NJKQ%40mail.gmail.com
From ec732571a1fa88fefcf9834987114716eb9c5998 Mon Sep 17 00:00:00 2001
From: Melih Mutlu 
Date: Mon, 21 Feb 2022 14:46:05 +0300
Subject: [PATCH 1/3] Added Windows with MinGW environment in Cirrus CI

---
 .cirrus.yml | 79 +++--
 1 file changed, 65 insertions(+), 14 deletions(-)

diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae55..3d8c4cd813 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -23,7 +23,6 @@ env:
   CHECKFLAGS: -Otarget
   PROVE_FLAGS: --timer
   PGCTLTIMEOUT: 120 # avoids spurious failures during parallel tests
-  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
   PG_TEST_EXTRA: kerberos ldap ssl
 
 
@@ -338,13 +337,30 @@ task:
 cores_script: src/tools/ci/cores_backtrace.sh macos "${HOME}/cores"
 
 
+WINDOWS_ENVIRONMENT_BASE: &WINDOWS_ENVIRONMENT_BASE
+env:
+  # Half the allowed per-user CPU cores
+  CPUS: 4
+  # The default working dir is in a directory msbuild complains about
+  CIRRUS_WORKING_DIR: "c:/cirrus"
+  TEMP_CONFIG: ${CIRRUS_WORKING_DIR}/src/tools/ci/pg_ci_base.conf
+
+  # Avoids port conflicts between concurrent tap test runs
+  PG_TEST_USE_UNIX_SOCKETS: 1
+
+only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*windows.*'
+
+sysinfo_script: |
+  chcp
+  systeminfo
+  powershell -Command get-psdrive -psprovider filesystem
+  set
+
 task:
+  << : *WINDOWS_ENVIRONMENT_BASE
   name: Windows - Server 2019, VS 2019
 
   env:
-# Half the allowed per-user CPU cores
-CPUS: 4
-
 # Our windows infrastructure doesn't have test concurrency above the level
 # of a single vcregress test target. Due to that, it's useful to run prove
 # with multiple jobs. For the other tasks it isn't, because two sources
@@ -354,15 +370,11 @@ task:
 # likely can be improved upon further.
 PROVE_FLAGS: -j10 --timer
 
-# The default cirrus working dir is in a directory msbuild complains about
-CIRRUS_WORKING_DIR: "c:/cirrus"
 # Avoid re-installing over and over
 NO_TEMP_INSTALL: 1
 # git's tar doesn't deal with drive letters, see
 # https://postgr.es/m/b6782dc3-a7b0-ed56-175f-f8f54cb08d67%40dunslane.net
 TAR: "c:/windows/system32/tar.exe"
-# Avoids port conflicts between concurrent tap test runs
-PG_TEST_USE_UNIX_SOCKETS: 1
 PG_REGRESS_SOCK_DIR: "c:/cirrus/"
 # -m enables parallelism
 # verbosity:minimal + Summary reduce verbosity, while keeping a summary of
@@ -393,12 +405,6 @@ task:
 cpu: $CPUS
 memory: 4G
 
-  sysinfo_script: |
-chcp
-systeminfo
-powershell -Command get-psdrive -psprovider filesystem
-set
-
   setup_additional_packages_script: |
 REM choco install -y --no-progress ...
 
@@ -456,6 +462,51 @@ task:
   path: "crashlog-*.txt"
   type: text/plain
 
+task:
+  << : *WINDOWS_ENVIRONMENT_BASE
+  name: Windows - Server 2019, MinGW64
+  windows_container:
+image: $CONTAINER_REPO/windows_ci_mingw64:latest
+cpu: $CPUS
+memory: 4G
+  env:
+CCACHE_DIR: C:/msys64/ccache
+BUILD_DIR: "%CIRRUS_WORKING_DIR%/build"
+
+  ccache_cache:
+folder: ${CCACHE_DIR}
+
+  mingw_info_script:
+- C:\msys64\usr\bin\bash.exe -lc "where gcc"
+- C:\msys64\usr\bin\bash.exe -lc "gcc --version"
+- C:\msys64\usr\bin\bash.exe -lc "where perl"
+- C:\msys64\usr\bin\bash.exe -lc "perl --version"
+
+  configure_script:
+- C:\msys64\usr\bin\bash.exe -lc "mkdir %BUILD_DIR% &&
+  cd %BUILD_DIR% &&
+  %CIRRUS_WORKING_DIR%/configure
+--host=x86_64-w64-

Re: Data is copied twice when specifying both child and parent table in publication

2022-07-28 Thread Peter Smith

Here are some review comments for the HEAD_v7-0001 patch:

==

1. 

I have a fundamental question about this patch.

IIUC the purpose of this patch is to ensure that (when
publish_via_root = true) the copy of the partition data will happen
only once (e.g. from one parent table on one of the publishers). But I
think there is no guarantee that those 2 publishers even had the same
data, right? Therefore it seems to me you could get different results
if the data were copied from pub1 or from pub2. (I have not tried it -
this is just my suspicion).

Am I correct or mistaken? If correct, then it means there is a big
(but subtle) difference related to the ordering of processing ...  A)
is this explicitly documented so the user knows what data to expect?
B) is the effect of different ordering tested anywhere? Actually, I
have no idea what exactly determines the order – is it the original
CREATE SUBSCRIPTION publication list order? Is it the logic of the
pg_get_publication_tables function? Is it the SQL in function
fetch_table_list? Or is it not deterministic at all? Please confirm
it.

==

2. Commit message.

2a.

If there are two publications that publish the parent table and the child table
separately, and both specify the option publish_via_partition_root, subscribing
to both publications from one subscription causes initial copy twice.

SUGGESTION
If there are two publications that publish the parent table and the child table
respectively, but both specify publish_via_partition_root = true, subscribing
to both publications from one subscription causes initial copy twice.

2b. 

Actually, isn't it more subtle than what that comment is describing?
Maybe nobody is explicitly publishing a parent table at all. Maybe
pub1 publishes partition1 and pub2 publishes partition2, but both
publications are using publish_via_partition_root = true. Is this
scenario even tested? Does the logic of pg_get_publication_tables
cover this scenario?

==

3. src/backend/catalog/pg_publication.c - pg_get_publication_tables

 pg_get_publication_tables(PG_FUNCTION_ARGS)
 {
 #define NUM_PUBLICATION_TABLES_ELEM 3
- FuncCallContext *funcctx;
- char*pubname = text_to_cstring(PG_GETARG_TEXT_PP(0));
- Publication *publication;
- List*tables;
+ FuncCallContext *funcctx;
+ Publication *publication;

Something seems strange about having a common Publication declaration.
Firstly it is used to represent every publication element in the array
loop. Later, it is overwritten to represent a single publication.

I think it might be easier if you declare these variables separately:

E.g.1
Publication *pub_elem; -- for the array element processing declared
within the for loop

E.g.2
Publication *pub; -- declared within if (funcctx->call_cntr <
list_length(results))

~~~

4.

+ /* Filter by final published table. */
+ foreach(lc, results)
+ {
+ Oid *table_info = (Oid *) lfirst(lc);
+
+ if (!list_member_oid(tables, table_info[0]))
+ results = foreach_delete_current(results, lc);
  }

The comment did not convey enough meaning. Can you make it more
descriptive to explain why/what the logic is doing here?

==

5. src/backend/commands/subscriptioncmds.c - fetch_table_list

  /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ if (server_version >= 16)
+ appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"

That comment is exactly the same as it was before the patch. But it
doesn't seem quite appropriate anymore for this new condition and this
new query.

~~~

6.

+ /*
+ * Get the list of tables from publisher, the partition table whose
+ * ancestor is also in this list should be ignored, otherwise the
+ * initial data in the partition table would be replicated twice.
+ */

Why say "should be ignored" --  don’t you mean "will be" or "must be" or "is".

~~~

7.

  initStringInfo(&cmd);
- appendStringInfoString(&cmd, "SELECT DISTINCT t.schemaname, t.tablename \n");

  /* Get column lists for each relation if the publisher supports it */
- if (check_columnlist)
- appendStringInfoString(&cmd, ", t.attnames\n");
+ if (server_version >= 16)
+ appendStringInfo(&cmd, "SELECT DISTINCT n.nspname, c.relname,\n"
+ "  ( CASE WHEN (array_length(gpt.attrs, 1) = c.relnatts)\n"
+ " THEN NULL ELSE gpt.attrs END\n"
+ "  ) AS attnames\n"
+ " FROM pg_class c\n"
+ "   JOIN pg_namespace n ON n.oid = c.relnamespace\n"
+ "   JOIN ( SELECT (pg_get_publication_tables(VARIADIC
array_agg(pubname::text))).*\n"
+ "  FROM pg_publication\n"
+ "  WHERE pubname IN ( %s )) as gpt\n"
+ "   ON gpt.relid = c.oid\n",
+ pub_names.data);
+ else
+ {
+ /*
+ * Get the list of tables from publisher, the partition table whose
+ * ancestor is also in this list should be ignored, otherwise the
+ * initial data in the partition table would be replicated twice.
+ */

- appendStringInfoString(&cmd, "FROM pg_catalog

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Amit Kapila

On Thu, Jul 28, 2022 at 12:56 PM Masahiko Sawada  wrote:
>
> On Thu, Jul 28, 2022 at 4:13 PM Amit Kapila  wrote:
> >
> > >
> > > While editing back branch patches, I realized that the following
> > > (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> > > equivalent:
> > >
> > > +   /*
> > > +* If the COMMIT record has invalidation messages, it could have 
> > > catalog
> > > +* changes. It is possible that we didn't mark this transaction as
> > > +* containing catalog changes when the decoding starts from a commit
> > > +* record without decoding the transaction's other changes. So, we 
> > > ensure
> > > +* to mark such transactions as containing catalog change.
> > > +*
> > > +* This must be done before SnapBuildCommitTxn() so that we can 
> > > include
> > > +* these transactions in the historic snapshot.
> > > +*/
> > > +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> > > +   SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> > > + parsed->nsubxacts, parsed->subxacts,
> > > + buf->origptr);
> > > +
> > > /*
> > >  * Process invalidation messages, even if we're not interested in the
> > >  * transaction's contents, since the various caches need to always be
> > >  * consistent.
> > >  */
> > > if (parsed->nmsgs > 0)
> > > {
> > > if (!ctx->fast_forward)
> > > ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
> > >   parsed->nmsgs, parsed->msgs);
> > > ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, 
> > > buf->origptr);
> > > }
> > >
> > > If that's right, I think we can merge these if branches. We can call
> > > ReorderBufferXidSetCatalogChanges() for top-txn and in
> > > SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> > > is in the list. What do you think?
> > >
> >
> > Note that this code doesn't exist in 14 and 15, so we need to create
> > different patches for those.
>
> Right.
>

Okay, then this sounds reasonable to me.

-- 
With Regards,
Amit Kapila.

RE: Skip partition tuple routing with constant partition key

2022-07-28 Thread houzj.f...@fujitsu.com

On Thursday, July 28, 2022 10:59 AM David Rowley  wrote:
> On Thu, 28 Jul 2022 at 00:50, Amit Langote 
> wrote:
> > So, in a way the caching scheme works for LIST partitioning only if
> > the same value appears consecutively in the input set, whereas it does
> > not for *a set of* values belonging to the same partition appearing
> > consecutively.  Maybe that's a reasonable restriction for now.
> 
> I'm not really seeing another cheap enough way of doing that. Any LIST
> partition could allow any number of values. We've only space to record
> 1 of those values by way of recording which element in the PartitionBound that
> it was located.
> 
> > if (part_index < 0)
> > -   part_index = boundinfo->default_index;
> > +   {
> > +   /*
> > +* Since we don't do caching for the default partition or failed
> > +* lookups, we'll just wipe the cache fields back to their initial
> > +* values.  The count becomes 0 rather than 1 as 1 means it's the
> > +* first time we've found a partition we're recording for the cache.
> > +*/
> > +   partdesc->last_found_datum_index = -1;
> > +   partdesc->last_found_part_index = -1;
> > +   partdesc->last_found_count = 0;
> > +
> > +   return boundinfo->default_index;
> > +   }
> >
> > I wonder why not to leave the cache untouched in this case?  It's
> > possible that erratic rows only rarely occur in the input sets.
> 
> I looked into that and I ended up just removing the code to reset the cache. 
> It
> now works similarly to a LIST partitioned table's NULL partition.
> 
> > Should the comment update above get_partition_for_tuple() mention
> > something like the cached path is basically O(1) and the non-cache
> > path O (log N) as I can see in comments in some other modules, like
> > pairingheap.c?
> 
> I adjusted for the other things you mentioned but I didn't add the big O 
> stuff. I
> thought the comment was clear enough.
> 
> I'd quite like to push this patch early next week, so if anyone else is 
> following
> along that might have any objections, could they do so before then?

Thanks for the patch. The patch looks good to me.

Only a minor nitpick:

+   /*
+* For LIST partitioning, this is the number of times in a row that the
+* the datum we're looking

It seems a duplicate 'the' word in this comment.
"the the datum".

Best regards,
Hou Zhijie

[PATCH] BUG FIX: redo will abort, due to inconsistent page found in BRIN_REGULAR_PAGE

2022-07-28 Thread 王海洋

Hi hackers,

I found that when wal_consistency_checking = brin is set, it may cause redo
abort, all the standby-nodes lost, and the primary node can not be restart.

This bug exists in all versions of PostgreSQL.

The operation steps are as follows:

1. Create a primary instance, set wal_consistency_checking = brin, and
start the primary instance.

initdb -D pg_test
echo "wal_consistency_checking = brin" >> pg_test/postgresql.conf
echo "port=53320" >> pg_test/postgresql.conf
pg_ctl start -D pg_test -l pg_test.logfile

2. Create a standby instance.

pg_basebackup -R -p 53320 -D pg_test_slave
echo "wal_consistency_checking = brin" >>
pg_test_slave/postgresql.conf
echo "port=53321" >> pg_test_slave/postgresql.conf
pg_ctl start -D pg_test_slave -l pg_test_slave.logfile

3. Execute brin_redo_abort.sql through psql, and find that the standby
machine is lost.

psql -p 53320 -f brin_redo_abort.sql

4. The standby instance is lost during redo, FATAL messages as follows:

FATAL:  inconsistent page found, rel 1663/12978/16387, forknum 0,
blkno 2

5. The primary instance cannot be restarted through pg_ctl restart -mi.

pg_ctl restart -D pg_test -mi -l pg_test.logfile

6. FATAL messages when restart primary instance as follows:

FATAL:  inconsistent page found, rel 1663/12978/16387, forknum 0,
blkno 2

I analyzed the reasons as follows:

1. When the revmap needs to be extended by brinRevmapExtend,
we may set BRIN_EVACUATE_PAGE flag on a REGULAR_PAGE to prevent
other concurrent backends from adding more BrinTuple to that page
in brin_start_evacuating_page.

2. But, during redo-process, it is not needed to set BRIN_EVACUATE_PAGE
flag on that REGULAR_PAGE after removing the old BrinTuple in
brin_xlog_update, since no one will add BrinTuple to that Page at
this time.

3. As a result, this will cause a FATAL message to be thrown in
CheckXLogConsistency after redo, due to inconsistency checking of
the BRIN_EVACUATE_PAGE flag, finally cause redo to abort.

4. Therefore, the BRIN_EVACUATE_PAGE flag should be cleared before
CheckXLogConsistency.


For the above reasons, the patch file, sql file, shell script file, and the
log files are given in the attachment.

Best Regards!
Haiyang Wang


brin_redo_abort.sh
Description: Binary data


pg_test.logfile
Description: Binary data


pg_test_slave.logfile
Description: Binary data


brin_redo_abort.sql
Description: Binary data


0001-clear-BRIN_EVACUATE_PAGE-before-consistency-checking.patch
Description: Binary data

Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

2022-07-28 Thread alvhe...@alvh.no-ip.org

Hello,

On 2022-Jun-29, Imseih (AWS), Sami wrote:

> > Would you mind trying the second attached to abtain detailed log on
> > your testing environment? With the patch, the modified TAP test yields
> > the log lines like below.
> 
> Thanks for this. I will apply this to the testing environment and
> will share the output.

Any luck with this?

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)

Any way to get timestamp from LSN value ?

2022-07-28 Thread Harinath Kanchu

Hello,

Is there any way to get the timestamp of the transaction using LSN value ?

For example: 
can we use the minimum recovery ending location in pg control file to get the 
minimum recovery timestamp ?

Minimum recovery ending location: 28/28000B68


Thanks in advance,

Best,
Harinath.

Re: Skip partition tuple routing with constant partition key

2022-07-28 Thread Amit Langote

On Thu, Jul 28, 2022 at 11:59 AM David Rowley  wrote:
> On Thu, 28 Jul 2022 at 00:50, Amit Langote  wrote:
> > So, in a way the caching scheme works for
> > LIST partitioning only if the same value appears consecutively in the
> > input set, whereas it does not for *a set of* values belonging to the
> > same partition appearing consecutively.  Maybe that's a reasonable
> > restriction for now.
>
> I'm not really seeing another cheap enough way of doing that. Any LIST
> partition could allow any number of values. We've only space to record
> 1 of those values by way of recording which element in the
> PartitionBound that it was located.

Yeah, no need to complicate the implementation for the LIST case.

> > if (part_index < 0)
> > -   part_index = boundinfo->default_index;
> > +   {
> > +   /*
> > +* Since we don't do caching for the default partition or failed
> > +* lookups, we'll just wipe the cache fields back to their initial
> > +* values.  The count becomes 0 rather than 1 as 1 means it's the
> > +* first time we've found a partition we're recording for the cache.
> > +*/
> > +   partdesc->last_found_datum_index = -1;
> > +   partdesc->last_found_part_index = -1;
> > +   partdesc->last_found_count = 0;
> > +
> > +   return boundinfo->default_index;
> > +   }
> >
> > I wonder why not to leave the cache untouched in this case?  It's
> > possible that erratic rows only rarely occur in the input sets.
>
> I looked into that and I ended up just removing the code to reset the
> cache. It now works similarly to a LIST partitioned table's NULL
> partition.

+1

> > Should the comment update above get_partition_for_tuple() mention
> > something like the cached path is basically O(1) and the non-cache
> > path O (log N) as I can see in comments in some other modules, like
> > pairingheap.c?
>
> I adjusted for the other things you mentioned but I didn't add the big
> O stuff. I thought the comment was clear enough.

WFM.

> I'd quite like to push this patch early next week, so if anyone else
> is following along that might have any objections, could they do so
> before then?

I have no more comments.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com

Re: Improve description of XLOG_RUNNING_XACTS

2022-07-28 Thread Kyotaro Horiguchi

At Thu, 28 Jul 2022 15:53:33 +0900, Masahiko Sawada  
wrote in 
> >
> Do you mean that both could be true at the same time? If I read
> GetRunningTransactionData() correctly, that doesn't happen.

So, I wrote "since it is debugging output", and "fine if we asuume the
record is sound". Is it any trouble with assuming the both *can*
happen at once?  If something's broken, it will be reflected in the
output.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Refactoring postgres_fdw/connection.c

2022-07-28 Thread Kyotaro Horiguchi

At Thu, 28 Jul 2022 15:26:42 +0900, Fujii Masao  
wrote in 
> 
> 
> On 2022/07/27 10:36, Kyotaro Horiguchi wrote:
> > At Tue, 26 Jul 2022 18:33:04 +0900, Fujii Masao
> >  wrote in
> > I didn't see it from that viewpoint but I don't think that
> > unconditionally justifies other refactoring.  If we merge
> > pgfdw_finish_pre_(sub)?commit_cleanup()s this way, in turn
> > pgfdw_subxact_callback() and pgfdw_xact_callback() are going to be
> > almost identical except event IDs to handle. But I don't think we
> > would want to merge them.
> 
> I don't think they are so identical because (as you say) they have to
> handle different event IDs. So I agree we don't want to merge them.

The xact_callback and subxact_callback handles different sets of event
IDs so they can be merged into one switch().  I don't think there's
much difference from merging the functions for xact and subxact into
one rountine then calling it with a flag to chose one of the two
paths. (Even though less-than-half lines of the fuction are shared..)
However, still I don't think they ought to be merged.

> > A concern on 0002 is that it is hiding the subxact-specific steps from
> > the subxact callback.  It would look reasonable if it were called from
> > two or more places for each topleve and !toplevel, but actually it has
> > only one caller for each.  So I think that pgfdw_exec_pre_commit
> > should not do that and should be renamed to pgfdw_commit_remote() or
> > something.  On the other hand pgfdw_finish_pre_commit() hides
> > toplevel-specific steps from the caller so the same argument holds.
> 
> So you conclusion is to rename pgfdw_exec_pre_commit() to
> pgfdw_commit_remote() or something?

And the remote stuff is removed from the function.  That being said, I
don't mean to fight this no longer since that is rather a matter of
taste.

> > Another point that makes me concern about the patch is the new
> > function takes an SQL statement, along with the toplevel flag. I guess
> > the reason is that the command for subxact (RELEASE SAVEPOINT %d)
> > requires the current transaction level.  However, the values
> > isobtainable very cheap within the cleanup functions. So I propose to
> > get rid of the parameter "sql" from the two functions.
> 
> Yes, that's possible. That is, pgfdw_exec_pre_commit() can construct
> the query string by executing the following codes, instead of
> accepting the query as an argument. But one downside of this approach
> is that the following codes are executed for every remote
> subtransaction entries. Maybe it's cheap to construct the query string
> as follows, but I'd like to avoid any unnecessray overhead if
> possible. So the patch makes the caller, pgfdw_subxact_callback(),
> construct the query string only once and give it to
> pgfdw_exec_pre_commit().
> 
>   curlevel = GetCurrentTransactionNestLevel();
>   snprintf(sql, sizeof(sql), "RELEASE SAVEPOINT s%d", curlevel);

That *overhead* has been there and I'm not sure how much actual impact
it gives on performance (comparing to the surrounding code).  But I
would choose leaving it open-coded as-is than turning it into a
function that need two tightly-bonded parameters passed and that also
tightly bonded to the caller via the parameters. ...In other words,
the original code doesn't seem to meet the requirement for a function.

However, it's okay if you prefer the functions than the open-coded
lines based on the above discussion, I'd stop objecting.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Masahiko Sawada

On Thu, Jul 28, 2022 at 4:13 PM Amit Kapila  wrote:
>
> On Thu, Jul 28, 2022 at 11:56 AM Masahiko Sawada  
> wrote:
> >
> > On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila  
> > wrote:
> > >
> > > On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada  
> > > wrote:
> > > >
> >
> > While editing back branch patches, I realized that the following
> > (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> > equivalent:
> >
> > +   /*
> > +* If the COMMIT record has invalidation messages, it could have catalog
> > +* changes. It is possible that we didn't mark this transaction as
> > +* containing catalog changes when the decoding starts from a commit
> > +* record without decoding the transaction's other changes. So, we 
> > ensure
> > +* to mark such transactions as containing catalog change.
> > +*
> > +* This must be done before SnapBuildCommitTxn() so that we can include
> > +* these transactions in the historic snapshot.
> > +*/
> > +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> > +   SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> > + parsed->nsubxacts, parsed->subxacts,
> > + buf->origptr);
> > +
> > /*
> >  * Process invalidation messages, even if we're not interested in the
> >  * transaction's contents, since the various caches need to always be
> >  * consistent.
> >  */
> > if (parsed->nmsgs > 0)
> > {
> > if (!ctx->fast_forward)
> > ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
> >   parsed->nmsgs, parsed->msgs);
> > ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
> > }
> >
> > If that's right, I think we can merge these if branches. We can call
> > ReorderBufferXidSetCatalogChanges() for top-txn and in
> > SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> > is in the list. What do you think?
> >
>
> Note that this code doesn't exist in 14 and 15, so we need to create
> different patches for those.

Right.

> BTW, how in 13 and lower versions did we
> identify and mark subxacts as having catalog changes without our
> patch?

I think we use HEAP_INPLACE and XLOG_HEAP2_NEW_CID to mark subxacts as well.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-07-28 Thread Amit Kapila

On Thu, Jul 28, 2022 at 11:56 AM Masahiko Sawada  wrote:
>
> On Thu, Jul 28, 2022 at 12:21 PM Amit Kapila  wrote:
> >
> > On Thu, Jul 28, 2022 at 7:18 AM Masahiko Sawada  
> > wrote:
> > >
>
> While editing back branch patches, I realized that the following
> (parsed->xinfo & XACT_XINFO_HAS_INVALS) and (parsed->nmsgs > 0) are
> equivalent:
>
> +   /*
> +* If the COMMIT record has invalidation messages, it could have catalog
> +* changes. It is possible that we didn't mark this transaction as
> +* containing catalog changes when the decoding starts from a commit
> +* record without decoding the transaction's other changes. So, we ensure
> +* to mark such transactions as containing catalog change.
> +*
> +* This must be done before SnapBuildCommitTxn() so that we can include
> +* these transactions in the historic snapshot.
> +*/
> +   if (parsed->xinfo & XACT_XINFO_HAS_INVALS)
> +   SnapBuildXidSetCatalogChanges(ctx->snapshot_builder, xid,
> + parsed->nsubxacts, parsed->subxacts,
> + buf->origptr);
> +
> /*
>  * Process invalidation messages, even if we're not interested in the
>  * transaction's contents, since the various caches need to always be
>  * consistent.
>  */
> if (parsed->nmsgs > 0)
> {
> if (!ctx->fast_forward)
> ReorderBufferAddInvalidations(ctx->reorder, xid, buf->origptr,
>   parsed->nmsgs, parsed->msgs);
> ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
> }
>
> If that's right, I think we can merge these if branches. We can call
> ReorderBufferXidSetCatalogChanges() for top-txn and in
> SnapBuildXidSetCatalogChanges() we mark its subtransactions if top-txn
> is in the list. What do you think?
>

Note that this code doesn't exist in 14 and 15, so we need to create
different patches for those. BTW, how in 13 and lower versions did we
identify and mark subxacts as having catalog changes without our
patch?

-- 
With Regards,
Amit Kapila.

Re: LogwrtResult contended spinlock

2022-07-28 Thread Alvaro Herrera

v10 is just a trivial rebase.  No changes.  Moved to next commitfest.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
>From 86655a0231e277c3f1bc907a0f0eb669943d4c71 Mon Sep 17 00:00:00 2001
From: Alvaro Herrera 
Date: Tue, 2 Feb 2021 14:03:43 -0300
Subject: [PATCH v10] Make XLogCtl->LogwrtResult accessible with atomics
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Currently, access to LogwrtResult is protected by a spinlock.  This
becomes severely contended in some scenarios, such as with a largish
replication flock: walsenders all calling GetFlushRecPtr repeatedly
cause the processor heat up to the point where eggs can be fried on top.

This can be reduced to a non-problem by replacing XLogCtl->LogwrtResult
with a struct containing a pair of atomically accessed variables. Do so.
In a few places, we can adjust the exact location where the locals are
updated to account for the fact that we no longer need the spinlock.

Author: Álvaro Herrera 
Discussion: https://postgr.es/m/20200831182156.GA3983@alvherre.pgsql
---
 src/backend/access/transam/xlog.c | 106 ++
 src/include/port/atomics.h|  29 
 src/tools/pgindent/typedefs.list  |   1 +
 3 files changed, 78 insertions(+), 58 deletions(-)

diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
index 15ab8d90d4..29ebd38103 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -287,16 +287,13 @@ static bool doPageWrites;
  *
  * LogwrtRqst indicates a byte position that we need to write and/or fsync
  * the log up to (all records before that point must be written or fsynced).
- * LogwrtResult indicates the byte positions we have already written/fsynced.
- * These structs are identical but are declared separately to indicate their
- * slightly different functions.
+ * LogWrtResult indicates the byte positions we have already written/fsynced.
+ * These structs are similar but are declared separately to indicate their
+ * slightly different functions; in addition, the latter is read and written
+ * using atomic operations.
  *
- * To read XLogCtl->LogwrtResult, you must hold either info_lck or
- * WALWriteLock.  To update it, you need to hold both locks.  The point of
- * this arrangement is that the value can be examined by code that already
- * holds WALWriteLock without needing to grab info_lck as well.  In addition
- * to the shared variable, each backend has a private copy of LogwrtResult,
- * which is updated when convenient.
+ * In addition to the shared variable, each backend has a private copy of
+ * LogwrtResult, each member of which is separately updated when convenient.
  *
  * The request bookkeeping is simpler: there is a shared XLogCtl->LogwrtRqst
  * (protected by info_lck), but we don't need to cache any copies of it.
@@ -319,6 +316,12 @@ static bool doPageWrites;
  *--
  */
 
+typedef struct XLogwrtAtomic
+{
+	pg_atomic_uint64 Write;		/* last byte + 1 written out */
+	pg_atomic_uint64 Flush;		/* last byte + 1 flushed */
+} XLogwrtAtomic;
+
 typedef struct XLogwrtRqst
 {
 	XLogRecPtr	Write;			/* last byte + 1 to write out */
@@ -456,6 +459,7 @@ typedef struct XLogCtlData
 {
 	XLogCtlInsert Insert;
 
+	XLogwrtAtomic LogwrtResult; /* uses atomics */
 	/* Protected by info_lck: */
 	XLogwrtRqst LogwrtRqst;
 	XLogRecPtr	RedoRecPtr;		/* a recent copy of Insert->RedoRecPtr */
@@ -473,12 +477,6 @@ typedef struct XLogCtlData
 	pg_time_t	lastSegSwitchTime;
 	XLogRecPtr	lastSegSwitchLSN;
 
-	/*
-	 * Protected by info_lck and WALWriteLock (you must hold either lock to
-	 * read it, but both to update)
-	 */
-	XLogwrtResult LogwrtResult;
-
 	/*
 	 * Latest initialized page in the cache (last byte position + 1).
 	 *
@@ -598,7 +596,7 @@ static ControlFileData *ControlFile = NULL;
 static int	UsableBytesInSegment;
 
 /*
- * Private, possibly out-of-date copy of shared LogwrtResult.
+ * Private, possibly out-of-date copy of shared XLogCtl->LogwrtResult.
  * See discussion above.
  */
 static XLogwrtResult LogwrtResult = {0, 0};
@@ -907,8 +905,6 @@ XLogInsertRecord(XLogRecData *rdata,
 		/* advance global request to include new block(s) */
 		if (XLogCtl->LogwrtRqst.Write < EndPos)
 			XLogCtl->LogwrtRqst.Write = EndPos;
-		/* update local result copy while I have the chance */
-		LogwrtResult = XLogCtl->LogwrtResult;
 		SpinLockRelease(&XLogCtl->info_lck);
 	}
 
@@ -1786,6 +1782,7 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 	 * Now that we have the lock, check if someone initialized the page
 	 * already.
 	 */
+	LogwrtResult.Write = pg_atomic_read_u64(&XLogCtl->LogwrtResult.Write);
 	while (upto >= XLogCtl->InitializedUpTo || opportunistic)
 	{
 		nextidx = XLogRecPtrToBufIdx(XLogCtl->InitializedUpTo);
@@ -1805,17 +1802,18 @@ AdvanceXLInsertBuffer(XLogRecPtr upto, TimeLineID tli, bool opportunistic)
 			if (opportunistic)
 break;
 
-

1 2 >

100 matches

Mail list logo