RE: pg_get_publication_tables() output duplicate relid

2021-12-13 Thread houzj.f...@fujitsu.com
On Sat, Nov 20, 2021 7:31 PM Amit Kapila  wrote:
> On Fri, Nov 19, 2021 at 10:58 AM Amit Kapila 
> wrote:
> >
> > On Fri, Nov 19, 2021 at 7:19 AM Amit Langote 
> wrote:
> > >
> > > The problematic case is attaching the partition *after* the
> > > subscriber has already marked the root parent as synced and/or ready
> > > for replication.  Refreshing the subscription doesn't help it
> > > discover the newly attached partition, because a
> > > publish_via_partition_root only ever tells about the root parent,
> > > which would be already synced, so the subscriber would think there's
> > > nothing to copy.
> > >
> >
> > Okay, I see this could be a problem but I haven't tried to reproduce it.
> 
> One more thing you mentioned is that the initial sync won't work after refresh
> but later changes will be replicated but I noticed that later changes also 
> don't
> get streamed till we restart the subscriber server.
> I am not sure but we might not be invalidating apply workers cache due to
> which it didn't notice the same.

I investigated this bug recently, and I think the reason is that when receiving
relcache invalidation message, the callback function[1] in walsender only reset
the schema sent status while it doesn't reset the replicate_valid flag. So, it
won’t rebuild the publication actions of the relation.

[1]
static void
rel_sync_cache_relation_cb(Datum arg, Oid relid)
...
/*
 * Reset schema sent status as the relation definition may have changed.
 * Also free any objects that depended on the earlier definition.
 */
if (entry != NULL)
{
entry->schema_sent = false;
list_free(entry->streamed_txns);
...

Also, when you DETACH a partition, the publication won’t be rebuilt too because
of the same reason. Which could cause unexpected behavior if we modify the
detached table's data . And the bug happens regardless of whether pubviaroot is
set or not.

For the fix:

I think if we also reset replicate_valid flag in rel_sync_cache_relation_cb,
then the bug can be fixed. I have a bit hesitation about this approach, because
it could increase the frequency of invalidating and rebuilding the publication
action. But I haven't produced some other better approaches.

Another possibility could be that we add a syscache callback function for
pg_inherits table, but we currently don't have syscache for pg_inherits. We
might need to add the cache pg_inherits first which doesn't seems better
than the above approach.

What do you think ?

Attach an initial patch which reset the replicate_valid flag in
rel_sync_cache_relation_cb and add some reproduction tests in 013_partition.pl.

Best regards,
Hou zj





0001-fix-replication-after-ATTACH-or-DETACH-partition.patch
Description:  0001-fix-replication-after-ATTACH-or-DETACH-partition.patch


Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread Dilip Kumar
On Tue, Dec 14, 2021 at 8:20 AM Amit Kapila  wrote:
>
> On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada  wrote:

> > In streaming cases, we don’t know when stream-commit or stream-abort
> > comes and another conflict could occur on the subscription in the
> > meanwhile. But given that (we expect) this feature is used after the
> > apply worker enters into an error loop, this is unlikely to happen in
> > practice unless the user sets the wrong XID. Similarly, in 2PC cases,
> > we don’t know when commit-prepared or rollback-prepared comes and
> > another conflict could occur in the meanwhile. But this could occur in
> > practice even if the user specified the correct XID. Therefore, if we
> > disallow to change skip_xid until the subscriber receives
> > commit-prepared or rollback-prepared, we cannot skip the second
> > transaction that conflicts with data on the subscriber.
> >
>
> I agree with this theory. Can we reflect this in comments so that in
> the future we know why we didn't pursue this direction?

I might be missing something here, but for streaming, transaction
users can decide whether they wants to skip or not only once we start
applying no?  I mean only once we start applying the changes we can
get some errors and by that time we must be having all the changes for
the transaction.  So I do not understand the point we are trying to
discuss here?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Commitfest 2021-11 Patch Triage - Part 3

2021-12-13 Thread Jeff Davis
On Mon, 2021-12-13 at 11:20 -0800, Mark Dilger wrote:
>   - The logical replication subscription patch needs to consider RLS.
> (Jeff Davis)
> 
> This is implemented but I still need to update the documentation
> before
> posting.

We also discussed how much of the insert path we want to include in the
apply worker. The immediate need is for the patch to support RLS, but
it raised the question: "why not just use the entire ExecInsert()
path?".

That's not a blocking issue for this patch, but something to consider
that will avoid problems if we want to support updatable views or
foreign tables as a target of a subscription.

Regards,
Jeff Davis






Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-12-13 Thread Andres Freund
On 2021-12-14 14:31:24 +0900, Michael Paquier wrote:
> On Mon, Dec 13, 2021 at 06:08:24PM -0800, Andres Freund wrote:
> > Seems like we might get away with making make -C contrib/pg_upgrade check 
> > and
> > vcregress.pl upgradecheck do nothing?
> 
> You mean #contrib/#src/bin/# here, right?  I don't think that we have
> any need to have "make -C" do nothing.  For vcregress.pl, we should
> IMO just remove upgradecheck.

Tom's point was that the buildfarm scripts do
if ($self->{bfconf}->{using_msvc})
@checklog = run_log("perl vcregress.pl upgradecheck");
else
  "cd $self->{pgsql}/src/bin/pg_upgrade && $make 
$instflags check";

if we don't want to break every buildfarm member that has TestUpgrade enabled
the moment this is committed, we need to have a backward compat path.

Greetings,

Andres Freund




Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread Dilip Kumar
On Fri, Dec 3, 2021 at 12:12 PM Masahiko Sawada  wrote:
>
> Skipping a whole transaction by specifying xid would be a good start.
> Ideally, we'd like to automatically skip only operations within the
> transaction that fail but it seems not easy to achieve. If we allow
> specifying operations and/or relations, probably multiple operations
> or relations need to be specified in some cases. Otherwise, the
> subscriber cannot continue logical replication if the transaction has
> multiple operations on different relations that fail. But similar to
> the idea of specifying multiple xids, we need to note the fact that
> user wouldn't know of the second operation failure unless the apply
> worker applies the change. So I'm not sure there are many use cases in
> practice where users can specify multiple operations and relations in
> order to skip applies that fail.

I think there would be use cases for specifying the relations or
operation, e.g. if the user finds an issue in inserting in a
particular relation then maybe based on some manual investigation he
founds that the table has some constraint due to that it is failing on
the subscriber side but on the publisher side that constraint is not
there so maybe the user is okay to skip the changes for this table and
not for other tables, or there might be a few more tables which are
designed based on the same principle and can have similar error so
isn't it good to provide an option to give the list of all such
tables.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread Greg Nancarrow
On Tue, Dec 14, 2021 at 3:23 PM vignesh C  wrote:
>
> While the worker is skipping one of the skip transactions specified by
> the user and immediately if the user specifies another skip
> transaction while the skipping of the transaction is in progress this
> new value will be reset by the worker while clearing the skip xid. I
> felt once the worker has identified the skip xid and is about to skip
> the xid, the worker can acquire a lock to prevent concurrency issues:

That's a good point.
If only the last_error_xid could be skipped, then this wouldn't be an
issue, right?
If a different xid to skip is specified while the worker is currently
skipping a transaction, should that even be allowed?


Regards,
Greg Nancarrow
Fujitsu Australia




RE: Optionally automatically disable logical replication subscriptions on error

2021-12-13 Thread osumi.takami...@fujitsu.com
On Monday, December 13, 2021 6:57 PM vignesh C  wrote:
> On Mon, Dec 6, 2021 at 4:22 PM osumi.takami...@fujitsu.com
>  wrote:
> >
> > I've attached the new version v12.
I appreciate your review.


> Thanks for the updated patch, few comments:
> 1) This is not required as it is not used in the caller.
> +++ b/src/backend/replication/logical/launcher.c
> @@ -132,6 +132,7 @@ get_subscription_list(void)
> sub->dbid = subform->subdbid;
> sub->owner = subform->subowner;
> sub->enabled = subform->subenabled;
> +   sub->disableonerr = subform->subdisableonerr;
> sub->name = pstrdup(NameStr(subform->subname));
> /* We don't fill fields we are not interested in. */
Okay.
The comment of the get_subscription_list() mentions that
we collect and fill only fields related to worker start/stop.
Then, I didn't need it. Fixed.


> 2) Should this be changed:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription
> +   worker detects any errors
> +  
> + 
> To:
> +   subdisableonerr bool
> +  
> +  
> +   If true, the subscription will be disabled when subscription's
> +   worker detects any errors
> +  
> + 
I felt either is fine. So fixed.


> 3) The last line can be slightly adjusted to keep within 80 chars:
> +  Specifies whether the subscription should be automatically disabled
> +  if any errors are detected by subscription workers during data
> +  replication from the publisher. The default is
> false.
Fixed.

> 4) Similarly this too can be handled:
> --- a/src/backend/catalog/system_views.sql
> +++ b/src/backend/catalog/system_views.sql
> @@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM
> public;
>  -- All columns of pg_subscription except subconninfo are publicly readable.
>  REVOKE ALL ON pg_subscription FROM public;  GRANT SELECT (oid,
> subdbid, subname, subowner, subenabled, subbinary,
> -  substream, subtwophasestate, subslotname,
> subsynccommit, subpublications)
> +  substream, subtwophasestate, subdisableonerr,
> subslotname, subsynccommit, subpublications)
>  ON pg_subscription TO public;
I split the line into two to make each line less than 80 chars.

> 5) Since disabling subscription code is common in and else, can we move it
> below:
> +   if (MySubscription->disableonerr)
> +   {
> +   WorkerErrorRecovery();
> +   error_recovery_done = true;
> +   }
> +   else
> +   {
> +   /*
> +* Some work in error recovery work is
> done. Switch to the old
> +* memory context and rethrow.
> +*/
> +   MemoryContextSwitchTo(ecxt);
> +   PG_RE_THROW();
> +   }
> +   }
> +   else
> +   {
> +   /*
> +* Don't miss any error, even when it's not
> reported to stats
> +* collector.
> +*/
> +   if (MySubscription->disableonerr)
> +   {
> +   WorkerErrorRecovery();
> +   error_recovery_done = true;
> +   }
> +   else
> +   /* Simply rethrow because of no recovery
> work */
> +   PG_RE_THROW();
> +   }
I moved the common code below those condition branches.


> 6) Can we move LockSharedObject below the if condition.
> +   subform = (Form_pg_subscription) GETSTRUCT(tup);
> +   LockSharedObject(SubscriptionRelationId, subform->oid, 0,
> AccessExclusiveLock);
> +
> +   /*
> +* We would not be here unless this subscription's
> disableonerr field was
> +* true when our worker began applying changes, but check whether
> that
> +* field has changed in the interim.
> +*/
> +   if (!subform->subdisableonerr)
> +   {
> +   /*
> +* Disabling the subscription has been done already. No need
> of
> +* additional work.
> +*/
> +   heap_freetuple(tup);
> +   table_close(rel, RowExclusiveLock);
> +   CommitTransactionCommand();
> +   return;
> +   }
> +
Fixed.

Besides all of those changes, I've removed the obsolete
comment of DisableSubscriptionOnError in v12.

Best Regards,
Takamichi Osumi



v13-0001-Optionally-disable-subscriptions-on-error.patch
Description: v13-0001-Optionally-disable-subscriptions-on-error.patch


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-12-13 Thread Michael Paquier
On Mon, Dec 13, 2021 at 06:08:24PM -0800, Andres Freund wrote:
> Seems like we might get away with making make -C contrib/pg_upgrade check and
> vcregress.pl upgradecheck do nothing?

You mean #contrib/#src/bin/# here, right?  I don't think that we have
any need to have "make -C" do nothing.  For vcregress.pl, we should
IMO just remove upgradecheck.

> For the common case of not testing cross-version stuff, pg_upgrade's tests
> would just be invoked via run_build.pl:run_bin_tests(). And TestUpgrade.pm
> should be fine with a test doing nothing.

Perhaps.  I am not sure what's the best picture here, TBH.  One
difference between the core stuff and the buldfarm is that in the case
of the buildfarm, we upgrade from a version that has not only the main
regression database, but everything from the contrib/ modules.

Speaking of which, I am going to send a patch for the buildfarm to be
able to use the SQL file from 0df9641, so as committers gain a bit
more control on the cross-version upgrade tests run by the buildfarm,
using the in-core code a maximum.
--
Michael


signature.asc
Description: PGP signature


Re: row filtering for logical replication

2021-12-13 Thread Amit Kapila
On Tue, Dec 14, 2021 at 4:44 AM Peter Smith  wrote:
>
> On Tue, Dec 7, 2021 at 5:48 PM tanghy.f...@fujitsu.com
>  wrote:
> >
> > I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should be
> > treated as no filter, and table tbl should have no filter in subscription 
> > sub. Thoughts?
> >
> > But for now, the filter(a > 10) works both when copying initial data and 
> > later changes.
> >
> > To fix it, I think we can check if the table is published in a 'FOR ALL 
> > TABLES'
> > publication or published as part of schema in function 
> > pgoutput_row_filter_init
> > (which was introduced in v44-0003 patch), also we need to make some changes 
> > in
> > tablesync.c.
> >
>
> Partly fixed in v46-0005  [1]
>
> NOTE
> - The initial COPY part of the tablesync does not take the publish
> operation into account so it means that if any of the subscribed
> publications have "puballtables" flag then all data will be copied
> sans filters.
>

I think this should be okay but the way you have implemented it in the
patch doesn't appear to be the optimal way. Can't we fetch
allpubtables info and qual info as part of one query instead of using
separate queries?

> I guess this is consistent with the other decision to
> ignore publication operations [2].
>
> TODO
> - Documentation
> - IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN SCHEMA
>

Yeah, "FOR ALL TABLES IN SCHEMA" should also be addressed. In this
case, the difference would be that we need to check the presence of
schema corresponding to the table (for which we are fetching
row_filter information) is there in pg_publication_namespace. If it
exists then we don't need to apply row_filter for the table. I feel it
is better to fetch all this information as part of the query which you
are using to fetch row_filter info. The idea is to avoid the extra
round-trip between subscriber and publisher.

Few other comments:
===
1.
@@ -926,6 +928,22 @@ pgoutput_row_filter_init(PGOutputData *data,
Relation relation, RelationSyncEntr
  bool rfisnull;

  /*
+ * If the publication is FOR ALL TABLES then it is treated same as if this
+ * table has no filters (even if for some other publication it does).
+ */
+ if (pub->alltables)
+ {
+ if (pub->pubactions.pubinsert)
+ no_filter[idx_ins] = true;
+ if (pub->pubactions.pubupdate)
+ no_filter[idx_upd] = true;
+ if (pub->pubactions.pubdelete)
+ no_filter[idx_del] = true;
+
+ continue;
+ }

Is there a reason to continue checking the other publications if
no_filter is true for all kind of pubactions?

2.
+ * All row filter expressions will be discarded if there is one
+ * publication-relation entry without a row filter. That's because
+ * all expressions are aggregated by the OR operator. The row
+ * filter absence means replicate all rows so a single valid
+ * expression means publish this row.

This same comment is at two places, remove from one of the places. I
think keeping it atop for loop is better.

3.
+ {
+ int idx;
+ bool found_filters = false;

I am not sure if starting such ad-hoc braces in the code to localize
the scope of variables is a regular practice. Can we please remove
this?


-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread vignesh C
On Fri, Dec 10, 2021 at 11:14 AM Masahiko Sawada  wrote:
>
> On Thu, Dec 9, 2021 at 6:16 PM Amit Kapila  wrote:
> >
> > On Thu, Dec 9, 2021 at 2:24 PM Masahiko Sawada  
> > wrote:
> > >
> > > On Thu, Dec 9, 2021 at 11:47 AM Amit Kapila  
> > > wrote:
> > > >
> > > > I am thinking that we can start a transaction, update the catalog,
> > > > commit that transaction. Then start a new one to update
> > > > origin_lsn/timestamp, finishprepared, and commit it. Now, if it
> > > > crashes after the first transaction, only commit prepared will be
> > > > resent again and this time we don't need to update the catalog as that
> > > > entry would be already cleared.
> > >
> > > Sounds good. In the crash case, it should be fine since we will just
> > > commit an empty transaction. The same is true for the case where
> > > skip_xid has been changed after skipping and preparing the transaction
> > > and before handling commit_prepared.
> > >
> > > Regarding the case where the user specifies XID of the transaction
> > > after it is prepared on the subscriber (i.g., the transaction is not
> > > empty), we won’t skip committing the prepared transaction. But I think
> > > that we don't need to support skipping already-prepared transaction
> > > since such transaction doesn't conflict with anything regardless of
> > > having changed or not.
> > >
> >
> > Yeah, this makes sense to me.
> >
>
> I've attached an updated patch. The new syntax is like "ALTER
> SUBSCRIPTION testsub SKIP (xid = '123')".
>
> I’ve been thinking we can do something safeguard for the case where
> the user specified the wrong xid. For example, can we somewhat use the
> stats in pg_stat_subscription_workers? An idea is that logical
> replication worker fetches the xid from the stats when reading the
> subscription and skips the transaction if the xid matches to
> subskipxid. That is, the worker checks the error reported by the
> worker previously working on the same subscription. The error could
> not be a conflict error (e.g., connection error etc.) or might have
> been cleared by the reset function, But given the worker is in an
> error loop, the worker can eventually get xid in question. We can
> prevent an unrelated transaction from being skipped unexpectedly. It
> seems not a stable solution though. Or it might be enough to warn
> users when they specified an XID that doesn’t match to last_error_xid.
> Anyway, I think it’s better to have more discussion on this. Any
> ideas?

While the worker is skipping one of the skip transactions specified by
the user and immediately if the user specifies another skip
transaction while the skipping of the transaction is in progress this
new value will be reset by the worker while clearing the skip xid. I
felt once the worker has identified the skip xid and is about to skip
the xid, the worker can acquire a lock to prevent concurrency issues:
+static void
+clear_subscription_skip_xid(void)
+{
+   Relationrel;
+   HeapTuple   tup;
+   boolnulls[Natts_pg_subscription];
+   boolreplaces[Natts_pg_subscription];
+   Datum   values[Natts_pg_subscription];
+
+   memset(values, 0, sizeof(values));
+   memset(nulls, false, sizeof(nulls));
+   memset(replaces, false, sizeof(replaces));
+
+   if (!IsTransactionState())
+   StartTransactionCommand();
+
+   rel = table_open(SubscriptionRelationId, RowExclusiveLock);
+
+   /* Fetch the existing tuple. */
+   tup = SearchSysCacheCopy1(SUBSCRIPTIONOID,
+
ObjectIdGetDatum(MySubscription->oid));
+
+   if (!HeapTupleIsValid(tup))
+   elog(ERROR, "subscription \"%s\" does not exist",
MySubscription->name);
+
+   /* Set subskipxid to null */
+   nulls[Anum_pg_subscription_subskipxid - 1] = true;
+   replaces[Anum_pg_subscription_subskipxid - 1] = true;
+
+   /* Update the system catalog to reset the skip XID */
+   tup = heap_modify_tuple(tup, RelationGetDescr(rel), values, nulls,
+   replaces);
+   CatalogTupleUpdate(rel, &tup->t_self, tup);
+
+   heap_freetuple(tup);
+   table_close(rel, RowExclusiveLock);
+}

Regards,
Vignesh




Re: Adding CI to our tree

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-14 16:51:58 +1300, Thomas Munro wrote:
> I'd better go and figure out how to fix cfbot when this lands...

I assume it'd be:
- stop adding the CI stuff
- adjust links to CI tasks, appveyor wouldn't be used anymore
- perhaps reference individual tasks from the cfbot page?


> > I think this may be OK for now, but I also could see arguments for wanting 
> > to
> > transfer the image specifications and the google account to PG properties.
> 
> No clue on the GCP account side of it (does pginfra already have
> one?), but for the repo I guess it would seem natural to have one on
> git.postgresql.org infra, mirrored (just like the main repo) to a repo
> on project-owned github.com/postgres, from which image building is
> triggered.  Then it could be maintained by the whole PostgreSQL
> project, patches discussed on -hackers, a bit like pg_bsd_indent.
> Perhaps with some way to trigger test image builds, so that people
> working on it don't need their own GCP account to do a trial run.

I think that's a good medium-term goal, I'd not make it a prerequisite for
merging myself.


> +  # Test that code can be built with gcc/clang without warnings
> 
> As a TODO note, I think we should eventually run a warnings check for
> MSVC too.  IIUC we only aim to be warning free in assertion builds on
> that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it
> has it in C++ but not C?), but that's something.

Hm. Not entirely sure how to do that without doing a separate windows build,
which is too slow...


> +  # XXX: Only do this if there have been changes in doc/ since last build
> +  always:
> +docs_build_script: |
> +  time ./configure \
> +--cache gcc.cache CC="ccache gcc"
> +  time make -s -j4 -C doc
> 
> Another TODO note: perhaps we could also make the documentation
> results a browsable artefact with a short retention time, if that's a
> thing.

Might be doable, but I'd guess that the volume of data it'd generate make it
not particularly attractive.


> I feel like I should apologise in advance for this level of
> nit-picking about English grammar, but:

:)

Will try to fix.


> +  name: FreeBSD
> 
> FreeBSD is missing --with-llvm.

That was kind of intentional, I guess I should add a comment about it. The CI
image for freebsd already starts slower due to its size, and is on the slower
side compared to anything !windows, so I'm not sure it's worth enabling llvm
there? It's probably not bad to have one platform testing without llvm.


> > [1] Some ideas for what could make sense to extend CI to in the future:
> 
> To your list, I'd add:
> 
> * 32 bit

That'd be easy.


> * test coverage report

If the output size is reasonable, that should be doable as well.


> * ability to capture complete Window install directory as an artefact
> so a Windows user without a dev environment could try out a proposed
> change/CF entry/...

I think the size of these artifacts would make this not something to enable by
default. But perhaps a manually triggered task would make sense?


> I hope we can get ccache working on Windows.

They did merge a number of the other required changes for that over the
weekend. I'll try once they released...

Thanks!

Andres Freund




more descriptive message for process termination due to max_slot_wal_keep_size

2021-12-13 Thread Kyotaro Horiguchi
Hello.

As complained in pgsql-bugs [1], when a process is terminated due to
max_slot_wal_keep_size, the related messages don't mention the root
cause for *the termination*.  Note that the third message does not
show for temporary replication slots.

[pid=a] LOG:  terminating process x to release replication slot "s"
[pid=x] LOG:  FATAL:  terminating connection due to administrator command
[pid=a] LOG:  invalidting slot "s" because its restart_lsn X/X exceeds 
max_slot_wal_keep_size

The attached patch attaches a DETAIL line to the first message.

> [17605] LOG:  terminating process 17614 to release replication slot "s1"
+ [17605] DETAIL:  The slot's restart_lsn 0/2CA0 exceeds 
max_slot_wal_keep_size.
> [17614] FATAL:  terminating connection due to administrator command
> [17605] LOG:  invalidating slot "s1" because its restart_lsn 0/2CA0 
> exceeds max_slot_wal_keep_size

Somewhat the second and fourth lines look inconsistent each other but
that wouldn't be such a problem.  I don't think we want to concatenate
the two lines together as the result is a bit too long.

> LOG:  terminating process 17614 to release replication slot "s1" because it's 
> restart_lsn 0/2CA0 exceeds max_slot_wal_keep_size.

What do you think about this?

[1] 
https://www.postgresql.org/message-id/20211214.101137.379073733372253470.horikyota.ntt%40gmail.com

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
>From b0c27dc80aff37ef984592b79f1dd20d052299fa Mon Sep 17 00:00:00 2001
From: Kyotaro Horiguchi 
Date: Tue, 14 Dec 2021 10:50:00 +0900
Subject: [PATCH] Make an error message about process termination more
 descriptive

If checkpointer kills a process due to a temporary replication slot
exceeding max_slot_wal_keep_size, the messages fails to describe the
cause of the termination.  It is because the message that describes
the reason that is emitted for persistent slots does not show for
temporary slots.  Add a DETAIL line to the message common to all types
of slot to describe the cause.

Reported-by: Alex Enachioaie 
Discussion: https://www.postgresql.org/message-id/17327-89d0efa8b9ae6271%40postgresql.org
---
 src/backend/replication/slot.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/src/backend/replication/slot.c b/src/backend/replication/slot.c
index 90ba9b417d..cba9a29113 100644
--- a/src/backend/replication/slot.c
+++ b/src/backend/replication/slot.c
@@ -1254,7 +1254,8 @@ InvalidatePossiblyObsoleteSlot(ReplicationSlot *s, XLogRecPtr oldestLSN,
 			{
 ereport(LOG,
 		(errmsg("terminating process %d to release replication slot \"%s\"",
-active_pid, NameStr(slotname;
+active_pid, NameStr(slotname)),
+		 errdetail("The slot's restart_lsn %X/%X exceeds max_slot_wal_keep_size.", LSN_FORMAT_ARGS(restart_lsn;
 
 (void) kill(active_pid, SIGTERM);
 last_signaled_pid = active_pid;
-- 
2.27.0



Re: Adding CI to our tree

2021-12-13 Thread Thomas Munro
On Tue, Dec 14, 2021 at 10:12 AM Andres Freund  wrote:
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

I've been pushing various versions of these patches into my own
development branches for a while now; they're working very nicely and
helping me a lot.  This is vastly better than anything I was doing
before, especially on Windows which is a blind spot for most of us.
It'll be great to see this committed, and continue improving it
in-tree.  I'd better go and figure out how to fix cfbot when this
lands...

> I think this may be OK for now, but I also could see arguments for wanting to
> transfer the image specifications and the google account to PG properties.

No clue on the GCP account side of it (does pginfra already have
one?), but for the repo I guess it would seem natural to have one on
git.postgresql.org infra, mirrored (just like the main repo) to a repo
on project-owned github.com/postgres, from which image building is
triggered.  Then it could be maintained by the whole PostgreSQL
project, patches discussed on -hackers, a bit like pg_bsd_indent.
Perhaps with some way to trigger test image builds, so that people
working on it don't need their own GCP account to do a trial run.

+  # Test that code can be built with gcc/clang without warnings

As a TODO note, I think we should eventually run a warnings check for
MSVC too.  IIUC we only aim to be warning free in assertion builds on
that platform, because it has no PG_USED_FOR_ASSERTS_ONLY (I think it
has it in C++ but not C?), but that's something.

+  # XXX: Only do this if there have been changes in doc/ since last build
+  always:
+docs_build_script: |
+  time ./configure \
+--cache gcc.cache CC="ccache gcc"
+  time make -s -j4 -C doc

Another TODO note: perhaps we could also make the documentation
results a browsable artefact with a short retention time, if that's a
thing.  (I've been confused about how to spell "artefact" for some
time now, and finally I know why: in the US it has an i; I blame Noah
Websta, whose name I have decided to improve.)

I feel like I should apologise in advance for this level of
nit-picking about English grammar, but:

+2) For not yet merged development work, CI can be enabled for some git hosting
+   providers. This allows to test patches on a number of platforms before they
+   are merged (or even submitted).

You can "allow testing" (gerund verb), you can "allow
developers/us/one/... to test" (infinitive, but with a noun phrase to
say who's allowed to do the thing), you can "allow verification of
..." (noun phrase), you can "be allowed to test" (passive), but you
can't "allow to test": it's not allowed![1] :-)

+# It might be nicer to switch to the openssl built as part of curl-for-win,
+# but recent releases only build openssl 3, and that still seems troublesome
+# on windows,

s/windows,/Windows./

+  name: FreeBSD

FreeBSD is missing --with-llvm.  If you add package "llvm" to your
image builder you'll currently get LLVM 9, then
LLVM_CONFIG="llvm-config" CXX="ccache c++" CLANG="ccache clang".  Or
we could opt for something more modern with package llvm13 and program
names llvm-config13 and clang13.

> The second attention-worthy point is the choice of a new toplevel ci/
> directory as the location for resources referencenced by CI. A few other
> projects also use ci/, but I can also see arguments for moving the contents to
> e.g. src/tools/ci or such?

I'd be +0.75 for moving it to src/tools/ci.

> [1] Some ideas for what could make sense to extend CI to in the future:

To your list, I'd add:

* 32 bit
* test coverage report
* ability to capture complete Window install directory as an artefact
so a Windows user without a dev environment could try out a proposed
change/CF entry/...

I hope we can get ccache working on Windows.

[1] 
https://english.stackexchange.com/questions/60271/grammatical-complements-for-allow/60285#60285




Re: parallel vacuum comments

2021-12-13 Thread Amit Kapila
On Tue, Dec 14, 2021 at 7:40 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Monday, December 13, 2021 2:12 PM Masahiko Sawada  
> wrote:
> >
> > On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila  wrote:
> > >
> > > On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada
> >  wrote:
> > > >
> > > > On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila 
> > wrote:
> > > > >
> > > > > On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada
> >  wrote:
> > > > > >
> > > > > > On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila 
> > wrote:
> > > > > >
> > > > > > Agreed with the above two points.
> > > > > >
> > > > > > I've attached updated patches that incorporated the above comments
> > > > > > too. Please review them.
> > > > > >
> > > > >
> > > > > I have made the following minor changes to the 0001 patch: (a) An
> > > > > assert was removed from dead_items_max_items() which I added back. (b)
> > > > > Removed an unnecessary semicolon from one of the statements in
> > > > > compute_parallel_vacuum_workers(). (c) Changed comments at a few
> > > > > places. (d) moved all parallel_vacuum_* related functions together.
> > > > > (e) ran pgindent and slightly modify the commit message.
> > > > >
> > > > > Let me know what you think of the attached?
> > > >
> > > > Thank you for updating the patch!
> > > >
> > > > The patch also moves some functions, e.g., update_index_statistics()
> > > > is moved without code changes. I agree to move functions for
> > > > consistency but that makes the review hard and the patch complicated.
> > > > I think it's better to do improving the parallel vacuum code and
> > > > moving functions in separate patches.
> > > >
> > >
> > > Okay, I thought it might be better to keep all parallel_vacuum_*
> > > related functions together but we can keep that in a separate patch
> > > Feel free to submit without those changes.
> >
> > I've attached the patch. I've just moved some functions back but not
> > done other changes.
> >
>
> Thanks for your patch.
>
> I tested your patch and tried some cases, like large indexes, different types 
> of indexes, it worked well.
>
> Besides, I noticed a typo as follows:
>
> +   /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS 
> */
>
> "PARALLEL_VACUUM_KEY_STATS" should be "PARALLEL_VACUUM_KEY_INDEX_STATS".
>

Thanks, I can take care of this before committing. The v9-0001* looks
good to me as well, so, I am planning to commit that tomorrow unless I
see more comments or any objection to that. There is still pending
work related to moving parallel vacuum code to a separate file and a
few other pending comments that are still under discussion. We can
take care of those in subsequent patches. Do, let me know if you or
others think differently?

-- 
With Regards,
Amit Kapila.




Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread Amit Kapila
On Mon, Dec 13, 2021 at 6:55 PM Masahiko Sawada  wrote:
>
> On Mon, Dec 13, 2021 at 1:04 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada  
> > wrote:
> >
> > > >
> > > > 4.
> > > > + * Also, one might think that we can skip preparing the skipped 
> > > > transaction.
> > > > + * But if we do that, PREPARE WAL record won’t be sent to its physical
> > > > + * standbys, resulting in that users won’t be able to find the prepared
> > > > + * transaction entry after a fail-over.
> > > > + *
> > > > ..
> > > > + */
> > > > + if (skipping_changes)
> > > > + stop_skipping_changes(false);
> > > >
> > > > Why do we need such a Prepare's entry either at current subscriber or
> > > > on its physical standby? I think it is to allow Commit-prepared. If
> > > > so, how about if we skip even commit prepared as well? Even on
> > > > physical standby, we would be having the value of skip_xid which can
> > > > help us to skip there as well after failover.
> > >
> > > It's true that skip_xid would be set also on physical standby. When it
> > > comes to preparing the skipped transaction on the current subscriber,
> > > if we want to skip commit-prepared I think we need protocol changes in
> > > order for subscribers to know prepare_lsn and preppare_timestampso
> > > that it can lookup the prepared transaction when doing
> > > commit-prepared. I proposed this idea before. This change would be
> > > benefical as of now since the publisher sends even empty transactions.
> > > But considering the proposed patch[1] that makes the puslisher not
> > > send empty transaction, this protocol change would be an optimization
> > > only for this feature.
> > >
> >
> > I was thinking to compare the xid received as part of the
> > commit_prepared message with the value of skip_xid to skip the
> > commit_prepared but I guess the user would change it between prepare
> > and commit prepare and then we won't be able to detect it, right? I
> > think we can handle this and the streaming case if we disallow users
> > to change the value of skip_xid when we are already skipping changes
> > or don't let the new skip_xid to reflect in the apply worker if we are
> > already skipping some other transaction. What do you think?
>
> In streaming cases, we don’t know when stream-commit or stream-abort
> comes and another conflict could occur on the subscription in the
> meanwhile. But given that (we expect) this feature is used after the
> apply worker enters into an error loop, this is unlikely to happen in
> practice unless the user sets the wrong XID. Similarly, in 2PC cases,
> we don’t know when commit-prepared or rollback-prepared comes and
> another conflict could occur in the meanwhile. But this could occur in
> practice even if the user specified the correct XID. Therefore, if we
> disallow to change skip_xid until the subscriber receives
> commit-prepared or rollback-prepared, we cannot skip the second
> transaction that conflicts with data on the subscriber.
>

I agree with this theory. Can we reflect this in comments so that in
the future we know why we didn't pursue this direction?

> From the application perspective, which behavior is preferable between
> skipping preparing a transaction and preparing an empty transaction,
> in the first place? From the resource consumption etc., skipping
> preparing transactions seems better. On the other hand, if we skipped
> preparing the transaction, the application would not be able to find
> the prepared transaction after a fail-over to the subscriber.
>

I am not sure how much it matters that such prepares are not present
because we wanted to some way skip the corresponding commit prepared
as well. I think your previous point is a good enough reason as to why
we should allow such prepares.

-- 
With Regards,
Amit Kapila.




Re: Failed transaction statistics to measure the logical replication progress

2021-12-13 Thread Amit Kapila
On Mon, Dec 13, 2021 at 5:48 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, December 13, 2021 6:19 PM Amit Kapila  
> wrote:
> > On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
> >  wrote:
> >
> > Few questions and comments:
> Thank you for your comments !
>
> > 
> > 1.
> > The pg_stat_subscription_workers view will
> > contain
> > one row per subscription worker on which errors have occurred, for 
> > workers
> > applying logical replication changes and workers handling the initial 
> > data
> > -   copy of the subscribed tables.  The statistics entry is removed when the
> > -   corresponding subscription is dropped.
> > +   copy of the subscribed tables. Also, the row corresponding to the apply
> > +   worker shows all transaction statistics of both types of workers on the
> > +   subscription. The statistics entry is removed when the corresponding
> > +   subscription is dropped.
> >
> > Why did you choose to show stats for both types of workers in one row?
> This is because if we have hundreds or thousands of tables for table sync,
> we need to create many entries to cover them and store the entries for all 
> tables.
>

If we fear a large number of entries for such workers then won't it be
better to show the value of these stats only for apply workers. I
think normally the table sync workers perform only copy operation or
maybe a fixed number of xacts, so, one might not be interested in the
transaction stats of these workers. I find merging only specific stats
of two different types of workers confusing.

What do others think about this?

-- 
With Regards,
Amit Kapila.




RE: parallel vacuum comments

2021-12-13 Thread houzj.f...@fujitsu.com
On Tuesday, December 14, 2021 10:11 AM Tang, Haiying wrote:
> On Monday, December 13, 2021 2:12 PM Masahiko Sawada
>  wrote:
> > I've attached the patch. I've just moved some functions back but not
> > done other changes.
> >
> 
> Thanks for your patch.
> 
> I tested your patch and tried some cases, like large indexes, different types 
> of
> indexes, it worked well.

+1, the patch looks good to me.

Best regards,
Hou zj


Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

2021-12-13 Thread Michael Paquier
On Mon, Dec 13, 2021 at 07:53:43PM +0900, Michael Paquier wrote:
> Well, I don't think that it is a big deal one way or the other, as
> we'd finish with InvalidXLogRecPtr for the LSN and 0 for the timestamp
> anyway.  If both of you feel that just removing the assertion rather
> than adding an extra check is better, that's fine by me :)

Looked at that today, and done this way.  The tests have been extended
a bit more with one ROLLBACK and one ROLLBACK PREPARED, while checking
for the contents decoded.
--
Michael


signature.asc
Description: PGP signature


RE: parallel vacuum comments

2021-12-13 Thread tanghy.f...@fujitsu.com
On Monday, December 13, 2021 2:12 PM Masahiko Sawada  
wrote:
> 
> On Mon, Dec 13, 2021 at 2:09 PM Amit Kapila  wrote:
> >
> > On Mon, Dec 13, 2021 at 10:33 AM Masahiko Sawada
>  wrote:
> > >
> > > On Fri, Dec 10, 2021 at 9:08 PM Amit Kapila 
> wrote:
> > > >
> > > > On Thu, Dec 9, 2021 at 6:05 PM Masahiko Sawada
>  wrote:
> > > > >
> > > > > On Thu, Dec 9, 2021 at 7:44 PM Amit Kapila 
> wrote:
> > > > >
> > > > > Agreed with the above two points.
> > > > >
> > > > > I've attached updated patches that incorporated the above comments
> > > > > too. Please review them.
> > > > >
> > > >
> > > > I have made the following minor changes to the 0001 patch: (a) An
> > > > assert was removed from dead_items_max_items() which I added back. (b)
> > > > Removed an unnecessary semicolon from one of the statements in
> > > > compute_parallel_vacuum_workers(). (c) Changed comments at a few
> > > > places. (d) moved all parallel_vacuum_* related functions together.
> > > > (e) ran pgindent and slightly modify the commit message.
> > > >
> > > > Let me know what you think of the attached?
> > >
> > > Thank you for updating the patch!
> > >
> > > The patch also moves some functions, e.g., update_index_statistics()
> > > is moved without code changes. I agree to move functions for
> > > consistency but that makes the review hard and the patch complicated.
> > > I think it's better to do improving the parallel vacuum code and
> > > moving functions in separate patches.
> > >
> >
> > Okay, I thought it might be better to keep all parallel_vacuum_*
> > related functions together but we can keep that in a separate patch
> > Feel free to submit without those changes.
> 
> I've attached the patch. I've just moved some functions back but not
> done other changes.
> 

Thanks for your patch.

I tested your patch and tried some cases, like large indexes, different types 
of indexes, it worked well.

Besides, I noticed a typo as follows:

+   /* Estimate size for index vacuum stats -- PARALLEL_VACUUM_KEY_STATS */

"PARALLEL_VACUUM_KEY_STATS" should be "PARALLEL_VACUUM_KEY_INDEX_STATS".

Regards,
Tang


Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set

2021-12-13 Thread Andres Freund
Hi,

On 2021-10-02 23:34:38 -0400, Tom Lane wrote:
> Andrew Dunstan  writes:
> > On 10/2/21 5:03 PM, Tom Lane wrote:
> >> IIUC, the only problem for a non-updated animal would be that it'd
> >> run the test twice?  Or would it actually fail?  If the latter,
> >> we'd need to sit on the patch rather longer.
> 
> > The patch removes test.sh, so yes it would break.
> 
> Maybe we could leave test.sh in place for awhile?  I'd rather
> not cause a flag day for buildfarm owners.  (Also, how do we
> see this working in the back branches?)

Seems like we might get away with making make -C contrib/pg_upgrade check and
vcregress.pl upgradecheck do nothing?

For the common case of not testing cross-version stuff, pg_upgrade's tests
would just be invoked via run_build.pl:run_bin_tests(). And TestUpgrade.pm
should be fine with a test doing nothing.

We'd not loose coverage with non-updated BF animals unless they have tap tests
disabled. Just the cross-version test would need timely work by buildfarm
operators - but I think Andrew could deal with that.

Greetings,

Andres Freund




Re: Adding CI to our tree

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-13 18:14:52 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
> >> sudo is used exactly twice; maybe it's not needed at all ?
>
> > The macos one is needed, but the freebsd one indeed isn't.
>
> I'm with Justin on this one.  I would view a script trying to
> mess with /cores as a hostile act.  PG cores on macOS tend to
> be extremely large and can fill up your disk fairly quickly
> if you don't know they're being accumulated.  I think it's okay
> to suggest in the documentation that people might want to allow
> cores to be dropped, but the script has NO business trying to
> force that.

I'm not quite following. This is a ephemeral CI instance?

Greetings,

Andres Freund




Re: Adding CI to our tree

2021-12-13 Thread Tom Lane
Andres Freund  writes:
> On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
>> sudo is used exactly twice; maybe it's not needed at all ?

> The macos one is needed, but the freebsd one indeed isn't.

I'm with Justin on this one.  I would view a script trying to
mess with /cores as a hostile act.  PG cores on macOS tend to
be extremely large and can fill up your disk fairly quickly
if you don't know they're being accumulated.  I think it's okay
to suggest in the documentation that people might want to allow
cores to be dropped, but the script has NO business trying to
force that.

regards, tom lane




Re: row filtering for logical replication

2021-12-13 Thread Peter Smith
On Tue, Dec 7, 2021 at 5:48 PM tanghy.f...@fujitsu.com
 wrote:
>
...
> Thanks for looking into it.
>
> I have another problem with your patch. The document says:
>
> ... If the subscription has several publications in
> +   which the same table has been published with different filters, those
> +   expressions get OR'ed together so that rows satisfying any of the 
> expressions
> +   will be replicated. Notice this means if one of the publications has no 
> filter
> +   at all then all other filters become redundant.
>
> Then, what if one of the publications is specified as 'FOR ALL TABLES' or 'FOR
> ALL TABLES IN SCHEMA'.
>
> For example:
> create table tbl (a int primary key);"
> create publication p1 for table tbl where (a > 10);
> create publication p2 for all tables;
> create subscription sub connection 'dbname=postgres port=5432' publication 
> p1, p2;
>
> I think for "FOR ALL TABLE" publication(p2 in my case), table tbl should be
> treated as no filter, and table tbl should have no filter in subscription 
> sub. Thoughts?
>
> But for now, the filter(a > 10) works both when copying initial data and 
> later changes.
>
> To fix it, I think we can check if the table is published in a 'FOR ALL 
> TABLES'
> publication or published as part of schema in function 
> pgoutput_row_filter_init
> (which was introduced in v44-0003 patch), also we need to make some changes in
> tablesync.c.
>

Partly fixed in v46-0005  [1]

NOTE
- The initial COPY part of the tablesync does not take the publish
operation into account so it means that if any of the subscribed
publications have "puballtables" flag then all data will be copied
sans filters. I guess this is consistent with the other decision to
ignore publication operations [2].

TODO
- Documentation
- IIUC there is a similar case yet to be addressed - FOR ALL TABLES IN SCHEMA

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtoxjo6hpDFTya6WYH-zdspKQ5j%2BwZHBRc6EZkAkq7Nfw%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAA4eK1L3r%2BURSLFotOT5Y88ffscCskRoGC15H3CSAU1jj_0Rdg%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 12:37 PM, "Robert Haas"  wrote:
> On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
>> > But against all that, if these tasks are slowing down checkpoints and
>> > that's avoidable, that seems pretty important too. Interestingly, I
>> > can't say that I've ever seen any of these things be a problem for
>> > checkpoint or startup speed. I wonder why you've had a different
>> > experience.
>>
>> Yeah, it's difficult for me to justify why users should suffer long
>> periods of downtime because startup or checkpointing is taking a very
>> long time doing things that are arguably unrelated to startup and
>> checkpointing.
>
> Well sure. But I've never actually seen that happen.

I'll admit that surprises me.  As noted elsewhere [0], we were seeing
this enough with pgsql_tmp that we started moving the directory aside
before starting the server.  Discussions about handling this usually
prompt questions about why there are so many temporary files in the
first place (which is fair).  FWIW all four functions noted in my
original message [1] are things I've seen firsthand affecting users.

Nathan

[0] https://postgr.es/m/E7573D54-A8C9-40A8-89D7-0596A36ED124%40amazon.com
[1] https://postgr.es/m/C1EE64B0-D4DB-40F3-98C8-0CED324D34CB%40amazon.com



Re: row filtering for logical replication

2021-12-13 Thread Peter Smith
On Thu, Dec 9, 2021 at 1:37 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, December 8, 2021 7:52 PM Ajin Cherian 
> > On Tue, Dec 7, 2021 at 5:36 PM Peter Smith  wrote:
> > >
> > > We were mid-way putting together the next v45* when your latest
> > > attachment was posted over the weekend. So we will proceed with our
> > > original plan to post our v45* (tomorrow).
> > >
> > > After v45* is posted we will pause to find what are all the
> > > differences between your unified patch and our v45* patch set. Our
> > > intention is to integrate as many improvements as possible from your
> > > changes into the v46* etc that will follow tomorrow’s v45*. On some
> > > points, we will most likely need further discussion.
> >
> >
> > Posting an update for review comments, using contributions majorly from
> > Peter Smith.
> > I've also included changes based on Euler's combined patch, specially 
> > changes
> > to documentation and test cases.
> > I have left out Hou-san's 0005, in this patch-set. Hou-san will provide a 
> > rebased
> > update based on this.
> >
> > This patch addresses the following review comments:
>
> Hi,
>
> Thanks for updating the patch.
> I noticed a possible issue.
>
> +   /* Check row filter. */
> +   if (!pgoutput_row_filter(data, relation, 
> oldtuple, NULL, relentry))
> +   break;
> +
> +   maybe_send_schema(ctx, change, relation, 
> relentry);
> +
> /* Switch relation if publishing via root. */
> if (relentry->publish_as_relid != 
> RelationGetRelid(relation))
> {
> ...
> /* Convert tuple if needed. */
> if (relentry->map)
> tuple = 
> execute_attr_map_tuple(tuple, relentry->map);
>
> Currently, we execute the row filter before converting the tuple, I think it 
> could
> get wrong result if we are executing a parent table's row filter and the 
> column
> order of the parent table is different from the child table. For example:
>
> 
> create table parent(a int primary key, b int) partition by range (a);
> create table child (b int, a int primary key);
> alter table parent attach partition child default;
> create publication pub for table parent where(a>10) 
> with(PUBLISH_VIA_PARTITION_ROOT);
>
> The column number of 'a' is '1' in filter expression while column 'a' is the
> second one in the original tuple. I think we might need to execute the filter
> expression after converting.
>

Fixed in v46* [1]

--
[1] 
https://www.postgresql.org/message-id/CAHut%2BPtoxjo6hpDFTya6WYH-zdspKQ5j%2BwZHBRc6EZkAkq7Nfw%40mail.gmail.com

Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Adding CI to our tree

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-13 16:02:50 -0600, Justin Pryzby wrote:
> On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote:
> > Hi,
> > 
> > Attached is an updated version of the CI patches. An example of a test run
> > with the attached version of this
> > https://cirrus-ci.com/build/6501998521483264
> 
> sudo is used exactly twice; maybe it's not needed at all ?

The macos one is needed, but the freebsd one indeed isn't.


> > +task:
> > +  name: FreeBSD
> ...
> > +  sysconfig_script:
> > +- sudo sysctl kern.corefile='/tmp/%N.%P.core'
> 
> > +task:
> > +  name: macOS
> ...
> > +  core_install_script:
> > +- sudo chmod 777 /cores
> 
> typos:
> binararies
> dont't

Oops, thanks.


Typos and sudo use are fixed in the repo [1].

Greetings,

Andres Freund

[1] https://github.com/anarazel/postgres/tree/ci




Re: make tuplestore helper function

2021-12-13 Thread Melanie Plageman
On Thu, Nov 18, 2021 at 1:24 PM Justin Pryzby  wrote:
>
> On Thu, Nov 18, 2021 at 12:59:03PM -0500, Melanie Plageman wrote:
> > On Mon, Nov 8, 2021 at 3:13 PM Justin Pryzby  wrote:
> > > On Mon, Nov 08, 2021 at 02:52:28PM -0500, Melanie Plageman wrote:
> > > > On Tue, Nov 2, 2021 at 4:23 PM Justin Pryzby  
> > > > wrote:
> > > > >
> > > > > Several places have a conditional value for the first argument 
> > > > > (randomAccess),
> > > > > but your patch changes the behavior to a constant "true".  I didn't 
> > > > > review the
> > > > > patch beyond that.
> > > > >
> ...
> > > > I believe the patch has preserved the same behavior. All of the callers
> > > > for which I replaced tuplestore_begin_heap() which passed a variable for
> > > > the randomAccess parameter had set that variable to something which was
> > > > effectively the same as passing true -- SFRM_Materialize_Random.
> > >
> > > I don't think so ?
> > >
> > > They callers aren't passing SFRM_Materialize_Random, but rather
> > > (allowedModes & SFRM_Materialize_Random) != 0
> > >
> > > Where allowedModes is determined EXEC_FLAG_BACKWARD.
> ...
> > You are right. I misread it.
> >
> > So, I've attached a patch where randomAccess is now an additional
> > parameter (and registered for the next fest).
> >
> > I was thinking about how to add a test that would have broken when I
> > passed true for randomAccess to tuplestore_begin_heap() when false was
> > required. But, I don't fully understand the problem. If backward
> > accesses to a tuplestore are not allowed but randomAccess is mistakenly
> > passed as true, would the potential result be potentially wrong results
> > from accessing the tuplestore results backwards?
>
> No, see here
>
> src/backend/utils/fmgr/README-The Tuplestore must be created with 
> randomAccess = true if
> src/backend/utils/fmgr/README:SFRM_Materialize_Random is set in allowedModes, 
> but it can (and preferably
> src/backend/utils/fmgr/README-should) be created with randomAccess = false if 
> not.  Callers that can support
> src/backend/utils/fmgr/README-both ValuePerCall and Materialize mode will set 
> SFRM_Materialize_Preferred,
> src/backend/utils/fmgr/README-or not, depending on which mode they prefer.
>
> If you use "randomAccess=true" when it's not needed, the result might be less
> efficient.
>
> Some callers specify "true" when they don't need to.  I ran into that here.
> https://www.postgresql.org/message-id/21724.1583955...@sss.pgh.pa.us
>
> Maybe you'd want to add an 0002 patch which changes those to conditional ?

Done in attached v4

> BTW, there should be a newline before MakeFuncResultTuplestore().

Fixed

- Melanie


v4-0002-Check-allowedModes-when-creating-tuplestores.patch
Description: Binary data


v4-0001-Add-helper-to-make-tuplestore.patch
Description: Binary data


Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 6:30 AM, "Dilip Kumar"  wrote:
> On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby  wrote:
>> Since I think this field is usually not interesting to most users of
>> pg_stat_activity, maybe this should instead be implemented as a function like
>> pg_backend_get_subxact_status(pid).
>>
>> People who want to could use it like:
>> SELECT * FROM pg_stat_activity psa, pg_backend_get_subxact_status(pid) sub;
>
> I have provided two function, one for subtransaction counts and other
> whether subtransaction cache is overflowed or not, we can use like
> this,  if we think this is better way to do it then we can also add
> another function for the lastOverflowedXid

The general approach looks good to me.  I think we could have just one
function for all three values, though.

Nathan



Re: Add client connection check during the execution of the query

2021-12-13 Thread Thomas Munro
On Tue, Dec 14, 2021 at 7:53 AM Andres Freund  wrote:
> On 2021-12-13 17:51:00 +1300, Thomas Munro wrote:
> > I tried that.  It seems OK, and gets rid of the PG_FINALLY(), which is
> > nice.  Latches still have higher priority, and still have the fast
> > return if already set and you asked for only one event, but now if you
> > ask for nevents > 1 we'll make the syscall too so we'll see the
> > WL_SOCKET_CLOSED.
>
> Isn't a certain postgres committer that cares a lot about unnecessary syscalls
> going to be upset about this one? Even with the nevents > 1 optimization? Yes,
> right now there's no other paths that do so, but I don't like the corner this
> paints us in.

Well, I was trying to avoid bikeshedding an API change just for a
hypothetical problem we could solve when the time comes (say, after
fixing the more egregious problems with Append's WES usage), but here
goes: we could do something like AddWaitEventToSet(FeBeWaitSet,
WL_LATCH_SET_LOPRIO, ...) that is translated to WL_LATCH_SET
internally but also sets a flag to enable this
no-really-please-poll-all-the-things-if-there-is-space behaviour.

> From a different angle: Why do we need to perform the client connection check
> if the latch is set?

Imagine a parallel message that arrives just as our connection check
CFI routine runs, and sets the latch.  It'd be arbitrary and bizarre
if that caused us to skip the check.  So we have to ignore it, and the
question is just how.  I presented two different ways.  A third way
would be to create an entirely new WES for this use case; nope, that's
either wasteful of an fd or wasteful of system calls for a temporary
WES and likely more PG_TRY() stuff to avoid leaking it.  A fourth
could be to modify the WES like the simple code in v2/v3, but make the
WES code no-throw or pretend it's no-throw, which I didn't seriously
consider (I mean, if, say, WaitForMultipleObjects() returns
WAIT_FAILED and ERRORs then your session is pretty much hosed and your
WES is probably never going to work correctly again, but it still
seems to break basic rules of programming decency and exception safety
to leave the WES sans latch on non-local exit, which is why I added
the PG_FINALLY() that offended you to v3).




Re: Adding CI to our tree

2021-12-13 Thread Justin Pryzby
On Mon, Dec 13, 2021 at 01:12:23PM -0800, Andres Freund wrote:
> Hi,
> 
> Attached is an updated version of the CI patches. An example of a test run
> with the attached version of this
> https://cirrus-ci.com/build/6501998521483264

sudo is used exactly twice; maybe it's not needed at all ?

> +task:
> +  name: FreeBSD
...
> +  sysconfig_script:
> +- sudo sysctl kern.corefile='/tmp/%N.%P.core'

> +task:
> +  name: macOS
...
> +  core_install_script:
> +- sudo chmod 777 /cores

typos:
binararies
dont't




Re: Column Filtering in Logical Replication

2021-12-13 Thread Alvaro Herrera
On 2021-Dec-13, Alvaro Herrera wrote:

> Hmm, I messed up the patch file I sent.  Here's the complete patch.

Actually, this requires even a bit more mess than this to be really
complete if we want to be strict about it.  The reason is that, with the
patch I just posted, we're creating a new type of representable object
that will need to have some way of making it through pg_identify_object,
pg_get_object_address, pg_identify_object_as_address.  This is only
visible as one tries to patch object_address.sql (auditability of DDL
operations being the goal).

I think this means we need a new OBJECT_PUBLICATION_REL_COLUMN value in
the ObjectType (paralelling OBJECT_COLUMN), and no new ObjectClass
value.  Looking now to confirm.

-- 
Álvaro Herrera  Valdivia, Chile  —  https://www.EnterpriseDB.com/
"El que vive para el futuro es un iluso, y el que vive para el pasado,
un imbécil" (Luis Adler, "Los tripulantes de la noche")




Re: Adding CI to our tree

2021-12-13 Thread Andres Freund
Hi,

Attached is an updated version of the CI patches. An example of a test run
with the attached version of this
https://cirrus-ci.com/build/6501998521483264

I again included the commit allowing crash dumps to be collected on windows,
but I don't think it can be merged as-is, and should be left for later.


Changes since v2:
- Address review comments

- Build with further optional features enabled. I think just about everything
  reasonable is now enabled on freebsd, linux and macos. There's quite a bit
  more that could be done on windows, but I think it's good enough for now.

- I added cross-compilation to windows from linux, to the "warnings"
  task. Occasionally there are build-system issues specific to
  cross-compilation, and the set of warnings are different.

- Docs are now built as part of the 'CompilerWarnings' task.

- I improved the CI README a bit more, in particular I added docs for the
  'ci-os-only' tag I added to the CI logic, which lets one select which
  operating systems test get to run on.

- Some of the 'Warnings' tasks now build with --enable-dtrace (once with
  optimizations, once without). It's pretty easy to break probes without
  seeing the problem locally.

- Switched to using PG_TEST_USE_UNIX_SOCKETS for windows. Without that I was
  seeing occasional spurious test failures due to the use of PROVE_FLAGS=
  -j10, to make the otherwise serial execution of tests on windows bearable.

- switch macos task to use monterey

- plenty smaller changes / cleanups


There of course is a lot more that can be done [1], but I am pretty happy with
what this covers.


I'd like to commit this soon. There's two aspects that perhaps deserve a bit
more discussion before doing so though:

One I explicitly brought up before:

On 2021-10-01 15:27:52 -0700, Andres Freund wrote:
> One policy discussion that we'd have to have is who should control the images
> used for CI. Right now that's on my personal google cloud account - which I am
> happy to do, but medium - long term that'd not be optimal.

The proposed CI script uses custom images to run linux and freebsd tests. They
are automatically built every day from the repository 
https://github.com/anarazel/pg-vm-images/

These images have all the prerequisites pre-installed. For Linux something
similar can be achieved by using dockerfiles referenced the .cirrus.yml file,
but for FreeBSD that's not available. Installing the necessary dependencies on
every run is too time intensive. For linux, the tests run a lot faster in
full-blown VMs than in docker, and full VMs allow a lot more control /
debugging.

I think this may be OK for now, but I also could see arguments for wanting to
transfer the image specifications and the google account to PG properties.


The second attention-worthy point is the choice of a new toplevel ci/
directory as the location for resources referencenced by CI. A few other
projects also use ci/, but I can also see arguments for moving the contents to
e.g. src/tools/ci or such?

Greetings,

Andres Freund


[1] Some ideas for what could make sense to extend CI to in the future:

- also test with msys / mingw on windows

- provide more optional dependencies for windows build

- Extend the set of compiler warnings - as the compiler version is controlled,
  we could be more aggressive than we can be via configure.ac.

- Add further distributions / platforms. Possibly as "manual" tasks - the
  amount resources one CI user gets is limited, so running tests on all
  platforms all the time would make tests take longer. Interesting things
  could be:

  - further linux distributions, particularly long-term supported ones

  - Some of the other BSDs. There currently are no pre-made images for
openbsd/netbsd, but it shouldn't be too hard to script building them.

  - running some tests on ARM could be interesting, cirrus supports that for
container based builds now

- run checks like cpluspluscheck as part of the CompilerWarnings task

- add tasks for running tests with tools like asan / ubsan (valgrind will be
  too slow).

- consider enable compile-time debugging options like COPY_PARSE_PLAN_TREES,
  and run-time force_parallel_mode = regress on some platforms. They seem to
  catch a lot of problems during development and are likely affordable enough.
>From 0fdf36d6701b3bb837a62ce8323e37acf7893e5d Mon Sep 17 00:00:00 2001
From: Andres Freund 
Date: Mon, 15 Mar 2021 09:25:15 -0700
Subject: [PATCH v3 1/2] ci: Add CI for FreeBSD, Linux, MacOS and Windows,
 utilizing cirrus-ci.

Author: Andres Freund 
Author: Thomas Munro 
Author: Melanie Plageman 
Reviewed-By: Melanie Plageman 
Discussion: https://postgr.es/m/20211001222752.wrz7erzh4cajv...@alap3.anarazel.de
---
 .cirrus.yml | 456 
 .dockerignore   |   7 +
 ci/README   |  50 
 ci/docker/linux_debian_bullseye |  67 +
 ci/docker/windows_vs_2019   | 122 +
 ci/gcp_freebsd_repartiti

Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 9:40 AM Shruthi Gowda  wrote:
> > I am reviewing another patch
> > "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> > and will provide the comments soon if any...

I spent much of today reviewing 0001. Here's an updated version, so
far only lightly tested. Please check whether I've broken anything.
Here are the changes:

- I adjusted the function header comment for heap_create. Your
proposed comment seemed like it was pretty detailed but not 100%
correct. It also made one of the lines kind of long because you didn't
wrap the text in the surrounding style. I decided to make it simpler
and shorter instead of longer still and 100% correct.

- I removed a one-line comment that said /* Override the toast
relfilenode */ because it preceded an error check, not a line of code
that would have done what the comment claimed.

- I removed a one-line comment that said /* Override the relfilenode
*/  because the following code would only sometimes override the
relfilenode. The code didn't seem complex enough to justify a a longer
and more accurate comment, so I just took it out.

- I changed a test for (relkind == RELKIND_RELATION || relkind ==
RELKIND_SEQUENCE || relkind == RELKIND_MATVIEW) to use
RELKIND_HAS_STORAGE(). It's true that not all of the storage types
that RELKIND_HAS_STORAGE() tests are possible here, but that's not a
reason to avoiding using the macro. If somebody adds a new relkind
with storage in the future, they might miss the need to manually
update this place, but they will not likely miss the need to update
RELKIND_HAS_STORAGE() since, if they did, their code probably wouldn't
work at all.

- I changed the way that you were passing create_storage down to
heap_create. I think I said before that you should EITHER fix it so
that we set create_storage = true only when the relation actually has
storage OR ELSE have heap_create() itself override the value to false
when there is no storage. You did both. There are times when it's
reasonable to ensure the same thing in multiple places, but this
doesn't seem to be one of them. So I took that out. I chose to retain
the code in heap_create() that overrides the value to false, added a
comment explaining that it does that, and then adjusted the callers to
ignore the storage type. I then added comments, and in one place an
assertion, to make it clearer what is happening.

- In pg_dump.c, I adjusted the comment that says "Not every relation
has storage." and the test that immediately follows, to ignore the
relfilenode when relkind says it's a partitioned table. Really,
partitioned tables should never have had relfilenodes, but as it turns
out, they used to have them.

Let me know your thoughts.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v7-0001-pg_upgrade-Preserve-relfilenodes-and-tablespace-O.patch
Description: Binary data


Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-13 Thread Mark Dilger



> On Dec 13, 2021, at 12:56 PM, Andrew Dunstan  wrote:
> 
> This patch had bit-rotted slightly, and I was attempting to remedy it.

I have that already, and getting ready to post.  Give me a few minutes and I'll 
repost.

> However, I got a failure running the TAP tests because of this change:
> 
> 
> diff --git a/src/test/modules/test_pg_dump/t/001_base.pl
> b/src/test/modules/test_pg_dump/t/001_base.pl
> index 16f7610883..7fbf2d871b 100644
> --- a/src/test/modules/test_pg_dump/t/001_base.pl
> +++ b/src/test/modules/test_pg_dump/t/001_base.pl
> @@ -9,7 +9,12 @@ use PostgreSQL::Test::Cluster;
>  use PostgreSQL::Test::Utils;
>  use Test::More;
>  
> -my $tempdir   = PostgreSQL::Test::Utils::tempdir;
> +# my $tempdir   = PostgreSQL::Test::Utils::tempdir;
> +my $tempbase = '/tmp/test_pg_dump';
> +my $subdir = 0;
> +$subdir++ while (-e "$tempbase/$subdir");
> +my $tempdir = "$tempbase/$subdir";
> +system("mkdir $tempdir");
>  
> 
> 
> What's going on here?

Yeah, I hit that, too.  That was an accidentally committed bit of local 
testing.  Please ignore it for now.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: Granting SET and ALTER SYSTE privileges for GUCs

2021-12-13 Thread Andrew Dunstan


On 11/23/21 21:14, Mark Dilger wrote:
>
>> On Nov 23, 2021, at 8:07 AM, Robert Haas  wrote:
>>
>> It's my impression that information_schema is a child of the SQL
>> standard, and that inventions specific to PG go in pg_catalog.
>>
>> Also, I think the user-facing name for GUCs is "settings".
> Thanks.  These issues should be fixed in v4.
>
> Along the way, I also added has_setting_privilege() functions overlooked in 
> v3 and before.



This patch had bit-rotted slightly, and I was attempting to remedy it.
However, I got a failure running the TAP tests because of this change:


diff --git a/src/test/modules/test_pg_dump/t/001_base.pl
b/src/test/modules/test_pg_dump/t/001_base.pl
index 16f7610883..7fbf2d871b 100644
--- a/src/test/modules/test_pg_dump/t/001_base.pl
+++ b/src/test/modules/test_pg_dump/t/001_base.pl
@@ -9,7 +9,12 @@ use PostgreSQL::Test::Cluster;
 use PostgreSQL::Test::Utils;
 use Test::More;
 
-my $tempdir   = PostgreSQL::Test::Utils::tempdir;
+# my $tempdir   = PostgreSQL::Test::Utils::tempdir;
+my $tempbase = '/tmp/test_pg_dump';
+my $subdir = 0;
+$subdir++ while (-e "$tempbase/$subdir");
+my $tempdir = "$tempbase/$subdir";
+system("mkdir $tempdir");
 


What's going on here?


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: WIN32 pg_import_system_collations

2021-12-13 Thread Thomas Munro
On Tue, Dec 14, 2021 at 5:29 AM Juan José Santamaría Flecha
 wrote:
> On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha 
>  wrote:
> Per path tester.

Hi Juan José,

I haven't tested yet but +1 for the feature.  I guess the API didn't
exist at the time collation support was added.

+/*
+ * Windows will use hyphens between language and territory, where ANSI
+ * uses an underscore. Simply make it ANSI looking.
+ */
+hyphen = strchr(localebuf, '-');
+if (hyphen)
+*hyphen = '_';
+

This conversion makes sense, to keep the user experience the same
across platforms.   Nitpick on the comment: why ANSI?  I think we can
call "en_NZ" a POSIX locale identifier[1], and I think we can call
"en-NZ" a BCP 47 language tag.

+/*
+ * This test is for Windows/Visual Studio systems and assumes that a full set
+ * of locales is installed. It must be run in a database with WIN1252 encoding,
+ * because of the locales' encondings. We lose some interesting cases from the
+ * UTF-8 version, like Turkish dotted and undotted 'i' or Greek sigma.
+ */

s/encondings/encodings/

When would the full set of locales not be installed on a Windows
system, and why does this need Visual Studio?  Wondering if this test
will work with some of the frankenstein/cross toolchains tool chains
(not objecting if it doesn't and could be skipped, just trying to
understand the comment).

Slightly related to this, in case you didn't see it, I'd also like to
use BCP 47 tags for the default locale for PostgreSQL 15[2].

[1] https://en.wikipedia.org/wiki/Locale_(computer_software)#POSIX_platforms
[2] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com




Re: extended stats on partitioned tables

2021-12-13 Thread Tomas Vondra
On 12/13/21 14:53, Justin Pryzby wrote:
> On Sun, Dec 12, 2021 at 11:23:19PM +0100, Tomas Vondra wrote:
>> On 12/12/21 22:32, Justin Pryzby wrote:
>>> On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
 The one thing bugging me a bit is that the regression test checks only a
 GROUP BY query. It'd be nice to add queries testing MCV/dependencies
 too, but that seems tricky because most queries will use per-partitions
 stats.
>>>
>>> You mean because the quals are pushed down to the scan node.
>>>
>>> Does that indicate a deficiency ?
>>>
>>> If extended stats are collected for a parent table, selectivity estimates 
>>> based
>>> from the parent would be better; but instead we use uncorrected column
>>> estimates from the child tables.
>>>
>>> From what I see, we could come up with a way to avoid the pushdown, 
>>> involving
>>> volatile functions/foreign tables/RLS/window functions/SRF/wholerow 
>>> vars/etc.
>>> But would it be better if extended stats objects on partitioned tables were 
>>> to
>>> collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
>>> wrong solution, but maybe we should still document that extended stats on
>>> (empty) parent tables are often themselves not used/useful for selectivity
>>> estimates, and the user should instead (or in addition) create stats on 
>>> child
>>> tables.
>>>
>>> Or, maybe if there's no extended stats on the child tables, stats on the 
>>> parent
>>> table should be consulted ?
>>
>> Maybe, but that seems like a mostly separate improvement. At this point I'm
>> interested only in testing the behavior implemented in the current patches.
> 
> I don't want to change the scope of the patch, or this thread, but my point is
> that the behaviour already changed once (the original regression) and now 
> we're
> planning to change it again to fix that, so we ought to decide on the expected
> behavior before writing tests to verify it.
> 

OK. Makes sense.

> I think it may be impossible to use the "dependencies" statistic with 
> inherited
> stats.  Normally the quals would be pushed down to the child tables.  But, if
> they weren't pushed down, they'd be attached to something other than a scan
> node on the parent table, so the stats on that table wouldn't apply (right?).
> 

Yeah, that's probably right. But I'm not 100% sure the whole inheritance
tree can't be treated as a single relation by some queries ...

> Maybe the useless stats types should have been prohibited on partitioned 
> tables
> since v10.  It's too late to change that, but perhaps now they shouldn't even
> be collected during analyze.  The dependencies and MCV paths are never called
> with rte->inh==true, so maybe we should Assert(!inh), or add a comment to that
> effect.  Or the regression tests should "memorialize" the behavior.  I'm still
> thinking about it.
> 

Yeah, we can't really prohibit them in backbranches - that'd mean some
CREATE STATISTICS commands suddenly start failing, which would be quite
annoying. Not building them for partitioned tables seems like a better
option - BuildRelationExtStatistics can check relkind and pick what to
ignore. But I'd choose to do that in a separate patch, probably - after
all, it shouldn't really change the behavior of any tests/queries, no?

This reminds me we need to consider if these patches could cause any
issues. The way I see it is this:

1) If the table is a separate relation (not part of an inheritance
tree), this should make no difference. -> OK

2) If the table is using "old" inheritance, this reverts back to
pre-regression behavior. So people will keep using the old statistics
until the ANALYZE, and we need to tell them to ANALYZE or something.

3) If the table is using partitioning, it's guaranteed to be empty and
there are no stats at all. Again, we should tell people to run ANALYZE.

Of course, we can't be sure query plans will change in the right
direction :-(


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 1:21 PM Bossart, Nathan  wrote:
> > But against all that, if these tasks are slowing down checkpoints and
> > that's avoidable, that seems pretty important too. Interestingly, I
> > can't say that I've ever seen any of these things be a problem for
> > checkpoint or startup speed. I wonder why you've had a different
> > experience.
>
> Yeah, it's difficult for me to justify why users should suffer long
> periods of downtime because startup or checkpointing is taking a very
> long time doing things that are arguably unrelated to startup and
> checkpointing.

Well sure. But I've never actually seen that happen.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2021-12-13 Thread Tom Lane
Huansong Fu  writes:
> We recently saw a similar issue in v12 and wondered why the corresponding fix 
> for v14 (https://github.com/postgres/postgres/commit/16e3ad5d143) was not 
> backported to v13 and before. The commit message did mention that this fix 
> might have problem with translatable string messages - would you mind 
> providing a bit more context about what is needed to backport this fix? Thank 
> you.

Well, the commit message lists the reasons for not back-patching:

* we've seen few field complaints about such problems
* it'd add work for translators
* it wouldn't work reliably before v12.

Perhaps there's a case for back-patching as far as v12,
but I can't get very excited about it.

regards, tom lane




Re: isolationtester: add session name to application name

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-13 13:57:52 -0500, Tom Lane wrote:
> Andres Freund  writes:
> > On 2021-12-13 19:46:34 +0900, Michael Paquier wrote:
> >> +1 for the idea.  Maybe it could be backpatched?
>
> > Not entirely trivially - the changes have some dependencies on other changes
> > (e.g. b1907d688, more on 741d7f104, but that was backpatched). I guess we
> > could backpatch b1907d688 as well, but I'm not sure its worth it?
>
> I think we've more recently had the idea that isolationtester features
> should be back-patched to avoid gotchas when back-patching test cases.
> For instance, all the isolationtester work I did this past summer was
> back-patched.  So from that vantage point, back-patching b1907d688
> seems fine.

Since there seems support for that approach, I've backpatched b1907d688 and
will push application_name isolationtester change once running the tests
across all branches finishes locally.

Regards,

Andres




Re: conchuela has some SSL issues

2021-12-13 Thread Andrew Dunstan


On 12/13/21 11:48, Robert Haas wrote:
> Hi,
>
> The last three buildfarm runs on conchuela show a failure in initdb:
>
> Shared object "libssl.so.48" not found, required by "libldap_r-2.4.so.2"
>
> It seems likely to me that this is a machine configuration issue
> rather than the result of some recent change in PostgreSQL, because
> the first failure 2 days ago shows only this as a recent PostgreSQL
> change:
>
> 07eee5a0dc Sat Dec 11 19:10:51 2021 UTC  Create a new type category
> for "internal use" types.
>
> And that doesn't seem like it could cause this.
>
> Any thoughts?
>

It's also failing on REL_14_STABLE, so I think it must be an
environmental change.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: extended stats on partitioned tables

2021-12-13 Thread Tomas Vondra
On 12/13/21 14:48, Justin Pryzby wrote:
> On Sun, Dec 12, 2021 at 10:29:39PM +0100, Tomas Vondra wrote:
>>> In your 0003 patch, the "if inh: break" isn't removed from 
>>> examine_variable(),
>>> but the corresponding thing is removed everywhere else.
>>
>> Ah, you're right. And it wasn't updated in the 0002 patch either - it
>> should do the relkind check too, to allow partitioned tables. Fixed.
> 
> I think you fixed it in 0002 (thanks) but still wasn't removed from 0003?
> 

D'oh! Those repeated rebase conflicts got me quite confused.

> In these comments:
> +* When dealing with regular inheritance trees, ignore extended stats
> +* (which were built without data from child rels, and thus do not
> +* represent them). For partitioned tables data from partitions are
> +* in the stats (and there's no data in the non-leaf relations), so
> +* in this case we do consider extended stats.
> 
> I suggest to add a comment after "For partitioned tables".
> 

I've reworded the comment a bit, hopefully it's a bit clearer now.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 0601bac64c0d7a1f1b676f0433a05f2f99d7bfb1 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sat, 25 Sep 2021 19:42:41 -0500
Subject: [PATCH 1/4] Ignore extended statistics for inheritance trees

Since commit 859b3003de we only build extended statistics for relations
without the child relations, so that we update the catalog row only
once. However, the non-inherited statistics were still used for planning
of queries on inheritance trees, which may produce bogus estimates.

This is roughly the same issue 427c6b5b9 addressed ~15 years ago, and we
fix it the same way - by ignoring extended statistics when calculating
estimates for the inheritance tree as a whole. We still consider
extended statistics when calculating estimates for individual child
relations, of course.

Report and patch by Justin Pryzby, minor fixes and cleanup by me.
Backpatch all the way back to PostgreSQL 10, where extended statistics
were introduced (same as 859b3003de).

Author: Justin Pryzby
Reported-by: Justin Pryzby
Backpatch-through: 10
Discussion: https://postgr.es/m/20210923212624.GI831%40telsasoft.com
---
 src/backend/statistics/dependencies.c   |  9 +
 src/backend/statistics/extended_stats.c |  9 +
 src/backend/utils/adt/selfuncs.c| 17 +
 src/test/regress/expected/stats_ext.out | 24 
 src/test/regress/sql/stats_ext.sql  | 15 +++
 5 files changed, 74 insertions(+)

diff --git a/src/backend/statistics/dependencies.c b/src/backend/statistics/dependencies.c
index 8bf80db8e4..a41bace75a 100644
--- a/src/backend/statistics/dependencies.c
+++ b/src/backend/statistics/dependencies.c
@@ -24,6 +24,7 @@
 #include "nodes/pathnodes.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "parser/parsetree.h"
 #include "statistics/extended_stats_internal.h"
 #include "statistics/statistics.h"
 #include "utils/bytea.h"
@@ -1410,11 +1411,19 @@ dependencies_clauselist_selectivity(PlannerInfo *root,
 	int			ndependencies;
 	int			i;
 	AttrNumber	attnum_offset;
+	RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
 
 	/* unique expressions */
 	Node	  **unique_exprs;
 	int			unique_exprs_cnt;
 
+	/*
+	 * When dealing with inheritance trees, ignore extended stats (which were
+	 * built without data from child rels, and thus do not represent them).
+	 */
+	if (rte->inh)
+		return 1.0;
+
 	/* check if there's any stats that might be useful for us. */
 	if (!has_stats_of_kind(rel->statlist, STATS_EXT_DEPENDENCIES))
 		return 1.0;
diff --git a/src/backend/statistics/extended_stats.c b/src/backend/statistics/extended_stats.c
index 69ca52094f..5a3ba2f1c5 100644
--- a/src/backend/statistics/extended_stats.c
+++ b/src/backend/statistics/extended_stats.c
@@ -30,6 +30,7 @@
 #include "nodes/nodeFuncs.h"
 #include "optimizer/clauses.h"
 #include "optimizer/optimizer.h"
+#include "parser/parsetree.h"
 #include "pgstat.h"
 #include "postmaster/autovacuum.h"
 #include "statistics/extended_stats_internal.h"
@@ -1694,6 +1695,14 @@ statext_mcv_clauselist_selectivity(PlannerInfo *root, List *clauses, int varReli
 	List	  **list_exprs;		/* expressions matched to any statistic */
 	int			listidx;
 	Selectivity sel = (is_or) ? 0.0 : 1.0;
+	RangeTblEntry *rte = planner_rt_fetch(rel->relid, root);
+
+	/*
+	 * When dealing with inheritance trees, ignore extended stats (which were
+	 * built without data from child rels, and thus do not represent them).
+	 */
+	if (rte->inh)
+		return sel;
 
 	/* check if there's any stats that might be useful for us. */
 	if (!has_stats_of_kind(rel->statlist, STATS_EXT_MCV))
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 10895fb287..1010d5caa8 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3913,6 +3

Re: bugfix: invalid bit/varbit input causes the log file to be unreadable

2021-12-13 Thread Huansong Fu
Hi, 

We recently saw a similar issue in v12 and wondered why the corresponding fix 
for v14 (https://github.com/postgres/postgres/commit/16e3ad5d143) was not 
backported to v13 and before. The commit message did mention that this fix 
might have problem with translatable string messages - would you mind providing 
a bit more context about what is needed to backport this fix? Thank you.

Regards,
Huansong
https://vmware.com/ 

> On Jun 29, 2020, at 11:42 AM, Tom Lane  wrote:
> 
> Quan Zongliang  writes:
>> I tested it, and it looks fine.
> 
> Pushed, thanks for reporting the issue!
> 
>   regards, tom lane
> 
> 
> 
> 



Re: port conflicts when running tests concurrently on windows.

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-10 10:22:13 +0100, Peter Eisentraut wrote:
> On 09.12.21 19:41, Andres Freund wrote:
> > Withhttps://github.com/anarazel/postgres/commit/046203741803da863f6129739fd215f8a32ec357
> > all tests pass. pg_regress requires PG_REGRESS_SOCK_DIR because it checks 
> > for
> > TMPDIR, but windows only has TMP and TEMP.
> 
> This looks reasonable so far.

I pushed that part, since we clearly need something like them.


> The commit messages 8f3ec75de4060d86176ad4ac998eeb87a39748c2 and
> 1d53432ff940b789c2431ba476a2a6e2db3edf84 contain some notes about what I
> thought at the time didn't work yet.

> In particular, the pg_upgrade tests
> don't support the use of Unix sockets on Windows.  (Those tests have been
> rewritten since, so I don't know what the status is.)

ISTM we still use two different implementations of the pg_upgrade tests :(. I
recall there being some recent-ish work on moving it to be a tap test, but
apparently not yet committed.

It doesn't look like the vcregress.pl implementation respects
PG_TEST_USE_UNIX_SOCKETS right now.


> pg_regress.c at remove_temp() is still there.  These things should probably
> be addressed before we can consider making this the default.

Hm, not immediately obvious what to do about this. Do you know if windows has
restrictions around the length of unix domain sockets? If not, I wonder if it
could be worth using the data directory as the socket path on windows instead
of the separate temp directory?

Greetings,

Andres Freund




Re: Commitfest 2021-11 Patch Triage - Part 3

2021-12-13 Thread Mark Dilger



> On Nov 11, 2021, at 2:12 PM, Daniel Gustafsson  wrote:
> 
> 3223: Delegating superuser tasks to new security roles
> ==
> There is a lot of ongoing discussion on this one, but skimming the thread it's
> not really clear (to me) where it leaves the patch in question.

This set of patches got a fair amount of in-person off-list discussion at the
PGConf NYC with folks who had concerns about some or all parts of the patch.
Most of the known contentious design details were resolved, and I've been
updating the patches to repost this week.  The subscriptions patch and guc
patch should be close to ready.  The CREATEROLE patch may take a bit longer.

Conversation Highlights:

  - The logical replication subscription patch needs to consider RLS.
(Jeff Davis)

This is implemented but I still need to update the documentation before
posting.

  - The patch for granting SET VALUE and/or ALTER SYSTEM on guc settings should
be extended to allow revoking privilege to SET VALUE on userset gucs.
(Joshua Brindle, Stephen Frost).

This is implemented for core by means of converting PGC_USERSET variables
to PGC_SUSET, with a corresponding "GRANT SET VALUE ON .. TO public" added
to the new catalog/setting_privileges.sql file.  By making it an explicit
grant, it is revokable without the need for any special logic.  This would 
allow, for
example, REVOKE SET VALUE ON work_mem FROM public; followed by explicit
grants to whichever roles you want to allow to set work_mem.

I still need to adjust gucs defined in some contrib modules, update
documentation, and verify pg_upgrade from older versions before posting.  

External projects are not harmed by this implementation.  The only issue
for them is that SET VALUE cannot be revoked from any PGC_USERSET gucs they
define until they update their module to use PGC_SUSET plus a GRANT
somewhere in their extension creation or upgrade sql script.

  - The CREATEROLE patch can go forward with roles having owners, but other
spec compliant behavior for GRANT/REVOKE should be implemented.
(Stephen Frost)

The bug wherein the grantor field in pg_auth_members may point to a dropped
role should be fixed.  Probably this field should simply be dropped.  We
don't use it anywhere, and on upgrade we can't know if existing values are
correct or bogus.  Values that do not refer to any existing role are
obviously bogus, but if a role Oid was reassigned, a grantor entry may be
pointing wrongly at the new role, and there is no way to know that.

Stephen indicated that, for all privileges, only the role which GRANTed a
privilege should be able to REVOKE it.  This differs from the current
implementation and is likely controversial.  Robert Haas, at least,
described that idea as a non-starter.  I don't plan to include that in this
patch, as it doesn't seem necessary to combine the two features even if we
eventually intend to do both.  (And certainly unhelpful to include it if
the idea is rejected.)

Whether the owner of a role is implicitly a member (or admin) of the owned
role is still disputed.  Stephen Frost would rather the owner has the right
to grant itself into the role, or grant itself ADMIN on the role, but not
be a member nor an admin by default.  Robert Haas preferred that the owner
is both a member and an admin by default, making role ownership analogous
to how superuser works.  I lean slightly in Stephen's favor, as any role
"foo" with CREATEROLE could always run:

CREATE ROLE bar [ ROLE foo | ADMIN foo];

and thereby give themselves the privilege they want in the same command
that creates the role.  Going with Robert's proposal, a creator who does
not want to have privileges in the role would need to run:

CREATE ROLE bar;
REVOKE [ADMIN OPTION FOR ] bar FROM foo;

which requires two separate commands, and creates a short window inbetween
where a (possibly undesireable) configuration exists.

Jeff Davis and Andrew Dunstan have volunteered as committers.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: isolationtester: add session name to application name

2021-12-13 Thread Tom Lane
Andres Freund  writes:
> On 2021-12-13 19:46:34 +0900, Michael Paquier wrote:
>> +1 for the idea.  Maybe it could be backpatched?

> Not entirely trivially - the changes have some dependencies on other changes
> (e.g. b1907d688, more on 741d7f104, but that was backpatched). I guess we
> could backpatch b1907d688 as well, but I'm not sure its worth it?

I think we've more recently had the idea that isolationtester features
should be back-patched to avoid gotchas when back-patching test cases.
For instance, all the isolationtester work I did this past summer was
back-patched.  So from that vantage point, back-patching b1907d688
seems fine.

regards, tom lane




Re: Add client connection check during the execution of the query

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-13 17:51:00 +1300, Thomas Munro wrote:
> On Sat, Dec 11, 2021 at 7:09 PM Thomas Munro  wrote:
> > On Sat, Dec 11, 2021 at 6:11 PM Andres Freund  wrote:
> > > Yuck. Is there really no better way to deal with this? What kind of 
> > > errors is
> > > this trying to handle transparently? Afaics this still changes when we'd
> > > e.g. detect postmaster death.
> >
> > The problem is that WaitEventSetWait() only reports the latch, if it's
> > set, so I removed it from the set (setting it to NULL), and then undo
> > that afterwards.  Perhaps we could fix that root problem instead.
> > That is, we could make it so that latches aren't higher priority in
> > that way, ie don't hide other events[1].  Then I wouldn't have to
> > modify the WES here, I could just ignore it in the output event list
> > (and make sure that's big enough for all possible events, as I had it
> > in the last version).  I'll think about that.
> 
> I tried that.  It seems OK, and gets rid of the PG_FINALLY(), which is
> nice.  Latches still have higher priority, and still have the fast
> return if already set and you asked for only one event, but now if you
> ask for nevents > 1 we'll make the syscall too so we'll see the
> WL_SOCKET_CLOSED.

Isn't a certain postgres committer that cares a lot about unnecessary syscalls
going to be upset about this one? Even with the nevents > 1 optimization? Yes,
right now there's no other paths that do so, but I don't like the corner this
paints us in.

>From a different angle: Why do we need to perform the client connection check
if the latch is set?

Greetings,

Andres Freund




Re: isolationtester: add session name to application name

2021-12-13 Thread Andres Freund
Hi,

On 2021-12-13 19:46:34 +0900, Michael Paquier wrote:
> On Fri, Dec 10, 2021 at 05:20:52PM -0800, Andres Freund wrote:
> > These days isolationtester.c already prefixes log output with the session
> > name. How about doing the same for application_name? It's a *tad* more
> > complicated than I'd like because isolationtester.c currently doesn't know 
> > the
> > name of the test its executing.
> 
> +1 for the idea.  Maybe it could be backpatched?

Not entirely trivially - the changes have some dependencies on other changes
(e.g. b1907d688, more on 741d7f104, but that was backpatched). I guess we
could backpatch b1907d688 as well, but I'm not sure its worth it?


> > +* easier to map spec file sesions to log output and
> 
> One s/sesions/sessions/ here.

Ah, good catch.

Greetings,

Andres Freund




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 9:20 AM, "Justin Pryzby"  wrote:
> On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote:
>> Another issue is that we don't want to increase the number of
>> processes without bound. Processes use memory and CPU resources and if
>> we run too many of them it becomes a burden on the system. Low-end
>> systems may not have too many resources in total, and high-end systems
>> can struggle to fit demanding workloads within the resources that they
>> have. Maybe it would be cheaper to do more things at once if we were
>> using threads rather than processes, but that day still seems fairly
>> far off.
>
> Maybe that's an argument that this should be a dynamic background worker
> instead of an auxilliary process.  Then maybe it would be controlled by
> max_parallel_maintenance_workers (or something similar).  The checkpointer
> would need to do these tasks itself if parallel workers were disabled or
> couldn't be launched.

I think this is an interesting idea.  I dislike the prospect of having
two code paths for all this stuff, but if it addresses the concerns
about resource usage, maybe it's worth it.

Nathan



Re: using extended statistics to improve join estimates

2021-12-13 Thread Tomas Vondra

On 11/6/21 11:03, Andy Fan wrote:

Hi Tomas:

This is the exact patch I want, thanks for the patch!



Good to hear.


On Thu, Oct 7, 2021 at 3:33 AM Tomas Vondra
 wrote:



3) estimation by join pairs

At the moment, the estimates are calculated for pairs of relations, so
for example given a query

   explain analyze
   select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
join t3 on (t1.b = t3.b and t2.c = t3.c);

we'll estimate the first join (t1,t2) just fine, but then the second
join actually combines (t1,t2,t3). What the patch currently does is it
splits it into (t1,t2) and (t2,t3) and estimates those.


Actually I can't understand how this works even for a simpler example.
let's say we query like this (ONLY use t2's column to join t3).

select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
  join t3 on (t2.c = t3.c and t2.d = t3.d);

Then it works well on JoinRel(t1, t2)  AND JoinRel(t2, t3).  But when comes
to JoinRel(t1, t2, t3), we didn't maintain the MCV on join rel,  so it
is hard to
work.  Here I see your solution is splitting it into (t1, t2) AND (t2,
t3) and estimate
those.  But how does this help to estimate the size of JoinRel(t1, t2, t3)?



Yeah, this is really confusing. The crucial thing to keep in mind is 
this works with clauses before running setrefs.c, so the clauses 
reference the original relations - *not* the join relation. Otherwise 
even the regular estimation would not work, because where would it get 
the per-column MCV lists etc.


Let's use a simple case with join clauses referencing just a single 
attribute for each pair or relations. And let's talk about how many join 
pairs it'll extract:


  t1 JOIN t2 ON (t1.a = t2.a) JOIN t3 ON (t1.b = t3.b)

=> first we join t1/t2, which is 1 join pair (t1,t2)
=> then we join t1/t2/t3, but the join clause references just 2 rels, so 
1 join pair (t1,t3)


Now a more complicated case, with more complex join clause

  t1 JOIN t2 ON (t1.a = t2.a) JOIN t3 ON (t1.b = t3.b AND t2.c = t3.c)

=> first we join t1/t2, which is 1 join pair (t1,t2)
=> then we join t1/t2/t3, but this time the join clause references all 
three rels, so we have two join pairs (t1,t3) and (t2,t3) and we can use 
all the statistics.





I wonder if this
should actually combine all three MCVs at once - we're pretty much
combining the MCVs into one large MCV representing the join result.



I guess we can keep the MCVs on joinrel for these matches.  Take the above
query I provided for example, and suppose the MCV data as below:

t1(a, b)
(1, 2) -> 0.1
(1, 3) -> 0.2
(2, 3) -> 0.5
(2, 8) -> 0.1

t2(a, b)
(1, 2) -> 0.2
(1, 3) -> 0.1
(2, 4) -> 0.2
(2, 10) -> 0.1

After t1.a = t2.a AND t1.b = t2.b,  we can build the MCV as below

(1, 2, 1, 2)  -> 0.1 * 0.2
(1, 3, 1, 3)  -> 0.2 * 0.1

And recording the total mcv frequence as (0.1 + 0.2 + 0.5 + 0.1) *
(0.2 + 0.1 + 0.2 + 0.1)



Right, that's about the joint distribution I whole join.


With this design, the nitems of MCV on joinrel would be less than
either of baserel.



Actually, I think the number of items can grow, because the matches may 
duplicate some items. For example in your example with (t1.a = t2.a) the 
first first (1,2) item in t1 matches (1,2) and (1,3) in t2. And same for 
(1,3) in t1. So that's 4 combinations. Of course, we could aggregate the 
MCV by ignoring columns not used in the query.



and since we handle the eqjoin as well, we even can record the items as

(1, 2) -> 0.1 * 0.2
(1, 3) -> 0.2 * 0.1;

About when we should maintain the JoinRel's MCV data, rather than
maintain this just
after the JoinRel size is estimated, we can only estimate it when it
is needed.  for example:

select * from t1 join t2 on (t1.a = t2.a and t1.b = t2.b)
  join t3 on (t2.c = t3.c and t2.d = t3.d);

we don't need to maintain the MCV on (t1, t2, t3)  since no others
need it at all. However
I don't check code too closely to see if it (Lazing computing MVC on
joinrel) is convenient
to do.



I'm not sure I understand what you're proposing here.

However, I think that estimating it for pairs has two advantages:

1) Combining MCVs for k relations requires k for loops. Processing 2 
relations at a time limits the amount of CPU we need. Of course, this 
assumes the joins are independent, which may or may not be true.


2) It seems fairly easy to combine different types of statistics 
(regular, extended, ...), and also consider the part not represented by 
MCV. It seems much harder when joining more than 2 relations.


I'm also worried about amplification of errors - I suspect attempting to 
build the joint MCV for the whole join relation may produce significant 
estimation errors.


Furthermore, I think joins with clauses referencing more than just two 
relations are fairly uncommon. And we can always improve the feature in 
this direction in the future.





But I haven't done that yet, as it requires the MCVs to be combined
using the join clauses (overl

Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Bossart, Nathan
On 12/13/21, 5:54 AM, "Robert Haas"  wrote:
> I don't know whether this kind of idea is good or not.

Thanks for chiming in.  I have an almost-complete patch set that I'm
hoping to post to the lists in the next couple of days.

> One thing we've seen a number of times now is that entrusting the same
> process with multiple responsibilities often ends poorly. Sometimes
> it's busy with one thing when another thing really needs to be done
> RIGHT NOW. Perhaps that won't be an issue here since all of these
> things are related to checkpointing, but then the process name should
> reflect that rather than making it sound like we can just keep piling
> more responsibilities onto this process indefinitely. At some point
> that seems bound to become an issue.

Two of the tasks are cleanup tasks that checkpointing handles at the
moment, and two are cleanup tasks that are done at startup.  For now,
all of these tasks are somewhat nonessential.  There's no requirement
that any of these tasks complete in order to finish startup or
checkpointing.  In fact, outside of preventing the server from running
out of disk space, I don't think there's any requirement that these
tasks run at all.  IMO this would have to be a core tenet of a new
auxiliary process like this.

That being said, I totally understand your point.  If there were a
dozen such tasks handled by a single auxiliary process, that could
cause a new set of problems.  Your checkpointing and startup might be
fast, but you might run out of disk space because our cleanup process
can't handle it all.  So a new worker could end up becoming an
availability risk as well.

> Another issue is that we don't want to increase the number of
> processes without bound. Processes use memory and CPU resources and if
> we run too many of them it becomes a burden on the system. Low-end
> systems may not have too many resources in total, and high-end systems
> can struggle to fit demanding workloads within the resources that they
> have. Maybe it would be cheaper to do more things at once if we were
> using threads rather than processes, but that day still seems fairly
> far off.

I do agree that it is important to be very careful about adding new
processes, and if a better idea for how to handle these tasks emerges,
I will readily abandon my current approach.  Upthread, Andres
mentioned optimizing unnecessary snapshot files, and I mentioned
possibly limiting how much time startup and checkpoints spend on these
tasks.  I don't have too many details for the former, and for the
latter, I'm worried about not being able to keep up.  But if the
prospect of adding a new auxiliary process for this stuff is a non-
starter, perhaps I should explore that approach some more.

> But against all that, if these tasks are slowing down checkpoints and
> that's avoidable, that seems pretty important too. Interestingly, I
> can't say that I've ever seen any of these things be a problem for
> checkpoint or startup speed. I wonder why you've had a different
> experience.

Yeah, it's difficult for me to justify why users should suffer long
periods of downtime because startup or checkpointing is taking a very
long time doing things that are arguably unrelated to startup and
checkpointing.

Nathan



Re: Add spin_delay() implementation for Arm in s_lock.h

2021-12-13 Thread Blake, Geoff
Hi Tom,

> What did you test exactly?

Tested 3 benchmark configurations on an m6g.16xlarge (Graviton2, 64 cpus, 256GB 
RAM)
I set the scale factor to consume about 1/3 of 256GB and the other parameters 
in the next line.
pgbench setup: -F 90 -s 5622 -c 256
Pgbench select-only w/   patch  662804 tps (-0.5%)
 w/o patch  666354 tps. 
tcpb-like w/   patch  35844 tps (0%)
w/o patch  35835 tps

We also test with Hammerdb when evaluating patches, it shows the patch gets +3%:
Hammerdb (192 Warehouse 256 clients)
w/   patch 1147463 NOPM (+3%)
w/o patch 1112908 NOPM

I've run pgbench more than once and the measured TPS values overlap, even 
though the means on select-only show a small degradation at the moment I am 
concluding it is noise.  On Hammerdb, the results show a measurable difference.

Thanks,
Geoff





Re: Column Filtering in Logical Replication

2021-12-13 Thread Alvaro Herrera
Hmm, I messed up the patch file I sent.  Here's the complete patch.


-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"Doing what he did amounts to sticking his fingers under the hood of the
implementation; if he gets his fingers burnt, it's his problem."  (Tom Lane)
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index bb4ef5e5e2..c86055b93c 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -30,7 +30,7 @@ ALTER PUBLICATION name RENAME TO where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ]  [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -110,6 +110,8 @@ ALTER PUBLICATION name RENAME TO * can be specified after the table
   name to explicitly indicate that descendant tables are included.
+  Optionally, a column list can be specified.  See  for details.
  
 

diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index d805e8e77a..73a23cbb02 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -28,7 +28,7 @@ CREATE PUBLICATION name
 
 where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ] [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -78,6 +78,15 @@ CREATE PUBLICATION name
   publication, so they are never explicitly added to the publication.
  
 
+ 
+  When a column list is specified, only the listed columns are replicated;
+  any other columns are ignored for the purpose of replication through
+  this publication.  If no column list is specified, all columns of the
+  table are replicated through this publication, including any columns
+  added later.  If a column list is specified, it must include the replica
+  identity columns.
+ 
+
  
   Only persistent base tables and partitioned tables can be part of a
   publication.  Temporary tables, unlogged tables, foreign tables,
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fe9c714257..a88d12e8ae 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1472,7 +1472,7 @@ doDeletion(const ObjectAddress *object, int flags)
 			break;
 
 		case OCLASS_PUBLICATION_REL:
-			RemovePublicationRelById(object->objectId);
+			RemovePublicationRelById(object->objectId, object->objectSubId);
 			break;
 
 		case OCLASS_PUBLICATION:
@@ -2754,8 +2754,12 @@ free_object_addresses(ObjectAddresses *addrs)
 ObjectClass
 getObjectClass(const ObjectAddress *object)
 {
-	/* only pg_class entries can have nonzero objectSubId */
+	/*
+	 * only pg_class and pg_publication_rel entries can have nonzero
+	 * objectSubId
+	 */
 	if (object->classId != RelationRelationId &&
+		object->classId != PublicationRelRelationId &&
 		object->objectSubId != 0)
 		elog(ERROR, "invalid non-zero objectSubId for object class %u",
 			 object->classId);
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 2bae3fbb17..5eed248dcb 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -4019,6 +4019,7 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
 /* translator: first %s is, e.g., "table %s" */
 appendStringInfo(&buffer, _("publication of %s in publication %s"),
  rel.data, pubname);
+/* FIXME add objectSubId support */
 pfree(rel.data);
 ReleaseSysCache(tup);
 break;
@@ -5853,9 +5854,16 @@ getObjectIdentityParts(const ObjectAddress *object,
 
 getRelationIdentity(&buffer, prform->prrelid, objname, false);
 appendStringInfo(&buffer, " in publication %s", pubname);
+if (object->objectSubId)	/* FIXME maybe get_attname */
+	appendStringInfo(&buffer, " column %d", object->objectSubId);
 
 if (objargs)
+{
 	*objargs = list_make1(pubname);
+	if (object->objectSubId)
+		*objargs = lappend(*objargs,
+		   psprintf("%d", object->objectSubId));
+}
 
 ReleaseSysCache(tup);
 break;
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 5f37bf6d10..dfcb450e61 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -658,6 +658,56 @@ isObjectPinned(const ObjectAddress *object)
  * Various special-purpose lookups and manipulations of pg_depend.
  */
 
+/*
+ * Find all objects of the given class that reference the specified object,
+ * and add them to the given ObjectAddresses.
+ */
+void
+findAndAddAddresses(ObjectAddresses *addrs, Oid classId,
+	Oid refclassId, Oid refobjectId, int32 refobjsubId)
+{
+	Relation	depRel;
+	ScanKeyData	key[3];
+	

Re: Column Filtering in Logical Replication

2021-12-13 Thread Alvaro Herrera
On 2021-Dec-10, Alvaro Herrera wrote:

> Actually it's not so easy to implement.

So I needed to add "sub object id" support for pg_publication_rel
objects in pg_depend / dependency.c.  What I have now is partial (the
describe routines need patched) but it's sufficient to show what's
needed.  In essence, we now set these depend entries with column
numbers, so that they can be dropped independently; when the drop comes,
the existing pg_publication_rel row is modified to cover the remaining
columns.  As far as I can tell, it works correctly.

There is one policy decision to make: what if ALTER TABLE drops the last
remaining column in the publication?  I opted to raise a specific error
in this case, though we could just the same opt to drop the relation
from the publication.  Are there opinions on this?

This version incorporates the fixups Peter submitted, plus some other
fixes of my own.  Notably, as Peter also mentioned, I changed
pg_publication_rel.prattrs to store int2vector rather than an array of
column names.  This makes for better behavior if columns are renamed and
things like that, and also we don't need to be so cautious about
quoting.  It does mean we need a slightly more complicated query in a
couple of spots, but that should be okay.

-- 
Álvaro Herrera   39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/
"Always assume the user will do much worse than the stupidest thing
you can imagine."(Julien PUYDT)
diff --git a/src/backend/catalog/dependency.c b/src/backend/catalog/dependency.c
index fe9c714257..a88d12e8ae 100644
--- a/src/backend/catalog/dependency.c
+++ b/src/backend/catalog/dependency.c
@@ -1472,7 +1472,7 @@ doDeletion(const ObjectAddress *object, int flags)
 			break;
 
 		case OCLASS_PUBLICATION_REL:
-			RemovePublicationRelById(object->objectId);
+			RemovePublicationRelById(object->objectId, object->objectSubId);
 			break;
 
 		case OCLASS_PUBLICATION:
@@ -2754,8 +2754,12 @@ free_object_addresses(ObjectAddresses *addrs)
 ObjectClass
 getObjectClass(const ObjectAddress *object)
 {
-	/* only pg_class entries can have nonzero objectSubId */
+	/*
+	 * only pg_class and pg_publication_rel entries can have nonzero
+	 * objectSubId
+	 */
 	if (object->classId != RelationRelationId &&
+		object->classId != PublicationRelRelationId &&
 		object->objectSubId != 0)
 		elog(ERROR, "invalid non-zero objectSubId for object class %u",
 			 object->classId);
diff --git a/src/backend/catalog/objectaddress.c b/src/backend/catalog/objectaddress.c
index 2bae3fbb17..5eed248dcb 100644
--- a/src/backend/catalog/objectaddress.c
+++ b/src/backend/catalog/objectaddress.c
@@ -4019,6 +4019,7 @@ getObjectDescription(const ObjectAddress *object, bool missing_ok)
 /* translator: first %s is, e.g., "table %s" */
 appendStringInfo(&buffer, _("publication of %s in publication %s"),
  rel.data, pubname);
+/* FIXME add objectSubId support */
 pfree(rel.data);
 ReleaseSysCache(tup);
 break;
@@ -5853,9 +5854,16 @@ getObjectIdentityParts(const ObjectAddress *object,
 
 getRelationIdentity(&buffer, prform->prrelid, objname, false);
 appendStringInfo(&buffer, " in publication %s", pubname);
+if (object->objectSubId)	/* FIXME maybe get_attname */
+	appendStringInfo(&buffer, " column %d", object->objectSubId);
 
 if (objargs)
+{
 	*objargs = list_make1(pubname);
+	if (object->objectSubId)
+		*objargs = lappend(*objargs,
+		   psprintf("%d", object->objectSubId));
+}
 
 ReleaseSysCache(tup);
 break;
diff --git a/src/backend/catalog/pg_depend.c b/src/backend/catalog/pg_depend.c
index 07bcdc463a..462c8efe70 100644
--- a/src/backend/catalog/pg_depend.c
+++ b/src/backend/catalog/pg_depend.c
@@ -658,6 +658,56 @@ isObjectPinned(const ObjectAddress *object)
  * Various special-purpose lookups and manipulations of pg_depend.
  */
 
+/*
+ * Find all objects of the given class that reference the specified object,
+ * and add them to the given ObjectAddresses.
+ */
+void
+findAndAddAddresses(ObjectAddresses *addrs, Oid classId,
+	Oid refclassId, Oid refobjectId, int32 refobjsubId)
+{
+	Relation	depRel;
+	ScanKeyData	key[3];
+	SysScanDesc scan;
+	HeapTuple	tup;
+
+	depRel = table_open(DependRelationId, AccessShareLock);
+
+	ScanKeyInit(&key[0],
+Anum_pg_depend_refclassid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(refclassId));
+	ScanKeyInit(&key[1],
+Anum_pg_depend_refobjid,
+BTEqualStrategyNumber, F_OIDEQ,
+ObjectIdGetDatum(refobjectId));
+	ScanKeyInit(&key[2],
+Anum_pg_depend_refobjsubid,
+BTEqualStrategyNumber, F_INT4EQ,
+Int32GetDatum(refobjsubId));
+
+	scan = systable_beginscan(depRel, DependReferenceIndexId, true,
+			  NULL, 3, key);
+
+	while (HeapTupleIsValid(tup = systable_getnext(scan)))
+	{
+		Form_pg_depend depform = (Form_pg_depend) GETSTRUCT(tup);
+		ObjectAddress	object;
+
+		if (depform->classid != classId)
+			continue;
+
+		ObjectAddressS

Re: pg_dump versus ancient server versions

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 12:23 PM Tom Lane  wrote:
> I've done as much as I plan to do in that direction.  As of the
> respective branch tips, I see clean builds and check-world
> results with minimal configure options in all branches back to 9.2
> on Fedora 35 (gcc 11.2.1) and macOS Monterey (Apple clang 13.0.0).

I think this is great. Thanks for being willing to work on it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_dump versus ancient server versions

2021-12-13 Thread Tom Lane
I wrote:
> Anyway, it seems like there's some consensus that 9.2 is a good
> stopping place for today.  I'll push forward with
> (1) back-patching as necessary to make 9.2 and up build cleanly
> on the platforms I have handy;

I've done as much as I plan to do in that direction.  As of the
respective branch tips, I see clean builds and check-world
results with minimal configure options in all branches back to 9.2
on Fedora 35 (gcc 11.2.1) and macOS Monterey (Apple clang 13.0.0).

A few notes for the archives' sake:

* As discussed, parallel check-world is unreliable before v10;
perhaps this is worth improving, but I doubt it.  I did find
that aggressive parallelism in the build process is fine.

* On some compilers, pre-v10 branches produce this warning:

scan.c: In function 'yy_try_NUL_trans':
scan.c:10189:23: warning: unused variable 'yyg' [-Wunused-variable]
 struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be 
unused depending upon options. */

In principle we could back-patch 65d508fd4 to silence that,
but I think that fix is more invasive than what we want to
do in these branches.  We lived with that warning for years
before figuring out how to get rid of it, so I think we can
continue to accept it in these branches.

* 9.2's plperl fails to compile on macOS:

./plperl.h:53:10: fatal error: 'EXTERN.h' file not found
#include "EXTERN.h"

This is evidently because 9.2 predates the "sysroot" hacking
we did later (5e2217131 and many many subsequent tweaks).
I judge this not worth the trouble to fix, because the argument
for supporting --with-perl in these branches is basically that
we need a PL with transforms to test pg_dump ... but transforms
didn't come in until 9.5.  (Reviewing the commit log, I suppose
that 9.3 and 9.4 would also fail to build in some macOS
configurations, but by the same argument I see no need to work
further on those branches either.  9.5 does have all the
sysroot changes I can find in the log.)

regards, tom lane




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Justin Pryzby
On Mon, Dec 13, 2021 at 08:53:37AM -0500, Robert Haas wrote:
> On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan  wrote:
> > Well, I haven't had a chance to look at your patch, and my patch set
> > still only has handling for CheckPointSnapBuild() and
> > RemovePgTempFiles(), but I thought I'd share what I have anyway.  I
> > split it into 5 patches:
> >
> > 0001 - Adds a new "custodian" auxiliary process that does nothing.
...
> 
> I don't know whether this kind of idea is good or not.
...
> 
> Another issue is that we don't want to increase the number of
> processes without bound. Processes use memory and CPU resources and if
> we run too many of them it becomes a burden on the system. Low-end
> systems may not have too many resources in total, and high-end systems
> can struggle to fit demanding workloads within the resources that they
> have. Maybe it would be cheaper to do more things at once if we were
> using threads rather than processes, but that day still seems fairly
> far off.

Maybe that's an argument that this should be a dynamic background worker
instead of an auxilliary process.  Then maybe it would be controlled by
max_parallel_maintenance_workers (or something similar).  The checkpointer
would need to do these tasks itself if parallel workers were disabled or
couldn't be launched.

-- 
Justin




conchuela has some SSL issues

2021-12-13 Thread Robert Haas
Hi,

The last three buildfarm runs on conchuela show a failure in initdb:

Shared object "libssl.so.48" not found, required by "libldap_r-2.4.so.2"

It seems likely to me that this is a machine configuration issue
rather than the result of some recent change in PostgreSQL, because
the first failure 2 days ago shows only this as a recent PostgreSQL
change:

07eee5a0dc Sat Dec 11 19:10:51 2021 UTC  Create a new type category
for "internal use" types.

And that doesn't seem like it could cause this.

Any thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: WIN32 pg_import_system_collations

2021-12-13 Thread Juan José Santamaría Flecha
On Mon, Dec 13, 2021 at 9:41 AM Juan José Santamaría Flecha <
juanjo.santama...@gmail.com> wrote:

Per path tester.


> Regards,
>
> Juan José Santamaría Flecha
>


v2-0001-WIN32-pg_import_system_collations.patch
Description: Binary data


Re: daitch_mokotoff module

2021-12-13 Thread Tomas Vondra

On 12/13/21 16:05, Andrew Dunstan wrote:


On 12/13/21 09:26, Tomas Vondra wrote:

On 12/13/21 14:38, Dag Lem wrote:

Please find attached an updated patch, with the following fixes:

* Replaced remaining malloc/free with palloc/pfree.
* Made "make check" pass.
* Updated notes on other implementations.



Thanks, looks interesting. A couple generic comments, based on a quick
code review.

1) Can the extension be marked as trusted, just like fuzzystrmatch?

2) The docs really need an explanation of what the extension is for,
not just a link to fuzzystrmatch. Also, a couple examples would be
helpful, I guess - similarly to fuzzystrmatch. The last line in the
docs is annoyingly long.



It's not clear to me why we need a new module for this. Wouldn't it be
better just to add the new function to fuzzystrmatch?



Yeah, that's a valid point. I think we're quite conservative about 
adding more contrib modules, and adding a function to an existing one 
works around a lot of that.


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




speed up text_position() for utf-8

2021-12-13 Thread John Naylor
Hi,

Commit 9556aa01c69 (Use single-byte Boyer-Moore-Horspool search even
with multibyte encodings), was a speed improvement for the majority of
cases, but when the match was toward the end of the string, the
slowdown in text_position_get_match_pos() was noticeable. It was found
that there was a lot of overhead in pg_mblen(). [1]

The attached exploratory PoC improves this for utf-8. It applies on
top v25 of my utf-8 verification patch in [2], since one approach
relies on the DFA from it. The other three approaches are:
- a version of pg_utf_mblen() that uses a lookup table [3]
- an inlined copy of pg_utf_mblen()
- an ascii fast path with a fallback to the inlined copy of pg_utf_mblen()

The test is attached and the test function is part of the patch. It's
based on the test used in the commit above. The test searches for a
string that's at the end of a ~1 million byte string. This is on gcc
11 with 2-3 runs to ensure repeatability, but I didn't bother with
statistics because the differences are pretty big:

  patch  | no match | ascii | mulitbyte
-+--+---+---
 PG11| 1120 |  1100 |   900
 master  |  381 |  2350 |  1900
 DFA |  386 |  1640 |  1640
 branchless utf mblen|  387 |  4100 |  2600
 inline pg_utf_mblen()   |  380 |  1080 |   920
 inline pg_utf_mblen() + ascii fast path |  382 |   470 |   918

Neither of the branchless approaches worked well. The DFA can't work
as well here as in verification because it must do additional work.
Inlining pg_utf_mblen() restores worst-case performance to PG11
levels. The ascii fast path is a nice improvement on top of that. A
similar approach could work for pg_mbstrlen() as well, but I haven't
looked into that yet. There are other callers of pg_mblen(), but I
haven't looked into whether they are performance-sensitive. A more
general application would be preferable to a targeted one.

[1] 
https://www.postgresql.org/message-id/b65df3d8-1f59-3bd7-ebbe-68b81d5a76a4%40iki.fi
[2] 
https://www.postgresql.org/message-id/CAFBsxsHG%3Dg6W8Mie%2B_NO8dV6O0pO2stxrnS%3Dme5ZmGqk--fd5g%40mail.gmail.com
[3] https://github.com/skeeto/branchless-utf8/blob/master/utf8.h

--
John Naylor
EDB: http://www.enterprisedb.com
 src/backend/utils/adt/varlena.c | 112 
 src/common/wchar.c  |  90 ++--
 src/include/mb/pg_wchar.h   |  53 ---
 src/test/regress/regress.c  |  18 +++
 4 files changed, 133 insertions(+), 140 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index bd3091bbfb..cc93818007 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -26,6 +26,7 @@
 #include "common/unicode_norm.h"
 #include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "mb/pg_utf8.h"
 #include "miscadmin.h"
 #include "nodes/execnodes.h"
 #include "parser/scansup.h"
@@ -1468,6 +1469,116 @@ text_position_get_match_pos(TextPositionState *state)
 {
 	if (!state->is_multibyte)
 		return state->last_match - state->str1 + 1;
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+#if 0	/* dfa */
+
+		int			utf8_state_count[UTF8_LAST_STATE + 1] = {0};
+		uint32		dfa_state = BGN;
+
+		/*
+		 * See pg_utf8.h for an explanation of the state machine. Unlike in
+		 * pg_wchar.c, we must calculate the exact state so we can use it as
+		 * an array index.
+		 */
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint++;
+
+			dfa_state = (Utf8Transition[*s] >> dfa_state) & 31;
+			utf8_state_count[dfa_state]++;
+		}
+		Assert(state->refpoint == state->last_match);
+		state->refpos += utf8_state_count[END];
+		return state->refpos + 1;
+
+#endif
+#if 0	/* like pg_utf_mblen(), but different algorithm: */
+
+		static const char lengths[] = {
+			1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
+			1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4, 1
+		};
+
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint;
+
+			state->refpoint += lengths[s[0] >> 3];
+			state->refpos++;
+		}
+
+		Assert(state->refpoint == state->last_match);
+		return state->refpos + 1;
+#endif
+#if 0	/* inline pg_utf_mblen()*/
+
+		int len = 0;
+
+		while (state->refpoint < state->last_match)
+		{
+			const unsigned char *s = (const unsigned char *) state->refpoint;
+
+			if ((*s & 0x80) == 0)
+len = 1;
+			else if ((*s & 0xe0) == 0xc0)
+len = 2;
+			else if ((*s & 0xf0) == 0xe0)
+len = 3;
+			else if ((*s & 0xf8) == 0xf0)
+len = 4;
+			else
+len = 1;
+
+			state->refpoint += len;
+			state->refpos++;
+		}
+
+		Assert(state->refpoint == state->last_match);
+		

Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 10:46 AM Tom Lane  wrote:
> Robert Haas  writes:
> > I really am glad they haven't. I think it's super-annoying that we
> > need hacks like UINT64_FORMAT all over the place. I think it was a
> > mistake not to nail down the size that each type is expected to be in
> > the original C standard,
>
> Well, mumble.  One must remember that when C was designed, there was
> a LOT more variability in hardware designs than we see today.  They
> could not have put a language with fixed ideas about datatype widths
> onto, say, PDP-10s (36-bit words) or Crays (60-bit, IIRC).  But it
> is a darn shame that people weren't more consistent about mapping
> the C types onto machines with S/360-like addressing.

Sure.

> > and making more changes to the conventions
> > now would cause a whole bunch of unnecessary code churn, probably for
> > almost everybody using C.
>
> The error in your thinking is believing that there *is* a convention.
> There isn't; see "long".

I mean I pretty much pointed out exactly that thing with my mention of
UINT64_FORMAT, so I'm not sure why you're making it seem like I didn't
know that.

> Anyway, my point is that we have created a set of type names that
> have the semantics we want, and we should avoid confusing those with
> underlying C types that are *not* guaranteed to be the same thing.

I agree entirely, but it's still an annoyance when dealing with printf
format codes and other operating-system defined types whose width we
don't know. Standardization here makes it easier to write good code;
different conventions make it harder. I'm guessing that other people
have noticed that too, and that's why we're getting stuff like
__int128 instead of redefining long long.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: [PATCH]Comment improvement in publication.sql

2021-12-13 Thread vignesh C
On Wed, Dec 8, 2021 at 11:07 AM tanghy.f...@fujitsu.com
 wrote:
>
> On Wednesday, December 8, 2021 1:49 PM, vignesh C  wrote:
>
> > The patch no longer applies, could you post a rebased patch.
>
> Thanks for your kindly reminder. Attached a rebased patch.
> Some changes in v4 patch has been fixed by 5a28324, so I deleted those 
> changes.

Thanks for the updated patch, should we make a similar change in
AlterPublicationTables Function header to mention Set too:
/*
* Add or remove table to/from publication.
*/
static void
AlterPublicationTables(AlterPublicationStmt *stmt, HeapTuple tup,
List *tables, List *schemaidlist)

Regards,
Vignesh




Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-13 Thread Tom Lane
Robert Haas  writes:
> I really am glad they haven't. I think it's super-annoying that we
> need hacks like UINT64_FORMAT all over the place. I think it was a
> mistake not to nail down the size that each type is expected to be in
> the original C standard,

Well, mumble.  One must remember that when C was designed, there was
a LOT more variability in hardware designs than we see today.  They
could not have put a language with fixed ideas about datatype widths
onto, say, PDP-10s (36-bit words) or Crays (60-bit, IIRC).  But it
is a darn shame that people weren't more consistent about mapping
the C types onto machines with S/360-like addressing.

> and making more changes to the conventions
> now would cause a whole bunch of unnecessary code churn, probably for
> almost everybody using C.

The error in your thinking is believing that there *is* a convention.
There isn't; see "long".

Anyway, my point is that we have created a set of type names that
have the semantics we want, and we should avoid confusing those with
underlying C types that are *not* guaranteed to be the same thing.

regards, tom lane




Re: Failed transaction statistics to measure the logical replication progress

2021-12-13 Thread vignesh C
On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, December 6, 2021 11:27 PM vignesh C  wrote:
> > Thanks for the updated patch, few comments:
> Thank you for your review !
>
> > 1) We can keep the documentation similar to mention the count includes both
> > table sync worker / main apply worker in case of commit_count/error_count
> > and abort_count to keep it consistent.
> > +   commit_count bigint
> > +  
> > +  
> > +   Number of transactions successfully applied in this subscription.
> > +   COMMIT and COMMIT PREPARED increments this counter.
> > +  
> > + 
> > +
> > + 
> > +  
> > +   error_count bigint
> > +  
> > +  
> > +   Number of transactions that failed to be applied by the table
> > +   sync worker or main apply worker in this subscription.
> > +   
> > + 
> > +
> > + 
> > +  
> > +   abort_count bigint
> > +  
> > +  
> > +   Number of transactions aborted in this subscription.
> > +   ROLLBACK PREPARED increments this counter.
> > +  
> > + 
> Yeah, you are right. Fixed.
> Note that abort_count is not used by table sync worker.
>
>
> > 2) Can this be changed:
> > +   /*
> > +* If this is a new error reported by table sync worker,
> > consolidate this
> > +* error count into the entry of apply worker.
> > +*/
> > +   if (OidIsValid(msg->m_subrelid))
> > +   {
> > +   /* Gain the apply worker stats */
> > +   subwentry = pgstat_get_subworker_entry(dbentry,
> > + msg->m_subid,
> > +
> > InvalidOid, true);
> > +   subwentry->error_count++;
> > +   }
> > +   else
> > +   subwentry->error_count++;   /* increment the apply
> > worker's counter. */
> > To:
> > +   /*
> > +* If this is a new error reported by table sync worker,
> > consolidate this
> > +* error count into the entry of apply worker.
> > +*/
> > +   if (OidIsValid(msg->m_subrelid))
> > +   /* Gain the apply worker stats */
> > +   subwentry = pgstat_get_subworker_entry(dbentry,
> > + msg->m_subid,
> > +
> > InvalidOid, true);
> > +
> > +   subwentry->error_count++;   /* increment the apply
> > worker's counter. */
> Your suggestion looks better.
> Also, I fixed some comments of this part
> so that we don't need to add a separate comment at the bottom
> for the increment of the apply worker.
>
>
> > 3) Since both 026_worker_stats and 027_worker_xact_stats.pl are testing
> > pg_stat_subscription_workers, can we move the tests to 026_worker_stats.pl.
> > If possible the error_count validation can be combined with the existing 
> > tests.
> > diff --git a/src/test/subscription/t/027_worker_xact_stats.pl
> > b/src/test/subscription/t/027_worker_xact_stats.pl
> > new file mode 100644
> > index 000..31dbea1
> > --- /dev/null
> > +++ b/src/test/subscription/t/027_worker_xact_stats.pl
> > @@ -0,0 +1,162 @@
> > +
> > +# Copyright (c) 2021, PostgreSQL Global Development Group
> > +
> > +# Tests for subscription worker statistics during apply.
> > +use strict;
> > +use warnings;
> > +use PostgreSQL::Test::Cluster;
> > +use PostgreSQL::Test::Utils;
> > +use Test::More tests => 1;
> > +
> > +# Create publisher node
> Right. I've integrated my tests with 026_worker_stats.pl.
> I think error_count validations are combined as you suggested.
> Another change I did is to introduce one function
> to contribute to better readability of the stats tests.
>
> Here, the 026_worker_stats.pl didn't look aligned by
> pgperltidy. This is not a serious issue at all.
> Yet, when I ran pgperltidy, the existing codes
> that required adjustments came into my patch.
> Therefore, I made a separate part for this.

Thanks for the updated patch, few comments:
1) Can we change this:
/*
+* Report the success of table sync as one commit to consolidate all
+* transaction stats into one record.
+*/
+   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
+
  LOGICAL_REP_MSG_COMMIT);
+
To:
/* Report the success of table sync */
+   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
+
  LOGICAL_REP_MSG_COMMIT);
+

2) Typo: ealier should be earlier
+   /*
+* Report ealier than the call of process_syncing_tables() not
to miss an
+* increment of commit_count in case it leads to the process exit. See
+* process_syncing_tables_for_apply().
+*/
+   pgstat_report_subworker_xact_end(MyLogicalRepWorker->subid,
+
  LOGICAL_REP_MSG_COMMIT);
+

3) Should we add an Assert for subwentry:
+   /*
+* If this is a new error reported by table sync worker,
consolidate this
+* error count into the entry of apply worker, by swapping the stats
+* entries.
+*/
+   if (OidIsValid(msg->m_subrelid))
+   subwentry = pgstat_g

Re: speed up verifying UTF-8

2021-12-13 Thread John Naylor
On Fri, Dec 10, 2021 at 2:33 PM Heikki Linnakangas  wrote:

> I had another look at this now. Looks good, just a few minor comments below:

Thanks for reviewing! I've attached v25 to address your points.

> This function assumes that the input len is a multiple of 8. There's an
> assertion for that, but it would be good to also mention it in the
> function comment. I took me a moment to realize that.

Done.

> Given that assumption, I wonder if "while (len >= 0)" would marginally
> faster. Or compute "s_end = s + len" first, and check for "while (s <
> s_end)", so that you don't need to update 'len' in the loop.

With two chunks, gcc 4.8.5/11.2 and clang 12 will unroll the inner
loop, so it doesn't matter:

L51:
mov rdx, QWORD PTR [rdi]
mov rsi, QWORD PTR [rdi+8]
lea rax, [rdx+rbx]
lea rbp, [rsi+rbx]
and rax, rbp
and rax, r11
cmp rax, r11
jne .L66
or  rdx, rsi
testrdx, r11
jne .L66
sub r8d, 16  ; refers to "len" in the caller
pg_utf8_verifystr()
add rdi, 16
cmp r8d, 15
jg  .L51

I *think* these are the same instructions as from your version from
some time ago that handled two integers explicitly -- I rewrote it
like this to test different chunk sizes.

(Aside on 32-byte strides: Four chunks was within the noise level of
two chunks on the platform I tested. With 32 bytes, that increases the
chance that a mixed input would have non-ascii and defeat this
optimization, so should be significantly faster to make up for that.
Along those lines, in the future we could consider SSE2 (unrolled 2 x
16 bytes) for this path. Since it's part of the spec for x86-64, we
wouldn't need a runtime check -- just #ifdef it inline. And we could
piggy-back on the CRC SSE4.2 configure test for intrinsic support, so
that would avoid adding a bunch of complexity.)

That said, I think your suggestions are better on code clarity
grounds. I'm on the fence about "while(s < s_end)", so I went with
"while (len > 0)" because it matches the style in wchar.c.

> Also would be good to mention what exactly the return value means. I.e
> "returns false if the input contains any bytes with the high-bit set, or
> zeros".

Done.

> > + /*
> > +  * Check if any high bits in the zero accumulator got cleared.
> > +  *
> > +  * XXX: As noted above, the zero check is only valid if the chunk had 
> > no
> > +  * high bits set. However, the compiler may perform these two checks 
> > in
> > +  * any order. That's okay because if any high bits were set, we would
> > +  * return false regardless, so invalid results from the zero check 
> > don't
> > +  * matter.
> > +  */
> > + if (zero_cum != UINT64CONST(0x8080808080808080))
> > + return false;

> I don't understand the "the compiler may perform these checks in any
> order" comment. We trust the compiler to do the right thing, and only
> reorder things when it's safe to do so. What is special here, why is it
> worth mentioning here?

Ah, that's a good question, and now that you mention it, the comment
is silly. When looking at the assembly output a while back, I was a
bit astonished that it didn't match my mental model of what was
happening, so I made this note. I've removed the whole XXX comment
here and expanded the first comment in the loop to:

/*
 * Capture any zero bytes in this chunk.
 *
 * First, add 0x7f to each byte. This sets the high bit in each byte,
 * unless it was a zero. If any resulting high bits are zero, the
 * corresponding high bits in the zero accumulator will be cleared.
 *
 * If none of the bytes in the chunk had the high bit set, the max
 * value each byte can have after the addition is 0x7f + 0x7f = 0xfe,
 * and we don't need to worry about carrying over to the next byte. If
 * any input bytes did have the high bit set, it doesn't matter
 * because we check for those separately.
 */

> > @@ -1721,7 +1777,7 @@ pg_gb18030_verifystr(const unsigned char *s, int len)
> >   return s - start;
> >  }
> >
> > -static int
> > +static pg_noinline int
> >  pg_utf8_verifychar(const unsigned char *s, int len)
> >  {
> >   int l;
>
> Why force it to not be inlined?

Since the only direct caller is now only using it for small inputs, I
thought about saving space, but it's not enough to matter, so I'll go
ahead and leave it out. While at it, I removed the unnecessary
"inline" declaration for utf8_advance(), since the compiler can do
that anyway.

> > + * In a shift-based DFA, the input byte is an index into array of integers
> > + * whose bit pattern encodes the state transitions. To compute the current
> > + * state, we simply right-shift the integer by the current state and apply 
> > a
> > + * mask. In this scheme, the address of the transition only depends on the
> > + * input byte, so there is better pipelining.
>
> Should be "To

Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-13 Thread Robert Haas
On Mon, Dec 13, 2021 at 9:44 AM Tom Lane  wrote:
> Yeah, exactly.  That seems like a natural evolution:
> short -> 2 bytes
> int -> 4 bytes
> long -> 8 bytes
> long long -> 16 bytes
> so I'm surprised that vendors haven't done that already instead
> of inventing hacks like __int128.

I really am glad they haven't. I think it's super-annoying that we
need hacks like UINT64_FORMAT all over the place. I think it was a
mistake not to nail down the size that each type is expected to be in
the original C standard, and making more changes to the conventions
now would cause a whole bunch of unnecessary code churn, probably for
almost everybody using C. It's not like people are writing high-level
applications in C these days; it's all low-level stuff that is likely
to care about the width of a word. It seems much more sensible to
standardize on names for words of all lengths in the standard than to
do anything else. I don't really care whether the standard chooses
int128, int256, int512, etc. or long long long, long long long long,
etc. or reallylong, superlong, incrediblylong, etc. but I hope they
define new stuff instead of encouraging implementations to redefine
what's there already.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Shruthi Gowda
On Mon, Dec 6, 2021 at 11:25 PM Robert Haas  wrote:
>
> On Sun, Dec 5, 2021 at 11:44 PM Sadhuprasad Patro  wrote:
> > 3.
> > @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
> >   */
> >   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
> >
> > - do
> > + /* Select an OID for the new database if is not explicitly configured. */
> > + if (!OidIsValid(dboid))
> >   {
> > - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> > -Anum_pg_database_oid);
> > - } while (check_db_file_conflict(dboid));
> >
> > I think we need to do 'check_db_file_conflict' for the USER given OID
> > also.. right? It may already be present.
>
> Hopefully, if that happens, we straight up fail later on.

That's right. If a database with user-specified OID exists, the
createdb fails with a "duplicate key value" error.
If just a data directory with user-specified OID exists,
MakePGDirectory() fails to create the directory and the cleanup
callback createdb_failure_callback() removes the directory that was
not created by 'createdb()' function.
The subsequent create database call with the same OID will succeed.
Should we handle the case where a data directory exists and the
corresponding DB with that oid does not exist? I presume this
situation doesn't arise unless the user tries to create directories in
the data path. Any thoughts?


Thanks & Regards
Shruthi KC
EnterpriseDB: http://www.enterprisedb.com




Re: RecoveryInProgress() has critical side effects

2021-12-13 Thread Robert Haas
On Tue, Dec 7, 2021 at 5:55 PM Robert Haas  wrote:
> On Sat, Dec 4, 2021 at 7:44 PM Michael Paquier  wrote:
> > My main worry here is that this changes slightly the definition of
> > doPageWrites across stable branches at the end of recovery as there
> > could be records generated there.  Note that GetFullPageWriteInfo()
> > gets called in XLogInsert(), while Insert->fullPageWrites gets updated
> > before CleanupAfterArchiveRecovery().  And it may influence
> > the value of doPageWrites in the startup process.
>
> But ... so what? All the code that uses it retries if the value that
> was tentatively used turns out to be wrong.

Despite the fact that this question didn't get further discussion, and
the fact that nobody seems quite sure what the best way of proceeding
here is, my interpretation of the comments made so far is that nobody
thinks that what we have now is superior to either of the proposed
alternatives, and there's a weak preference for v2 over v1. So I
committed that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: daitch_mokotoff module

2021-12-13 Thread Andrew Dunstan


On 12/13/21 09:26, Tomas Vondra wrote:
> On 12/13/21 14:38, Dag Lem wrote:
>> Please find attached an updated patch, with the following fixes:
>>
>> * Replaced remaining malloc/free with palloc/pfree.
>> * Made "make check" pass.
>> * Updated notes on other implementations.
>>
>
> Thanks, looks interesting. A couple generic comments, based on a quick
> code review.
>
> 1) Can the extension be marked as trusted, just like fuzzystrmatch?
>
> 2) The docs really need an explanation of what the extension is for,
> not just a link to fuzzystrmatch. Also, a couple examples would be
> helpful, I guess - similarly to fuzzystrmatch. The last line in the
> docs is annoyingly long.


It's not clear to me why we need a new module for this. Wouldn't it be
better just to add the new function to fuzzystrmatch?


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: should we enable log_checkpoints out of the box?

2021-12-13 Thread Robert Haas
On Thu, Dec 9, 2021 at 11:02 AM Bharath Rupireddy
 wrote:
> May I know if the v2 patch (attached at [1] in this thread) is still
> of interest?
>
> CF entry is here - https://commitfest.postgresql.org/36/3401/
>
> [1] - 
> https://www.postgresql.org/message-id/CALj2ACU9cK4pCzcqvey71F57PTPsdxtUGmfUnQ7-GR4pTUgmeA%40mail.gmail.com

Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-13 Thread Tom Lane
Peter Eisentraut  writes:
> On 10.12.21 16:25, Tom Lane wrote:
>> Our experience with the variable size of "long" has left a sufficiently
>> bad taste in my mouth that I'm not enthused about adding hard-wired
>> assumptions that "long long" is identical to int64.  So this seems like
>> it's going in the wrong direction, and giving up portability that we
>> might want back someday.

> What kind of scenario do you have in mind?  Someone making their long 
> long int 128 bits?

Yeah, exactly.  That seems like a natural evolution:
short -> 2 bytes
int -> 4 bytes
long -> 8 bytes
long long -> 16 bytes
so I'm surprised that vendors haven't done that already instead
of inventing hacks like __int128.

Our current hard-coded uses of long long are all written on the
assumption that it's *at least* 64 bits, so we'd survive OK on
such a platform so long as we don't start confusing it with
*exactly* 64 bits.

regards, tom lane




Re: preserving db/ts/relfilenode OIDs across pg_upgrade (was Re: storing an explicit nonce)

2021-12-13 Thread Shruthi Gowda
On Mon, Dec 6, 2021 at 10:14 AM Sadhuprasad Patro  wrote:
>
> On Tue, Oct 26, 2021 at 6:55 PM Shruthi Gowda  wrote:
> >
> >
> > I have revised the patch w.r.t the way 'create_storage' is interpreted
> > in heap_create() along with some minor changes to preserve the DBOID
> > patch.
> >
>
> Hi Shruthi,
>
> I am reviewing the attached patches and providing a few comments here
> below for patch "v5-0002-Preserve-database-OIDs-in-pg_upgrade.patch"
>
> 1.
> --- a/doc/src/sgml/ref/create_database.sgml
> +++ b/doc/src/sgml/ref/create_database.sgml
> @@ -31,7 +31,8 @@ CREATE DATABASE  class="parameter">name
> -   [ IS_TEMPLATE [=]  class="parameter">istemplate ] ]
> +   [ IS_TEMPLATE [=]  class="parameter">istemplate ]
> +   [ OID [=]  class="parameter">db_oid ] ]
>
> Replace "db_oid" with 'oid'. Below in the listitem, we have mentioned 'oid'.

Replaced "db_oid" with "oid"

>
> 2.
> --- a/src/backend/commands/dbcommands.c
> +++ b/src/backend/commands/dbcommands.c
> + if ((dboid < FirstNormalObjectId) &&
> + (strcmp(dbname, "template0") != 0) &&
> + (!IsBinaryUpgrade))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE)),
> + errmsg("Invalid value for option \"%s\"", defel->defname),
> + errhint("The specified OID %u is less than the minimum OID for user
> objects %u.",
> + dboid, FirstNormalObjectId));
> + }
>
> Are we sure that 'IsBinaryUpgrade' will be set properly, before the
> createdb function is called? Can we recheck once ?

I believe 'IsBinaryUpgrade' will be set to true when pg_upgrade is invoked.
pg_ugrade internally does pg_dump and pg_restore for every database in
the cluster.

> 3.
> @@ -504,11 +525,15 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
>   */
>   pg_database_rel = table_open(DatabaseRelationId, RowExclusiveLock);
>
> - do
> + /* Select an OID for the new database if is not explicitly configured. */
> + if (!OidIsValid(dboid))
>   {
> - dboid = GetNewOidWithIndex(pg_database_rel, DatabaseOidIndexId,
> -Anum_pg_database_oid);
> - } while (check_db_file_conflict(dboid));
>
> I think we need to do 'check_db_file_conflict' for the USER given OID
> also.. right? It may already be present.

If a datafile with user-specified OID exists, the create database
fails with the below error.
postgres=# create database d2 oid 16452;
ERROR:  could not create directory "base/16452": File exists

> 4.
> --- a/src/bin/initdb/initdb.c
> +++ b/src/bin/initdb/initdb.c
>
> /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> +
>
> Better to mention here, why OID 4 is reserved for template0 database?.

The comment is updated to explain why template0 oid is fixed.

> 5.
> + /*
> + * Create template0 database with oid Template0ObjectId i.e, 4
> + */
> + static const char *const template0_setup[] = {
> + "CREATE DATABASE template0 IS_TEMPLATE = true ALLOW_CONNECTIONS = false OID 
> "
> + CppAsString2(Template0ObjectId) ";\n\n",
>
> Can we write something like, 'OID = CppAsString2(Template0ObjectId)'?
> mention "=".

Fixed

> 6.
> +
> + /*
> + * We use the OID of postgres to determine datlastsysoid
> + */
> + "UPDATE pg_database SET datlastsysoid = "
> + "(SELECT oid FROM pg_database "
> + "WHERE datname = 'postgres');\n\n",
> +
>
> Make the above comment a single line comment.

As Robert confirmed, this part of the code is moved from a different place.

> 7.
> There are some spelling mistakes in the comments as below, please
> correct the same
> + /*
> + * Make sure that binary upgrade propogate the database OID to the
> new => correct spelling
> + * cluster
> + */
>
> +/* OID 4 is reserved for Templete0 database */
>  > Correct spelling
> +#define Template0ObjectId 4
>

Fixed.

> I am reviewing another patch
> "v5-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg" as well
> and will provide the comments soon if any...

Thanks. I have rebased relfilenode oid preserve patch. You may use the
rebased patch for review.

Thanks & Regards
Shruthi K C
EnterpriseDB: http://www.enterprisedb.com


v6-0001-Preserve-relfilenode-and-tablespace-OID-in-pg_upg.patch
Description: Binary data


v6-0002-Preserve-database-OIDs-in-pg_upgrade.patch
Description: Binary data


Re: Documenting when to retry on serialization failure

2021-12-13 Thread Robert Haas
On Thu, Dec 9, 2021 at 7:43 AM Simon Riggs  wrote:
> I had a conversation with Kevin Grittner about retry some years back
> and it seemed clear that the application should re-execute application
> logic from the beginning, rather than just slavishly re-execute the
> same SQL. But that is not documented either.

Yeah, that would be good to mention somehow.

> Is *automatic* retry possible? In all cases? None? Or maybe Some?
>
> ISTM that we can't retry anything where a transaction has replied to a
> user and then the user issued a subsequent SQL statement, since we
> have no idea whether the subsequent SQL was influenced by the initial
> reply.

I agree.

> But what about the case of a single statement transaction? Can we just
> re-execute then? I guess if it didn't run anything other than
> IMMUTABLE functions then it should be OK, assuming the inputs
> themselves were immutable, which we've no way for the user to declare.
> Could we allow a user-defined auto_retry parameter?

I suppose in theory a user-defined parameter is possible, but I think
it's fundamentally better for this to be managed on the application
side. Even if the transaction is a single query, we don't know how
expensive that query is, and it's at least marginally possible that
the user might care about that. For example, if the user has set a
10-minute timeout someplace, and the query fails after 8 minutes, they
may want to retry. But if we retry automatically then they might hit
their timeout, or just be confused about why things are taking so
long. And they can always decide not to retry after all, but give up,
save it for a less busy period, or whatever.

> We don't mention that a transaction might just repeatedly fail either.

True. I think that's another good argument against an auto-retry system.

The main thing that worries me about an auto-retry system is something
else: I think it would rarely be applicable, and people would try to
apply it to situations where it won't actually work properly. I
believe most users who need to retry transactions that fail due to
serialization problems will need some real application logic to make
sure that they do the right thing. People with single-statement
transactions that can be blindly retried probably aren't using higher
isolation levels anyway, and probably won't have many failures even if
they are. SSI is really for sophisticated applications, and I think
trying to make it "just work" for people with dumb applications will,
well, just not work.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Multi-Column List Partitioning

2021-12-13 Thread Ashutosh Sharma
Hi,

Is this okay?

postgres=# CREATE TABLE t1 (a int, b int) PARTITION BY LIST ( a, a, a );
CREATE TABLE

postgres=# CREATE TABLE t1_1 PARTITION OF t1 FOR VALUES IN ((1, 2, 3), (4,
5, 6));
CREATE TABLE

postgres=# \d t1
   Partitioned table "public.t1"
 Column |  Type   | Collation | Nullable | Default
+-+---+--+-
 a  | integer |   |  |
 b  | integer |   |  |
Partition key: LIST (a, a, a)
Number of partitions: 1 (Use \d+ to list them.)

--

Also, getting some compiler warnings when building the source. please check.

--
With Regards,
Ashutosh Sharma.

On Mon, Dec 6, 2021 at 7:27 PM Nitin Jadhav 
wrote:

> Thank you for reviewing the patch.
>
> > partbounds.c: In function ‘get_qual_for_list.isra.18’:
> > partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> > datumCopy(bound_info->datums[i][j],
> >   ~~^~~~
> > partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> >  PartitionBoundInfo boundinfo;
> > ^
> > partbounds.c: In function ‘partition_bounds_merge’:
> > partbounds.c:1305:12: warning: ‘inner_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> >   bool*inner_isnull;
> >^~~~
> > partbounds.c:1304:12: warning: ‘outer_isnull’ may be used
> > uninitialized in this function [-Wmaybe-uninitialized]
> >   bool*outer_isnull;
> >   ^~~~
>
> Fixed.
>
> > This function is unnecessarily complicated, I think you can avoid
> > inner for loops; simply replace for-loop-block with  "if
> > (equal(lfirst(cell), new_bound)) return true".
>
> Thank you for the suggestion. Fixed.
>
> > + char   **colname = (char **) palloc0(partnatts * sizeof(char *));
> > + Oid*coltype = palloc0(partnatts * sizeof(Oid));
> > + int32*coltypmod = palloc0(partnatts * sizeof(int));
> > + Oid*partcollation = palloc0(partnatts * sizeof(Oid));
> > +
> > This allocation seems to be worthless, read ahead.
> >
> > I think there is no need for this separate loop inside
> > transformPartitionListBounds, you can do that same in the next loop as
> > well. And instead of  get_partition_col_* calling and storing, simply
> > use that directly as an argument to transformPartitionBoundValue().
>
> Yes. The loop can be avoided and content of the above loop can be
> included in the next loop but the next loop iterates over a list of
> multi column datums. For each iteration, we need the information of
> all the columns. The above data (colname, coltype, coltypmod and
> partcollation) remains same for each iteration of the loop, If we
> modify as suggested, then the function to fetch these information has
> to be called every-time. To avoid this situation I have made a
> separate loop outside which only runs as many number of columns and
> stores in a variable which can be reused later. Please let me correct
> if I am wrong.
>
> > I think this should be inside the "else" block after "!IsA(rowexpr,
> > RowExpr)" error and you can avoid IsA() check too.
>
> This is required to handle the situation when one partition key is
> mentioned and multiple values are provided in the partition bound
> specification.
>
> > Looks difficult to understand at first glance, how about the following:
> >
> > if (b1->isnulls != b2->isnulls)
> >return false;
> >
> > if (b1->isnulls)
> > {
> >if (b1->isnulls[i][j] != b2->isnulls[i][j])
> >return false;
> >if (b1->isnulls[i][j])
> >continue;
> > }
> >
> > See how range partitioning infinite values are handled. Also, place
> > this before the comment block that was added for the "!datumIsEqual()"
> > case.
>
> Fixed. I feel the 'continue' block is not required and hence removed it.
>
> > Nothing wrong with this but if we could have checked "dest->isnulls"
> > instead of "src->isnulls" would be much better.
>
> Here we are copying the data from 'src' to 'dest'. If there is no data
> in 'src', it is unnecessary to copy. Hence checking 'src'.
>
> > Condition "key->strategy != PARTITION_STRATEGY_LIST" seems to be
> unnecessary.
>
> Fixed.
>
> > Can't be a single loop?
>
> Yes. Fixed.
>
>
>
> On Fri, Dec 3, 2021 at 7:26 PM Amul Sul  wrote:
> >
> > Hi,
> >
> > Few comments for v7 patch, note that I haven't been through the
> > previous discussion, if any of the review comments that has been
> > already discussed & overridden, then please ignore here too:
> >
> >
> > partbounds.c: In function ‘get_qual_for_list.isra.18’:
> > partbounds.c:4284:29: warning: ‘boundinfo’ may be used uninitialized
> > in this function [-Wmaybe-uninitialized]
> >  datumCopy(bound_info->datums[i][j],
> >~~^~~~
> > partbounds.c:4335:21: note: ‘boundinfo’ was declared here
> >   PartitionBoundInfo boundinfo;
> >  ^
> > partbounds.c: In function ‘partition_bounds_merge’:
>

Re: [Proposal] Fully WAL logged CREATE DATABASE - No Checkpoints

2021-12-13 Thread Dilip Kumar
On Mon, Dec 13, 2021 at 8:34 AM Ashutosh Sharma  wrote:
>
> +   /*
> +* If the relation is from the default tablespace then we need to
> +* create it in the destinations db's default tablespace.  Otherwise,
> +* we need to create in the same tablespace as it is in the source
> +* database.
> +*/
>
> This comment looks a bit confusing to me especially because when we say 
> destination db's default tablespace people may think of pg_default tablespace 
> (at least I think so). Basically what you are trying to say here - "If the 
> relation exists in the same tablespace as the src database, then in the 
> destination db also it should be the same or something like that.. " So, why 
> not put it that way instead of referring to it as the default tablespace. 
> It's just my view. If you disagree you can ignore it.
>
> --
>
> +   else if (src_dboid == dst_dboid)
> +   continue;
> +   else
> +   dstrnode.spcNode = srcrnode.spcNode;;
>
> There is an extra semicolon here.


Noted. I will fix them in the next version.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com




Re: Add sub-transaction overflow status in pg_stat_activity

2021-12-13 Thread Dilip Kumar
On Tue, Dec 7, 2021 at 11:11 AM Justin Pryzby  wrote:

>
> You added this to pg_stat_activity, which already has a lot of fields.
> We talked a few months ago about not adding more fields that weren't commonly
> used.
> https://www.postgresql.org/message-id/flat/20210426191811.sp3o77doinphyjhu%40alap3.anarazel.de#d96d0a116f0344301eead2676ea65b2e
>
> Since I think this field is usually not interesting to most users of
> pg_stat_activity, maybe this should instead be implemented as a function like
> pg_backend_get_subxact_status(pid).
>
> People who want to could use it like:
> SELECT * FROM pg_stat_activity psa, pg_backend_get_subxact_status(pid) sub;

I have provided two function, one for subtransaction counts and other
whether subtransaction cache is overflowed or not, we can use like
this,  if we think this is better way to do it then we can also add
another function for the lastOverflowedXid

postgres[43994]=# select id, pg_stat_get_backend_pid(id) as pid,
pg_stat_get_backend_subxact_count(id) as nsubxact,
pg_stat_get_backend_subxact_overflow(id) as overflowed from
pg_stat_get_backend_idset() as id;
 id |  pid  | nsubxact | overflowed
+---+--+
  1 | 43806 |0 | f
  2 | 43983 |   64 | t
  3 | 43994 |0 | f
  4 | 44323 |   22 | f
  5 | 43802 |0 | f
  6 | 43801 |0 | f
  7 | 43804 |0 | f
(7 rows)

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com
From 8a9aa7ba753f8ae77651861629266b86b276bf11 Mon Sep 17 00:00:00 2001
From: Dilip Kumar 
Date: Sun, 12 Dec 2021 17:10:55 +0530
Subject: [PATCH v2] Add functions to show subtransaction count and overflow
 status

If there are some backends having a lot of nested subtransaction
or the subtransaction cache is overflowed there is a no way to
detect that.  So this patch is making that easy by providing
functions to get that information.
---
 doc/src/sgml/monitoring.sgml| 27 +++
 src/backend/storage/ipc/sinvaladt.c | 13 +
 src/backend/utils/activity/backend_status.c |  4 +++-
 src/backend/utils/adt/pgstatfuncs.c | 23 +++
 src/include/catalog/pg_proc.dat |  8 
 src/include/storage/sinvaladt.h |  4 +++-
 src/include/utils/backend_status.h  | 11 +++
 7 files changed, 84 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 62f2a33..77c6cfc 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -5482,6 +5482,33 @@ SELECT pg_stat_get_backend_pid(s.backendid) AS pid,
   

 
+ pg_stat_get_backend_subxact_count
+
+pg_stat_get_backend_subxact_count ( integer )
+integer
+   
+   
+Returns the number of cached subtransactions of this backend.
+   
+  
+
+  
+   
+
+ pg_stat_get_backend_subxact_overflow
+
+pg_stat_get_backend_subxact_overflow ( integer )
+boolean
+   
+   
+Returns true if the number of active subtransactions crossed per
+backend subtransaction cache limit, false otherwise.
+   
+  
+
+  
+   
+
  pg_stat_get_backend_userid
 
 pg_stat_get_backend_userid ( integer )
diff --git a/src/backend/storage/ipc/sinvaladt.c b/src/backend/storage/ipc/sinvaladt.c
index 946bd8e..876d7fe 100644
--- a/src/backend/storage/ipc/sinvaladt.c
+++ b/src/backend/storage/ipc/sinvaladt.c
@@ -395,17 +395,20 @@ BackendIdGetProc(int backendID)
 
 /*
  * BackendIdGetTransactionIds
- *		Get the xid and xmin of the backend. The result may be out of date
- *		arbitrarily quickly, so the caller must be careful about how this
- *		information is used.
+ *		Get the xid and xmin, nsubxid and overflow status of the backend. The
+ *		result may be out of date arbitrarily quickly, so the caller must be
+ *		careful about how this information is used.
  */
 void
-BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmin)
+BackendIdGetTransactionIds(int backendID, TransactionId *xid,
+		   TransactionId *xmin, int *nsubxid, bool *overflowed)
 {
 	SISeg	   *segP = shmInvalBuffer;
 
 	*xid = InvalidTransactionId;
 	*xmin = InvalidTransactionId;
+	*nsubxid = 0;
+	*overflowed = false;
 
 	/* Need to lock out additions/removals of backends */
 	LWLockAcquire(SInvalWriteLock, LW_SHARED);
@@ -419,6 +422,8 @@ BackendIdGetTransactionIds(int backendID, TransactionId *xid, TransactionId *xmi
 		{
 			*xid = proc->xid;
 			*xmin = proc->xmin;
+			*nsubxid = proc->subxidStatus.count;
+			*overflowed = proc->subxidStatus.overflowed;
 		}
 	}
 
diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
index 7229598..9c904be 100644
--- a/src/backend/utils/activity/backend_status.c
+++ b/src/backend/utils/activity/backend_status.c
@@ -848,7 +848,9 @@ pgstat

Re: daitch_mokotoff module

2021-12-13 Thread Tomas Vondra

On 12/13/21 14:38, Dag Lem wrote:

Please find attached an updated patch, with the following fixes:

* Replaced remaining malloc/free with palloc/pfree.
* Made "make check" pass.
* Updated notes on other implementations.



Thanks, looks interesting. A couple generic comments, based on a quick 
code review.


1) Can the extension be marked as trusted, just like fuzzystrmatch?

2) The docs really need an explanation of what the extension is for, not 
just a link to fuzzystrmatch. Also, a couple examples would be helpful, 
I guess - similarly to fuzzystrmatch. The last line in the docs is 
annoyingly long.


3) What's daitch_mokotov_header.pl for? I mean, it generates the header, 
but when do we need to run it?


4) It seems to require perl-open, which is a module we did not need 
until now. Not sure how well supported it is, but maybe we can use a 
more standard module?


5) Do we need to keep DM_MAIN? It seems to be meant for some kind of 
testing, but our regression tests certainly don't need it (or the palloc 
mockup). I suggest to get rid of it.


6) I really don't understand some of the comments in daitch_mokotov.sql, 
like for example:


-- testEncodeBasic
-- Tests covered above are omitted.

Also, comments with names of Java methods seem pretty confusing. It'd be 
better to actually explain what rules are the tests checking.


7) There are almost no comments in the .c file (ignoring the comment on 
top). Short functions like initialize_node are probably fine without 
one, but e.g. update_node would deserve one.


8) Some of the lines are pretty long (e.g. the update_node signature is 
almost 170 chars). That should be wrapped. Maybe try running pgindent on 
the code, that'll show which parts need better formatting (so as not to 
get broken later).


9) I'm sure there's better way to get the number of valid chars than this:

  for (i = 0, ilen = 0; (c = read_char(&str[i], &ilen)) && (c < 'A' || 
c > ']'); i += ilen)

  {
  }

Say, a while loop or something?


regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: WIP: WAL prefetch (another approach)

2021-12-13 Thread Robert Haas
On Fri, Nov 26, 2021 at 9:47 PM Tom Lane  wrote:
> Yeah ... on the one hand, that machine has shown signs of
> hard-to-reproduce flakiness, so it's easy to write off the failures
> I saw as hardware issues.  On the other hand, the flakiness I've
> seen has otherwise manifested as kernel crashes, which is nothing
> like the consistent test failures I was seeing with the patch.
>
> Andres speculated that maybe we were seeing a kernel bug that
> affects consistency of concurrent reads and writes.  That could
> be an explanation; but it's just evidence-free speculation so far,
> so I don't feel real convinced by that idea either.
>
> Anyway, I hope to find time to see if the issue still reproduces
> with Thomas' new patch set.

Honestly, all the reasons that Thomas articulated for the revert seem
relatively unimpressive from my point of view. Perhaps they are
sufficient justification for a revert so near to the end of the
development cycle, but that's just an argument for committing things a
little sooner so we have time to work out the kinks. This kind of work
is too valuable to get hung up for a year or three because of a couple
of minor preexisting bugs and/or preexisting maybe-bugs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: should we document an example to set multiple libraries in shared_preload_libraries?

2021-12-13 Thread Robert Haas
On Thu, Dec 9, 2021 at 2:32 AM Maciek Sakrejda  wrote:
> > Considering the vanishingly small number of actual complaints we've
> > seen about this, that sounds ridiculously over-engineered.
> > A documentation example should be sufficient.
>
> I don't know if this will tip the scales, but I'd like to lodge a
> belated complaint. I've gotten myself in this server-fails-to-start
> situation several times (in development, for what it's worth). The
> syntax (as Bharath pointed out in the original message) is pretty
> picky, there are no guard rails, and if you got there through ALTER
> SYSTEM, you can't fix it with ALTER SYSTEM (because the server isn't
> up). If you go to fix it manually, you get a scary "Do not edit this
> file manually!" warning that you have to know to ignore in this case
> (that's if you find the file after you realize what the fairly generic
> "FATAL: ... No such file or directory" error in the log is telling
> you). Plus you have to get the (different!) quoting syntax right or
> cut your losses and delete the change.

+1. I disagree that trying to detect this kind of problem would be
"ridiculously over-engineered." I don't know whether it can be done
elegantly enough that we'd be happy with it and I don't know whether
it would end up just garden variety over-engineered. But there's
nothing ridiculous about trying to prevent people from putting their
system into a state where it won't start.

(To be clear, I also think updating the documentation is sensible,
without taking a view on exactly what that update should look like.)

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: extended stats on partitioned tables

2021-12-13 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 11:23:19PM +0100, Tomas Vondra wrote:
> On 12/12/21 22:32, Justin Pryzby wrote:
> > On Sun, Dec 12, 2021 at 05:17:10AM +0100, Tomas Vondra wrote:
> > > The one thing bugging me a bit is that the regression test checks only a
> > > GROUP BY query. It'd be nice to add queries testing MCV/dependencies
> > > too, but that seems tricky because most queries will use per-partitions
> > > stats.
> > 
> > You mean because the quals are pushed down to the scan node.
> > 
> > Does that indicate a deficiency ?
> > 
> > If extended stats are collected for a parent table, selectivity estimates 
> > based
> > from the parent would be better; but instead we use uncorrected column
> > estimates from the child tables.
> > 
> > From what I see, we could come up with a way to avoid the pushdown, 
> > involving
> > volatile functions/foreign tables/RLS/window functions/SRF/wholerow 
> > vars/etc.
> > But would it be better if extended stats objects on partitioned tables were 
> > to
> > collect stats for both parent AND CHILD ?  I'm not sure.  Maybe that's the
> > wrong solution, but maybe we should still document that extended stats on
> > (empty) parent tables are often themselves not used/useful for selectivity
> > estimates, and the user should instead (or in addition) create stats on 
> > child
> > tables.
> > 
> > Or, maybe if there's no extended stats on the child tables, stats on the 
> > parent
> > table should be consulted ?
> 
> Maybe, but that seems like a mostly separate improvement. At this point I'm
> interested only in testing the behavior implemented in the current patches.

I don't want to change the scope of the patch, or this thread, but my point is
that the behaviour already changed once (the original regression) and now we're
planning to change it again to fix that, so we ought to decide on the expected
behavior before writing tests to verify it.

I think it may be impossible to use the "dependencies" statistic with inherited
stats.  Normally the quals would be pushed down to the child tables.  But, if
they weren't pushed down, they'd be attached to something other than a scan
node on the parent table, so the stats on that table wouldn't apply (right?).  

Maybe the useless stats types should have been prohibited on partitioned tables
since v10.  It's too late to change that, but perhaps now they shouldn't even
be collected during analyze.  The dependencies and MCV paths are never called
with rte->inh==true, so maybe we should Assert(!inh), or add a comment to that
effect.  Or the regression tests should "memorialize" the behavior.  I'm still
thinking about it.

-- 
Justin




Re: O(n) tasks cause lengthy startups and checkpoints

2021-12-13 Thread Robert Haas
On Fri, Dec 10, 2021 at 2:03 PM Bossart, Nathan  wrote:
> Well, I haven't had a chance to look at your patch, and my patch set
> still only has handling for CheckPointSnapBuild() and
> RemovePgTempFiles(), but I thought I'd share what I have anyway.  I
> split it into 5 patches:
>
> 0001 - Adds a new "custodian" auxiliary process that does nothing.
> 0002 - During startup, remove the pgsql_tmp directories instead of
>only clearing the contents.
> 0003 - Split temporary file cleanup during startup into two stages.
>The first renames the directories, and the second clears them.
> 0004 - Moves the second stage from 0003 to the custodian process.
> 0005 - Moves CheckPointSnapBuild() to the custodian process.

I don't know whether this kind of idea is good or not.

One thing we've seen a number of times now is that entrusting the same
process with multiple responsibilities often ends poorly. Sometimes
it's busy with one thing when another thing really needs to be done
RIGHT NOW. Perhaps that won't be an issue here since all of these
things are related to checkpointing, but then the process name should
reflect that rather than making it sound like we can just keep piling
more responsibilities onto this process indefinitely. At some point
that seems bound to become an issue.

Another issue is that we don't want to increase the number of
processes without bound. Processes use memory and CPU resources and if
we run too many of them it becomes a burden on the system. Low-end
systems may not have too many resources in total, and high-end systems
can struggle to fit demanding workloads within the resources that they
have. Maybe it would be cheaper to do more things at once if we were
using threads rather than processes, but that day still seems fairly
far off.

But against all that, if these tasks are slowing down checkpoints and
that's avoidable, that seems pretty important too. Interestingly, I
can't say that I've ever seen any of these things be a problem for
checkpoint or startup speed. I wonder why you've had a different
experience.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: extended stats on partitioned tables

2021-12-13 Thread Justin Pryzby
On Sun, Dec 12, 2021 at 10:29:39PM +0100, Tomas Vondra wrote:
> > In your 0003 patch, the "if inh: break" isn't removed from 
> > examine_variable(),
> > but the corresponding thing is removed everywhere else.
> 
> Ah, you're right. And it wasn't updated in the 0002 patch either - it
> should do the relkind check too, to allow partitioned tables. Fixed.

I think you fixed it in 0002 (thanks) but still wasn't removed from 0003?

In these comments:
+* When dealing with regular inheritance trees, ignore extended stats
+* (which were built without data from child rels, and thus do not
+* represent them). For partitioned tables data from partitions are
+* in the stats (and there's no data in the non-leaf relations), so
+* in this case we do consider extended stats.

I suggest to add a comment after "For partitioned tables".

-- 
Justin




Re: daitch_mokotoff module

2021-12-13 Thread Dag Lem
Please find attached an updated patch, with the following fixes:

* Replaced remaining malloc/free with palloc/pfree.
* Made "make check" pass.
* Updated notes on other implementations.

Best regards

Dag Lem

diff --git a/contrib/Makefile b/contrib/Makefile
index 87bf87ab90..5ea729 100644
--- a/contrib/Makefile
+++ b/contrib/Makefile
@@ -14,6 +14,7 @@ SUBDIRS = \
 		btree_gist	\
 		citext		\
 		cube		\
+		daitch_mokotoff	\
 		dblink		\
 		dict_int	\
 		dict_xsyn	\
diff --git a/contrib/daitch_mokotoff/Makefile b/contrib/daitch_mokotoff/Makefile
new file mode 100644
index 00..baec5e31d4
--- /dev/null
+++ b/contrib/daitch_mokotoff/Makefile
@@ -0,0 +1,25 @@
+# contrib/daitch_mokotoff/Makefile
+
+MODULE_big = daitch_mokotoff
+OBJS = \
+	$(WIN32RES) \
+	daitch_mokotoff.o
+
+EXTENSION = daitch_mokotoff
+DATA = daitch_mokotoff--1.0.sql
+PGFILEDESC = "daitch_mokotoff - Daitch-Mokotoff Soundex System"
+
+HEADERS = daitch_mokotoff.h
+
+REGRESS = daitch_mokotoff
+
+ifdef USE_PGXS
+PG_CONFIG = pg_config
+PGXS := $(shell $(PG_CONFIG) --pgxs)
+include $(PGXS)
+else
+subdir = contrib/daitch_mokotoff
+top_builddir = ../..
+include $(top_builddir)/src/Makefile.global
+include $(top_srcdir)/contrib/contrib-global.mk
+endif
diff --git a/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql b/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql
new file mode 100644
index 00..0b5a643175
--- /dev/null
+++ b/contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql
@@ -0,0 +1,8 @@
+/* contrib/daitch_mokotoff/daitch_mokotoff--1.0.sql */
+
+-- complain if script is sourced in psql, rather than via CREATE EXTENSION
+\echo Use "CREATE EXTENSION daitch_mokotoff" to load this file. \quit
+
+CREATE FUNCTION daitch_mokotoff(text) RETURNS text
+AS 'MODULE_PATHNAME', 'daitch_mokotoff'
+LANGUAGE C IMMUTABLE STRICT PARALLEL SAFE;
diff --git a/contrib/daitch_mokotoff/daitch_mokotoff.c b/contrib/daitch_mokotoff/daitch_mokotoff.c
new file mode 100644
index 00..a7f1fd8541
--- /dev/null
+++ b/contrib/daitch_mokotoff/daitch_mokotoff.c
@@ -0,0 +1,549 @@
+/*
+ * Daitch-Mokotoff Soundex
+ *
+ * Copyright (c) 2021 Finance Norway
+ * Author: Dag Lem 
+ *
+ * This implementation of the Daitch-Mokotoff Soundex System aims at high
+ * performance.
+ *
+ * - The processing of each phoneme is initiated by an O(1) table lookup.
+ * - For phonemes containing more than one character, a coding tree is traversed
+ *   to process the complete phoneme.
+ * - The (alternate) soundex codes are produced digit by digit in-place in
+ *   another tree structure.
+ *
+ *  References:
+ *
+ * https://www.avotaynu.com/soundex.htm
+ * https://www.jewishgen.org/InfoFiles/Soundex.html
+ * https://familypedia.fandom.com/wiki/Daitch-Mokotoff_Soundex
+ * https://stevemorse.org/census/soundex.html (dmlat.php, dmsoundex.php)
+ * https://github.com/apache/commons-codec/ (dmrules.txt, DaitchMokotoffSoundex.java)
+ * https://metacpan.org/pod/Text::Phonetic (DaitchMokotoff.pm)
+ *
+ * A few notes on other implementations:
+ *
+ * - "J" is considered a vowel in dmlat.php
+ * - The official rules for "RS" are commented out in dmlat.php
+ * - Identical code digits for adjacent letters are not collapsed correctly in
+ *   dmsoundex.php when double digit codes are involved. E.g. "BESST" yields
+ *   744300 instead of 743000 as for "BEST".
+ * - "Y" is not considered a vowel in DaitchMokotoffSoundex.java
+ * - Both dmlat.php and dmrules.txt have the same unofficial rules for "UE".
+ * - Coding of MN/NM + M/N differs between dmsoundex.php and DaitchMokotoffSoundex.java
+ * - No other known implementation yields the correct set of codes for e.g.
+ *   "CJC" (55 54 545000 45 40 44).
+ *
+ * Permission to use, copy, modify, and distribute this software and its
+ * documentation for any purpose, without fee, and without a written agreement
+ * is hereby granted, provided that the above copyright notice and this
+ * paragraph and the following two paragraphs appear in all copies.
+ *
+ * IN NO EVENT SHALL THE AUTHOR OR DISTRIBUTORS BE LIABLE TO ANY PARTY FOR
+ * DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING
+ * LOST PROFITS, ARISING OUT OF THE USE OF THIS SOFTWARE AND ITS
+ * DOCUMENTATION, EVEN IF THE AUTHOR OR DISTRIBUTORS HAVE BEEN ADVISED OF THE
+ * POSSIBILITY OF SUCH DAMAGE.
+ *
+ * THE AUTHOR AND DISTRIBUTORS SPECIFICALLY DISCLAIM ANY WARRANTIES,
+ * INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY
+ * AND FITNESS FOR A PARTICULAR PURPOSE.  THE SOFTWARE PROVIDED HEREUNDER IS
+ * ON AN "AS IS" BASIS, AND THE AUTHOR AND DISTRIBUTORS HAS NO OBLIGATIONS TO
+ * PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
+*/
+
+#include "daitch_mokotoff.h"
+
+#ifndef DM_MAIN
+
+#include "postgres.h"
+#include "utils/builtins.h"
+#include "mb/pg_wchar.h"
+
+#else			/* DM_MAIN */
+
+#include 
+#include 
+
+void *
+palloc(size_t size)
+{
+	void	   *ptr;
+
+	ptr = malloc(size);
+
+	if (ptr == NULL)
+	{

Re: Skipping logical replication transactions on subscriber side

2021-12-13 Thread Masahiko Sawada
On Mon, Dec 13, 2021 at 1:04 PM Amit Kapila  wrote:
>
> On Mon, Dec 13, 2021 at 8:28 AM Masahiko Sawada  wrote:
> >
> > On Sat, Dec 11, 2021 at 3:29 PM Amit Kapila  wrote:
> > >
> > > 3.
> > > + * Also, we don't skip receiving the changes in streaming cases,
> > > since we decide
> > > + * whether or not to skip applying the changes when starting to apply 
> > > changes.
> > >
> > > But why so? Can't we even skip streaming (and writing to file all such
> > > messages)? If we can do this then we can avoid even collecting all
> > > messages in a file.
> >
> > IIUC in streaming cases, a transaction can be sent to the subscriber
> > while splitting into multiple chunks of changes. In the meanwhile,
> > skip_xid can be changed. If the user changed or cleared skip_xid after
> > the subscriber skips some streamed changes, the subscriber won't able
> > to have complete changes of the transaction.
> >
>
> Yeah, I think if we want we can handle this by writing into the stream
> xid file whether the changes need to be skipped and then the
> consecutive streams can check that in the file or may be in some way
> don't allow skip_xid to be changed in worker if it is already skipping
> some xact. If we don't want to do anything for this then it is better
> to at least reflect this reasoning in the comments.

Yes. Given that we still need to apply messages other than
data-modification messages, we need to skip writing only these changes
to the stream file.

>
> > >
> > > 4.
> > > + * Also, one might think that we can skip preparing the skipped 
> > > transaction.
> > > + * But if we do that, PREPARE WAL record won’t be sent to its physical
> > > + * standbys, resulting in that users won’t be able to find the prepared
> > > + * transaction entry after a fail-over.
> > > + *
> > > ..
> > > + */
> > > + if (skipping_changes)
> > > + stop_skipping_changes(false);
> > >
> > > Why do we need such a Prepare's entry either at current subscriber or
> > > on its physical standby? I think it is to allow Commit-prepared. If
> > > so, how about if we skip even commit prepared as well? Even on
> > > physical standby, we would be having the value of skip_xid which can
> > > help us to skip there as well after failover.
> >
> > It's true that skip_xid would be set also on physical standby. When it
> > comes to preparing the skipped transaction on the current subscriber,
> > if we want to skip commit-prepared I think we need protocol changes in
> > order for subscribers to know prepare_lsn and preppare_timestampso
> > that it can lookup the prepared transaction when doing
> > commit-prepared. I proposed this idea before. This change would be
> > benefical as of now since the publisher sends even empty transactions.
> > But considering the proposed patch[1] that makes the puslisher not
> > send empty transaction, this protocol change would be an optimization
> > only for this feature.
> >
>
> I was thinking to compare the xid received as part of the
> commit_prepared message with the value of skip_xid to skip the
> commit_prepared but I guess the user would change it between prepare
> and commit prepare and then we won't be able to detect it, right? I
> think we can handle this and the streaming case if we disallow users
> to change the value of skip_xid when we are already skipping changes
> or don't let the new skip_xid to reflect in the apply worker if we are
> already skipping some other transaction. What do you think?

In streaming cases, we don’t know when stream-commit or stream-abort
comes and another conflict could occur on the subscription in the
meanwhile. But given that (we expect) this feature is used after the
apply worker enters into an error loop, this is unlikely to happen in
practice unless the user sets the wrong XID. Similarly, in 2PC cases,
we don’t know when commit-prepared or rollback-prepared comes and
another conflict could occur in the meanwhile. But this could occur in
practice even if the user specified the correct XID. Therefore, if we
disallow to change skip_xid until the subscriber receives
commit-prepared or rollback-prepared, we cannot skip the second
transaction that conflicts with data on the subscriber.

>From the application perspective, which behavior is preferable between
skipping preparing a transaction and preparing an empty transaction,
in the first place? From the resource consumption etc., skipping
preparing transactions seems better. On the other hand, if we skipped
preparing the transaction, the application would not be able to find
the prepared transaction after a fail-over to the subscriber.

Regards,

--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: [PATCH] pg_stat_toast

2021-12-13 Thread Gunnar "Nick" Bluth

Am 13.12.21 um 00:41 schrieb Andres Freund:

Hi,

On 2021-12-13 00:00:23 +0100, Gunnar "Nick" Bluth wrote:

Regarding stats size; it adds one PgStat_BackendToastEntry
(PgStat_BackendAttrIdentifier + PgStat_ToastCounts, should be 56-64 bytes or
something in that ballpark) per TOASTable attribute, I can't see that make
any system break sweat ;-)


That's actually a lot. The problem is that all the stats data for a database
is loaded into private memory for each connection to that database, and that
the stats collector regularly writes out all the stats data for a database.


My understanding is that the stats file is only pulled into the backend 
when the SQL functions (for the view) are used (see 
"pgstat_fetch_stat_toastentry()").


Otherwise, a backend just initializes an empty hash, right?

Of which I reduced the initial size from 512 to 32 for the below tests 
(I guess the "truth" lies somewhere in between here), along with making 
the GUC parameter an actual GUC parameter and disabling the elog() calls 
I scattered all over the place ;-) for the v0.2 patch attached.



A quick run comparing 1.000.000 INSERTs (2 TOASTable columns each) with and
without "pgstat_track_toast" resulted in 12792.882 ms vs. 12810.557 ms. So
at least the call overhead seems to be neglectible.


Yea, you'd probably need a few more tables and a few more connections for it
to have a chance of mattering meaningfully.


So, I went ahead and
* set up 2 clusters with "track_toast" off and on resp.
* created 100 DBs
 * each with 100 tables
 * with one TOASTable column in each table
 * filling those with 32000 bytes of md5 garbage

These clusters sum up to ~ 2GB each, so differences should _start to_ 
show up, I reckon.


$ du -s testdb*
2161208 testdb
2163240 testdb_tracking

$ du -s testdb*/pg_stat
4448testdb/pg_stat
4856testdb_tracking/pg_stat

The db_*.stat files are 42839 vs. 48767 bytes each (so confirmed, the 
differences do show).



No idea if this is telling us anything, tbth, but the 
/proc//smaps_rollup for a backend serving one of these DBs look 
like this ("0 kB" lines omitted):


track_toast OFF
===
Rss:   12428 kB
Pss:5122 kB
Pss_Anon:   1310 kB
Pss_File:   2014 kB
Pss_Shmem:  1797 kB
Shared_Clean:   5864 kB
Shared_Dirty:   3500 kB
Private_Clean:  1088 kB
Private_Dirty:  1976 kB
Referenced:11696 kB
Anonymous:  2120 kB

track_toast ON (view not called yet):
=
Rss:   12300 kB
Pss:4883 kB
Pss_Anon:   1309 kB
Pss_File:   1888 kB
Pss_Shmem:  1685 kB
Shared_Clean:   6040 kB
Shared_Dirty:   3468 kB
Private_Clean:   896 kB
Private_Dirty:  1896 kB
Referenced:11572 kB
Anonymous:  2116 kB

track_toast ON (view called):
=
Rss:   15408 kB
Pss:7482 kB
Pss_Anon:   2083 kB
Pss_File:   2572 kB
Pss_Shmem:  2826 kB
Shared_Clean:   6616 kB
Shared_Dirty:   3532 kB
Private_Clean:  1472 kB
Private_Dirty:  3788 kB
Referenced:14704 kB
Anonymous:  2884 kB

That backend used some memory for displaying the result too, of course...

A backend with just two TOAST columns in one table (filled with 
1.000.001 rows) looks like this before and after calling the 
"pg_stat_toast" view:

Rss:  146208 kB
Pss:  116181 kB
Pss_Anon:   2050 kB
Pss_File:   2787 kB
Pss_Shmem:111342 kB
Shared_Clean:   6636 kB
Shared_Dirty:  45928 kB
Private_Clean:  1664 kB
Private_Dirty: 91980 kB
Referenced:   145532 kB
Anonymous:  2844 kB

Rss:  147736 kB
Pss:  103296 kB
Pss_Anon:   2430 kB
Pss_File:   3147 kB
Pss_Shmem: 97718 kB
Shared_Clean:   6992 kB
Shared_Dirty:  74056 kB
Private_Clean:  1984 kB
Private_Dirty: 64704 kB
Referenced:   147092 kB
Anonymous:  3224 kB

After creating 10.000 more tables (view shows 10.007 rows now), before 
and after calling "TABLE pg_stat_toast":

Rss:   13816 kB
Pss:4898 kB
Pss_Anon:   1314 kB
Pss_File:   1755 kB
Pss_Shmem:  1829 kB
Shared_Clean:   5972 kB
Shared_Dirty:   5760 kB
Private_Clean:   832 kB
Private_Dirty:  1252 kB
Referenced:13088 kB
Anonymous:  2124 kB

Rss:  126816 kB
Pss:   55213 kB
Pss_Anon:   5383 kB
Pss_File:   2615 kB
Pss_Shmem: 47215 kB
Shared_Clean:   6460 kB
Shared_Dirty: 113028 kB
Private_Clean:  1600 kB
Private_Dirty:  5728 kB
Referenced:   126112 kB
Anonymous:  6184 kB


That DB's stat-file is now 4.119.254 bytes (3.547.439 without track_toast).

After VACUUM ANALYZE, the size goes up to 5.919.812 (5.348.768).
The "100 tables" DBs' go to 97.910 (91.868) bytes.

In total:
$ du -s testd

Re: using extended statistics to improve join estimates

2021-12-13 Thread Tomas Vondra

On 11/22/21 02:23, Justin Pryzby wrote:

Your regression tests include two errors, which appear to be accidental, and
fixing the error shows that this case is being estimated poorly.

+-- try combining with single-column (and single-expression) statistics
+DROP STATISTICS join_test_2;
+ERROR:  statistics object "join_test_2" does not exist
...
+ERROR:  statistics object "join_stats_2" already exists



D'oh, what a silly mistake ...

You're right fixing the DROP STATISTICS results in worse estimate, but 
that's actually expected for a fairly simple reason. The join condition 
has expressions on both sides, and dropping the statistics means we 
don't have any MCV for the join_test_2 side. So the optimizer ends up 
not using the regular estimates, as if there were no extended stats.


A couple lines later the script creates an extended statistics on that 
expression alone, which fixes this. An expression index would do the 
trick too.


Attached is a patch fixing the test and also the issue reported by 
Zhihong Yu some time ago.



regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL CompanyFrom 616f7f3faa818ea89c4c1cecb9aa50dd9e4fe8e7 Mon Sep 17 00:00:00 2001
From: Tomas Vondra 
Date: Mon, 13 Dec 2021 14:05:17 +0100
Subject: [PATCH] Estimate joins using extended statistics

Use extended statistics (MCV) to improve join estimates. In general this
is similar to how we use regular statistics - we search for extended
statistics (with MCV) covering all join clauses, and if we find such MCV
on both sides of the join, we combine those two MCVs.

Extended statistics allow a couple additional improvements - e.g. if
there are baserel conditions, we can use them to restrict the part of
the MCVs combined. This means we're building conditional probability
distribution and calculating conditional probability

P(join clauses | baserel conditions)

instead of just P(join clauses).

The patch also allows combining regular and extended MCV - we don't need
extended MCVs on both sides. This helps when one of the tables does not
have extended statistics (e.g. because there are no correlations).
---
 src/backend/optimizer/path/clausesel.c|  63 +-
 src/backend/statistics/extended_stats.c   | 805 ++
 src/backend/statistics/mcv.c  | 754 
 .../statistics/extended_stats_internal.h  |  20 +
 src/include/statistics/statistics.h   |  12 +
 src/test/regress/expected/stats_ext.out   | 167 
 src/test/regress/sql/stats_ext.sql|  66 ++
 7 files changed, 1886 insertions(+), 1 deletion(-)

diff --git a/src/backend/optimizer/path/clausesel.c b/src/backend/optimizer/path/clausesel.c
index d263ecf082..709e92446b 100644
--- a/src/backend/optimizer/path/clausesel.c
+++ b/src/backend/optimizer/path/clausesel.c
@@ -50,6 +50,9 @@ static Selectivity clauselist_selectivity_or(PlannerInfo *root,
 			 JoinType jointype,
 			 SpecialJoinInfo *sjinfo,
 			 bool use_extended_stats);
+static inline bool treat_as_join_clause(PlannerInfo *root,
+		Node *clause, RestrictInfo *rinfo,
+		int varRelid, SpecialJoinInfo *sjinfo);
 
 /
  *		ROUTINES TO COMPUTE SELECTIVITIES
@@ -129,12 +132,53 @@ clauselist_selectivity_ext(PlannerInfo *root,
 	RangeQueryClause *rqlist = NULL;
 	ListCell   *l;
 	int			listidx;
+	bool		single_clause_optimization = true;
+
+	/*
+	 * The optimization of skipping to clause_selectivity_ext for single
+	 * clauses means we can't improve join estimates with a single join
+	 * clause but additional baserel restrictions. So we disable it when
+	 * estimating joins.
+	 *
+	 * XXX Not sure if this is the right way to do it, but more elaborate
+	 * checks would mostly negate the whole point of the optimization.
+	 * The (Var op Var) patch has the same issue.
+	 *
+	 * XXX An alternative might be making clause_selectivity_ext smarter
+	 * and make it use the join extended stats there. But that seems kinda
+	 * against the whole point of the optimization (skipping expensive
+	 * stuff) and it's making other parts more complex.
+	 *
+	 * XXX Maybe this should check if there are at least some restrictions
+	 * on some base relations, which seems important. But then again, that
+	 * seems to go against the idea of this check to be cheap. Moreover, it
+	 * won't work for OR clauses, which may have multiple parts but we still
+	 * see them as a single BoolExpr clause (it doesn't work later, though).
+	 */
+	if (list_length(clauses) == 1)
+	{
+		Node *clause = linitial(clauses);
+		RestrictInfo *rinfo = NULL;
+
+		if (IsA(clause, RestrictInfo))
+		{
+			rinfo = (RestrictInfo *) clause;
+			clause = (Node *) rinfo->clause;
+		}
+
+		single_clause_optimization
+			= !treat_as_join_clause(root, clause, rinfo, varRelid, sjinfo);
+	}
 
 	/*
 	 * If there's exactly one clause, just go directly to
 	 * clause_selec

Re: Probable memory leak with ECPG and AIX

2021-12-13 Thread talk to ben
Hi,

(I work with Guillaume on this case.)

On Sun, Dec 12, 2021 at 8:34 AM Noah Misch  wrote:

> That almost certainly means he's using a 32-bit binary with the default
> heap
> size.  To use more heap on AIX, build 64-bit or override the heap size.
> For
> example, "env LDR_CNTRL=MAXDATA=0x8000 ./a.out" gives 2GiB of heap.
> See
>
> https://www.postgresql.org/docs/devel/installation-platform-notes.html#INSTALLATION-NOTES-AIX
> for more ways to control heap size.  While that documentation focuses on
> the
> server, the same techniques apply to clients like your test program.
>
> That said, I don't know why your test program reaches 256MB on AIX.  On
> GNU/Linux, it uses a lot less.  What version of PostgreSQL provided your
> client libraries?
>

They use a 12.3 in production but have also tested on a  12.9 with the same
result.
We relayed your suggestion and will get back to you with the results.

Thanks for the input !


RE: Failed transaction statistics to measure the logical replication progress

2021-12-13 Thread osumi.takami...@fujitsu.com
On Monday, December 13, 2021 6:19 PM Amit Kapila  
wrote:
> On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
>  wrote:
> 
> Few questions and comments:
Thank you for your comments !

> 
> 1.
> The pg_stat_subscription_workers view will
> contain
> one row per subscription worker on which errors have occurred, for workers
> applying logical replication changes and workers handling the initial data
> -   copy of the subscribed tables.  The statistics entry is removed when the
> -   corresponding subscription is dropped.
> +   copy of the subscribed tables. Also, the row corresponding to the apply
> +   worker shows all transaction statistics of both types of workers on the
> +   subscription. The statistics entry is removed when the corresponding
> +   subscription is dropped.
> 
> Why did you choose to show stats for both types of workers in one row?
This is because if we have hundreds or thousands of tables for table sync,
we need to create many entries to cover them and store the entries for all 
tables.


> 2.
> + PGSTAT_MTYPE_SUBWORKERXACTEND,
>  } StatMsgType;
> 
> I don't think we comma with the last message type.
> 4.
>   /*
> + * Cumulative transaction statistics of subscription worker */
> + PgStat_Counter commit_count; PgStat_Counter error_count;
> + PgStat_Counter abort_count;
> +
> 
> I think it is better to keep the order of columns as commit_count, 
> abort_count,
> error_count in the entire patch.
Okay, I'll fix both points in the next version.

 
> 3.
> + Oid m_subrelid;
> +
> + /* necessary to determine column to increment */ LogicalRepMsgType
> + m_command;
> +
> +} PgStat_MsgSubWorkerXactEnd;
> 
> Is m_subrelid used in this patch? If not, why did you keep it?
Absolutely, this was a mistake when I took the decision to merge both stats
of table sync and apply worker.

> I think if you choose
> to show separate stats for table sync and apply worker then probably it will 
> be
> used.
Yeah, I'll fix this. Of course, after I could confirm that the idea for merging 
the
two types of workers stats was acceptable for you and others.

Best Regards,
Takamichi Osumi




Re: Question about 001_stream_rep.pl recovery test

2021-12-13 Thread Michael Paquier
On Fri, Dec 10, 2021 at 01:44:40PM -0800, David Zhang wrote:
> Inside the test script `src/test/recovery/t/001_stream_rep.pl`, a comment at
> line 30 says `my_backup` is "not mandatory",
> 
>  30 # Take backup of standby 1 (not mandatory, but useful to check if
>  31 # pg_basebackup works on a standby).
>  32 $node_standby_1->backup($backup_name);
> 
> however if remove the backup folder "my_backup" after line 32, something
> like below,

Well, the comment is not completely incorrect IMO.  In this context, I
read taht that we don't need a backup from a standby and we could just
take a backup from the primary instead.  If I were to fix something, I
would suggest to reword the comment as follows:
# Take backup of standby 1 (could be taken from the primary, but this
# is useful to check if pg_basebackup works on a standby).

> And then the test script takes another backup `my_backup_2`, but it seems
> this backup is not used anywhere.

This had better stay around, per commit f267c1c.  And as far as I can
see, we don't have an equivalent test in a different place, where
pg_basebackup runs on a standby while its *primary* is stopped.
--
Michael


signature.asc
Description: PGP signature


Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

2021-12-13 Thread Michael Paquier
On Mon, Dec 13, 2021 at 03:46:55PM +0530, Amit Kapila wrote:
> This is my understanding as well. I think here the point of Sawada-San
> is why to have additional for replorigin_session_origin_lsn in prepare
> code path? I think the way you have it in your patch is correct as
> well but it is probably better to keep the check only based on
> replorigin so as to keep this check consistent in all code paths.

Well, I don't think that it is a big deal one way or the other, as
we'd finish with InvalidXLogRecPtr for the LSN and 0 for the timestamp
anyway.  If both of you feel that just removing the assertion rather
than adding an extra check is better, that's fine by me :)
--
Michael


signature.asc
Description: PGP signature


Re: isolationtester: add session name to application name

2021-12-13 Thread Michael Paquier
On Fri, Dec 10, 2021 at 05:20:52PM -0800, Andres Freund wrote:
> These days isolationtester.c already prefixes log output with the session
> name. How about doing the same for application_name? It's a *tad* more
> complicated than I'd like because isolationtester.c currently doesn't know the
> name of the test its executing.

+1 for the idea.  Maybe it could be backpatched?  It could be really
useful to have the same amount of details across all the stable
branches to ease any future backpatch of a test.  It does not seem to
me that many people would rely much on application_name in out-of-core
test, but if that's the case such tests would suddenly break after a
the next minor upgrade.

> As attached this appends "control connection" for the control connection, but
> perhaps we should just not append anything for that?

Keeping "control connection" seems is fine for me for these.

> +  * easier to map spec file sesions to log output and

One s/sesions/sessions/ here.
--
Michael


signature.asc
Description: PGP signature


Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

2021-12-13 Thread Amit Kapila
On Mon, Dec 13, 2021 at 2:00 PM Michael Paquier  wrote:
>
> On Mon, Dec 13, 2021 at 04:30:36PM +0900, Masahiko Sawada wrote:
> > Why do we check if replorigin_session_origin_lsn is not invalid data
> > only when PREPARE TRANSACTION?
>
> Well, it does not matter for the case of PREPARE TRANSACTION, does it?
> we would include values for the the origin LSN and timestamp in
> any case as these are fixed in the 2PC file header.
>
> > Looking at commit and rollback code, we
> > don't have assertions or checks that check
> > replorigin_session_origin_lsn/timestamp is valid data. So it looks
> > like we accept also invalid data in those cases since
> > replorigin_advance doesn’t move LSN backward while applying a commit
> > or rollback record even if LSN is invalid. The same is true for
> > PREPARE records, i.g., even if replication origin LSN is invalid, it
> > doesn't go backward. If replication origin LSN and timestamp in commit
> > record and rollback record must be valid data too, I think we should
> > similar checks for commit and rollback code and I think the assertions
> > will fail in the case I reported before[1].
>
> It seems to me that the origin LSN and timestamp are optional, so as
> one may choose to not call pg_replication_origin_xact_setup() (as said
> in my first message), and we would not require more sanity checks when
> advancing the replication origin in the commit and rollback code
> paths.
>

This is my understanding as well. I think here the point of Sawada-San
is why to have additional for replorigin_session_origin_lsn in prepare
code path? I think the way you have it in your patch is correct as
well but it is probably better to keep the check only based on
replorigin so as to keep this check consistent in all code paths.

-- 
With Regards,
Amit Kapila.




Re: Optionally automatically disable logical replication subscriptions on error

2021-12-13 Thread vignesh C
On Mon, Dec 6, 2021 at 4:22 PM osumi.takami...@fujitsu.com
 wrote:
>
> On Monday, December 6, 2021 1:16 PM Greg Nancarrow  
> wrote:
> > On Sat, Dec 4, 2021 at 12:20 AM osumi.takami...@fujitsu.com
> >  wrote:
> > >
> > > Hi, I've made a new patch v11 that incorporated suggestions described
> > above.
> > >
> >
> > Some review comments for the v11 patch:
> Thank you for your reviews !
>
> > doc/src/sgml/ref/create_subscription.sgml
> > (1) Possible wording improvement?
> >
> > BEFORE:
> > +  Specifies whether the subscription should be automatically disabled
> > + if replicating data from the publisher triggers errors. The default
> > + is false.
> > AFTER:
> > +  Specifies whether the subscription should be automatically disabled
> > + if any errors are detected by subscription workers during data
> > + replication from the publisher. The default is false.
> Fixed.
>
> > src/backend/replication/logical/worker.c
> > (2) WorkerErrorRecovery comments
> > Instead of:
> >
> > + * As a preparation for disabling the subscription, emit the error,
> > + * handle the transaction and clean up the memory context of
> > + * error. ErrorContext is reset by FlushErrorState.
> >
> > why not just say:
> >
> > + Worker error recovery processing, in preparation for disabling the
> > + subscription.
> >
> > And then comment the function's code lines:
> >
> > e.g.
> >
> > /* Emit the error */
> > ...
> > /* Abort any active transaction */
> > ...
> > /* Reset the ErrorContext */
> > ...
> Agreed. Fixed.
>
> > (3) DisableSubscriptionOnError return
> >
> > The "if (!subform->subdisableonerr)" block should probably first:
> >heap_freetuple(tup);
> >
> > (regardless of the fact the only current caller will proc_exit anyway)
> Fixed.
>
> > (4) did_error flag
> >
> > I think perhaps the previously-used flag name "disable_subscription"
> > is better, or maybe "error_recovery_done".
> > Also, I think it would look better if it was set AFTER
> > WorkerErrorRecovery() was called.
> Adopted error_recovery_done
> and changed its places accordingly.
>
> > (5) DisableSubscriptionOnError LOG message
> >
> > This version of the patch removes the LOG message:
> >
> > + ereport(LOG,
> > + errmsg("logical replication subscription \"%s\" will be disabled due
> > to error: %s",
> > +MySubscription->name, edata->message));
> >
> > Perhaps a similar error message could be logged prior to EmitErrorReport()?
> >
> > e.g.
> >  "logical replication subscription \"%s\" will be disabled due to an error"
> Added.
>
> I've attached the new version v12.

Thanks for the updated patch, few comments:
1) This is not required as it is not used in the caller.
+++ b/src/backend/replication/logical/launcher.c
@@ -132,6 +132,7 @@ get_subscription_list(void)
sub->dbid = subform->subdbid;
sub->owner = subform->subowner;
sub->enabled = subform->subenabled;
+   sub->disableonerr = subform->subdisableonerr;
sub->name = pstrdup(NameStr(subform->subname));
/* We don't fill fields we are not interested in. */

2) Should this be changed:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription
+   worker detects any errors
+  
+ 
To:
+   subdisableonerr bool
+  
+  
+   If true, the subscription will be disabled when subscription's
+   worker detects any errors
+  
+ 

3) The last line can be slightly adjusted to keep within 80 chars:
+  Specifies whether the subscription should be automatically disabled
+  if any errors are detected by subscription workers during data
+  replication from the publisher. The default is
false.

4) Similarly this too can be handled:
--- a/src/backend/catalog/system_views.sql
+++ b/src/backend/catalog/system_views.sql
@@ -1259,7 +1259,7 @@ REVOKE ALL ON pg_replication_origin_status FROM public;
 -- All columns of pg_subscription except subconninfo are publicly readable.
 REVOKE ALL ON pg_subscription FROM public;
 GRANT SELECT (oid, subdbid, subname, subowner, subenabled, subbinary,
-  substream, subtwophasestate, subslotname,
subsynccommit, subpublications)
+  substream, subtwophasestate, subdisableonerr,
subslotname, subsynccommit, subpublications)
 ON pg_subscription TO public;

5) Since disabling subscription code is common in and else, can we
move it below:
+   if (MySubscription->disableonerr)
+   {
+   WorkerErrorRecovery();
+   error_recovery_done = true;
+   }
+   else
+   {
+   /*
+* Some work in error recovery work is
done. Switch to the old
+* memory context and rethrow.
+*/
+   MemoryCon

Re: Remove pg_strtouint64(), use strtoull() directly

2021-12-13 Thread Peter Eisentraut

On 10.12.21 16:25, Tom Lane wrote:

Our experience with the variable size of "long" has left a sufficiently
bad taste in my mouth that I'm not enthused about adding hard-wired
assumptions that "long long" is identical to int64.  So this seems like
it's going in the wrong direction, and giving up portability that we
might want back someday.


What kind of scenario do you have in mind?  Someone making their long 
long int 128 bits?






Re: Failed transaction statistics to measure the logical replication progress

2021-12-13 Thread Amit Kapila
On Tue, Dec 7, 2021 at 3:12 PM osumi.takami...@fujitsu.com
 wrote:
>

Few questions and comments:

1.
The pg_stat_subscription_workers view will contain
one row per subscription worker on which errors have occurred, for workers
applying logical replication changes and workers handling the initial data
-   copy of the subscribed tables.  The statistics entry is removed when the
-   corresponding subscription is dropped.
+   copy of the subscribed tables. Also, the row corresponding to the apply
+   worker shows all transaction statistics of both types of workers on the
+   subscription. The statistics entry is removed when the corresponding
+   subscription is dropped.

Why did you choose to show stats for both types of workers in one row?

2.
+ PGSTAT_MTYPE_SUBWORKERXACTEND,
 } StatMsgType;

I don't think we comma with the last message type.

3.
+ Oid m_subrelid;
+
+ /* necessary to determine column to increment */
+ LogicalRepMsgType m_command;
+
+} PgStat_MsgSubWorkerXactEnd;

Is m_subrelid used in this patch? If not, why did you keep it? I think
if you choose to show separate stats for table sync and apply worker
then probably it will be used.

4.
  /*
+ * Cumulative transaction statistics of subscription worker
+ */
+ PgStat_Counter commit_count;
+ PgStat_Counter error_count;
+ PgStat_Counter abort_count;
+

I think it is better to keep the order of columns as commit_count,
abort_count, error_count in the entire patch.

-- 
With Regards,
Amit Kapila.




WIN32 pg_import_system_collations

2021-12-13 Thread Juan José Santamaría Flecha
I want to propose an implementation of pg_import_system_collations() for
WIN32 using EnumSystemLocalesEx() [1], which is available from Windows
Server 2008 onwards.

The patch includes a test emulating that of collate.linux.utf8, but for
Windows-1252. The main difference is that it doesn't have the tests for
Turkish dotted and undotted 'i', since that locale is WIN1254.

I am opening an item in the commitfest for this.

[1]
https://docs.microsoft.com/en-us/windows/win32/api/winnls/nf-winnls-enumsystemlocalesex

Regards,

Juan José Santamaría Flecha
From ddf3114e82dc32edb59278d9ed2cb995222a5adf Mon Sep 17 00:00:00 2001
From: Juan Jose Santamaria Flecha 
Date: Fri, 10 Dec 2021 17:44:15 -0500
Subject: [PATCH] WIN32 pg_import_system_collations . Adds the
 pg_import_system_collations() functionality for Windows newer than 2008 using
 EnumSystemLocalesEx(). Also adding a test emulating collate.linux.utf8 but
 for Windows-1252

---
 src/backend/commands/collationcmds.c   |  197 +++-
 .../regress/expected/collate.windows.win1252.out   | 1019 
 .../regress/expected/collate.windows.win1252_1.out |   12 +
 src/test/regress/parallel_schedule |2 +-
 src/test/regress/sql/collate.windows.win1252.sql   |  418 
 src/tools/msvc/vcregress.pl|3 +-
 6 files changed, 1599 insertions(+), 52 deletions(-)
 create mode 100644 src/test/regress/expected/collate.windows.win1252.out
 create mode 100644 src/test/regress/expected/collate.windows.win1252_1.out
 create mode 100644 src/test/regress/sql/collate.windows.win1252.sql

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 53fc579..c286113 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -522,6 +522,128 @@ get_icu_locale_comment(const char *localename)
 }
 #endif			/* USE_ICU */
 
+/* will we use EnumSystemLocalesEx in pg_import_system_collations? */
+#if defined(WIN32) && _WIN32_WINNT >= 0x0600
+#define ENUM_SYSTEM_LOCALE
+#endif
+
+#if defined(READ_LOCALE_A_OUTPUT) || defined(ENUM_SYSTEM_LOCALE)
+/*
+ * Create a collation if the input locale is valid for so.
+ * Also keeps track of the number of valid locales and collations created.
+ */
+static int
+CollationFromLocale(char *localebuf, int *nvalid, int *ncreated, int nspid)
+{
+	int			enc;
+	Oid			collid;
+
+	/*
+	 * Some systems have locale names that don't consist entirely of
+	 * ASCII letters (such as "bokmål" or "français").
+	 * This is pretty silly, since we need the locale itself to
+	 * interpret the non-ASCII characters. We can't do much with
+	 * those, so we filter them out.
+	 */
+	if (!pg_is_ascii(localebuf))
+	{
+		elog(DEBUG1, "skipping locale with non-ASCII name: \"%s\"", localebuf);
+		return -1;
+	}
+
+	enc = pg_get_encoding_from_locale(localebuf, false);
+	if (enc < 0)
+	{
+		elog(DEBUG1, "skipping locale with unrecognized encoding: \"%s\"",
+			 localebuf);
+		return -1;
+	}
+	if (!PG_VALID_BE_ENCODING(enc))
+	{
+		elog(DEBUG1, "skipping locale with client-only encoding: \"%s\"", localebuf);
+		return -1;
+	}
+	if (enc == PG_SQL_ASCII)
+		return -1;		/* C/POSIX are already in the catalog */
+
+	/* count valid locales found in operating system */
+	*nvalid += 1;
+
+	/*
+	 * Create a collation named the same as the locale, but quietly
+	 * doing nothing if it already exists.  This is the behavior we
+	 * need even at initdb time, because some versions of "locale -a"
+	 * can report the same locale name more than once.  And it's
+	 * convenient for later import runs, too, since you just about
+	 * always want to add on new locales without a lot of chatter
+	 * about existing ones.
+	 */
+	collid = CollationCreate(localebuf, nspid, GetUserId(),
+			 COLLPROVIDER_LIBC, true, enc,
+			 localebuf, localebuf,
+			 get_collation_actual_version(COLLPROVIDER_LIBC, localebuf),
+			 true, true);
+	if (OidIsValid(collid))
+	{
+		*ncreated += 1;
+
+		/* Must do CCI between inserts to handle duplicates correctly */
+		CommandCounterIncrement();
+	}
+
+	return enc;
+}
+#endif			/* defined(READ_LOCALE_A_OUTPUT) || defined(ENUM_SYSTEM_LOCALE) */
+
+#ifdef ENUM_SYSTEM_LOCALE
+/* Parameter to be passed to the callback function win32_read_locale() */
+typedef struct
+{
+	int		   *ncreated;
+	int		   *nvalid;
+	Oid			nspid;
+} CollParam;
+
+/*
+ * Callback function for EnumSystemLocalesEx() in pg_import_system_collations()
+ * Creates a collation for every valid locale.
+ */
+BOOL CALLBACK
+win32_read_locale(LPWSTR pStr, DWORD dwFlags, LPARAM lparam)
+{
+	CollParam  *param = (CollParam *) lparam;
+	char		localebuf[NAMEDATALEN];
+	int			result;
+	char	   *hyphen;
+
+	(void) dwFlags;
+
+	result = WideCharToMultiByte(CP_ACP, 0, pStr, -1, localebuf, NAMEDATALEN,
+ NULL, NULL);
+
+	if (result == 0)
+	{
+		if (GetLastError() == ERROR_INSUFFICIENT_BUFFER)
+			elog(DEBUG1, "skipping locale with too-long name: \"%s\"", localebuf);

Re: Assertion failure with replication origins and PREPARE TRANSACTIOn

2021-12-13 Thread Michael Paquier
On Mon, Dec 13, 2021 at 04:30:36PM +0900, Masahiko Sawada wrote:
> Why do we check if replorigin_session_origin_lsn is not invalid data
> only when PREPARE TRANSACTION?

Well, it does not matter for the case of PREPARE TRANSACTION, does it?
we would include values for the the origin LSN and timestamp in
any case as these are fixed in the 2PC file header.

> Looking at commit and rollback code, we
> don't have assertions or checks that check
> replorigin_session_origin_lsn/timestamp is valid data. So it looks
> like we accept also invalid data in those cases since
> replorigin_advance doesn’t move LSN backward while applying a commit
> or rollback record even if LSN is invalid. The same is true for
> PREPARE records, i.g., even if replication origin LSN is invalid, it
> doesn't go backward. If replication origin LSN and timestamp in commit
> record and rollback record must be valid data too, I think we should
> similar checks for commit and rollback code and I think the assertions
> will fail in the case I reported before[1].

It seems to me that the origin LSN and timestamp are optional, so as
one may choose to not call pg_replication_origin_xact_setup() (as said
in my first message), and we would not require more sanity checks when
advancing the replication origin in the commit and rollback code
paths.  Let's see what others think here.
--
Michael


signature.asc
Description: PGP signature