Re: Add new option 'all' to pg_stat_reset_shared()

2023-11-15 Thread Michael Paquier
On Wed, Nov 15, 2023 at 04:25:14PM +0900, Michael Paquier wrote:
> Other than that, it looks OK.

Tweaked the queries of this one slightly, and applied.  So I think
that we are now good for this thread.  Thanks, all!
--
Michael


signature.asc
Description: PGP signature


Re: Infinite Interval

2023-11-15 Thread Ashutosh Bapat
On Tue, Nov 14, 2023 at 4:39 PM Dean Rasheed  wrote:
>
> On Thu, 9 Nov 2023 at 12:49, Dean Rasheed  wrote:
> >
> > OK, I have pushed 0001 and 0002. Here's the remaining (main) patch.
> >
>
> OK, I have now pushed the main patch.

Thanks a lot Dean.


-- 
Best Wishes,
Ashutosh Bapat




Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-15 Thread Amul Sul
On Wed, Nov 15, 2023 at 9:26 PM Nathan Bossart 
wrote:

> On Wed, Nov 15, 2023 at 09:27:18AM +0530, Amul Sul wrote:
> > Nevermind, I usually use git apply or git am, here are those errors:
> >
> > PG/ - (master) $ git apply
> ~/Downloads/retire_compatibility_macro_v1.patch
> > error: patch failed: src/backend/access/brin/brin.c:297
> > error: src/backend/access/brin/brin.c: patch does not apply
>
> I wonder if your mail client is modifying the patch.  Do you have the same
> issue if you download it from the archives?
>

Yes, you are correct. Surprisingly, the archive version applied cleanly.

Gmail is doing something, I usually use web login on chrome browser,  I
never
faced such issues with other's patches.  Anyway, will try both the versions
next
time for the same kind of issue, sorry for the noise.

Regards,
Amul


Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-15 Thread Michael Paquier
On Wed, Nov 15, 2023 at 12:21:40PM +0100, Alvaro Herrera wrote:
> On 2023-Nov-15, Michael Paquier wrote:
> Oh, I think you're overthinking what my comment was.  I was saying, just
> name it "InjectionPointsHash".  Since there appears to be no room for
> another hash table for injection points, then there's no need to specify
> that this one is the ByName hash.  I couldn't think of any other way to
> organize the injection points either.

Aha, OK.  No problem, this was itching me as well but I didn't see an
argument with changing these names, so I've renamed things a bit more.

>> Yes, still not something that's required in the core APIs or an
>> initial batch.
> 
> I agree that we can do the easy thing first and build it up later.  I
> just hope we don't get too wedded on the current interface because of
> lack of time in the current release that we get stuck with it.

One thing that I assume we will need with more advanced testing is the
possibility to pass down one (for a structure of data) or more
arguments to the callback a point is attached to.  For that, it is
possible to add more macros, like INJECTION_POINT_1ARG,
INJECTION_POINT_ARG(), etc. that use some (void *) pointers.  It would
be possible to add that even in stable branches, as need arises,
changing the structure of the shmem hash table if required to control
the way a callback is run.

At the end, I suspect that it is one of these things where we'll need
to adapt depending on what people want to do with this stuff.  FWIW, I
can do already quite a bit with this minimalistic design and an
external extension.  Custom wait events are also very handy to monitor
a wait.

>> I am not sure that it is a good idea to enforce a specific conditional
>> logic in the backend core code.
> 
> Agreed, let's get more experience on what other types of tests people
> want to build, and how are things going to interact with each other.

I've worked more on polishing the core facility today, with 0001
introducing the backend-side facility.  One thing that I mentioned
lacking is a local cache for processes so as they don't load an
external callback more than once if run.  So I have added this local
cache.  When a point is scanned but not found, a previous cache entry
is cleared if any (this leaks but we don't have a way to unload stuff,
and for testing that's not a big deal).  I've renamed the routines to
use attach and detach as terms, and adopted the name you've suggested
for the macro.  The names around the hash table and its entries have
been changed to what you've suggested.  You are right, that's more
intuitive.

0002 is the test module for the basics, split into its own patch, with
a couple of tests for the local cache part.  0003 and 0004 have been
adjusted with the new SQL functions.  At the end, I'd like to propose
0004 as it's been a PITA for me and I don't want to break this case
again.  It needs more work and can be cheaper.  One more argument in
favor of it is the addition of condition variables to wait and wake
points (perhaps with even more variables?) in the test module.

If there is interest for 0003, I'm OK to work more on it as well, but
that's less important IMV.

Thoughts and comments are welcome, with a v4 series attached.
--
Michael
From 8febfda427ae773dd3c4c66ca9c932b3fe4cbc10 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 16 Nov 2023 14:41:41 +0900
Subject: [PATCH v4 1/4] Add backend facility for injection points

This adds a set of routines allowing developers to attach, detach and
run custom code based on arbitrary code paths set with a centralized
macro called INJECTION_POINT().  Injection points are registered in a
shared hash table.  Processes also use a local cache to over loading
callbacks more than necessary, cleaning up their cache if a callback has
found to be removed.
---
 src/include/pg_config.h.in|   3 +
 src/include/utils/injection_point.h   |  36 ++
 src/backend/storage/ipc/ipci.c|   3 +
 src/backend/storage/lmgr/lwlocknames.txt  |   1 +
 .../utils/activity/wait_event_names.txt   |   1 +
 src/backend/utils/misc/Makefile   |   1 +
 src/backend/utils/misc/injection_point.c  | 349 ++
 src/backend/utils/misc/meson.build|   1 +
 doc/src/sgml/installation.sgml|  30 ++
 doc/src/sgml/xfunc.sgml   |  56 +++
 configure |  34 ++
 configure.ac  |   7 +
 meson.build   |   1 +
 meson_options.txt |   3 +
 src/Makefile.global.in|   1 +
 src/tools/pgindent/typedefs.list  |   2 +
 16 files changed, 529 insertions(+)
 create mode 100644 src/include/utils/injection_point.h
 create mode 100644 src/backend/utils/misc/injection_point.c

diff --git a/src/include/pg_config.h.in b/src/include/pg_config.h.in
index d8a2985567..7a976821e5 100644
--- 

Re: Synchronizing slots from primary to standby

2023-11-15 Thread shveta malik
On Tue, Nov 14, 2023 at 7:56 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 11/13/23 2:57 PM, Zhijie Hou (Fujitsu) wrote:
> > On Friday, November 10, 2023 4:16 PM Drouvot, Bertrand 
> >  wrote:
> >> Yeah good point, agree to just error out in all the case then (if we 
> >> discard the
> >> sync_ reserved wording proposal, which seems to be the case as probably not
> >> worth the extra work).
> >
> > Thanks for the discussion!
> >
> > Here is the V33 patch set which includes the following changes:
>
> Thanks for working on it!
>
> >
> > 1) Drop slots with state 'i' in promotion flow after we shut down 
> > WalReceiver.
>
> @@ -3557,10 +3558,15 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool 
> randAccess,
>   * this only after failure, so when you promote, we still
>   * finish replaying as much as we can from archive and
>   * pg_wal before failover.
> +*
> +* Drop the slots for which sync is initiated but not yet
> +* completed i.e. they are still waiting for the primary
> +* server to catch up.
>   */
>  if (StandbyMode && CheckForStandbyTrigger())
>  {
>  XLogShutdownWalRcv();
> +   slotsync_drop_initiated_slots();
>  return XLREAD_FAIL;
>  }
>
> I had a closer look and it seems this is not located at the right place.
>
> Indeed, it's added here:
>
> switch (currentSource)
> {
> case XLOG_FROM_ARCHIVE:
> case XLOG_FROM_PG_WAL:
>
> While in our case we are in
>
> case XLOG_FROM_STREAM:
>
> So I think we should move slotsync_drop_initiated_slots() in the
> XLOG_FROM_STREAM case. Maybe before shutting down the sync slot worker?
> (the TODO item number 2 you mentioned up-thread)
>
> BTW in order to prevent any corner case, would'nt also be better to
>
> replace:
>
> +   /*
> +* Do not allow consumption of a "synchronized" slot until the standby
> +* gets promoted.
> +*/
> +   if (RecoveryInProgress() && (slot->data.sync_state != 
> SYNCSLOT_STATE_NONE))
>
> with something like:
>
> if ((RecoveryInProgress() && (slot->data.sync_state != SYNCSLOT_STATE_NONE)) 
> || slot->data.sync_state == SYNCSLOT_STATE_INITIATED)
>
> to ensure slots in 'i' case can never be used?
>

Yes, it makes sense. WIll do it.

> Regards,
>
> --
> Bertrand Drouvot
> PostgreSQL Contributors Team
> RDS Open Source Databases
> Amazon Web Services: https://aws.amazon.com




RE: Random pg_upgrade test failure on drongo

2023-11-15 Thread Hayato Kuroda (Fujitsu)
Dear hackers,
 
> While tracking a buildfarm, I found that drongo failed the test
> pg_upgrade/003_logical_slots [1].
> A strange point is that the test passed in the next iteration. Currently I'm 
> not
> sure the reason, but I will keep my eye for it and will investigate if it
> happens again.
 
This email just tells an update. We found that fairywren was also failed due to
the same reason [2]. It fails inconsistently, but there might be a bad thing on
windows. I'm now trying to reproduce with my colleagues to analyze more detail.
Also, working with Andrew for getting logs emitted during the upgrade.
I will continue to keep on my eye.
 
[2]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren&dt=2023-11-08%2010%3A22%3A45

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: remaining sql/json patches

2023-11-15 Thread Amit Langote
Hi Erik,

On Thu, Nov 16, 2023 at 13:52 Erik Rijkers  wrote:

> Op 11/15/23 om 14:00 schreef Amit Langote:
> > Hi,
>
> [..]
>
> > Attached updated patch.  The version of 0001 that I posted on Oct 11
> > to add the error-safe version of CoerceViaIO contained many
> > unnecessary bits that are now removed.
> >
> > --
> > Thanks, Amit Langote
> > EDB: http://www.enterprisedb.com
>
>  > [v24-0001-Add-soft-error-handling-to-some-expression-nodes.patch]
>  > [v24-0002-Add-soft-error-handling-to-populate_record_field.patch]
>  > [v24-0003-SQL-JSON-query-functions.patch]
>  > [v24-0004-JSON_TABLE.patch]
>  > [v24-0005-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch]
>
> Hi Amit,
>
> Here is a statement that seems to gobble up all memory and to totally
> lock up the machine. No ctrl-C - only a power reset gets me out of that.
> It was in one of my tests, so it used to work:
>
> select json_query(
>  jsonb '"[3,4]"'
>, '$[*]' returning bigint[] empty object on error
> );
>
> Can you have a look?


Wow, will look. Thanks.

>


Re: remaining sql/json patches

2023-11-15 Thread Erik Rijkers

Op 11/15/23 om 14:00 schreef Amit Langote:

Hi,


[..]


Attached updated patch.  The version of 0001 that I posted on Oct 11
to add the error-safe version of CoerceViaIO contained many
unnecessary bits that are now removed.

--
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


> [v24-0001-Add-soft-error-handling-to-some-expression-nodes.patch]
> [v24-0002-Add-soft-error-handling-to-populate_record_field.patch]
> [v24-0003-SQL-JSON-query-functions.patch]
> [v24-0004-JSON_TABLE.patch]
> [v24-0005-Claim-SQL-standard-compliance-for-SQL-JSON-featu.patch]

Hi Amit,

Here is a statement that seems to gobble up all memory and to totally 
lock up the machine. No ctrl-C - only a power reset gets me out of that. 
It was in one of my tests, so it used to work:


select json_query(
jsonb '"[3,4]"'
  , '$[*]' returning bigint[] empty object on error
);

Can you have a look?

Thanks,

Erik








Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Peter Smith
On Thu, Nov 16, 2023 at 12:18 PM Amit Kapila  wrote:
>
> On Thu, Nov 16, 2023 at 3:48 AM Peter Smith  wrote:
> >
> > ~
> >
> > SUGGESTION (#1a and #1b)
> >
> > ereport(log_replication_commands ? LOG : DEBUG1,
> > errmsg(SlotIsLogical(s)
> >? "acquired logical replication slot \"%s\""
> >: "acquired physical replication slot \"%s\"",
> >NameStr(s->data.name)));
> >
> > ~~~
> >
>
> Personally, I prefer the way Bharath had in his patch. Did you see any
> preferred way in the existing code?

Not really. I think the errmsg combined with ternary is not so common.
I couldn't find many examples, so I wouldn't try to claim anything is
a "preferred" way

There are some existing examples, like Bharath had:

ereport(NOTICE,
(errcode(ERRCODE_DUPLICATE_OBJECT),
 collencoding == -1
 ? errmsg("collation \"%s\" already exists, skipping",
  collname)
 : errmsg("collation \"%s\" for encoding \"%s\" already
exists, skipping",
  collname, pg_encoding_to_char(collencoding;

OTOH, when there are different numbers of substitution parameters in
each of the errmsg like that, you don't have much choice but to do it
that way.

I am fine with whatever is chosen -- I was only offering an alternative.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Peter Smith
On Thu, Nov 16, 2023 at 12:36 PM Amit Kapila  wrote:
>
> On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera  
> wrote:
> >
> > Translation-wise, this doesn't work, because you're building a string.
> > There's no reason to think that the words "logical" and "physical"
> > should stay untranslated; the message would make no sense, or at least
> > would be very ugly.
> >
> > You should do something like
> >
> > if (am_walsender)
> > {
> > ereport(log_replication_commands ? LOG : DEBUG1,
> > SlotIsLogical(s) ? errmsg("acquired logical replication 
> > slot \"%s\"", NameStr(s->data.name)) :
> > errmsg("acquired physical replication slot \"%s\"", 
> > NameStr(s->data.name)));
> > }
> >
> > (Obviously, lose the "translator:" comments since they are unnecessary)
> >
> >
> > If you really want to keep the "logical"/"physical" word untranslated,
> > you need to split it out of the sentence somehow.  But it would be
> > really horrible IMO.  Like
> >
> > errmsg("acquired replication slot \"%s\" of type \"%s\"",
> >NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")
> >
>
> Thanks for the suggestion. I would like to clarify on this a bit. What
> do exactly mean by splitting out of the sentence? For example, in one
> of the existing messages:
>
> ereport(LOG,
> /* translator: %s is SIGKILL or SIGABRT */
> (errmsg("issuing %s to recalcitrant children",
> send_abort_for_kill ? "SIGABRT" : "SIGKILL")));
>
> Do here words SIGABRT/SIGKILL remain untranslated due to the
> translator's comment? I thought this was similar to the message being
> proposed but seems like this message construction follows translation
> rules better.
>

IIUC, that example is different because "SIGABRT" / "SIGKILL" are not
real words, so you don't want the translator to attempt to translate
them.You want them to appear in the message as-is.

OTOH in this patch "logical" and "physical" are just normal English
words that should be translated as part of the original message.
e.g. like in these similar messages:
- msgid "database \"%s\" is used by an active logical replication slot"
- msgstr "la base de données « %s » est utilisée par un slot de
réplication logique actif"

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: pg_upgrade and logical replication

2023-11-15 Thread Peter Smith
Here are some review comments for patch v14-0001

==
src/backend/utils/adt/pg_upgrade_support.c

1. binary_upgrade_replorigin_advance

+ /* lock to prevent the replication origin from vanishing */
+ LockRelationOid(ReplicationOriginRelationId, RowExclusiveLock);
+ originid = replorigin_by_name(originname, false);

Use uppercase for the lock comment.

==
src/bin/pg_upgrade/check.c

2. check_for_subscription_state

> > + prep_status("Checking for subscription state");
> > +
> > + snprintf(output_path, sizeof(output_path), "%s/%s",
> > + log_opts.basedir,
> > + "subscription_state.txt");
> >
> > I felt this filename ought to be more like
> > 'subscriptions_with_bad_state.txt' because the current name looks like
> > a normal logfile with nothing to indicate that it is only for the
> > states of the "bad" subscriptions.
>
> I  have kept the file name intentionally shorted as we noticed that
> when the upgrade of the publisher patch used a longer name there were
> some buildfarm failures because of longer names.

OK, but how about some other short meaningful name like 'subs_invalid.txt'?

I also thought "state" in the original name was misleading because
this file contains not only subscriptions with bad 'state' but also
subscriptions with missing 'origin'.

~~~

3. check_new_cluster_logical_replication_slots

  int nslots_on_old;
  int nslots_on_new;
+ int nsubs_on_old = old_cluster.subscription_count;

I felt it might be better to make both these quantities 'unsigned' to
make it more obvious that there are no special meanings for negative
numbers.

~~~

4. check_new_cluster_logical_replication_slots

nslots_on_old = count_old_cluster_logical_slots();

~

IMO the 'nsubs_on_old' should be coded the same as above. AFAICT, this
is the only code where you are interested in the number of
subscribers, and furthermore, it seems you only care about that count
in the *old* cluster. This means the current implementation of
get_subscription_count() seems more generic than it needs to be and
that results in more unnecessary patch code. (I will repeat this same
review comment in the other relevant places).

SUGGESTION
nslots_on_old = count_old_cluster_logical_slots();
nsubs_on_old = count_old_cluster_subscriptions();

~~~

5.
+ /*
+ * Quick return if there are no logical slots and subscriptions to be
+ * migrated.
+ */
+ if (nslots_on_old == 0 && nsubs_on_old == 0)
  return;

/and subscriptions/and no subscriptions/

~~~

6.
- if (nslots_on_old > max_replication_slots)
+ if (nslots_on_old && nslots_on_old > max_replication_slots)
  pg_fatal("max_replication_slots (%d) must be greater than or equal
to the number of "
  "logical replication slots (%d) on the old cluster",
  max_replication_slots, nslots_on_old);

Neither nslots_on_old nor max_replication_slots can be < 0, so I don't
see why the additional check is needed here.
AFAICT "if (nslots_on_old > max_replication_slots)" acheives the same
thing that you want.

~~~

7.
+ if (nsubs_on_old && nsubs_on_old > max_replication_slots)
+ pg_fatal("max_replication_slots (%d) must be greater than or equal
to the number of "
+ "subscriptions (%d) on the old cluster",
+ max_replication_slots, nsubs_on_old);

Neither nsubs_on_old nor max_replication_slots can be < 0, so I don't
see why the additional check is needed here.
AFAICT "if (nsubs_on_old > max_replication_slots)" achieves the same
thing that you want.

==
src/bin/pg_upgrade/info.c

8. get_db_rel_and_slot_infos

+ if (cluster == &old_cluster)
+ get_subscription_count(cluster);
+

I felt this is unnecessary because you only want to know the
nsubs_on_old in one place and then only for the old cluster, so
calling this to set a generic attribute for the cluster is overkill.

~~~

9.
+/*
+ * Get the number of subscriptions in the old cluster.
+ */
+static void
+get_subscription_count(ClusterInfo *cluster)
+{
+ PGconn*conn;
+ PGresult   *res;
+
+ if (GET_MAJOR_VERSION(cluster->major_version) < 1700)
+ return;
+
+ conn = connectToServer(cluster, "template1");
+ res = executeQueryOrDie(conn,
+   "SELECT oid FROM pg_catalog.pg_subscription");
+
+ cluster->subscription_count = PQntuples(res);
+
+ PQclear(res);
+ PQfinish(conn);
+}

9a.
Currently, this is needed only for the old_cluster (like the function
comment implies), so the parameter is not required.

Also, AFAICT this number is only needed in one place
(check_new_cluster_logical_replication_slots) so IMO it would be
better to make lots of changes to simplify this code:
- change the function name to be like the other one. e.g.
count_old_cluster_subscriptions()
- function to return unsigned

SUGGESTION (something like this...)

unsigned
count_old_cluster_subscriptions(void)
{
  unsigned nsubs = 0;

  if (GET_MAJOR_VERSION(cluster->major_version) >= 1700)
  {
PGconn *conn = connectToServer(&old_cluster, "template1");
PGresult *res = executeQueryOrDie(conn,
"SELECT oid FROM pg_catalog.pg_subscription");
nsubs = PQntuples(res);

Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Amit Kapila
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera  wrote:
>
> Translation-wise, this doesn't work, because you're building a string.
> There's no reason to think that the words "logical" and "physical"
> should stay untranslated; the message would make no sense, or at least
> would be very ugly.
>
> You should do something like
>
> if (am_walsender)
> {
> ereport(log_replication_commands ? LOG : DEBUG1,
> SlotIsLogical(s) ? errmsg("acquired logical replication slot 
> \"%s\"", NameStr(s->data.name)) :
> errmsg("acquired physical replication slot \"%s\"", 
> NameStr(s->data.name)));
> }
>
> (Obviously, lose the "translator:" comments since they are unnecessary)
>
>
> If you really want to keep the "logical"/"physical" word untranslated,
> you need to split it out of the sentence somehow.  But it would be
> really horrible IMO.  Like
>
> errmsg("acquired replication slot \"%s\" of type \"%s\"",
>NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")
>

Thanks for the suggestion. I would like to clarify on this a bit. What
do exactly mean by splitting out of the sentence? For example, in one
of the existing messages:

ereport(LOG,
/* translator: %s is SIGKILL or SIGABRT */
(errmsg("issuing %s to recalcitrant children",
send_abort_for_kill ? "SIGABRT" : "SIGKILL")));

Do here words SIGABRT/SIGKILL remain untranslated due to the
translator's comment? I thought this was similar to the message being
proposed but seems like this message construction follows translation
rules better.

-- 
With Regards,
Amit Kapila.




Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Amit Kapila
On Thu, Nov 16, 2023 at 3:48 AM Peter Smith  wrote:
>
> ~
>
> SUGGESTION (#1a and #1b)
>
> ereport(log_replication_commands ? LOG : DEBUG1,
> errmsg(SlotIsLogical(s)
>? "acquired logical replication slot \"%s\""
>: "acquired physical replication slot \"%s\"",
>NameStr(s->data.name)));
>
> ~~~
>

Personally, I prefer the way Bharath had in his patch. Did you see any
preferred way in the existing code?

-- 
With Regards,
Amit Kapila.




Re: Remove MSVC scripts from the tree

2023-11-15 Thread Michael Paquier
On Wed, Nov 15, 2023 at 11:27:13AM +0100, Peter Eisentraut wrote:
> (Note, however, that your rebase didn't pick up commits e7814b40d0 and
> b41b1a7f49 that I did yesterday.  Please check that again.)

Indeed.  I need to absorb that properly.
--
Michael


signature.asc
Description: PGP signature


Re: Tab completion for CREATE TABLE ... AS

2023-11-15 Thread Michael Paquier
On Wed, Nov 15, 2023 at 05:26:58PM +0300, Gilles Darold wrote:
> Right, I don't know how I have missed the sql-createtableas page in the
> documentation.
> 
> Patched v2 fixes the keyword list, I have also sorted by alphabetical order
> the CREATE TABLE completion (AS was at the end of the list).
> 
> It has also been re-based on current master.

Fun.  It has failed to apply here.

Anyway, I can see that a comment update has been forgotten.  A second
thing is that it requires two more lines to add the query keywords for
the case where a CTAS has a list of column names.  I've added both
changes, and applied the patch on HEAD.  That's not all the patterns
possible, but this covers the most useful ones.
--
Michael


signature.asc
Description: PGP signature


Re: [HACKERS] Allow INSTEAD OF DELETE triggers to modify the tuple for RETURNING

2023-11-15 Thread jian he
On Fri, Nov 3, 2023 at 12:34 AM Marko Tiikkaja  wrote:
>
> I am now.  Thanks! :-)  Will try to keep an eye on the builds in the future.
>
> Attached v4 of the patch which should fix the issue.
>

doc seems to still have an issue.
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F45%2F4617

In the regress test, do we need to clean up the created object after we use it.
tested passed, looking at ExecIRInsertTriggers, your changes look sane.




Re: POC PATCH: copy from ... exceptions to: (was Re: VLDB Features)

2023-11-15 Thread jian he
On Thu, Nov 9, 2023 at 4:12 AM Tom Lane  wrote:
>
> Daniel Gustafsson  writes:
> >> On 8 Nov 2023, at 19:18, Tom Lane  wrote:
> >> I think an actually usable feature of this sort would involve
> >> copying all the failed lines to some alternate output medium,
> >> perhaps a second table with a TEXT column to receive the original
> >> data line.  (Or maybe an array of text that could receive the
> >> broken-down field values?)  Maybe we could dump the message info,
> >> line number, field name etc into additional columns.
>
> > I agree that the errors should be easily visible to the user in some way.  
> > The
> > feature is for sure interesting, especially in data warehouse type jobs 
> > where
> > dirty data is often ingested.
>
> I agree it's interesting, but we need to get it right the first time.
>
> Here is a very straw-man-level sketch of what I think might work.
> The option to COPY FROM looks something like
>
> ERRORS TO other_table_name (item [, item [, ...]])
>
> where the "items" are keywords identifying the information item
> we will insert into each successive column of the target table.
> This design allows the user to decide which items are of use
> to them.  I envision items like
>
> LINENO  bigint  COPY line number, counting from 1
> LINEtextraw text of line (after encoding conversion)
> FIELDS  text[]  separated, de-escaped string fields (the data
> that was or would be fed to input functions)
> FIELD   textname of troublesome field, if field-specific
> MESSAGE texterror message text
> DETAIL  texterror message detail, if any
> SQLSTATE text   error SQLSTATE code
>

just
SAVE ERRORS

automatically create a table to hold the error. (validate
auto-generated table name uniqueness, validate create privilege).
and the table will have the above related info. if no error then table
gets dropped.




Re: Add minimal C example and SQL registration example for custom table access methods.

2023-11-15 Thread Roberto Mello
Suggestion:

In the C example you added you mention in the comment:

+  /* Methods from TableAmRoutine omitted from example, but all
+ non-optional ones must be provided here. */

Perhaps you could provide a "see " to point the reader finding your 
example where he could find these non-optional methods he must provide?

Nitpicking a little: your patch appears to change more lines than it does, 
because it added line breaks earlier in the lines. I would generally avoid that 
unless there's good reason to do so.

Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM

2023-11-15 Thread Andres Freund
Hi,

On 2023-11-15 16:32:48 -0500, Robert Haas wrote:
> On Mon, Nov 13, 2023 at 8:26 PM Andres Freund  wrote:
> > I think this undersells the situation a bit. We right now do
> > FreeSpaceMapVacuumRange() for 8GB of data (VACUUM_FSM_EVERY_PAGES) in the 
> > main
> > fork, while holding an exclusive page level lock.
> 
> That sounds fairly horrific?

It's decidedly not great, indeed. I couldn't come up with a clear risk of
deadlock, but I wouldn't want to bet against there being a deadlock risk.

I think the rarity of it does ameliorate the performance issues to some
degree.

Thoughts on whether to backpatch? It'd probably be at least a bit painful,
there have been a lot of changes in the surrounding code in the last 5 years.

- Andres




Re: WaitEventSet resource leakage

2023-11-15 Thread Tom Lane
Heikki Linnakangas  writes:
> On 09/03/2023 20:51, Tom Lane wrote:
>> After further thought that seems like a pretty ad-hoc solution.
>> We probably can do no better in the back branches, but shouldn't
>> we start treating WaitEventSets as ResourceOwner-managed resources?
>> Otherwise, transient WaitEventSets are going to be a permanent
>> source of headaches.

> Let's change it so that it's always allocated in TopMemoryContext, but 
> pass a ResourceOwner instead:
> WaitEventSet *
> CreateWaitEventSet(ResourceOwner owner, int nevents)
> And use owner == NULL to mean session lifetime.

WFM.  (I didn't study your back-branch patch.)

regards, tom lane




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tom Lane
Thomas Munro  writes:
> Idea #1

> For output, which happens with sprintf(ptr, "%.15g%s", ...) in
> execute.c, perhaps we could use our in-tree Ryu routine instead?

> For input, which happens with strtod() in data.c, rats, we don't have
> a parser and I understand that it is not for the faint of heart

Yeah.  Getting rid of ecpg's use of uselocale() would certainly be
nice, but I'm not ready to add our own implementation of strtod()
to get there.

> Idea #2

> Perhaps we could use snprintf_l() and strtod_l() where available.
> They're not standard, but they are obvious extensions that NetBSD and
> Windows have, and those are the two systems for which we are doing
> grotty things in that code.

Oooh, shiny.  I do not see any man page for strtod_l, but I do see
that it's declared on mamba's host.  I wonder how long they've had it?
The man page for snprintf_l appears to be quite ancient, so we could
hope that strtod_l is available on all versions anyone cares about.

> That would amount to extending
> pg_locale.c's philosophy: either you must have uselocale(), or the
> full set of _l() functions (that POSIX failed to standardise, dunno
> what the history is behind that, seems weird).

Yeah.  I'd say the _l functions should be preferred over uselocale()
if available, but sadly they're not there on common systems.  (It
looks like glibc has strtod_l but not snprintf_l, which is odd.)

regards, tom lane




Re: WaitEventSet resource leakage

2023-11-15 Thread Heikki Linnakangas

(Alexander just reminded me of this off-list)

On 09/03/2023 20:51, Tom Lane wrote:

In [1] I wrote:


PG Bug reporting form  writes:

The following script:
[ leaks a file descriptor per error ]


Yeah, at least on platforms where WaitEventSets own kernel file
descriptors.  I don't think it's postgres_fdw's fault though,
but that of ExecAppendAsyncEventWait, which is ignoring the
possibility of failing partway through.  It looks like it'd be
sufficient to add a PG_CATCH or PG_FINALLY block there to make
sure the WaitEventSet is disposed of properly --- fortunately,
it doesn't need to have any longer lifespan than that one
function.


Here's a patch to do that. For back branches.


After further thought that seems like a pretty ad-hoc solution.
We probably can do no better in the back branches, but shouldn't
we start treating WaitEventSets as ResourceOwner-managed resources?
Otherwise, transient WaitEventSets are going to be a permanent
source of headaches.


Agreed. The current signature of CurrentWaitEventSet is:

WaitEventSet *
CreateWaitEventSet(MemoryContext context, int nevents)

Passing MemoryContext makes little sense when the WaitEventSet also 
holds file descriptors. With anything other than TopMemoryContext, you 
need to arrange for proper cleanup with PG_TRY-PG_CATCH or by avoiding 
ereport() calls. And once you've arrange for cleanup, the memory context 
doesn't matter much anymore.


Let's change it so that it's always allocated in TopMemoryContext, but 
pass a ResourceOwner instead:


WaitEventSet *
CreateWaitEventSet(ResourceOwner owner, int nevents)

And use owner == NULL to mean session lifetime.

--
Heikki Linnakangas
Neon (https://neon.tech)
From b9ea609855b838369cddb33e4045ac91603dd726 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Wed, 15 Nov 2023 23:44:56 +0100
Subject: [PATCH v1 1/1] Fix resource leak when a FDW's ForeignAsyncRequest
 function fails

If an error is thrown after calling CreateWaitEventSet(), the memory
of a WaitEventSet is free'd as it's allocated in the short-lived
memory context, but the file descriptor (on epoll- or kqueue-based
systems) or handles (on Windows) that it contains are leaked. Use
PG_TRY-FINALLY to ensure it gets freed.

In the passing, fix misleading comment on what the 'nevents' argument
to WaitEventSetWait means.

The added test doesn't check for leaking resources, so it passed even
before this commit. But at least it covers the code path.

Report by Alexander Lakhin, analysis and suggestion for the fix by
Tom Lane.

Discussion: https://www.postgresql.org/message-id/17828-122da8cba2323...@postgresql.org
Discussion: https://www.postgresql.org/message-id/472235.1678387...@sss.pgh.pa.us
---
 .../postgres_fdw/expected/postgres_fdw.out|  7 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  6 ++
 src/backend/executor/nodeAppend.c | 66 +++
 3 files changed, 50 insertions(+), 29 deletions(-)

diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out
index 64bcc66b8d..22cae37a1e 100644
--- a/contrib/postgres_fdw/expected/postgres_fdw.out
+++ b/contrib/postgres_fdw/expected/postgres_fdw.out
@@ -10809,6 +10809,13 @@ SELECT * FROM result_tbl ORDER BY a;
 (2 rows)
 
 DELETE FROM result_tbl;
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (1) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+ERROR:  relation "public.non_existent_table" does not exist
+CONTEXT:  remote SQL command: SELECT a, b, c FROM public.non_existent_table
+DROP FOREIGN TABLE async_p_broken;
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/contrib/postgres_fdw/sql/postgres_fdw.sql b/contrib/postgres_fdw/sql/postgres_fdw.sql
index 2d14eeadb5..075da4ff86 100644
--- a/contrib/postgres_fdw/sql/postgres_fdw.sql
+++ b/contrib/postgres_fdw/sql/postgres_fdw.sql
@@ -3607,6 +3607,12 @@ INSERT INTO result_tbl SELECT a, b, 'AAA' || c FROM async_pt WHERE b === 505;
 SELECT * FROM result_tbl ORDER BY a;
 DELETE FROM result_tbl;
 
+-- Test error handling, if accessing one of the foreign partitions errors out
+CREATE FOREIGN TABLE async_p_broken PARTITION OF async_pt FOR VALUES FROM (1) TO (10001)
+  SERVER loopback OPTIONS (table_name 'non_existent_table');
+SELECT * FROM async_pt;
+DROP FOREIGN TABLE async_p_broken;
+
 -- Check case where multiple partitions use the same connection
 CREATE TABLE base_tbl3 (a int, b int, c text);
 CREATE FOREIGN TABLE async_p3 PARTITION OF async_pt FOR VALUES FROM (3000) TO (4000)
diff --git a/src/backend/executor/nodeAppend.c b/src/backend/executor/nodeAppend.c
index 609df6b9e6..99818d3ebc 100644
--- a/src/backend/executor/nodeAppend.c
+++ b/src/backend/execut

Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Thomas Munro
On Thu, Nov 16, 2023 at 10:17 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > The other uses of uselocale() are in ECPG code that must
> > be falling back to the setlocale() path.  In other words, isn't it the
> > case that we don't require uselocale() to compile ECPG stuff, but it'll
> > probably crash or corrupt itself or give wrong answers if you push it
> > on NetBSD, so... uhh, really we do require it?
>
> Dunno.  mamba is getting through the ecpg regression tests okay,
> but we all know that doesn't prove a lot.  (AFAICS, ecpg only
> cares about this to the extent of not wanting an LC_NUMERIC
> locale where the decimal point isn't '.'.  I'm not sure that
> NetBSD supports any such locale anyway --- I think they're like
> OpenBSD in having only pro-forma locale support.)

Idea #1

For output, which happens with sprintf(ptr, "%.15g%s", ...) in
execute.c, perhaps we could use our in-tree Ryu routine instead?

For input, which happens with strtod() in data.c, rats, we don't have
a parser and I understand that it is not for the faint of heart (naive
implementation gets subtle things wrong, cf "How to read floating
point numbers accurately" by W D Clinger + whatever improvements have
happened in this space since 1990).

Idea #2

Perhaps we could use snprintf_l() and strtod_l() where available.
They're not standard, but they are obvious extensions that NetBSD and
Windows have, and those are the two systems for which we are doing
grotty things in that code.  That would amount to extending
pg_locale.c's philosophy: either you must have uselocale(), or the
full set of _l() functions (that POSIX failed to standardise, dunno
what the history is behind that, seems weird).




Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Peter Smith
Some minor comments for v17-0001.

==

1.
+   ereport(log_replication_commands ? LOG : DEBUG1,
+   SlotIsLogical(s)
+   /* translator: %s is name of the replication slot */
+   ? errmsg("acquired logical replication slot \"%s\"",
+NameStr(s->data.name))
+   : errmsg("acquired physical replication slot \"%s\"",
+NameStr(s->data.name)));

1a.
FWIW, if the ternary was inside the errmsg, there would be less code
duplication.

~

1b.
I searched HEAD code and did not find any "translator:" comments for
just ordinary slot name substitutions like these; AFAICT that comment
is not necessary anymore.

~

SUGGESTION (#1a and #1b)

ereport(log_replication_commands ? LOG : DEBUG1,
errmsg(SlotIsLogical(s)
   ? "acquired logical replication slot \"%s\""
   : "acquired physical replication slot \"%s\"",
   NameStr(s->data.name)));

~~~

2.
Ditto for the other ereport.

==
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: Popcount optimization using AVX512

2023-11-15 Thread Nathan Bossart
On Wed, Nov 15, 2023 at 08:27:57PM +, Shankaran, Akash wrote:
> AVX512 has light and heavy instructions. While the heavy AVX512
> instructions have clock frequency implications, the light instructions
> not so much. See [0] for more details. We captured EMON data for the
> benchmark used in this work, and see that the instructions are using the
> licensing level not meant for heavy AVX512 operations. This means the
> instructions for popcount : _mm512_popcnt_epi64(),
> _mm512_reduce_add_epi64() are not going to have any significant impact on
> CPU clock frequency.
>
> Clock frequency impact aside, we measured the same benchmark for gains on
> older Intel hardware and observe up to 18% better performance on Intel
> Icelake. On older intel hardware, the popcntdq 512 instruction is not
> present so it won’t work. If clock frequency is not affected, rest of
> workload should not be impacted in the case of mixed workloads. 

Thanks for sharing your analysis.

> Testing this on smaller block sizes < 8KiB shows that AVX512 compared to
> the current 64bit behavior shows slightly lower performance, but with a
> large variance. We cannot conclude much from it. The testing with ANALYZE
> benchmark by Nathan also points to no visible impact as a result of using
> AVX512. The gains on larger dataset is easily evident, with less
> variance.
>
> What are your thoughts if we introduce AVX512 popcount for smaller sizes
> as an optional feature initially, and then test it more thoroughly over
> time on this particular use case? 

I don't see any need to rush this.  At the very earliest, this feature
would go into v17, which doesn't enter feature freeze until April 2024.
That seems like enough time to complete any additional testing you'd like
to do.  However, if you are seeing worse performance with this patch, then
it seems unlikely that we'd want to proceed.

> Thoughts or feedback on the approach in the patch? This solution should
> not impact anyone who doesn’t use the feature i.e. AVX512. Open to
> additional ideas if this doesn’t seem like the right approach here.  

It's true that it wouldn't impact anyone not using the feature, but there's
also a decent chance that this code goes virtually untested.  As I've
stated elsewhere [0], I think we should ensure there's buildfarm coverage
for this kind of architecture-specific stuff.

[0] https://postgr.es/m/20230726043707.GB3211130%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM

2023-11-15 Thread Robert Haas
On Mon, Nov 13, 2023 at 8:26 PM Andres Freund  wrote:
> I think this undersells the situation a bit. We right now do
> FreeSpaceMapVacuumRange() for 8GB of data (VACUUM_FSM_EVERY_PAGES) in the main
> fork, while holding an exclusive page level lock.

That sounds fairly horrific?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: lazy_scan_heap() should release lock on buffer before vacuuming FSM

2023-11-15 Thread Melanie Plageman
On Tue, Nov 14, 2023 at 7:15 PM Andres Freund  wrote:
>
> Hi,
>
> On 2023-11-14 07:46:10 -0500, Melanie Plageman wrote:
> > > FreeSpaceMapVacuumRange()'s comment says:
> > >  * As above, but assume that only heap pages between start and end-1 
> > > inclusive
> > >  * have new free-space information, so update only the upper-level slots
> > >  * covering that block range.  end == InvalidBlockNumber is equivalent to
> > >  * "all the rest of the relation".
> > >
> > > So FreeSpaceMapVacuumRange(..., blkno) will not actually process the 
> > > "effects"
> > > of the RecordPageWithFreeSpace() above it - which seems confusing.
> >
> > Ah, so shall I pass blkno + 1 as end?
>
> I think there's no actual overflow danger, because MaxBlockNumber + 1 is
> InvalidBlockNumber, which scans the rest of the relation (i.e. exactly the
> intended block). Perhaps worth noting?

Attached

> > > =# DELETE FROM copytest_0;
> > > =# VACUUM (VERBOSE) copytest_0;
> > > ...
> > > INFO:  0: table "copytest_0": truncated 146264 to 110934 pages
> > > ...
> > > tuples missed: 5848 dead from 89 pages not removed due to cleanup lock 
> > > contention
> > > ...
> > >
> > > A bit of debugging later I figured out that this is due to the background
> > > writer. If I SIGSTOP bgwriter, the whole relation is truncated...
> >
> > That's a bit sad. But isn't that what you would expect bgwriter to do?
> > Write out dirty buffers? It doesn't know that those pages consist of
> > only dead tuples and that you will soon truncate them.
>
> I think the issue more that it's feels wrong that a pin by bgwriter blocks
> vacuum cleaning up. I think the same happens with on-access pruning - where
> it's more likely for bgwriter to focus on those pages. Checkpointer likely
> causes the same. Even normal backends can cause this while writing out the
> page.
>
> It's not like bgwriter/checkpointer would actually have a problem if the page
> were pruned - the page is locked while being written out and neither keep
> pointers into the page after unlocking it again.
>
>
> At this point I started to wonder if we should invent a separate type of pin
> for a buffer undergoing IO. We basically have the necessary state already:
> BM_IO_IN_PROGRESS. We'd need to look at that in some places, e.g. in
> InvalidateBuffer(), instead of BUF_STATE_GET_REFCOUNT(), we'd need to also
> look at BM_IO_IN_PROGRESS.
>
>
> Except of course that that'd not change anything -
> ConditionalLockBufferForCleanup() locks the buffer conditionally, *before*
> even looking at the refcount and returns false if not.  And writing out a
> buffer takes a content lock.  Which made me realize that
>   "tuples missed: 5848 dead from 89 pages not removed due to cleanup lock 
> contention"
>
> is often kinda wrong - the contention is *not* cleanup lock specific. It's
> often just plain contention on the lwlock.

Do you think we should change the error message?

> Perhaps we should just treat IO_IN_PROGRESS buffers differently in
> lazy_scan_heap() and heap_page_prune_opt() and wait?

Hmm. This is an interesting idea. lazy_scan_heap() waiting for
checkpointer to write out a buffer could have an interesting property
of shifting who writes out the FPI. I wonder if it would matter.

- Melanie
From b1af221d22cfcfbc63497d45c2d0f8ab28c720ab Mon Sep 17 00:00:00 2001
From: Melanie Plageman 
Date: Mon, 13 Nov 2023 16:39:25 -0500
Subject: [PATCH v2] Release lock on heap buffer before vacuuming FSM

When there are no indexes on a table, we vacuum each heap block after
pruning it and then update the freespace map. Periodically, we also
vacuum the freespace map. This was done while unnecessarily holding a
lock on the heap page. Release the lock before calling
FreeSpaceMapVacuumRange() and, while we're at it, ensure the range
includes the heap block we just vacuumed.

Discussion: https://postgr.es/m/CAAKRu_YiL%3D44GvGnt1dpYouDSSoV7wzxVoXs8m3p311rp-TVQQ%40mail.gmail.com
---
 src/backend/access/heap/vacuumlazy.c | 30 +---
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
index 6985d299b2..59f51f40e1 100644
--- a/src/backend/access/heap/vacuumlazy.c
+++ b/src/backend/access/heap/vacuumlazy.c
@@ -1046,18 +1046,6 @@ lazy_scan_heap(LVRelState *vacrel)
 /* Forget the LP_DEAD items that we just vacuumed */
 dead_items->num_items = 0;
 
-/*
- * Periodically perform FSM vacuuming to make newly-freed
- * space visible on upper FSM pages.  Note we have not yet
- * performed FSM processing for blkno.
- */
-if (blkno - next_fsm_block_to_vacuum >= VACUUM_FSM_EVERY_PAGES)
-{
-	FreeSpaceMapVacuumRange(vacrel->rel, next_fsm_block_to_vacuum,
-			blkno);
-	next_fsm_block_to_vacuum = blkno;
-}
-
 /*
  * Now perform FSM processing for blkno, and move on to next
  * page.
@@ -1071,6 +1059,24 @@ lazy_scan_heap(LVRelState *vacrel)
 
 UnlockReleaseBuffer(b

Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-15 Thread Peter Eisentraut

On 15.11.23 13:26, Amul Sul wrote:

Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know if
it's right or wrong, but if you have a specific reason, it would be
good
to know.

I referred to ALTER COLUMN DEFAULT and used that.


Hmm, I'm not sure if that is a good comparison.  For ALTER TABLE, SET 
DEFAULT is just a catalog manipulation, it doesn't change any data, so 
it's pretty easy.  SET EXPRESSION changes data, which other phases might 
want to inspect?  For example, if you do SET EXPRESSION and add a 
constraint in the same ALTER TABLE statement, do those run in the 
correct order?



I think ATExecSetExpression() needs to lock pg_attribute?  Did you lose
that during the refactoring?

I have removed that intentionally since we were not updating anything in
pg_attribute like ALTER DROP EXPRESSION.


ok


Tiny comment: The error message in ATExecSetExpression() does not need
to mention "stored", since it would be also applicable to virtual
generated columns in the future.

I had to have the same thought, but later decided when we do that
virtual column thing, we could simply change that. I am fine to do that 
change

now as well, let me know your thought.


Not a big deal, but I would change it now.

Another small thing I found:  In ATExecColumnDefault(), there is an 
errhint() that suggests DROP EXPRESSION instead of DROP DEFAULT.  You 
could now add another hint that suggests SET EXPRESSION instead of SET 
DEFAULT.






Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tom Lane
Thomas Munro  writes:
> Currently pg_locale.c requires systems to have *either* uselocale() or
> mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second
> requirement.

Check.

> The other uses of uselocale() are in ECPG code that must
> be falling back to the setlocale() path.  In other words, isn't it the
> case that we don't require uselocale() to compile ECPG stuff, but it'll
> probably crash or corrupt itself or give wrong answers if you push it
> on NetBSD, so... uhh, really we do require it?

Dunno.  mamba is getting through the ecpg regression tests okay,
but we all know that doesn't prove a lot.  (AFAICS, ecpg only
cares about this to the extent of not wanting an LC_NUMERIC
locale where the decimal point isn't '.'.  I'm not sure that
NetBSD supports any such locale anyway --- I think they're like
OpenBSD in having only pro-forma locale support.)

regards, tom lane




Re: typo in fallback implementation for pg_atomic_test_set_flag()

2023-11-15 Thread Nathan Bossart
On Wed, Nov 15, 2023 at 09:52:34AM -0600, Nathan Bossart wrote:
> On Tue, Nov 14, 2023 at 07:17:32PM -0800, Andres Freund wrote:
>> Are you planning to apply the fix?
> 
> Yes, I'll take care of it.

Committed and back-patched.  I probably could've skipped back-patching this
one since it doesn't seem to be causing any problems yet, but I didn't see
any reason not to back-patch, either.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Thomas Munro
On Thu, Nov 16, 2023 at 9:51 AM Tom Lane  wrote:
> Thomas Munro  writes:
> > On Thu, Nov 16, 2023 at 6:45 AM Tom Lane  wrote:
> >> You would need to do some research and try to prove that that won't
> >> be a problem on any modern platform.  Presumably it once was a problem,
> >> or we'd not have bothered with a configure check.
>
> > According to data I scraped from the build farm, the last two systems
> > we had that didn't have uselocale() were curculio (OpenBSD 5.9) and
> > wrasse (Solaris 11.3), but those were both shut down (though wrasse
> > still runs old branches) as they were well out of support.
>
> AFAICS, NetBSD still doesn't have it.  They have no on-line man page
> for it, and my animal mamba shows it as not found.

Oh :-(  I see that but had missed that sidewinder was NetBSD and my
scraped data predates mamba.  Sorry for the wrong info.

Currently pg_locale.c requires systems to have *either* uselocale() or
mbstowcs_l()/wcstombs_l(), but NetBSD satisfies the second
requirement.  The other uses of uselocale() are in ECPG code that must
be falling back to the setlocale() path.  In other words, isn't it the
case that we don't require uselocale() to compile ECPG stuff, but it'll
probably crash or corrupt itself or give wrong answers if you push it
on NetBSD, so... uhh, really we do require it?




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tom Lane
Thomas Munro  writes:
> On Thu, Nov 16, 2023 at 6:45 AM Tom Lane  wrote:
>> You would need to do some research and try to prove that that won't
>> be a problem on any modern platform.  Presumably it once was a problem,
>> or we'd not have bothered with a configure check.

> According to data I scraped from the build farm, the last two systems
> we had that didn't have uselocale() were curculio (OpenBSD 5.9) and
> wrasse (Solaris 11.3), but those were both shut down (though wrasse
> still runs old branches) as they were well out of support.

AFAICS, NetBSD still doesn't have it.  They have no on-line man page
for it, and my animal mamba shows it as not found.

regards, tom lane




Re: Allow tests to pass in OpenSSL FIPS mode

2023-11-15 Thread Tom Lane
Daniel Gustafsson  writes:
> Since the 3DES/DES deprecations aren't limited to FIPS, do we want to do
> anything for pgcrypto where we have DES/3DES encryption?  Maybe a doc patch
> which mentions the deprecation with a link to the SP could be in order?

A docs patch that marks both MD5 and 3DES as deprecated is probably
appropriate, but it seems like a matter for a separate thread and patch.

In the meantime, I've done a pass of review of Peter's v4 patches.
v4-0001 is already committed, so that's not considered here.

v4-0002: I think it is worth splitting up contrib/pgcrypto's
pgp-encrypt test, which has only one test case whose output changes,
and a bunch of others that don't.  v5-0002, attached, does it
like that.  It's otherwise the same as v4.

(It might be worth doing something similar for uuid_ossp's test,
but I have not bothered here.  That test script is stable enough
that I'm not too worried about future maintenance.)

The attached 0003, 0004, 0005 patches are identical to Peter's.
I think that it is possibly worth modifying the password test so that
we don't fail to create the roles, so as to reduce the delta between
password.out and password_1.out (and thereby ease future maintenance
of those files).  However you might disagree, so I split my proposal
out as a separate patch v5-0007-password-test-delta.patch; you can
drop that from the set if you don't like it.

v5-0006-allow-for-disabled-3DES.patch adds the necessary expected
file to make that pass on my Fedora 38 system.

With or without 0007, as you choose, I think it's committable.

regards, tom lane

diff --git a/contrib/pgcrypto/Makefile b/contrib/pgcrypto/Makefile
index 7fb59f51b7..5efa10c334 100644
--- a/contrib/pgcrypto/Makefile
+++ b/contrib/pgcrypto/Makefile
@@ -42,7 +42,7 @@ PGFILEDESC = "pgcrypto - cryptographic functions"
 REGRESS = init md5 sha1 hmac-md5 hmac-sha1 blowfish rijndael \
 	sha2 des 3des cast5 \
 	crypt-des crypt-md5 crypt-blowfish crypt-xdes \
-	pgp-armor pgp-decrypt pgp-encrypt $(CF_PGP_TESTS) \
+	pgp-armor pgp-decrypt pgp-encrypt pgp-encrypt-md5 $(CF_PGP_TESTS) \
 	pgp-pubkey-decrypt pgp-pubkey-encrypt pgp-info
 
 EXTRA_CLEAN = gen-rtab
diff --git a/contrib/pgcrypto/expected/crypt-md5_1.out b/contrib/pgcrypto/expected/crypt-md5_1.out
new file mode 100644
index 00..0ffda34ab4
--- /dev/null
+++ b/contrib/pgcrypto/expected/crypt-md5_1.out
@@ -0,0 +1,16 @@
+--
+-- crypt() and gen_salt(): md5
+--
+SELECT crypt('', '$1$Szzz0yzz');
+ERROR:  crypt(3) returned NULL
+SELECT crypt('foox', '$1$Szzz0yzz');
+ERROR:  crypt(3) returned NULL
+CREATE TABLE ctest (data text, res text, salt text);
+INSERT INTO ctest VALUES ('password', '', '');
+UPDATE ctest SET salt = gen_salt('md5');
+UPDATE ctest SET res = crypt(data, salt);
+ERROR:  crypt(3) returned NULL
+SELECT res = crypt(data, res) AS "worked"
+FROM ctest;
+ERROR:  invalid salt
+DROP TABLE ctest;
diff --git a/contrib/pgcrypto/expected/hmac-md5_1.out b/contrib/pgcrypto/expected/hmac-md5_1.out
new file mode 100644
index 00..56875b0f63
--- /dev/null
+++ b/contrib/pgcrypto/expected/hmac-md5_1.out
@@ -0,0 +1,44 @@
+--
+-- HMAC-MD5
+--
+SELECT hmac(
+'Hi There',
+'\x0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b0b'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 2
+SELECT hmac(
+'Jefe',
+'what do ya want for nothing?',
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 3
+SELECT hmac(
+'\x'::bytea,
+'\x'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 4
+SELECT hmac(
+'\xcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcdcd'::bytea,
+'\x0102030405060708090a0b0c0d0e0f10111213141516171819'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 5
+SELECT hmac(
+'Test With Truncation',
+'\x0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c0c'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 6
+SELECT hmac(
+'Test Using Larger Than Block-Size Key - Hash Key First',
+'\x'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
+-- 7
+SELECT hmac(
+'Test Using Larger Than Block-Size Key and Larger Than One Block-Size Data',
+'\x'::bytea,
+'md5');
+ERROR:  Cannot use "md5": Cipher cannot be initialized
diff --git a/contrib/pgcrypto/expected/md5_1.out b/contrib/pgcrypto/expected/md5_1.out
new file mode 100644
index 00..decb215c48
--- /dev/null
+++ b/contrib/pgcrypto/expected/md5_1.out
@@ -0,0 +1,17 @@
+--
+-- MD5 message digest
+--
+SELECT digest('

Re: Some performance degradation in REL_16 vs REL_15

2023-11-15 Thread Andres Freund
Hi,

On 2023-11-15 10:09:06 -0500, Tom Lane wrote:
> "Anton A. Melnikov"  writes:
> > I can't understand why i get the opposite results on my pc and on the 
> > server. It is clear that the absolute
> > TPS values will be different for various configurations. This is normal. 
> > But differences?
> > Is it unlikely that some kind of reference configuration is needed to 
> > accurately
> > measure the difference in performance. Probably something wrong with my pc, 
> > but now
> > i can not figure out what's wrong.
>
> > Would be very grateful for any advice or comments to clarify this problem.
>
> Benchmarking is hard :-(.

Indeed.


> IME it's absolutely typical to see variations of a couple of percent even
> when "nothing has changed", for example after modifying some code that's
> nowhere near any hot code path for the test case.  I usually attribute this
> to cache effects, such as a couple of bits of hot code now sharing or not
> sharing a cache line.

FWIW, I think we're overusing that explanation in our community. Of course you
can encounter things like this, but the replacement policies of cpu caches
have gotten a lot better and the caches have gotten bigger too.

IME this kind of thing is typically dwarfed by much bigger variations from
things like

- cpu scheduling - whether the relevant pgbench thread is colocated on the
  same core as the relevant backend can make a huge difference,
  particularly when CPU power saving modes are not disabled.  Just looking at
  tps from a fully cached readonly pgbench, with a single client:

  Power savings enabled, same core:
  37493

  Power savings enabled, different core:
  28539

  Power savings disabled, same core:
  38167

  Power savings disabled, different core:
  37365


- can transparent huge pages be used for the executable mapping, or not

  On newer kernels linux (and some filesystems) can use huge pages for the
  executable. To what degree that succeeds is a large factor in performance.

  Single threaded read-only pgbench

  postgres mapped without huge pages:
  37155 TPS

  with 2MB of postgres as huge pages:
  37695 TPS

  with 6MB of postgres as huge pages:
  42733 TPS

  The really annoying thing about this is that entirely unpredictable whether
  huge pages are used or not. Building the same way, sometimes 0, sometimes 2MB,
  sometimes 6MB are mapped huge. Even though the on-disk contents are
  precisely the same.  And it can even change without rebuilding, if the
  binary is evicted from the page cache.

  This alone makes benchmarking extremely annoying. It basically can't be
  controlled and has huge effects.


- How long has the server been started

  If e.g. once you run your benchmark on the first connection to a database,
  and after a restart not (e.g. autovacuum starts up beforehand), you can get
  a fairly different memory layout and cache situation, due to [not] using the
  relcache init file. If not, you'll have a catcache that's populated,
  otherwise not.

  Another mean one is whether you start your benchmark within a relatively
  short time of the server starting. Readonly pgbench with a single client,
  started immediately after the server:

  progress: 12.0 s, 37784.4 tps, lat 0.026 ms stddev 0.001, 0 failed
  progress: 13.0 s, 37779.6 tps, lat 0.026 ms stddev 0.001, 0 failed
  progress: 14.0 s, 37668.2 tps, lat 0.026 ms stddev 0.001, 0 failed
  progress: 15.0 s, 32133.0 tps, lat 0.031 ms stddev 0.113, 0 failed
  progress: 16.0 s, 37564.9 tps, lat 0.027 ms stddev 0.012, 0 failed
  progress: 17.0 s, 37731.7 tps, lat 0.026 ms stddev 0.001, 0 failed

  There's a dip at 15s, odd - turns out that's due to bgwriter writing a WAL
  record, which triggers walwriter to write it out and then initialize the
  whole WAL buffers with 0s - happens once.  In this case I've exagerated the
  effect a bit by using a 1GB wal_buffers, but it's visible otherwise too.
  Whether your benchmark period includes that dip or not adds a fair bit of
  noise.

  You can even see the effects of autovacuum workers launching - even if
  there's nothing to do!  Not as a huge dip, but enough to add some "run to
  run" variation.


- How much other dirty data is there in the kernel pagecache. If you e.g. just
  built a new binary, even with just minor changes, the kernel will need to
  flush those pages eventually, which may contend for IO and increases page
  faults.

  Rebuilding an optimized build generates something like 1GB of dirty
  data. Particularly with ccache, that'll typically not yet be flushed by the
  time you run a benchmark. That's not nothing, even with a decent NVMe SSD.

- many more, unfortunately

Greetings,

Andres Freund




Re: [PoC] Federated Authn/z with OAUTHBEARER

2023-11-15 Thread Jacob Champion
On Thu, Nov 9, 2023 at 5:43 PM Andrey Chudnovsky  wrote:
> Do you plan to support adding an extension hook to validate the token?
>
> It would allow a more efficient integration, then spinning a separate process.

I think an API in the style of archive modules might probably be a
good way to go, yeah.

It's probably not very high on the list of priorities, though, since
the inputs and outputs are going to "look" the same whether you're
inside or outside of the server process. The client side is going to
need the bulk of the work/testing/validation. Speaking of which -- how
is the current PQauthDataHook design doing when paired with MS AAD
(er, Entra now I guess)? I haven't had an Azure test bed available for
a while.

Thanks,
--Jacob




Re: pg_dump needs SELECT privileges on irrelevant extension table

2023-11-15 Thread Jacob Champion
> commit a70f2a57f233244c0a780829baf48c624187d456
> Author: Tom Lane 
> Date:   Mon Nov 13 17:04:10 2023 -0500
>
>Don't try to dump RLS policies or security labels for extension objects.

(Thanks Tom!)

--Jacob




Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-15 Thread Nathan Bossart
Committed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Thomas Munro
On Thu, Nov 16, 2023 at 7:42 AM Dagfinn Ilmari Mannsåker
 wrote:
> Tom Lane  writes:
>
> > "Tristan Partin"  writes:
> >> I would like to propose removing HAVE_USELOCALE, and just have WIN32,
> >> which means that Postgres would require uselocale(3) on anything that
> >> isn't WIN32.
> >
> > You would need to do some research and try to prove that that won't
> > be a problem on any modern platform.  Presumably it once was a problem,
> > or we'd not have bothered with a configure check.
> >
> > (Some git archaeology might yield useful info about when and why
> > we added the check.)
>
> For reference, the Perl effort to use the POSIX.1-2008 thread-safe
> locale APIs have revealed several platform-specific bugs that cause it
> to disable them on FreeBSD and macOS:
>
> https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35

Interesting that C vs C.UTF-8 has come up there, something that has
also confused us and others (in fact I still owe Daniel Vérité a
response to his complaint about how we treat the latter; I got stuck
on a logical problem with the proposal and then dumped core...).  The
idea of C.UTF-8 is relatively new, and seems to have shaken a few bugs
out in a few places.  Anyway, that in particular is a brand new
FreeBSD bug report and I am sure it will be addressed soon.

> https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831

As for macOS, one thing I noticed is that the FreeBSD -> macOS
pipeline appears to have re-awoken after many years of slumber.  I
don't know anything about that other than that when I recently
upgraded my Mac to 14.1, suddenly a few userspace tools are now
running the recentish FreeBSD versions of certain userland tools (tar,
grep, ...?), instead of something from the Jurassic.  Whether that
might apply to libc, who can say... they seemed to have quite ancient
BSD locale code last time I checked.

> https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862

That linked issue appears to be fixed already.

> But Perl actually makes use of per-thread locales, because it has a
> separate interpereer per thread, each of which can have a different
> locale active.  Since Postgres isn't actually multi-threaded (yet),
> these concerns might not apply to the same degree.

ECPG might use them in multi-threaded code.  I'm not sure if it's a
problem and whose problem it is.




Re: Why do indexes and sorts use the database collation?

2023-11-15 Thread Jeff Davis
On Tue, 2023-11-14 at 13:01 +0100, Tomas Vondra wrote:

> Presumably we'd no generate incorrect
> results, but we'd not be able use an index, causing performance
> issues.

Couldn't use the index for its pathkeys or range scans, but could use
it for equality.

> AFAICS this is a trade-off between known benefits (faster equality
> searches, which are common for PK columns) vs. unknown downsides
> (performance penalty for operations with unknown frequency).

Don't forget the dependency versioning risk: changes to the collation
provider library can corrupt your indexes. That affects even small
tables where performance is not a concern.

> Not sure it's a decision we can make automatically. But it's mostly
> achievable manually, if the user specifies COLLATE "C" for the
> column.

Changing the column collation is the wrong place to do it, in my
opinion. It conflates semantics with performance considerations and
dependency risks.

We already have the catalog support for indexes with a different
collation:

  CREATE TABLE foo (t TEXT COLLATE "en_US");
  INSERT INTO foo SELECT g::TEXT FROM generate_series(1,100) g;
  CREATE INDEX foo_idx ON foo (t COLLATE "C");
  ANALYZE foo;

The problem is that:

  EXPLAIN SELECT * FROM foo WHERE t = '345678';

doesn't use the index. And also that there's no way to do it for a PK
index.

> I realize you propose to do this automatically for everyone,

I don't think I proposed that. Perhaps we nudge users in that direction
over time as the utility becomes clear, but I'm not trying to push for
a sudden radical change.

Perhaps many read $SUBJECT as a rhetorical question, but I really do
want to know if I am missing important and common use cases for indexes
in a non-"C" collation.

>  But maybe there's a way
> to make this manual approach more convenient? Say, by allowing the PK
> to
> have a different collation (which I don't think is allowed now).

Yeah, that should be fairly non-controversial.

> FWIW I wonder what the impact of doing this automatically would be in
> practice. I mean, in my experience the number of tables with TEXT (or
> types sensitive to collations) primary keys is fairly low, especially
> for tables of non-trivial size (where the performance impact might be
> measurable).

I think a lot more users would be helped than hurt. But in absolute
numbers, the latter group still represents a lot of regressions, so
let's not do anything radical.

> > 
> True. What about trying to allow a separate collation for the PK
> constraint (and the backing index)?

+1.

> > 
> OK. I personally don't recall any case where I'd see a collation on
> PK
> indexes as a performance issue. Or maybe I just didn't realize it.

Try a simple experiment building an index with the "C" collation and
then try with a different locale on the same data. Numbers vary, but
I've seen 1.5X to 4X on some simple generated data. Others have
reported much worse numbers on some versions of glibc that are
especially slow with lots of non-latin characters.

> But speaking of natural keys, I recall a couple schemas with natural
> keys in code/dimension tables, and it's not uncommon to cluster those
> slow-moving tables once in a while. I don't know if ORDER BY queries
> were very common on those tables, though.

Yeah, not exactly a common case though.

> > 
> OK, I think this answers my earlier question. Now that I think about
> this, the one confusing thing with this syntax is that it seems to
> assign the collation to the constraint, but in reality we want the
> constraint to be enforced with the column's collation and the
> alternative collation is for the index.

Yeah, let's be careful about that. It's still technically correct:
uniqueness in either collation makes sense. But it could be confusing
anyway.

> > 
Regards,
Jeff Davis





Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tom Lane
=?utf-8?Q?Dagfinn_Ilmari_Manns=C3=A5ker?=  writes:
> Tom Lane  writes:
>> "Tristan Partin"  writes:
>>> I would like to propose removing HAVE_USELOCALE, and just have WIN32, 
>>> which means that Postgres would require uselocale(3) on anything that 
>>> isn't WIN32.

>> You would need to do some research and try to prove that that won't
>> be a problem on any modern platform.  Presumably it once was a problem,
>> or we'd not have bothered with a configure check.

> For reference, the Perl effort to use the POSIX.1-2008 thread-safe
> locale APIs have revealed several platform-specific bugs that cause it
> to disable them on FreeBSD and macOS:
> https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35
> https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831
> and work around bugs on others (e.g. OpenBSD):
> https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862
> But Perl actually makes use of per-thread locales, because it has a
> separate interpereer per thread, each of which can have a different
> locale active.  Since Postgres isn't actually multi-threaded (yet),
> these concerns might not apply to the same degree.

Interesting.  That need not stop us from dropping the configure
check for uselocale(), but it might be a problem for Tristan's
larger ambitions.

regards, tom lane




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Dagfinn Ilmari Mannsåker
Tom Lane  writes:

> "Tristan Partin"  writes:
>> I would like to propose removing HAVE_USELOCALE, and just have WIN32, 
>> which means that Postgres would require uselocale(3) on anything that 
>> isn't WIN32.
>
> You would need to do some research and try to prove that that won't
> be a problem on any modern platform.  Presumably it once was a problem,
> or we'd not have bothered with a configure check.
>
> (Some git archaeology might yield useful info about when and why
> we added the check.)

For reference, the Perl effort to use the POSIX.1-2008 thread-safe
locale APIs have revealed several platform-specific bugs that cause it
to disable them on FreeBSD and macOS:

https://github.com/perl/perl5/commit/9cbc12c368981c56d4d8e40cc9417ac26bec2c35
https://github.com/perl/perl5/commit/dd4eb78c55aab441aec1639b1dd49f88bd960831

and work around bugs on others (e.g. OpenBSD):

https://github.com/perl/perl5/commit/0f3830f3997cf7ef1531bad26d2e0f13220dd862

But Perl actually makes use of per-thread locales, because it has a
separate interpereer per thread, each of which can have a different
locale active.  Since Postgres isn't actually multi-threaded (yet),
these concerns might not apply to the same degree.

>   regards, tom lane

- ilmari




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Thomas Munro
On Thu, Nov 16, 2023 at 6:45 AM Tom Lane  wrote:
> "Tristan Partin"  writes:
> > I would like to propose removing HAVE_USELOCALE, and just have WIN32,
> > which means that Postgres would require uselocale(3) on anything that
> > isn't WIN32.
>
> You would need to do some research and try to prove that that won't
> be a problem on any modern platform.  Presumably it once was a problem,
> or we'd not have bothered with a configure check.
>
> (Some git archaeology might yield useful info about when and why
> we added the check.)

According to data I scraped from the build farm, the last two systems
we had that didn't have uselocale() were curculio (OpenBSD 5.9) and
wrasse (Solaris 11.3), but those were both shut down (though wrasse
still runs old branches) as they were well out of support.  OpenBSD
gained uselocale() in 6.2, and Solaris in 11.4, as part of the same
suite of POSIX changes that we already required in commit 8d9a9f03.

+1 for the change.

https://man.openbsd.org/uselocale.3
https://docs.oracle.com/cd/E88353_01/html/E37843/uselocale-3c.html




Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Bharath Rupireddy
On Wed, Nov 15, 2023 at 3:58 PM Alvaro Herrera  wrote:
>
> Translation-wise, this doesn't work, because you're building a string.
> There's no reason to think that the words "logical" and "physical"
> should stay untranslated; the message would make no sense, or at least
> would be very ugly.
>
> You should do something like
>
> if (am_walsender)
> {
> ereport(log_replication_commands ? LOG : DEBUG1,
> SlotIsLogical(s) ? errmsg("acquired logical replication slot 
> \"%s\"", NameStr(s->data.name)) :
> errmsg("acquired physical replication slot \"%s\"", 
> NameStr(s->data.name)));
> }

This seems better, so done that way.

> (Obviously, lose the "translator:" comments since they are unnecessary)

The translator message now indicates that the remaining %s denotes the
replication slot name.

PSA v17 patch.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com


v17-0001-Log-messages-for-replication-slot-acquisition-an.patch
Description: Binary data


Re: pg_upgrade and logical replication

2023-11-15 Thread vignesh C
On Mon, 13 Nov 2023 at 17:49, Amit Kapila  wrote:
>
> On Mon, Nov 13, 2023 at 5:01 PM Amit Kapila  wrote:
> >
> > On Fri, Nov 10, 2023 at 7:26 PM vignesh C  wrote:
> > >
> > > Thanks for the comments, the attached v13 version patch has the
> > > changes for the same.
> > >
> >
> > +
> > + ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname,
> > sizeof(originname));
> > + originid = replorigin_by_name(originname, false);
> > + replorigin_advance(originid, sublsn, InvalidXLogRecPtr,
> > +false /* backward */ ,
> > +false /* WAL log */ );
> >
> > This seems to update the origin state only in memory. Is it sufficient
> > to use this here?
> >
>
> I think it is probably getting ensured by clean shutdown
> (shutdown_checkpoint) which happens on the new cluster after calling
> this function. We can probably try to add a comment for it. BTW, we
> also need to ensure that max_replication_slots is configured to a
> value higher than origins we are planning to create on the new
> cluster.

Added comments and also added the check for max_replication_slots.

The attached v14 patch at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm20%3DBk_w9jDZXEqkJ3_NUAxOBswCn4jR-tmh-MqNpPZYw%40mail.gmail.com

Regards,
Vignesh




Re: pg_upgrade and logical replication

2023-11-15 Thread vignesh C
On Mon, 13 Nov 2023 at 17:02, Amit Kapila  wrote:
>
> On Fri, Nov 10, 2023 at 7:26 PM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v13 version patch has the
> > changes for the same.
> >
>
> +
> + ReplicationOriginNameForLogicalRep(subid, InvalidOid, originname,
> sizeof(originname));
> + originid = replorigin_by_name(originname, false);
> + replorigin_advance(originid, sublsn, InvalidXLogRecPtr,
> +false /* backward */ ,
> +false /* WAL log */ );
>
> This seems to update the origin state only in memory. Is it sufficient
> to use this here? Anyway, I think using this requires us to first
> acquire RowExclusiveLock on pg_replication_origin something the patch
> is doing for some other system table.

Added the lock.

The attached v14 patch at [1] has the changes for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm20%3DBk_w9jDZXEqkJ3_NUAxOBswCn4jR-tmh-MqNpPZYw%40mail.gmail.com

Regards,
Vignesh




Re: Some performance degradation in REL_16 vs REL_15

2023-11-15 Thread Andres Freund
Hi,

On 2023-11-15 11:33:44 +0300, Anton A. Melnikov wrote:
> The configure options and test scripts on my pc and server were the same:
> export CFLAGS="-O2"
> ./configure --enable-debug --with-perl --with-icu --enable-depend 
> --enable-tap-tests
> #reinstall
> #reinitdb
> #create database bench
> for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1;
> psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 
> -j6 bench;

Even with scale 8 you're likely significantly impacted by contention. And
obviously WAL write latency. See below for why that matters.



> I can't understand why i get the opposite results on my pc and on the server. 
> It is clear that the absolute
> TPS values will be different for various configurations. This is normal. But 
> differences?
> Is it unlikely that some kind of reference configuration is needed to 
> accurately
> measure the difference in performance. Probably something wrong with my pc, 
> but now
> i can not figure out what's wrong.

One very common reason for symptoms like this are power-saving measures by the
CPU. In workloads where the CPU is not meaningfully utilized, the CPU will go
into a powersaving mode - which can cause workloads that are latency sensitive
to be badly affected.  Both because initially the cpu will just work at a
lower frequency and because it takes time to shift to a higher latency.


Here's an example:
I bound the server and psql to the same CPU core (nothing else is allowed to
use that core. And ran the following:

\o /dev/null
SELECT 1; SELECT 1; SELECT 1; SELECT pg_sleep(0.1); SELECT 1; SELECT 1; SELECT 
1;
Time: 0.181 ms
Time: 0.085 ms
Time: 0.071 ms
Time: 100.474 ms
Time: 0.153 ms
Time: 0.077 ms
Time: 0.069 ms

You can see how the first query timing was slower, the next two were faster,
and then after the pg_sleep() it's slow again.


# tell the CPU to optimize for performance not power
cpupower frequency-set --governor performance

# disable going to lower power states
cpupower idle-set -D0

# disable turbo mode for consistent performance
echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

Now the timings are:
Time: 0.038 ms
Time: 0.028 ms
Time: 0.025 ms
Time: 1000.262 ms (00:01.000)
Time: 0.027 ms
Time: 0.024 ms
Time: 0.023 ms

Look, fast and reasonably consistent timings.

Switching back:
Time: 0.155 ms
Time: 0.123 ms
Time: 0.074 ms
Time: 1001.235 ms (00:01.001)
Time: 0.120 ms
Time: 0.077 ms
Time: 0.068 ms


The perverse thing is that this often means that *reducing* the number of
instructions executed yields to *worse* behaviour when under non-sustained
load, because from the CPUs point of view there is less need to increase clock
speed.


To show how much of a difference that can make, I ran pgbench with a single
client on one core, and the server on another (so the CPU is idle inbetween):
numactl --physcpubind 11 pgbench -n -M prepared -P1 -S -c 1 -T10

With power optimized configuration:
latency average = 0.035 ms
latency stddev = 0.002 ms
initial connection time = 5.255 ms
tps = 28434.334672 (without initial connection time)

With performance optimized configuration:
latency average = 0.025 ms
latency stddev = 0.001 ms
initial connection time = 3.544 ms
tps = 40079.995935 (without initial connection time)

That's a whopping 1.4x in throughput!


Now, the same thing, except that I used a custom workload where pgbench
transactions are executed in a pipelined fashion, 100 read-only transactions
in one script execution:

With power optimized configuration:
latency average = 1.055 ms
latency stddev = 0.125 ms
initial connection time = 6.915 ms
tps = 947.985286 (without initial connection time)

(this means we actually executed 94798.5286 readonly pgbench transactions/s)

With performance optimized configuration:
latency average = 1.376 ms
latency stddev = 0.083 ms
initial connection time = 3.759 ms
tps = 726.849018 (without initial connection time)

Suddenly the super-duper performance optimized settings are worse (but note
that stddev is down)! I suspect the problem is that now because we disabled
idle states, the cpu ends up clocking *lower*, due to power usage.

If I just change the relevant *cores* to the performance optimized
configuration:

cpupower -c 10,11 idle-set -D0; cpupower -c 10,11 frequency-set --governor 
performance

latency average = 0.940 ms
latency stddev = 0.061 ms
initial connection time = 3.311 ms
tps = 1063.719116 (without initial connection time)

It wins again.


Now, realistically you'd never use -D0 (i.e. disabling all downclocking, not
just lower states) - the power differential is quite big and as shown here it
can hurt performance as well.

On an idle system, looking at the cpu power usage with:
powerstat -D -R 5 1000

  TimeUser  Nice   Sys  IdleIO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts 
  pkg-0   dram  pkg-1
09:45:03   0.6   0.0   0.2  99.2   0.01   2861   2823000  46.84 
  24.82   3.68  18.33
09:45:08   1.0   0.0   0.1  99.0  

Re: pg_upgrade and logical replication

2023-11-15 Thread vignesh C
On Mon, 13 Nov 2023 at 13:52, Peter Smith  wrote:
>
> Here are some review comments for patch v13-0001
>
> ==
> src/bin/pg_dump/pg_dump.c
>
> 1. getSubscriptionTables
>
> + int i_srsublsn;
> + int i;
> + int cur_rel = 0;
> + int ntups;
>
> What is the difference between 'i' and 'cur_rel'?
>
> AFAICT these represent the same tuple index, in which case you might
> as well throw away 'cur_rel' and only keep 'i'.

Modified

> ~~~
>
> 2. getSubscriptionTables
>
> + for (i = 0; i < ntups; i++)
> + {
> + Oid cur_srsubid = atooid(PQgetvalue(res, i, i_srsubid));
> + Oid relid = atooid(PQgetvalue(res, i, i_srrelid));
> + TableInfo  *tblinfo;
>
> Since this is all new code, using C99 style for loop variable
> declaration of 'i' will be better.

Modified

> ==
> src/bin/pg_upgrade/check.c
>
> 3. check_for_subscription_state
>
> +check_for_subscription_state(ClusterInfo *cluster)
> +{
> + int dbnum;
> + FILE*script = NULL;
> + char output_path[MAXPGPATH];
> + int ntup;
> +
> + /* Subscription relations state can be migrated since PG17. */
> + if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700)
> + return;
> +
> + prep_status("Checking for subscription state");
> +
> + snprintf(output_path, sizeof(output_path), "%s/%s",
> + log_opts.basedir,
> + "subscription_state.txt");
>
> I felt this filename ought to be more like
> 'subscriptions_with_bad_state.txt' because the current name looks like
> a normal logfile with nothing to indicate that it is only for the
> states of the "bad" subscriptions.

I  have kept the file name intentionally shorted as we noticed that
when the upgrade of the publisher patch used a longer name there were
some buildfarm failures because of longer names.

> ~~~
>
> 4.
> + for (dbnum = 0; dbnum < cluster->dbarr.ndbs; dbnum++)
> + {
>
> Since this is all new code, using C99 style for loop variable
> declaration of 'dbnum' will be better.

Modified

> ~~~
>
> 5.
> + * a) SUBREL_STATE_DATASYNC:A relation upgraded while in this state
> + * would retain a replication slot, which could not be dropped by the
> + * sync worker spawned after the upgrade because the subscription ID
> + * tracked by the publisher does not match anymore.
>
> missing whitespace
>
> /SUBREL_STATE_DATASYNC:A relation/SUBREL_STATE_DATASYNC: A relation/

Modified

Also added a couple of missing test cases. The attached v14 version
patch has the changes for the same.

Regards,
Vignesh
From 354137c80dfacc30bd0fa85c2f993f34ae5af4b9 Mon Sep 17 00:00:00 2001
From: Vignesh C 
Date: Mon, 30 Oct 2023 12:31:59 +0530
Subject: [PATCH v14] Preserve the full subscription's state during pg_upgrade

Previously, only the subscription metadata information was preserved.  Without
the list of relations and their state it's impossible to re-enable the
subscriptions without missing some records as the list of relations can only be
refreshed after enabling the subscription (and therefore starting the apply
worker).  Even if we added a way to refresh the subscription while enabling a
publication, we still wouldn't know which relations are new on the publication
side, and therefore should be fully synced, and which shouldn't.

To fix this problem, this patch teaches pg_dump to restore the content of
pg_subscription_rel from the old cluster by using
binary_upgrade_add_sub_rel_state SQL function. This is supported only
in binary upgrade mode.

The new SQL binary_upgrade_add_sub_rel_state function has the following
syntax:
SELECT binary_upgrade_add_sub_rel_state(subname text, relid oid, state char [,sublsn pg_lsn])

In the above, subname is the subscription name, relid is the relation
identifier, the state is the state of the relation, sublsn is subscription lsn
which is optional, and defaults to NULL/InvalidXLogRecPtr if not provided.
pg_dump will retrieve these values(subname, relid, state and sublsn) from the
old cluster.

The subscription's replication origin is needed to ensure that we don't
replicate anything twice.

To fix this problem, this patch teaches pg_dump to update the replication
origin along with create subscription by using
binary_upgrade_replorigin_advance SQL function to restore the
underlying replication origin remote LSN. This is supported only in
binary upgrade mode.

The new SQL binary_upgrade_replorigin_advance function has the following
syntax:
SELECT binary_upgrade_replorigin_advance(subname text, sublsn pg_lsn)

In the above, subname is the subscription name and sublsn is subscription lsn.
pg_dump will retrieve these values(subname and sublsn) from the old cluster.

pg_upgrade will check that all the subscription relations are in 'i' (init), 's' (data sync) or in 'r' (ready) state, and
will error out if that's not the case, logging the reason for the failure.

Author: Julien Rouhaud, Vignesh C
Reviewed-by: FIXME
Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud
---
 doc/src/sgml/ref/pgupgrade.sgml|  72 
 src/backend/utils/adt/pg_upgrade_support.c | 130 +++
 src/bin/pg_dum

Re: Potential use-after-free in partion related code

2023-11-15 Thread Alvaro Herrera
On 2023-Nov-15, Andres Freund wrote:

>   partConstraint = list_concat(partBoundConstraint,
>
> RelationGetPartitionQual(rel));
> 
> At this point partBoundConstraint may not be used anymore, because
> list_concat() might have reallocated.
> 
> But then a few lines later:
> 
>   /* we already hold a lock on the default partition */
>   defaultrel = table_open(defaultPartOid, NoLock);
>   defPartConstraint =
>   get_proposed_default_constraint(partBoundConstraint);
> 
> We use partBoundConstraint again.

Yeah, this is wrong if partBoundConstraint is reallocated by
list_concat.  One possible fix is to change list_concat to
list_concat_copy(), which leaves the original list unmodified.

AFAICT the bug came in with 6f6b99d1335b, which added default
partitions.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Now I have my system running, not a byte was off the shelf;
It rarely breaks and when it does I fix the code myself.
It's stable, clean and elegant, and lightning fast as well,
And it doesn't cost a nickel, so Bill Gates can go to hell."




Re: On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tom Lane
"Tristan Partin"  writes:
> I would like to propose removing HAVE_USELOCALE, and just have WIN32, 
> which means that Postgres would require uselocale(3) on anything that 
> isn't WIN32.

You would need to do some research and try to prove that that won't
be a problem on any modern platform.  Presumably it once was a problem,
or we'd not have bothered with a configure check.

(Some git archaeology might yield useful info about when and why
we added the check.)

regards, tom lane




Re: pg_basebackup check vs Windows file path limits

2023-11-15 Thread Andrew Dunstan



On 2023-11-15 We 06:34, Alvaro Herrera wrote:

On 2023-Nov-13, Andrew Dunstan wrote:


size?

https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol

Hmm, here's what that page says - I can't see it saying what you're
suggesting here - am I missing something?:

I don't think so.  I think I just confused myself.  Reading the docs it
appears that other Windows APIs work as I described, but not this one.

Anyway, after looking at it a bit more, I realized that this code uses
MAX_PATH as basis for its buffer's length limit -- and apparently on
Windows that's only 260, much shorter than MAXPGPATH (1024) which our
own code uses to limit the buffers given to readlink().  So maybe fixing
this is just a matter of doing s/MAX_PATH/MAXPGPATH/ in dirmod.c.




I'll test it.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: remaining sql/json patches

2023-11-15 Thread Andres Freund
On 2023-11-15 09:11:19 -0800, Andres Freund wrote:
> On 2023-11-15 22:00:41 +0900, Amit Langote wrote:
> > > This causes a nontrivial increase in the size of the parser (~5% in an
> > > optimized build here), I wonder if we can do better.
> > 
> > Hmm, sorry if I sound ignorant but what do you mean by the parser here?
> 
> gram.o, in an optimized build.

Or, hm, maybe I meant the size of the generated gram.c actually.

Either is worth looking at.




Re: remaining sql/json patches

2023-11-15 Thread Andres Freund
Hi,

Thanks, this looks like a substantial improvement. I don't quite have time to
look right now, but I thought I'd answer one question below.


On 2023-11-15 22:00:41 +0900, Amit Langote wrote:
> > This causes a nontrivial increase in the size of the parser (~5% in an
> > optimized build here), I wonder if we can do better.
> 
> Hmm, sorry if I sound ignorant but what do you mean by the parser here?

gram.o, in an optimized build.


> I can see that the byte-size of gram.o increases by 1.66% after the
> above additions  (1.72% with previous versions).

I'm not sure anymore how I measured it, but if you just looked at the total
file size, that might not show the full gain, because of debug symbols
etc. You can use the size command to look at just the code and data size.


> I've also checked
> using log_parser_stats that there isn't much slowdown in the
> raw-parsing speed.

What does "isn't much slowdown" mean in numbers?

Greetings,

Andres Freund




Potential use-after-free in partion related code

2023-11-15 Thread Andres Freund
Hi,

A while back I had proposed annotations for palloc() et al that let the
compiler know about which allocators pair with what freeing functions. One
thing that allows the compiler to do is to detect use after free.

One such complaint is:

../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c: In 
function ‘ATExecAttachPartition’:
../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c:18758:25:
 warning: pointer ‘partBoundConstraint’ may be used after ‘list_concat’ 
[-Wuse-after-free]
18758 | 
get_proposed_default_constraint(partBoundConstraint);
  | 
^~~~
../../../../../home/andres/src/postgresql/src/backend/commands/tablecmds.c:18711:26:
 note: call to ‘list_concat’ here
18711 | partConstraint = list_concat(partBoundConstraint,
  |  ^~~~
18712 |  
RelationGetPartitionQual(rel));
  |  
~~


And it seems quite right:

partConstraint = list_concat(partBoundConstraint,
 
RelationGetPartitionQual(rel));

At this point partBoundConstraint may not be used anymore, because
list_concat() might have reallocated.

But then a few lines later:

/* we already hold a lock on the default partition */
defaultrel = table_open(defaultPartOid, NoLock);
defPartConstraint =
get_proposed_default_constraint(partBoundConstraint);

We use partBoundConstraint again.

I unfortunately can't quickly enough identify what partConstraint,
defPartConstraint, partBoundConstraint are, so I don't don't really know what
the fix here is.

Greetings,

Andres Freund




Re: Explicitly skip TAP tests under Meson if disabled

2023-11-15 Thread Andres Freund
On 2023-11-15 11:02:19 +0100, Peter Eisentraut wrote:
> On 04.11.23 01:51, Andres Freund wrote:
> > I'd just use a single test() invocation here, and add an argument to 
> > testwrap
> > indicating that it should print out the skipped message. That way we a) 
> > don't
> > need two test() invocations, b) could still see the test name etc in the 
> > test
> > invocation.
> 
> Here is a patch that does it that way.

WFM!




Re: BUG #18097: Immutable expression not allowed in generated at

2023-11-15 Thread Aleksander Alekseev
Hi,

> True, but from the perspective of the affected code, the question is
> basically "did you call expression_planner() yet".  So I like this
> naming for that connection, whereas something based on "transformation"
> doesn't really connect to anything in existing function names.

Fair enough.

-- 
Best regards,
Aleksander Alekseev




Re: retire MemoryContextResetAndDeleteChildren backwards compatibility macro

2023-11-15 Thread Nathan Bossart
On Wed, Nov 15, 2023 at 09:27:18AM +0530, Amul Sul wrote:
> Nevermind, I usually use git apply or git am, here are those errors:
> 
> PG/ - (master) $ git apply ~/Downloads/retire_compatibility_macro_v1.patch
> error: patch failed: src/backend/access/brin/brin.c:297
> error: src/backend/access/brin/brin.c: patch does not apply

I wonder if your mail client is modifying the patch.  Do you have the same
issue if you download it from the archives?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: typo in fallback implementation for pg_atomic_test_set_flag()

2023-11-15 Thread Nathan Bossart
On Tue, Nov 14, 2023 at 07:17:32PM -0800, Andres Freund wrote:
> Are you planning to apply the fix?

Yes, I'll take care of it.

>> I'd ordinarily suggest removing this section of code since it doesn't seem
>> to have gotten much coverage
> 
> Which section precisely?

The lines below this:

/*
 * provide fallback for test_and_set using atomic_exchange if available
 */
#if !defined(PG_HAVE_ATOMIC_TEST_SET_FLAG) && 
defined(PG_HAVE_ATOMIC_EXCHANGE_U32)

but above this:

/*
 * provide fallback for test_and_set using atomic_compare_exchange if
 * available.
 */
#elif !defined(PG_HAVE_ATOMIC_TEST_SET_FLAG) && 
defined(PG_HAVE_ATOMIC_COMPARE_EXCHANGE_U32)

>> but I'm actually looking into adding some faster atomic-exchange
>> implementations that may activate this code for certain
>> compiler/architecture combinations.
> 
> Hm. I don't really see how adding a faster atomic-exchange implementation
> could trigger this implementation being used?

That'd define PG_HAVE_ATOMIC_EXCHANGE_U32, so this fallback might be used
if PG_HAVE_ATOMIC_TEST_SET_FLAG is not defined.  I haven't traced through
all the #ifdefs that lead to this point exhaustively, though, so perhaps
this is still unlikely.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: BUG #18097: Immutable expression not allowed in generated at

2023-11-15 Thread Tom Lane
Aleksander Alekseev  writes:
>>> Oh no! We encountered one of the most difficult problems in computer
>>> science [1].

>> Indeed :-(.  Looking at it again this morning, I'm thinking of
>> using "contain_mutable_functions_after_planning" --- what do you
>> think of that?

> It's better but creates an impression that the actual planning will be
> involved.

True, but from the perspective of the affected code, the question is
basically "did you call expression_planner() yet".  So I like this
naming for that connection, whereas something based on "transformation"
doesn't really connect to anything in existing function names.

regards, tom lane




Re: Fix documentation for pg_stat_statements JIT deform_counter

2023-11-15 Thread Julien Rouhaud
On Wed, Nov 15, 2023 at 01:53:13PM +0100, Daniel Gustafsson wrote:
> > On 11 Nov 2023, at 10:26, Julien Rouhaud  wrote:
>
> > I was adding support for the new pg_stat_statements JIT deform_counter in 
> > PoWA
> > when I realized that those were added after jit_generation_time in the
> > documentation while they're actually at the end of the view.
>
> Nice catch, that was indeed an omission in the original commit.  Thanks for 
> the
> patch, I'll apply that shortly.

Thanks!




Re: Some performance degradation in REL_16 vs REL_15

2023-11-15 Thread Tom Lane
"Anton A. Melnikov"  writes:
> I can't understand why i get the opposite results on my pc and on the server. 
> It is clear that the absolute
> TPS values will be different for various configurations. This is normal. But 
> differences?
> Is it unlikely that some kind of reference configuration is needed to 
> accurately
> measure the difference in performance. Probably something wrong with my pc, 
> but now
> i can not figure out what's wrong.

> Would be very grateful for any advice or comments to clarify this problem.

Benchmarking is hard :-(.  IME it's absolutely typical to see
variations of a couple of percent even when "nothing has changed",
for example after modifying some code that's nowhere near any
hot code path for the test case.  I usually attribute this to
cache effects, such as a couple of bits of hot code now sharing or
not sharing a cache line.  If you use two different compiler versions
then that situation is likely to occur all over the place even with
exactly the same source code.  NUMA creates huge reproducibility
problems too on multisocket machines (which your server is IIUC).
When I had a multisocket workstation I'd usually bind all the server
processes to one socket if I wanted more-or-less-repeatable numbers.

I wouldn't put a lot of faith in the idea that measured pgbench
differences of up to several percent are meaningful at all,
especially when comparing across different hardware and different
OS+compiler versions.  There are too many variables that have
little to do with the theoretical performance of the source code.

regards, tom lane




Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-11-15 Thread Alexander Korotkov
On Wed, Nov 15, 2023 at 8:02 AM Andres Freund  wrote:
> On 2023-11-14 19:14:57 +0800, Richard Guo wrote:
> > While working on BUG #18187 [1], I noticed that we also have issues with
> > how SJE replaces join clauses involving the removed rel.  As an example,
> > consider the query below, which would trigger an Assert.
> >
> > create table t (a int primary key, b int);
> >
> > explain (costs off)
> > select * from t t1
> >inner join t t2 on t1.a = t2.a
> > left join t t3 on t1.b > 1 and t1.b < 2;
> > server closed the connection unexpectedly
> >
> > The Assert failure happens in remove_self_join_rel() when we're trying
> > to remove t1.  The two join clauses of t1, 't1.b > 1' and 't1.b < 2',
> > share the same pointer of 'required_relids', which is {t1, t3} at first.
> > After we've performed replace_varno for the first clause, the
> > required_relids becomes {t2, t3}, which is no problem.  However, the
> > second clause's required_relids also becomes {t2, t3}, because they are
> > actually the same pointer.  So when we proceed with replace_varno on the
> > second clause, we'd trigger the Assert.
>
> Good catch.
>
>
> > Off the top of my head I'm thinking that we can fix this kind of issue
> > by bms_copying the bitmapset first before we make a substitution in
> > replace_relid(), like attached.
> >
> > Alternatively, we can revise distribute_qual_to_rels() as below so that
> > different RestrictInfos don't share the same pointer of required_relids.
>
> > --- a/src/backend/optimizer/plan/initsplan.c
> > +++ b/src/backend/optimizer/plan/initsplan.c
> > @@ -2385,7 +2385,7 @@ distribute_qual_to_rels(PlannerInfo *root, Node
> > *clause,
> >  * nonnullable-side rows failing the qual.
> >  */
> > Assert(ojscope);
> > -   relids = ojscope;
> > +   relids = bms_copy(ojscope);
> > Assert(!pseudoconstant);
> > }
> > else
> >
> > With this way, I'm worrying that there are other places where we should
> > avoid sharing the same pointer to Bitmapset structure.
>
> Indeed.
>
>
> > I'm not sure how to discover all these places.  Any thoughts?
>
> At the very least I think we should add a mode to bitmapset.c mode where
> every modification of a bitmapset reallocates, rather than just when the size
> actually changes. Because we only reallocte and free in relatively uncommon
> cases, particularly on 64bit systems, it's very easy to not find spots that
> continue to use the input pointer to one of the modifying bms functions.
>
> A very hacky implementation of that indeed catches this bug with the existing
> regression tests.
>
> The tests do *not* pass with just the attached applied, as the "Delete relid
> without substitution" path has the same issue. With that also copying and all
> the "reusing" bms* functions always reallocating, the tests pass - kinda.
>
>
> The kinda because there are callers to bms_(add|del)_members() that pass the
> same bms as a and b, which only works if the reallocation happens "late".

+1,
Neat idea. I'm willing to work on this. Will propose the patch soon.

--
Regards,
Alexander Korotkov




Re: Assert failure on 'list_member_ptr(rel->joininfo, restrictinfo)'

2023-11-15 Thread Alexander Korotkov
On Wed, Nov 15, 2023 at 8:04 AM Andres Freund  wrote:
>
> On 2023-11-14 14:42:13 +0200, Alexander Korotkov wrote:
> > It's possibly dumb option, but what about just removing the assert?
>
> That's not at all an option - the in-place bms_* functions can free their
> input. So a dangling pointer to the "old" version is a use-after-free waiting
> to happen - you just need a query that actually gets to bitmapsets that are a
> bit larger.

Yeah, now I got it, thank you.  I was under the wrong impression that
bitmapset has the level of indirection, so the pointer remains valid.
Now, I see that bitmapset manipulation functions can do free/repalloc
making the previous bitmapset pointer invalid.

--
Regards,
Alexander Korotkov




Re: Tab completion for CREATE TABLE ... AS

2023-11-15 Thread Gilles Darold

Le 15/11/2023 à 03:58, Michael Paquier a écrit :

On Thu, Nov 02, 2023 at 07:27:02PM +0300, Gilles Darold wrote:

Look like the tab completion for CREATE TABLE ... AS is not
proposed.

+   /* Complete CREATE TABLE  AS with list of keywords */
+   else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", 
MatchAny, "AS"))
+   COMPLETE_WITH("SELECT", "WITH");

There is a bit more than SELECT and WITH as possible query for a CTAS.
How about VALUES, TABLE or even EXECUTE (itself able to handle a
SELECT, TABLE or VALUES)?
--
Michael


Right, I don't know how I have missed the sql-createtableas page in the 
documentation.


Patched v2 fixes the keyword list, I have also sorted by alphabetical 
order the CREATE TABLE completion (AS was at the end of the list).


It has also been re-based on current master.

--
Gilles Darold
http://www.darold.net/
diff --git a/src/bin/psql/tab-complete.c b/src/bin/psql/tab-complete.c
index a1dfc11f6b..e6c88cf1a1 100644
--- a/src/bin/psql/tab-complete.c
+++ b/src/bin/psql/tab-complete.c
@@ -2501,11 +2501,15 @@ psql_completion(const char *text, int start, int end)
 	/* Complete CREATE TABLE  with '(', OF or PARTITION OF */
 	else if (TailMatches("CREATE", "TABLE", MatchAny) ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny))
-		COMPLETE_WITH("(", "OF", "PARTITION OF");
+		COMPLETE_WITH("(", "AS", "OF", "PARTITION OF");
 	/* Complete CREATE TABLE  OF with list of composite types */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "OF") ||
 			 TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "OF"))
 		COMPLETE_WITH_SCHEMA_QUERY(Query_for_list_of_composite_datatypes, NULL);
+	/* Complete CREATE TABLE  AS with list of keywords */
+	else if (TailMatches("CREATE", "TABLE", MatchAny, "AS") ||
+			TailMatches("CREATE", "TEMP|TEMPORARY|UNLOGGED", "TABLE", MatchAny, "AS"))
+		COMPLETE_WITH("EXECUTE", "SELECT", "TABLE", "VALUES", "WITH");
 	/* Complete CREATE TABLE name (...) with supported options */
 	else if (TailMatches("CREATE", "TABLE", MatchAny, "(*)") ||
 			 TailMatches("CREATE", "UNLOGGED", "TABLE", MatchAny, "(*)"))


Re: Allow tests to pass in OpenSSL FIPS mode

2023-11-15 Thread Daniel Gustafsson
> On 15 Nov 2023, at 12:44, Peter Eisentraut  wrote:
> 
> On 15.11.23 00:07, Tom Lane wrote:
>> I'm more concerned about the 3DES situation.  Fedora might be a bit
>> ahead of the curve here, but according to the link above, everybody is
>> supposed to be in compliance by the end of 2023.  So I'd be inclined
>> to guess that the 3DES-is-rejected case is going to be mainstream
>> before v17 ships.
> 
> Right.  It is curious that I have not found any activity in the OpenSSL issue 
> trackers about this.  But if you send me your results file, then I can 
> include it in the patch as an alternative expected.

As NIST SP800-131A allows decryption with 3DES and DES I dont think OpenSSL
will do much other than move it to the legacy module where it can be used
opt-in like DES.  SKIPJACK is already disallowed since before but is still
tested with decryption during FIPS validation.

Using an alternative resultsfile to handle platforms which explicitly removes
disallowed ciphers seem like the right choice.

Since the 3DES/DES deprecations aren't limited to FIPS, do we want to do
anything for pgcrypto where we have DES/3DES encryption?  Maybe a doc patch
which mentions the deprecation with a link to the SP could be in order?

--
Daniel Gustafsson





Re: [PATCH] pgbench log file headers

2023-11-15 Thread Adam Hendel
Hello


On Mon, Nov 13, 2023 at 6:01 PM Andres Freund  wrote:

> Hi,
>
> On 2023-11-13 11:55:07 -0600, Adam Hendel wrote:
> > Currently, pgbench will log individual transactions to a logfile when the
> > `--log` parameter flag is provided. The logfile, however, does not
> include
> > column header. It has become a fairly standard expectation of users to
> have
> > column headers present in flat files. Without the header in the pgbench
> log
> > files, new users must navigate to the docs and piece together the column
> > headers themselves. Most industry leading frameworks have tooling built
> in
> > to read column headers though, for example python/pandas read_csv().
>
> The disadvantage of doing that is that a single pgbench run with --log will
> generate many files when using -j, to avoid contention. The easiest way to
> deal with that after the run is to cat all the log files to a larger one.
> If
> you do that with headers present, you obviously have a few bogus rows (the
> heades from the various files).
>
> I guess you could avoid the "worst" of that by documenting that the
> combined
> log file should be built by
>   cat {$log_prefix}.${pid} {$log_prefix}.${pid}.*
> and only outputting the header in the file generated by the main thread.
>
>
> We can improve the experience for users by adding column headers to
> pgbench
> > logfiles with an optional command line flag, `--log-header`. This will
> keep
> > the API backwards compatible by making users opt-in to the column
> headers.
> > It follows the existing pattern of having conditional flags in pgbench’s
> > API; the `--log` option would have both –log-prefix and –log-header if
> this
> > work is accepted.
>
> > The implementation considers the column headers only when the
> > `--log-header` flag is present. The values for the columns are taken
> > directly from the “Per-Transaction Logging” section in
> > https://www.postgresql.org/docs/current/pgbench.html and takes into
> account
> > the conditional columns `schedule_lag` and `retries`.
> >
> >
> > Below is an example of what that logfile will look like:
> >
> >
> > pgbench  postgres://postgres:postgres@localhost:5432/postgres --log
> > --log-header
> >
> > client_id transaction_no time script_no time_epoch time_us
>
> Independent of your patch, but we imo ought to combine time_epoch time_us
> in
> the log output. There's no point in forcing consumers to combine those
> fields,
> and it makes logging more expensive...  And if we touch this, we should
> just
> switch to outputting nanoseconds instead of microseconds.
>
> It also is quite odd that we have "time" and "time_epoch", "time_us", where
> time is the elapsed time of an individual "transaction" and time_epoch +
> time_us together are a wall-clock timestamp.  Without differentiating
> between
> those, the column headers seem not very useful, because one needs to look
> in
> the documentation to understand the fields.


> I don't think there's all that strong a need for backward compatibility in
> pgbench. We could just change the columns as I suggest above and then
> always
> emit the header in the "main" log file.
>
>
Do you think this should be done in separate patches?

First patch: log the column headers in the "main" log file. No --log-header
flag, make it the default behavior of --log.

Next patch: collapse "time_epoch" and "time_us" in the log output and give
the "time" column a name that is more clear.

Adam


> Greetings,
>
> Andres Freund
>


Re: trying again to get incremental backup

2023-11-15 Thread Jakub Wartak
Hi Robert,

[..spotted the v9 patchset..]

so I've spent some time playing still with patchset v8 (without the
6/6 testing patch related to wal_level=minimal), with the exception of
- patchset v9 - marked otherwise.

1. On compile time there were 2 warnings to shadowing variable (at
least with gcc version 10.2.1), but on v9 that is fixed:

blkreftable.c: In function ‘WriteBlockRefTable’:
blkreftable.c:520:24: warning: declaration of ‘brtentry’ shadows a
previous local [-Wshadow=compatible-local]
walsummarizer.c: In function ‘SummarizeWAL’:
walsummarizer.c:833:36: warning: declaration of ‘private_data’ shadows
a previous local [-Wshadow=compatible-local]

2. Usability thing: I hit the timeout hard: "This backup requires WAL
to be summarized up to 0/9D8, but summarizer has only reached
0/0." with summarize_wal=off (default) but apparently this in TODO.
Looks like an important usability thing.

3. I've verified that if the DB was in wal_level=minimal even
temporarily (and thus summarization was disabled) it is impossible to
take incremental backup:

pg_basebackup: error: could not initiate base backup: ERROR:  WAL
summaries are required on timeline 1 from 0/7D8 to 0/1028, but
the summaries for that timeline and LSN range are incomplete
DETAIL:  The first unsummarized LSN is this range is 0/D04AE88.

4. As we have discussed off list, there's is (was) this
pg_combinebackup bug in v8's reconstruct_from_incremental_file() where
it was unable to realize that - in case of combining multiple
incremental backups - it should stop looking for the previous instance
of the full file (while it was fine with v6 of the patchset). I've
checked it on v9 - it is good now.

5. On v8 i've finally played a little bit with standby(s) and this
patchset with couple of basic scenarios while mixing source of the
backups:

a. full on standby, incr1 on standby, full db restore (incl. incr1) on standby
# sometimes i'm getting spurious error like those when doing
incrementals on standby with -c fast :
2023-11-15 13:49:05.721 CET [10573] LOG:  recovery restart point
at 0/A28
2023-11-15 13:49:07.591 CET [10597] WARNING:  aborting backup due
to backend exiting before pg_backup_stop was called
2023-11-15 13:49:07.591 CET [10597] ERROR:  manifest requires WAL
from final timeline 1 ending at 0/AF8, but this backup starts at
0/A28
2023-11-15 13:49:07.591 CET [10597] STATEMENT:  BASE_BACKUP (
INCREMENTAL,  LABEL 'pg_basebackup base backup',  PROGRESS,
CHECKPOINT 'fast',  WAIT 0,  MANIFEST 'yes',  TARGET 'client')
# when you retry the same pg_basebackup it goes fine (looks like
CHECKPOINT on standby/restartpoint <-> summarizer disconnect, I'll dig
deeper tomorrow. It seems that issuing "CHECKPOINT; pg_sleep(1);"
against primary just before pg_basebackup --incr on standby
workarounds it)

b. full on primary, incr1 on standby, full db restore (incl. incr1) on
standby # WORKS
c. full on standby, incr1 on standby, full db restore (incl. incr1) on
primary # WORKS*
d. full on primary, incr1 on standby, full db restore (incl. incr1) on
primary # WORKS*

* - needs pg_promote() due to the controlfile having standby bit +
potential fiddling with postgresql.auto.conf as it is having
primary_connstring GUC.

6. Sci-fi-mode-on: I was wondering about the dangers of e.g. having
more recent pg_basebackup (e.g. from pg18 one day) running against
pg17 in the scope of having this incremental backups possibility. Is
it going to be safe? (currently there seems to be no safeguards
against such use) or should those things (core, pg_basebackup) should
be running in version lock step?

Regards,
-J.




Re: Fix documentation for pg_stat_statements JIT deform_counter

2023-11-15 Thread Daniel Gustafsson
> On 11 Nov 2023, at 10:26, Julien Rouhaud  wrote:

> I was adding support for the new pg_stat_statements JIT deform_counter in PoWA
> when I realized that those were added after jit_generation_time in the
> documentation while they're actually at the end of the view.

Nice catch, that was indeed an omission in the original commit.  Thanks for the
patch, I'll apply that shortly.

--
Daniel Gustafsson





Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-15 Thread Amul Sul
On Wed, Nov 15, 2023 at 5:09 PM Peter Eisentraut 
wrote:

> On 14.11.23 11:40, Amul Sul wrote:
> > Please have a look at the attached version, updating the syntax to have
> "AS"
> > after EXPRESSION and other changes suggested previously.
>
> The code structure looks good to me now.
>

Thank you for your review.


>
> Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know if
> it's right or wrong, but if you have a specific reason, it would be good
> to know.
>

I referred to ALTER COLUMN DEFAULT and used that.



> I think ATExecSetExpression() needs to lock pg_attribute?  Did you lose
> that during the refactoring?
>

I have removed that intentionally since we were not updating anything in
pg_attribute like ALTER DROP EXPRESSION.


>
> Tiny comment: The error message in ATExecSetExpression() does not need
> to mention "stored", since it would be also applicable to virtual
> generated columns in the future.
>

I had to have the same thought, but later decided when we do that
virtual column thing, we could simply change that. I am fine to do that
change
now as well, let me know your thought.


> Documentation additions in alter_table.sgml should use one-space indent
> consistently.  Also, "This form replaces expression" is missing a "the"?
>

Ok, will fix that.

Regards,
Amul


Re: RFC: Pluggable TOAST

2023-11-15 Thread Matthias van de Meent
On Tue, 14 Nov 2023, 14:12 Nikita Malakhov,  wrote:
>
> Hi!
>
> Matthias, regarding your message above, I have a question to ask.
> On typed TOAST implementations - we thought that TOAST method used
> for storing data could depend not only on data type, but on the flow or 
> workload,
> like out bytea appendable toaster which is much (hundreds of times) faster on
> update compared to regular procedure. That was one of ideas behind the
> Pluggable TOAST - we can choose the most suitable TOAST implementation
> available.
>
> If we have a single TOAST entry point for data type - then we should have
> some means to control it or choose a TOAST method suitable to our needs.
> Or should not?

I'm not sure my interpretation of the question is correct, but I'll
assume it's "would you want something like STORAGE
[plain/external/...] for controlling type-specific toast operations?".

I don't see many reasons why we'd need a system to disable (some of)
those features, with the only one being "the workload is mostly
read-only of the full attributes, so any performance overhead of
type-aware detoasting is not worth the temporary space savings during
updates". So, while I do think there would be good reasons for typed
toasting to be disabled, I don't see a good reason for only specific
parts of type-specific toasting to be disabled (no reason for 'disable
the append optimization for bytea, but not the splice optimization').

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Allow tests to pass in OpenSSL FIPS mode

2023-11-15 Thread Peter Eisentraut

On 15.11.23 00:07, Tom Lane wrote:

I'm more concerned about the 3DES situation.  Fedora might be a bit
ahead of the curve here, but according to the link above, everybody is
supposed to be in compliance by the end of 2023.  So I'd be inclined
to guess that the 3DES-is-rejected case is going to be mainstream
before v17 ships.


Right.  It is curious that I have not found any activity in the OpenSSL 
issue trackers about this.  But if you send me your results file, then I 
can include it in the patch as an alternative expected.






Re: BUG #18097: Immutable expression not allowed in generated at

2023-11-15 Thread Aleksander Alekseev
Hi,

> > Oh no! We encountered one of the most difficult problems in computer
> > science [1].
>
> Indeed :-(.  Looking at it again this morning, I'm thinking of
> using "contain_mutable_functions_after_planning" --- what do you
> think of that?

It's better but creates an impression that the actual planning will be
involved. According to the comments for expression_planner():

```
 * Currently, we disallow sublinks in standalone expressions, so there's no
 * real "planning" involved here.  (That might not always be true though.)
```

I'm not very well familiar with the part of code responsible for
planning, but I find this inconsistency confusing.

Since the code is written for people to be read and is read more often
than written personally I believe that longer and more descriptive
names are better. Something like
contain_mutable_functions_after_planner_transformations(). This being
said, in practice one should read the comments to learn about corner
cases, pre- and postconditions anyway, so maybe it's not that a big
deal. I think of contain_mutable_functions_after_transformations() as
a good compromise between the length and descriptiveness.

-- 
Best regards,
Aleksander Alekseev




Re: ALTER COLUMN ... SET EXPRESSION to alter stored generated column's expression

2023-11-15 Thread Peter Eisentraut

On 14.11.23 11:40, Amul Sul wrote:

Please have a look at the attached version, updating the syntax to have "AS"
after EXPRESSION and other changes suggested previously.


The code structure looks good to me now.

Question: Why are you using AT_PASS_ADD_OTHERCONSTR?  I don't know if 
it's right or wrong, but if you have a specific reason, it would be good 
to know.


I think ATExecSetExpression() needs to lock pg_attribute?  Did you lose 
that during the refactoring?


Tiny comment: The error message in ATExecSetExpression() does not need 
to mention "stored", since it would be also applicable to virtual 
generated columns in the future.


Documentation additions in alter_table.sgml should use one-space indent 
consistently.  Also, "This form replaces expression" is missing a "the"?






Re: MERGE ... RETURNING

2023-11-15 Thread Dean Rasheed
On Mon, 13 Nov 2023 at 05:29, jian he  wrote:
>
> v13 works fine. all tests passed. The code is very intuitive. played
> with multi WHEN clauses, even with before/after row triggers, work as
> expected.
>

Thanks for the review and testing!

> I don't know when replace_outer_merging will be invoked. even set a
> breakpoint on it. coverage shows replace_outer_merging only called
> once.
>

It's used when MERGING() is used in a subquery in the RETURNING list.
The MergingFunc node in the subquery is replaced by a Param node,
referring to the outer MERGE query, so that the result from MERGING()
is available in the SELECT subquery (under any other circumstances,
you're not allowed to use MERGING() in a SELECT). This is similar to
what happens when a subquery contains an aggregate over columns from
an outer query only -- for example, see:

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate,aggregate%20belongs%20to.

https://github.com/postgres/postgres/commit/e649796f128bd8702ba5744d36f4e8cb81f0b754

A MERGING() expression in a subquery in the RETURNING list is
analogous, in that it belongs to the outer MERGE query, not the SELECT
subquery.

> sql-merge.html miss mentioned RETURNING need select columns privilege?
> in sql-insert.html, we have:
> "Use of the RETURNING clause requires SELECT privilege on all columns
> mentioned in RETURNING. If you use the query clause to insert rows
> from a query, you of course need to have SELECT privilege on any table
> or column used in the query."
>

Ah, good point. I don't think I looked at the privileges paragraph on
the MERGE page. Currently it says:

   You will require the SELECT privilege on the data_source
   and any column(s) of the target_table_name referred to in a
   condition.

Being pedantic, there are 2 problems with that:

1. It might be taken to imply that you need the SELECT privilege on
every column of the data_source, which isn't the case.

2. It mentions conditions, but not expressions (such as those that can
appear in INSERT and UPDATE actions).

A more accurate statement would be:

   You will require the SELECT privilege and any column(s)
   of the data_source and target_table_name referred to in any
   condition or expression.

which is also consistent with the wording used on the UPDATE manual page.

Done that way, I don't think it would need to be updated to mention
RETURNING, because RETURNING just returns a list of expressions.
Again, that would be consistent with the UPDATE page, which doesn't
mention RETURNING in its discussion of privileges.

> I saw the change in src/sgml/glossary.sgml, So i looked around. in the
> "Materialized view (relation)" part. "It cannot be modified via
> INSERT, UPDATE, or DELETE operations.". Do we need to put "MERGE" into
> that sentence?
> also there is SELECT, INSERT, UPDATE, DELETE,  do we need to add a
> MERGE entry in glossary.sgml?

Yes, that makes sense.

Attached is a separate patch with those doc updates, intended to be
applied and back-patched independently of the main RETURNING patch.

Regards,
Dean


merge-docs.patch.no-cfbot
Description: Binary data


Re: pg_basebackup check vs Windows file path limits

2023-11-15 Thread Alvaro Herrera
On 2023-Nov-13, Andrew Dunstan wrote:

> > size?
> > 
> > https://learn.microsoft.com/en-us/windows/win32/api/ioapiset/nf-ioapiset-deviceiocontrol
> 
> Hmm, here's what that page says - I can't see it saying what you're
> suggesting here - am I missing something?:

I don't think so.  I think I just confused myself.  Reading the docs it
appears that other Windows APIs work as I described, but not this one.

Anyway, after looking at it a bit more, I realized that this code uses
MAX_PATH as basis for its buffer's length limit -- and apparently on
Windows that's only 260, much shorter than MAXPGPATH (1024) which our
own code uses to limit the buffers given to readlink().  So maybe fixing
this is just a matter of doing s/MAX_PATH/MAXPGPATH/ in dirmod.c.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/




Re: Adding facility for injection points (or probe points?) for more advanced tests

2023-11-15 Thread Alvaro Herrera
On 2023-Nov-15, Michael Paquier wrote:

> On Tue, Nov 14, 2023 at 02:11:50PM +0100, Alvaro Herrera wrote:

> > You named the hash table InjectionPointHashByName, which seems weird.
> > Is there any *other* way to locate an injection point that is not by
> > name?
> 
> I am not sure what you mean here.  Names are kind of the most
> portable and simplest thing I could think of.

Oh, I think you're overthinking what my comment was.  I was saying, just
name it "InjectionPointsHash".  Since there appears to be no room for
another hash table for injection points, then there's no need to specify
that this one is the ByName hash.  I couldn't think of any other way to
organize the injection points either.

> > In this patch, injection points are instance-wide (because the hash
> > table is in shmem).
> 
> Yes, still not something that's required in the core APIs or an
> initial batch.

I agree that we can do the easy thing first and build it up later.  I
just hope we don't get too wedded on the current interface because of
lack of time in the current release that we get stuck with it.

> I am not sure that it is a good idea to enforce a specific conditional
> logic in the backend core code.

Agreed, let's get more experience on what other types of tests people
want to build, and how are things going to interact with each other.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No necesitamos banderas
 No reconocemos fronteras"  (Jorge González)




Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Alvaro Herrera
Translation-wise, this doesn't work, because you're building a string.
There's no reason to think that the words "logical" and "physical"
should stay untranslated; the message would make no sense, or at least
would be very ugly.

You should do something like

if (am_walsender)
{
ereport(log_replication_commands ? LOG : DEBUG1,
SlotIsLogical(s) ? errmsg("acquired logical replication slot 
\"%s\"", NameStr(s->data.name)) :
errmsg("acquired physical replication slot \"%s\"", 
NameStr(s->data.name)));
}

(Obviously, lose the "translator:" comments since they are unnecessary)


If you really want to keep the "logical"/"physical" word untranslated,
you need to split it out of the sentence somehow.  But it would be
really horrible IMO.  Like

errmsg("acquired replication slot \"%s\" of type \"%s\"",
   NameStr(s->data.name), SlotIsLogical(s) ? "logical" : "physical")

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Use it up, wear it out, make it do, or do without"




On non-Windows, hard depend on uselocale(3)

2023-11-15 Thread Tristan Partin
I have been working on adding using thread-safe locale APIs within 
Postgres where appropriate[0]. The patch that I originally submitted 
crashed during initdb (whoops!), so I worked on fixing the crash, which 
led me to having to touch some code in chklocale.c, which became 
a frustrating experience because chklocale.c is compiled in 3 different 
configurations.



pgport_variants = {
  '_srv': internal_lib_args + {
'dependencies': [backend_port_code],
  },
  '': default_lib_args + {
'dependencies': [frontend_port_code],
  },
  '_shlib': default_lib_args + {
'pic': true,
'dependencies': [frontend_port_code],
  },
}


This means that some APIs I added or changed in pg_locale.c, can't be 
used without conditional compilation depending on what variant is being 
compiled. Additionally, I also have conditional compilation based on 
HAVE_USELOCALE and WIN32.


I would like to propose removing HAVE_USELOCALE, and just have WIN32, 
which means that Postgres would require uselocale(3) on anything that 
isn't WIN32.


[0]: https://www.postgresql.org/message-id/cwmw5ozbwj10.1yflqwsue5...@neon.tech

--
Tristan Partin
Neon (https://neon.tech)




Re: Remove MSVC scripts from the tree

2023-11-15 Thread Peter Eisentraut

On 15.11.23 05:49, Michael Paquier wrote:

Attached is a v4.


I'm happy with that.

(Note, however, that your rebase didn't pick up commits e7814b40d0 and 
b41b1a7f49 that I did yesterday.  Please check that again.)






Re: Fix some memory leaks in ecpg.addons

2023-11-15 Thread Tristan Partin

On Wed Nov 8, 2023 at 11:52 AM CST, Tom Lane wrote:

"Tristan Partin"  writes:
> On Wed Nov 8, 2023 at 11:18 AM CST, Michael Meskes wrote:
>> Agreed, it's not exactly uncommon for tools like ecpg to not worry
>> about memory. After all it gets freed when the program ends.

> In the default configuration of AddressSanitizer, I can't even complete 
> a full build of Postgres.


Why is the meson stuff building ecpg test cases as part of the core build?
That seems wrong for a number of reasons, not only that we don't hold
that code to the same standards as the core server.


After looking into this a tiny bit more, we are building the 
dependencies of the ecpg tests.



foreach pgc_file : pgc_files
  exe_input = custom_target('@0@.c'.format(pgc_file),
input: '@0@.pgc'.format(pgc_file),
output: '@BASENAME@.c',
command: ecpg_preproc_test_command_start +
  ['-C', 'ORACLE',] +
  ecpg_preproc_test_command_end,
install: false,
build_by_default: false,
kwargs: exe_preproc_kw,
  )

  ecpg_test_dependencies += executable(pgc_file,
exe_input,
kwargs: ecpg_test_exec_kw,
  )
endforeach


This is the pattern that we have in all the ecpg/test/*/meson.build 
files. That ecpg_test_dependencies variable is then used in the actual 
ecpg tests:



tests += {
  'name': 'ecpg',
  'sd': meson.current_source_dir(),
  'bd': meson.current_build_dir(),
  'ecpg': {
'expecteddir': meson.current_source_dir(),
'inputdir': meson.current_build_dir(),
'schedule': ecpg_test_files,
'sql': [
  'sql/twophase',
],
'test_kwargs': {
  'depends': ecpg_test_dependencies,
},
'dbname': 'ecpg1_regression,ecpg2_regression',
'regress_args': ecpg_regress_args,
  },
}


So in my opinion there is nothing wrong here. The build is working as 
intended. Does this make sense to you, Tom?


--
Tristan Partin
Neon (https://neon.tech)




Re: ResourceOwner refactoring

2023-11-15 Thread Heikki Linnakangas

On 13/11/2023 01:08, Thomas Munro wrote:

On Mon, Nov 13, 2023 at 11:16 AM Heikki Linnakangas  wrote:

On 11/11/2023 14:00, Alexander Lakhin wrote:

10.11.2023 17:26, Heikki Linnakangas wrote:

I think that is surprising behavior from the DSA facility. When you make 
allocations with dsa_allocate() or just call
dsa_get_address() on an existing dsa_pointer, you wouldn't expect the current 
resource owner to matter. I think
dsa_create/attach() should store the current resource owner in the dsa_area, 
for use in subsequent operations on the
DSA, per attached patch (0002-Fix-dsa.c-with-different-resource-owners.patch).


Yeah, I agree that it is surprising and dangerous that the DSM
segments can have different owners if you're not careful, and it's not
clear that you have to be.  Interesting that no one ever reported
breakage in parallel query due to this thinko, presumably because the
current resource owner always turns out to be either the same one or
something with longer life.


Yep. I ran check-world with an extra assertion that 
"CurrentResourceOwner == area->resowner || area->resowner == NULL" to 
verify that. No failures.



With the patch 0002 applied, I'm observing another anomaly:


Ok, so the ownership of a dsa was still muddled :-(. Attached is a new
version that goes a little further. It replaces the whole
'mapping_pinned' flag in dsa_area with the 'resowner'. When a mapping is
pinned, it means that 'resowner == NULL'. This is now similar to how the
resowner field and pinning works in dsm.c.


This patch makes sense to me.

It might be tempting to delete the dsa_pin_mapping() interface
completely and say that if CurrentResourceOwner == NULL, then it's
effectively (what we used to call) pinned, but I think it's useful for
exception-safe construction of multiple objects to be able to start
out with a resource owner and then later 'commit' by clearing it.  As
seen in GetSessionDsmHandle().


Yeah that's useful. I don't like the "pinned mapping" term here. It's 
confusing because we also have dsa_pin(), which means something 
different: the area will continue to exist even after all backends have 
detached from it. I think we should rename 'dsa_pin_mapping()' to 
'dsa_forget_resowner()' or something like that.


I pushed these fixes, without that renaming. Let's do that as a separate 
patch if we like that.


I didn't backpatch this for now, because we don't seem to have a live 
bug in back branches. You could argue for backpatching because this 
could bite us in the future if we backpatch something else that uses 
dsa's, or if there are extensions using it. We can do that later after 
we've had this in 'master' for a while.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Allow tests to pass in OpenSSL FIPS mode

2023-11-15 Thread Daniel Gustafsson
> On 15 Nov 2023, at 00:07, Tom Lane  wrote:

> (In reality, people running FIPS mode are probably pretty
> accustomed to seeing this error, so maybe it's not worth the
> trouble to improve it.)

In my experience this holds a lot of truth, this is a common error pattern and
while all improvements to error messages are good, it's not a reason to hold
off this patch.

--
Daniel Gustafsson





Re: Explicitly skip TAP tests under Meson if disabled

2023-11-15 Thread Peter Eisentraut

On 04.11.23 01:51, Andres Freund wrote:

I'd just use a single test() invocation here, and add an argument to testwrap
indicating that it should print out the skipped message. That way we a) don't
need two test() invocations, b) could still see the test name etc in the test
invocation.


Here is a patch that does it that way.
From d58e65a71d2fab40ab22f047999efe06d96d8688 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 15 Nov 2023 11:00:49 +0100
Subject: [PATCH v2] Explicitly skip TAP tests under Meson if disabled

If the tap_tests option is disabled under Meson, the TAP tests are
currently not registered at all.  But this makes it harder to see what
is going one, why suddently there are fewer tests than before.

Instead, run testwrap with an option that marks the test as skipped.
That way, the total list and count of tests is constant whether the
option is enabled or not.
---
 meson.build| 5 +++--
 src/tools/testwrap | 5 +
 2 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/meson.build b/meson.build
index 47c8fcdc53..286d7e4269 100644
--- a/meson.build
+++ b/meson.build
@@ -3252,8 +3252,9 @@ foreach test_dir : tests
 
   testport += 1
 elif kind == 'tap'
+  testwrap_tap = testwrap_base
   if not tap_tests_enabled
-continue
+testwrap_tap += ['--skip', 'TAP tests not enabled']
   endif
 
   test_command = [
@@ -3293,7 +3294,7 @@ foreach test_dir : tests
 test(test_dir['name'] / onetap_p,
   python,
   kwargs: test_kwargs,
-  args: testwrap_base + [
+  args: testwrap_tap + [
 '--testgroup', test_dir['name'],
 '--testname', onetap_p,
 '--', test_command,
diff --git a/src/tools/testwrap b/src/tools/testwrap
index 7a64fe76a2..d01e61051c 100755
--- a/src/tools/testwrap
+++ b/src/tools/testwrap
@@ -12,6 +12,7 @@ parser.add_argument('--srcdir', help='source directory of 
test', type=str)
 parser.add_argument('--basedir', help='base directory of test', type=str)
 parser.add_argument('--testgroup', help='test group', type=str)
 parser.add_argument('--testname', help='test name', type=str)
+parser.add_argument('--skip', help='skip test (with reason)', type=str)
 parser.add_argument('test_command', nargs='*')
 
 args = parser.parse_args()
@@ -23,6 +24,10 @@ print('# executing test in {} group {} test {}'.format(
 testdir, args.testgroup, args.testname))
 sys.stdout.flush()
 
+if args.skip is not None:
+print('1..0 # Skipped: ' + args.skip)
+sys.exit(0)
+
 if os.path.exists(testdir) and os.path.isdir(testdir):
 shutil.rmtree(testdir)
 os.makedirs(testdir)
-- 
2.42.1



Re: add log messages when replication slots become active and inactive (was Re: Is it worth adding ReplicationSlot active_pid to ReplicationSlotPersistentData?)

2023-11-15 Thread Amit Kapila
On Wed, Nov 15, 2023 at 11:00 AM Bharath Rupireddy
 wrote:
>
> PSA v15 patch.
>

The patch looks good to me. I have slightly modified the translator
message and commit message in the attached. I'll push this tomorrow
unless there are any comments.

-- 
With Regards,
Amit Kapila.


v16-0001-Log-messages-for-replication-slot-acquisition-an.patch
Description: Binary data


Re: Some performance degradation in REL_16 vs REL_15

2023-11-15 Thread Anton A. Melnikov

On 30.10.2023 22:51, Andres Freund wrote:


There's really no point in comparing peformance with assertions enabled
(leaving aside assertions that cause extreme performance difference, making
development harder). We very well might have added assertions making things
more expensive, without affecting performance in optimized/non-assert builds.



Thanks for advice! I repeated measurements on my pc without asserts and 
CFLAGS="-O2".
Also i reduced the number of clients to -c6 to leave a reserve of two cores
from my 8-core cpu and used -j6 accordingly.

The results were similar: on my pc REL_10_STABLE(c18c12c9) was faster than 
REL_16_STABLE(07494a0d)
but the effect became weaker:
 REL_10_STABLE gives ~965+-15 TPS(+-2%) while REL_16_STABLE gives ~920+-30 
TPS(+-3%) in the test: pgbench -s8 -c6 -T20 -j6
So 10 is faster than 16 by ~5%. (see raw-my-pc.txt attached for the raw data)

Then, thanks to my colleagues, i carried out similar measurements on the more 
powerful 24-core standalone server.
The REL_10_STABLE gives 8260+-100 TPS(+-1%) while REL_16_STABLE gives 8580+-90 
TPS(+-1%) in the same test: pgbench -s8 -c6 -T20 -j6
The test gave an opposite result!
On that server the 16 is faster than 10 by ~4%.

When i scaled the test on server to get the same reserve of two cores, the 
results became like this:
REL_10_STABLE gives ~16000+-300 TPS(+-2%) while REL_16_STABLE gives ~18500+-200 
TPS(+-1%) in the scaled test: pgbench -s24 -c22 -T20 -j22
Here the difference is more noticeable: 16 is faster than 10 by ~15%. 
(raw-server.txt)

The configure options and test scripts on my pc and server were the same:
export CFLAGS="-O2"
./configure --enable-debug --with-perl --with-icu --enable-depend 
--enable-tap-tests
#reinstall
#reinitdb
#create database bench
for ((i=0; i<100; i++)); do pgbench -U postgres -i -s8 bench> /dev/null 2>&1;
psql -U postgres -d bench -c "checkpoint"; RES=$(pgbench -U postgres -c6 -T20 
-j6 bench;

Configurations:
my pc:  8-core AMD Ryzen 7 4700U @ 1.4GHz, 64GB RAM, NVMe M.2 SSD drive.
Linux 5.15.0-88-generic #98~20.04.1-Ubuntu SMP Mon Oct 9 16:43:45 UTC 2023 
x86_64 x86_64 x86_64 GNU/Linux
server: 2x 12-hyperthreaded cores Intel(R) Xeon(R) CPU X5675 @ 3.07GHz, 24GB 
RAM, RAID from SSD drives.
Linux 5.10.0-21-amd64 #1 SMP Debian 5.10.162-1 (2023-01-21) x86_64 GNU/Linux

I can't understand why i get the opposite results on my pc and on the server. 
It is clear that the absolute
TPS values will be different for various configurations. This is normal. But 
differences?
Is it unlikely that some kind of reference configuration is needed to accurately
measure the difference in performance. Probably something wrong with my pc, but 
now
i can not figure out what's wrong.

Would be very grateful for any advice or comments to clarify this problem.

With the best wishes!

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

REL_10_STABLE c18c12c983a84d55e58b176969782c7ffac3272b
pgbench -s8 -c6 -T20 -j6
940.47198
954.621585
902.319686
965.387517
959.44536
970.882218
922.012141
969.642272
964.549628
935.639076
958.835093
976.912892
975.618375
960.599515
981.900039
973.34447
964.563699
960.321335
962.643262
975.631214
971.78315
965.226256
961.106572
968.520002
973.825485
978.49579
963.863368
973.906058
966.676175
965.186708
954.572371
977.620229
981.419347
962.751969
963.676599
967.966257
974.68403
955.342462
957.832817
984.065968
972.364891
977.489316
957.352265
969.463156
974.320994
949.679765
969.081674
963.190554
962.394456
966.84177
975.840044
954.471689
977.764019
968.67597
963.203923
964.752374
965.957151
979.17749
915.412491
975.120789
962.105916
980.343235
957.180492
953.552183
979.783099
967.906392
966.926945
962.962301
965.53471
971.030289
954.21045
961.266889
973.367193
956.736464
980.317352
911.188865
979.274233
980.267316
982.029926
977.731543
937.327052
978.161778
978.575841
962.661776
914.896072
966.902901
973.539272
980.418576
966.073472
963.196341
962.718863
977.062467
958.303102
959.937588
959.52382
934.876632
971.899844
979.71
964.154208
960.051284

REL_16_STABLE 07494a0df9a66219e5f1029de47ecabc34c9cb55
pgbench -s8 -c6 -T20 -j6
952.061203
905.964458
921.009294
921.970342
924.810464
935.988344
917.110599
925.62075
933.423024
923.445651
932.740532
927.994569
913.773152
922.955946
917.680486
923.145074
925.133017
922.36253
907.656249
927.980182
924.249294
933.355461
923.359649
919.694726
923.178731
929.250348
921.643735
916.546247
930.960814
913.333819
773.157522
945.293028
924.839608
932.228796
912.846096
924.01411
924.341422
909.823505
882.105606
920.337305
930.297982
909.238148
880.839643
939.582115
927.263785
927.921499
932.897521
931.77316
943.261293
853.433421
921.813303
916.463003
919.652647
914.662188
912.137913
923.279822
922.967526
936.344334
946.281347
801.718759
950.571673
928.845848
888.181388
885.603875
939.763546
896.841216
934.904546
929.369005
884.065433
874.953048
933.411683
930.654935
952.833611
942.193108
930.491705
93