date:20231004

Re: Modernize const handling with readline

2023-10-04 Thread Peter Eisentraut


On 04.10.23 17:09, Aleksander Alekseev wrote:

Hi,


On 03.10.23 13:28, Aleksander Alekseev wrote:

While examining the code for similar places I noticed that the
following functions can also be const'ified:

- crc32_sz


I suppose this could be changed.


OK, that's a simple change. Here is the patch.


committed this and my patch

Re: Extract numeric filed in JSONB more effectively

2023-10-04 Thread Andy Fan

Hi,

I am feeling this topic has been well discussed and the only pending
issues are below,  it would be great that any committer can have a
look at these,  so I mark this entry as "Ready for Committer".

Things are not addressed yet:
> 1.  the error message handling.
>

You can check [1] for more background of this,  I think blocking this
feature at an error message level is not pretty reasonable.


> 2.  if we have chances to optimize _tz functions, I guess no.
>

patch 002 is dedicated  for this,  I think it should not be committed,
the reason is described in the commit message.

3.  function naming issue. I think I can get it modified once after
> all the other issues are addressed.
>
>
[1]
https://www.postgresql.org/message-id/d70280648894e56f9f0d12c75090c3d8%40anastigmatix.net


-- 
Best Regards
Andy Fan

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-04 Thread Drouvot, Bertrand


Hi,

On 10/5/23 7:10 AM, Michael Paquier wrote:

On Wed, Oct 04, 2023 at 12:54:24PM +0200, Drouvot, Bertrand wrote:

Except the Nit that I mentioned in 0001, that looks all good to me (with the
new wait in 001_worker_spi.pl).


Thanks, I've applied the refactoring, including in it the stuff to be
able to control the flags used when launching a dynamic worker.  0001
also needed some indenting.  You'll notice that the diffs in
worker_spi are minimal now.  worker_spi_main is no more, as an effect
of..  Cough..  c8e318b1b.


Thanks!


+# An error message should be issued.
+my $nb_errors = 0;
+for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+{
+   if ($node->log_contains(
+   "role \"nologrole\" is not permitted to log in", 
$log_start))
+   {
+   $nb_errors = 1;
+   last;
+   }
+   usleep(100_000);
+}

This can be switched to use $node->wait_for_log, making the test
simpler.  No need for Time::HiRes, either.



Oh, thanks, did not know about $node->wait_for_log, good to know!


-extern void BackgroundWorkerInitializeConnection(const char *dbname,
const char *username, uint32 flags);
+extern void BackgroundWorkerInitializeConnection(const char *dbname,
const char *username, bits32 flags);
[...]
-BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid,
uint32 flags)
+BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid,
bits32 flags)

That's changing signatures just for the sake of breaking them.  I
would leave that alone, I guess..


Ok, switched back to uint32 in v6-0002 attached (v6-0001 is your v5-0001
unchanged).



@@ -719,6 +719,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
 const char *username, Oid useroid,
 bool load_session_libraries,
 bool override_allow_connections,
+bool bypass_login_check,
 char *out_dbname)

I was not paying much attention here, but load_session_libraries gives
a good argument in favor of switching all these booleans to a single
bits32 argument as that would make three of them now, with a different
set of flags than the bgworker ones.  This can be refactored on its
own.


Yeah good point, will work on it once the current one is committed.



-#define BGWORKER_BYPASS_ALLOWCONN 1
+#define BGWORKER_BYPASS_ALLOWCONN 0x0001

Can be a change of its own as well.


Yeah, but I think it's simple enough to just keep this change here
(and I don't think it's "really" needed without introducing 0x0002)



While looking at the refactoring at worker_spi.  I've noticed that it
would be simple and cheap to have some coverage for BYPASS_ALLOWCONN,
something we've never done since this has been introduced.  Let me
suggest 0001 to add some coverage.


Good idea! I looked at 0001 and it looks ok to me.



0002 is your own patch, with the tests simplified a bit.


Thanks, LGTM.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 5ebe8aa15f22851ae7771137f58366ea9aa21087 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 5 Oct 2023 14:06:37 +0900
Subject: [PATCH v6 2/2] Allow background workers to bypass login check

This adds a new flag value (BGWORKER_BYPASS_ROLELOGINCHECK) to the flags
being used in BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().  This flag allows the
background workers to bypass the login check done in
InitializeSessionUserId().
---
 doc/src/sgml/bgworker.sgml|  3 +-
 src/backend/bootstrap/bootstrap.c |  2 +-
 src/backend/postmaster/autovacuum.c   |  5 +--
 src/backend/postmaster/postmaster.c   |  2 +
 src/backend/tcop/postgres.c   |  1 +
 src/backend/utils/init/miscinit.c |  4 +-
 src/backend/utils/init/postinit.c |  5 ++-
 src/include/miscadmin.h   |  4 +-
 src/include/postmaster/bgworker.h |  6 ++-
 .../modules/worker_spi/t/001_worker_spi.pl| 37 +++
 src/test/modules/worker_spi/worker_spi.c  |  2 +
 11 files changed, 59 insertions(+), 12 deletions(-)
   9.2% doc/src/sgml/
  10.7% src/backend/postmaster/
  15.1% src/backend/utils/init/
   9.5% src/include/postmaster/
  47.4% src/test/modules/worker_spi/t/
   3.1% src/test/modules/worker_spi/
   4.8% src/

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 9ad1146ba0..7335c02ed8 100644
--- a/doc/src/sgml/bgworker.sgml
+++ b/doc/src/sgml/bgworker.sgml
@@ -200,7 +200,8 @@ typedef struct BackgroundWorker
InvalidOid, the process will run as the superuser created
during initdb. If 
BGWORKER_BYPASS_ALLOWCONN
is specified as flags it is possible to bypass the 
restriction
-   to connect to databases not allowing user connections.
+   to connect to databases not allowing user connections. If 
BGWORKER_BYP

Re: pg16: XX000: could not find pathkey item to sort

2023-10-04 Thread David Rowley

On Tue, 3 Oct 2023 at 20:16, David Rowley  wrote:
> I wonder if the attached patch is too much of a special case fix.  I
> guess from the lack of complaints previously that there are no other
> cases where we could possibly have pathkeys that belong to columns
> that are aggregated.  I've not gone to much effort to see if I can
> craft a case that hits this without the ORDER BY/DISTINCT aggregate
> optimisation, however.

I spent more time on this today.  I'd been wondering if there was any
reason why create_agg_path() would receive a subpath with pathkeys
that were anything but the PlannerInfo's group_pathkeys.  I mean, how
could we do Group Aggregate if it wasn't?  I wondered if grouping sets
might change that, but it seems the group_pathkeys will be set to the
initial grouping set.

Given that, it would seem it's safe to just trim off any pathkey that
was added to the group_pathkeys by
adjust_group_pathkeys_for_groupagg().
PlannerInfo.num_groupby_pathkeys marks the number of pathkeys that
existed in group_pathkeys before adjust_group_pathkeys_for_groupagg()
made any additions, so we can just trim the list length back to that.

I've done this in the attached patch.   I also considered if it was
worth adding a regression test for this and I concluded that there are
better ways to test for this and considered if we should add some code
to createplan.c to check that all Path pathkeys have corresponding
items in the PathTarget.  I've included an additional patch which adds
some code in USE_ASSERT_CHECKING builds to verify this.  Without the
fix it's simple enough to trigger this with a query such as:

select two,count(distinct four) from tenk1 group by two order by two;

Without the fix the additional asserts cause the regression tests to
fail, but with the fix everything passes.

Justin's case is quite an obscure way to hit this as it requires
partitionwise aggregation plus a single partition so that the Append
is removed due to only having a single subplan in setrefs.c.  If there
had been 2 partitions, then the AppendPath wouldn't have inherited the
subpath's pathkeys per code at the end of create_append_path().

So in short, I propose the attached fix without any regression tests
because I feel that any regression test would just mark that there was
a big in create_agg_path() and not really help with ensuring we don't
end up with some similar problem in the future.

I have some concerns that the assert_pathkeys_in_target() function
might be a little heavyweight for USE_ASSERT_CHECKING builds. So I'm
not proposing to commit that without further discussion.

Does anyone feel differently?

If not, I plan to push the attached
strip_aggregate_pathkeys_from_aggpaths_v2.patch early next week.

David

strip_aggregated_pathkeys_from_aggpaths_v2.patch
Description: Binary data

assert_pathkeys_in_target.patch
Description: Binary data

Re: [PoC/RFC] Multiple passwords, interval expirations

2023-10-04 Thread Gurjeet Singh

I had an idea to simplify this feature/patch and after some validation
in internal discussions, I am posting the new approach here. I'd
appreciate any feedback and comments.

To begin with, the feature we are chasing is to make it convenient for
the users to rollover their passwords. Currently there is no easy way
to rollover passwords without increasing the risk of an application
outage. After a password change, the users/admins have to rush to
change the password in all locations where it is stored. There is a
window of time where if the application password is not changed to the
new one, and the application tries to connect/reconnect for any
reason, the application will fail authentication and lead to an outage.

I personally haven't seen any attempts by any
application/driver/framework to solve this problem in the wild, so
following is just me theorizing how one may solve this problem on the
application side; there may be other ways in which others may solve
this problem. The application may be written in such a way that upon
password authentication failure, it tries again with a second
password. The application's configuration file (or environment
variables) may allow specifying 2 passwords at the same time, and the
application will keep trying these 2 passwords alternatively until it
succeeds or the user restarts it with a new configuration. With such a
logic in place in their application, the users may first change the
configuration of all the instances of the application to hold the new
password along with the old/current working password, and only then
change the password in the database. This way, in the event of an
application instance start/restart either the old password will
succeed, or the new password will.

There may be other ways to solve this problem, but I can't imagine any
of those ways to be convenient and straightforward. At least not as
convenient as it can be if the database itself allowed for storing
both the passwords, and honored both passwords at the same time, while
allowing to associate a separate validity period with each of the
passwords.

The patches posted in this thread so far attempt to add the ability to
allow the user to have an arbitrary number of passwords. I believe
that allowing arbitrary number of passwords is not only unnecessary,
but the need to name passwords, the need to store them in a shared
catalog, etc. may actually create problems in the field. The
users/admins will have to choose names for passwords, which they
didn't have to previously. The need to name them may also lead to
users storing password-hints in the password names (e.g. 'mom''s
birthday', 'ex''s phone number', 'third password'), rendering the
passwords weak.

Moreover, allowing an arbitrarily many number of passwords will
require us to provide additional infrastructure to solve problems like
observability (which passwords are currently in use, and which ones
have been effectively forgotten by applications), or create a nuisance
for admins that can create more problems than it solves.

So I propose that the feature should allow no more than 2 passwords
for a role, each with their own validity periods. This eliminates the
need to name passwords, because at any given time there are no more
than 2 passwords; current one, and old one. This also eliminates the
need for a shared catalog to hold passwords, because with the limit of
2 imposed, we can store the old password and its validity period in
additional columns in the pg_authid table.

The patches so far also add a notion of max validity period of
passwords, which only a superuser can override. I believe this is a
useful feature, but that feature can be dealt with separately,
independent of password rollover feature. So in the newer patches I
will not include the relevant GUC and code.

With the above being said, following is the user interface I can think
of that can allow for various operations that users may need to
perform to rollover their passwords. The 'ADD PASSWORD' and 'ALL
PASSWORD' are additions to the grammar. rololdpassword and
rololdvaliduntil will be new columns in pg_authid that will hold the
old password and its valid-until value.

In essence, we create a stack that can hold 2 passwords. Pushing an
element when it's full will make it forget the bottom element. Popping
the stack makes it forget the top element, and the only remaining
element, if any, becomes the top.

-- Create a user, as usual
CREATE ROLE u1 PASSWORD 'p1' VALID UNTIL '2020/01/01';

-- Add another password that the user can use for authentication. This moves
-- the 'p1' password hash and its valid-until value to rololdpassword and
-- rololdvaliduntil, respectively.
ALTER ROLE u1 ADD PASSWORD 'p2' VALID UNTIL '2021/01/01';

-- Change the password 'p2's (current password's) validity
ALTER ROLE u1 VALID UNTIL '2022/01/01';
-- Note that currently I don't have a proposal for how to change the old
-- password's validity period, without deleting the latest/main passwo

Re: Making aggregate deserialization (and WAL receive) functions slightly faster

2023-10-04 Thread Michael Paquier

On Wed, Oct 04, 2023 at 07:47:11PM +1300, David Rowley wrote:
> The original patch had a new function in stringinfo.c which allowed a
> StringInfoData to be initialised from an existing string with some
> given length.  Tom wasn't a fan of that because there wasn't any
> protection against someone trying to use the given StringInfoData and
> then calling appendStringInfo to append another string. That can't be
> done in this case as we can't repalloc the VARDATA_ANY(state) pointer
> due to it not pointing directly to a palloc'd chunk.  Tom's complaint
> seemed to be about having a reusable function which could be abused,
> so I modified the patch to remove the reusable code.  I think your
> macro idea in stringinfo.h would put the patch in the same position as
> it was initially.

Ahem, well.  Based on this argument my own argument does not hold
much.  Perhaps I'd still use a macro at the top of array_userfuncs.c
and numeric.c, to avoid repeating the same pattern respectively two
and four times, documenting once on top of both macros that this is a
fake StringInfo because of the reasons documented in these code paths.
--
Michael


signature.asc
Description: PGP signature

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-04 Thread Michael Paquier

On Wed, Oct 04, 2023 at 12:54:24PM +0200, Drouvot, Bertrand wrote:
> Except the Nit that I mentioned in 0001, that looks all good to me (with the
> new wait in 001_worker_spi.pl).

Thanks, I've applied the refactoring, including in it the stuff to be
able to control the flags used when launching a dynamic worker.  0001
also needed some indenting.  You'll notice that the diffs in
worker_spi are minimal now.  worker_spi_main is no more, as an effect
of..  Cough..  c8e318b1b.

+# An error message should be issued.
+my $nb_errors = 0;
+for (my $i = 0; $i < 10 * $PostgreSQL::Test::Utils::timeout_default; $i++)
+{
+   if ($node->log_contains(
+   "role \"nologrole\" is not permitted to log in", 
$log_start))
+   {
+   $nb_errors = 1;
+   last;
+   }
+   usleep(100_000);
+}

This can be switched to use $node->wait_for_log, making the test
simpler.  No need for Time::HiRes, either.

-extern void BackgroundWorkerInitializeConnection(const char *dbname,
const char *username, uint32 flags);
+extern void BackgroundWorkerInitializeConnection(const char *dbname,
const char *username, bits32 flags);
[...]
-BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid,
uint32 flags) 
+BackgroundWorkerInitializeConnectionByOid(Oid dboid, Oid useroid,
bits32 flags)

That's changing signatures just for the sake of breaking them.  I
would leave that alone, I guess..

@@ -719,6 +719,7 @@ InitPostgres(const char *in_dbname, Oid dboid,
 const char *username, Oid useroid,
 bool load_session_libraries,
 bool override_allow_connections,
+bool bypass_login_check,
 char *out_dbname)

I was not paying much attention here, but load_session_libraries gives
a good argument in favor of switching all these booleans to a single
bits32 argument as that would make three of them now, with a different
set of flags than the bgworker ones.  This can be refactored on its
own.

-#define BGWORKER_BYPASS_ALLOWCONN 1
+#define BGWORKER_BYPASS_ALLOWCONN 0x0001

Can be a change of its own as well.

While looking at the refactoring at worker_spi.  I've noticed that it
would be simple and cheap to have some coverage for BYPASS_ALLOWCONN,
something we've never done since this has been introduced.  Let me
suggest 0001 to add some coverage.

0002 is your own patch, with the tests simplified a bit.
--
Michael
From fe8373992d2e2083d53e75ecd954ba2b71e454a1 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 5 Oct 2023 13:02:14 +0900
Subject: [PATCH v5 1/2] worker_spi: Add tests for BGWORKER_BYPASS_ALLOWCONN

---
 .../modules/worker_spi/t/001_worker_spi.pl| 25 +++
 1 file changed, 25 insertions(+)

diff --git a/src/test/modules/worker_spi/t/001_worker_spi.pl b/src/test/modules/worker_spi/t/001_worker_spi.pl
index 4b46b1336b..7e5a4b1402 100644
--- a/src/test/modules/worker_spi/t/001_worker_spi.pl
+++ b/src/test/modules/worker_spi/t/001_worker_spi.pl
@@ -104,4 +104,29 @@ postgres|myrole|WorkerSpiMain]),
 	'dynamic bgworkers all launched'
 ) or die "Timed out while waiting for dynamic bgworkers to be launched";
 
+# Check BGWORKER_BYPASS_ALLOWCONN.
+$node->safe_psql('postgres', q(ALTER DATABASE mydb ALLOW_CONNECTIONS false;));
+my $log_offset = -s $node->logfile;
+
+# bgworker cannot be launched with connection restriction.
+my $worker3_pid = $node->safe_psql('postgres',
+	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id);]);
+$node->wait_for_log(
+	qr/database "mydb" is not currently accepting connections/, $log_offset);
+
+$result = $node->safe_psql('postgres',
+	"SELECT count(*) FROM pg_stat_activity WHERE pid = $worker3_pid;");
+is($result, '0', 'dynamic bgworker without BYPASS_ALLOWCONN not started');
+
+# bgworker bypasses the connection check, and can be launched.
+my $worker4_pid = $node->safe_psql('postgres',
+	qq[SELECT worker_spi_launch(12, $mydb_id, $myrole_id, '{"ALLOWCONN"}');]);
+ok( $node->poll_query_until(
+		'postgres',
+		qq[SELECT datname, usename, wait_event FROM pg_stat_activity
+WHERE backend_type = 'worker_spi dynamic' AND
+pid IN ($worker4_pid) ORDER BY datname;],
+		qq[mydb|myrole|WorkerSpiMain]),
+	'dynamic bgworker with BYPASS_ALLOWCONN started');
+
 done_testing();
-- 
2.42.0

From 59b1e8827c20a3e801e64b3b089efb3d06d71f42 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Thu, 5 Oct 2023 14:06:37 +0900
Subject: [PATCH v5 2/2] Allow background workers to bypass login check

This adds a new flag value (BGWORKER_BYPASS_ROLELOGINCHECK) to the flags
being used in BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().  This flag allows the
background workers to bypass the login check done in
InitializeSessionUserId().
---
 src/include/miscadmin.h   |  4 +-
 src/include/postmaster/bgworker.h | 10 +++--
 src/backend/bootstrap/bootstrap.c

post-recovery amcheck expectations

2023-10-04 Thread Noah Misch

Suppose we start with this nbtree (subset of a diagram from verify_nbtree.c):

 *   1
 *   /   \
 *2 <-> 3

We're deleting 2, the leftmost leaf under a leftmost internal page.  After the
MARK_PAGE_HALFDEAD record, the first downlink from 1 will lead to 3, which
still has a btpo_prev pointing to 2.  bt_index_parent_check() complains here:

/* The first page we visit at the level should be leftmost */
if (first && !BlockNumberIsValid(state->prevrightlink) && 
!P_LEFTMOST(opaque))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
 errmsg("the first child of leftmost 
target page is not leftmost of its level in index \"%s\"",

RelationGetRelationName(state->rel)),
 errdetail_internal("Target block=%u 
child block=%u target page lsn=%X/%X.",

state->targetblock, blkno,

LSN_FORMAT_ARGS(state->targetlsn;

One can encounter this if recovery ends between a MARK_PAGE_HALFDEAD record
and its corresponding UNLINK_PAGE record.  See the attached test case.  The
index is actually fine in such a state, right?  I lean toward fixing this by
having amcheck scan left; if left links reach only half-dead or deleted pages,
that's as good as the present child block being P_LEFTMOST.  There's a
different error from bt_index_check(), and I've not yet studied how to fix
that:

  ERROR:  left link/right link pair in index "not_leftmost_pk" not in agreement
  DETAIL:  Block=0 left block=0 left link from block=4294967295.

Alternatively, one could view this as a need for the user to VACUUM between
recovery and amcheck.  The documentation could direct users to "VACUUM
(DISABLE_PAGE_SKIPPING off, INDEX_CLEANUP on, TRUNCATE off)" if not done since
last recovery.  Does anyone prefer that or some other alternative?

For some other amcheck expectations, the comments suggest reliance on the
bt_index_parent_check() ShareLock.  I haven't tried to make test cases for
them, but perhaps recovery can trick them the same way.  Examples:

  errmsg("downlink or sibling link points to deleted block in index \"%s\"",
  errmsg("block %u is not leftmost in index \"%s\"",
  errmsg("block %u is not true root in index \"%s\"",

Thanks,
nm
Author: Noah Misch 
Commit: Noah Misch 



diff --git a/contrib/amcheck/Makefile b/contrib/amcheck/Makefile
index b82f221..9a7f4f7 100644
--- a/contrib/amcheck/Makefile
+++ b/contrib/amcheck/Makefile
@@ -13,6 +13,7 @@ PGFILEDESC = "amcheck - function for verifying relation 
integrity"
 REGRESS = check check_btree check_heap
 
 TAP_TESTS = 1
+EXTRA_INSTALL=contrib/pg_walinspect
 
 ifdef USE_PGXS
 PG_CONFIG = pg_config
diff --git a/contrib/amcheck/t/004_pitr.pl b/contrib/amcheck/t/004_pitr.pl
new file mode 100644
index 000..ec6d87e
--- /dev/null
+++ b/contrib/amcheck/t/004_pitr.pl
@@ -0,0 +1,65 @@
+
+# Copyright (c) 2021-2023, PostgreSQL Global Development Group
+
+# Test integrity of intermediate states by PITR to those states
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# origin node: generate WAL records of interest.
+my $origin = PostgreSQL::Test::Cluster->new('origin');
+$origin->init(has_archiving => 1, allows_streaming => 1);
+$origin->append_conf('postgresql.conf', 'autovacuum = off');
+$origin->start;
+$origin->backup('my_backup');
+# Create a table with each of 6 PK values spanning 1/4 of a block.  Delete the
+# first four, so one index leaf is eligible for deletion.  Make a replication
+# slot just so pg_walinspect will always have access to later WAL.
+my $setup = safe_psql('postgres', "SELECT pg_current_wal_lsn()");
+# VACUUM to delete the aforementioned leaf page.  Find the LSN of that
+# UNLINK_PAGE record.
+my $exec = stop;
+
+# replica node: amcheck at notable points in the WAL stream
+my $replica = PostgreSQL::Test::Cluster->new('replica');
+$replica->init_from_backup($origin, 'my_backup', has_restoring => 1);
+$replica->append_conf('postgresql.conf',
+   "recovery_target_lsn = '$unlink_lsn'");
+$replica->append_conf('postgresql.conf', 'recovery_target_inclusive = off');
+$replica->append_conf('postgresql.conf', 'recovery_target_action = promote');
+$replica->start;
+$replica->poll_query_until('postgres', "SELECT pg_is_in_recovery() = 'f';")
+  or die "Timed out while waiting for PITR promotion";
+# recovery done; run amcheck
+is( $replica->psql(
+   'postgres', "SELECT bt_index_parent_check('not_leftmost_pk', 
true)"),
+   0);
+is( $replica->psql(
+   'postgre

Re: Rethink the wait event names for postgres_fdw, dblink and etc

2023-10-04 Thread Michael Paquier

On Wed, Oct 04, 2023 at 05:19:40PM +0900, Michael Paquier wrote:
> I am lacking a bit of time now, but I have applied the bits for
> test_shm_mq and worker_spi.  Note that I have not added tests for
> test_shm_mq as it may be possible that the two events (for the
> bgworker startup and for a message to be queued) are never reached
> depending on the timing.  I'll handle the rest tomorrow, with likely
> some adjustments to the tests.  (I may as well just remove them, this
> API is already covered by worker_spi.)

After sleeping on it, I've taken the decision to remove the tests.  As
far as I have tested, this was stable, but this does not really
improve the test coverage as WaitEventExtensionNew() is covered in
worker_spi.  I have done tweaks to the docs and the variable names,
and applied that into its own commit.

Note as well that the docs of dblink were wrong for DblinkGetConnect:
the wait event could be seen in other functions than dblink() and
dblink_exec().
--
Michael


signature.asc
Description: PGP signature

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2023-10-04 Thread James Coleman

On Wed, Oct 4, 2023 at 9:42 AM Robert Haas  wrote:
>
> On Wed, Oct 4, 2023 at 9:36 AM James Coleman  wrote:
> > Are you thinking we should simply elide the fact that there is pruning
> > that happens outside of HOT? Or add that information onto the HOT
> > page, even though it doesn't directly fit?
>
> I think we should elide it. Maybe with a much larger rewrite there
> would be a good place to include that information, but with the
> current structure, the page is about why HOT is good, and talking
> about pruning that can happen apart from HOT doesn't advance that
> message.

All right, attached is a v3 which attempts to fix the wrong
information with an economy of words. I may at some point submit a
separate patch that adds a broader pruning section, but this at least
brings the docs inline with reality insofar as they address it.

James


v3-0001-Correct-HOT-docs-to-account-for-LP_REDIRECT.patch
Description: Binary data

Re: make add_paths_to_append_rel aware of startup cost

2023-10-04 Thread Andy Fan

On Wed, Oct 4, 2023 at 8:41 AM David Rowley  wrote:

> On Sun, 1 Oct 2023 at 21:26, Andy Fan  wrote:
> >> But overall, I'm more inclined to just go with the more simple "add a
> >> cheap unordered startup append path if considering cheap startup
> >> plans" version. I see your latest patch does both. So, I'd suggest two
> >> patches as I do see the merit in keeping this simple and cheap.  If we
> >> can get the first part in and you still find cases where you're not
> >> getting the most appropriate startup plan based on the tuple fraction,
> >> then we can reconsider what extra complexity we should endure in the
> >> code based on the example query where we've demonstrated the planner
> >> is not choosing the best startup path appropriate to the given tuple
> >> fraction.
> >
> > I think this is a fair point,  I agree that your first part is good
> enough to be
> > committed first.   Actually I tried a lot to make a test case which can
> prove
> > the value of cheapest fractional cost but no gain so far:(
>
> I've attached a patch with the same code as the previous patch but
> this time including a regression test.
>
> I see no reason to not commit this so if anyone feels differently
> please let me know.
>
> David
>

Patch LGTM.

-- 
Best Regards
Andy Fan

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Isaac Morland

On Wed, 4 Oct 2023 at 17:37, Jeff Davis  wrote:

> On Wed, 2023-10-04 at 14:14 -0400, Isaac Morland wrote:
> > Always store only UTF-8 in the database
>
> What problem does that solve? I don't see our encoding support as a big
> source of problems, given that database-wide UTF-8 already works fine.
> In fact, some postgres features only work with UTF-8.
>

My idea is in the context of a suggestion that we support specifying the
encoding per column. I don't mean to suggest eliminating the ability to set
a server-wide encoding, although I doubt there is any use case for using
anything other than UTF-8 except for an old database that hasn’t been
converted yet.

I see no reason to write different strings using different encodings in the
data files, depending on what column they belong to. The various text types
are all abstract data types which store sequences of characters (not
bytes); if one wants bytes, then one has to encode them. Of course, if one
wants UTF-8 bytes, then the encoding is, under the covers, the identity
function, but conceptually it is still taking the characters stored in the
database and converting them to bytes according to a specific encoding.

By contrast, although I don’t see it as a top-priority use case, I can
imagine somebody wanting to restrict the characters stored in a particular
column to characters that can be encoded in a particular encoding. That is
what "CHARACTER SET LATIN1" and so on should mean.

> What about characters not in UTF-8?
>
> Honestly I'm not clear on this topic. Are the "private use" areas in
> unicode enough to cover use cases for characters not recognized by
> unicode? Which encodings in postgres can represent characters that
> can't be automatically transcoded (without failure) to unicode?
>

Here I’m just anticipating a hypothetical objection, “what about characters
that can’t be represented in UTF-8?” to my suggestion to always use UTF-8
and I’m saying we shouldn’t care about them. I believe the answers to your
questions in this paragraph are “yes”, and “none”.

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Nico Williams

On Wed, Oct 04, 2023 at 04:01:26PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 16:15 -0500, Nico Williams wrote:
> > Better that than TEXT blobs w/ the encoding given by the `CREATE
> > DATABASE` or `initdb` default!
> 
> From an engineering perspective, yes, per-column encodings would be
> more flexible. But I still don't understand who exactly would use that,
> and why.

Say you have a bunch of text files in different encodings for reasons
(historical).  And now say you want to store them in a database so you
can index them and search them.  Sure, you could use a filesystem, but
you want an RDBMS.  Well, the answer to this is "convert all those files
to UTF-8".

> It would take an awful lot of effort to implement and make the code
> more complex, so we'd really need to see some serious demand for that.

Yes, it's better to just use UTF-8.

The DB could implement conversions to/from other codesets and encodings
for clients that insist on it.  Why would clients insist anyways?
Better to do the conversions at the clients.

In the middle its best to just have Unicode, and specifically UTF-8,
then push all conversions to the edges of the system.

Nico
--

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Jeff Davis

On Wed, 2023-10-04 at 16:15 -0500, Nico Williams wrote:
> Better that than TEXT blobs w/ the encoding given by the `CREATE
> DATABASE` or `initdb` default!

>From an engineering perspective, yes, per-column encodings would be
more flexible. But I still don't understand who exactly would use that,
and why.

It would take an awful lot of effort to implement and make the code
more complex, so we'd really need to see some serious demand for that.

Regards,
Jeff Davis

Re: Add annotation syntax to pg_hba.conf entries

2023-10-04 Thread Tom Lane

Robert Haas  writes:
> You're probably not going to like this answer very much, but this
> doesn't seem particularly worthwhile to me.

Yeah, I was unconvinced about the number of use-cases too.
As you say, some support from other potential users could convince
me otherwise, but right now the evidence seems thin.

> The argument for this
> feature is not that this information needs to exist, but that it needs
> to be queryable from within PostgreSQL.

Not only that, but that it needs to be accessible via the
pg_hba_file_rules view.  Superusers could already see the
pg_hba file's contents via pg_read_file().

Again, that's not an argument that this is a bad idea.
But it's an answer that would likely satisfy some fraction
of whatever potential users are out there, which makes the
question of how many use-cases really exist even more
pressing.

regards, tom lane

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Nico Williams

On Wed, Oct 04, 2023 at 05:32:50PM -0400, Chapman Flack wrote:
> Well, for what reason does anybody run PG now with the encoding set
> to anything besides UTF-8? I don't really have my finger on that pulse.

Because they still have databases that didn't use UTF-8 10 or 20 years
ago that they haven't migrated to UTF-8?

It's harder to think of why one might _want_ to store text in any
encoding other than UTF-8 for _new_ databases.

Though too there's no reason that it should be impossible other than
lack of developer interest: as long as text is tagged with its encoding,
it should be possible to store text in any number of encodings.

> Could it be that it bloats common strings in their local script, and
> with enough of those to store, it could matter to use the local
> encoding that stores them more economically?

UTF-8 bloat is not likely worth the trouble.  UTF-8 is only clearly
bloaty when compared to encodings with 1-byte code units, like
ISO-8859-*.  For CJK UTF-8 is not much more bloaty than native
non-Unicode encodings like SHIFT_JIS.

UTF-8 is not much bloatier than UTF-16 in general either.

Bloat is not really a good reason to avoid Unicode or any specific TF.

> Also, while any Unicode transfer format can encode any Unicode code
> point, I'm unsure whether it's yet the case that {any Unicode code
> point} is a superset of every character repertoire associated with
> every non-Unicode encoding.

It's not always been the case that Unicode is a strict superset of all
currently-in-use human scripts.  Making Unicode a strict superset of all
currently-in-use human scripts seems to be the Unicode Consortium's aim.

I think you're asking why not just use UTF-8 for everything, all the
time.  It's a fair question.  I don't have a reason to answer in the
negative (maybe someone else does).  But that doesn't mean that one
couldn't want to store text in many encodings (e.g., for historical
reasons).

Nico
--

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Jeff Davis

On Wed, 2023-10-04 at 14:14 -0400, Isaac Morland wrote:
> Always store only UTF-8 in the database

What problem does that solve? I don't see our encoding support as a big
source of problems, given that database-wide UTF-8 already works fine.
In fact, some postgres features only work with UTF-8.

I agree that we shouldn't add a bunch of bookkeeping and type system
support for per-column encodings without a clear use case, because that
would have a cost. But right now it's just a database-wide thing.

I don't see encodings as a major area to solve problems or innovate. At
the end of the day, encodings have little semantic significance, and
therefore limited upside and limited downside. Collations and
normalization get more interesting, but those are happening at a higher
layer than the encoding.

> What about characters not in UTF-8?

Honestly I'm not clear on this topic. Are the "private use" areas in
unicode enough to cover use cases for characters not recognized by
unicode? Which encodings in postgres can represent characters that
can't be automatically transcoded (without failure) to unicode?

Obviously if we have some kind of unicode-based type, it would only
work with encodings that are a subset of unicode.

Regards,
Jeff Davis

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Chapman Flack


On 2023-10-04 16:38, Jeff Davis wrote:

On Wed, 2023-10-04 at 14:02 -0400, Chapman Flack wrote:

The SQL standard would have me able to:

CREATE TABLE foo (
   a CHARACTER VARYING CHARACTER SET UTF8,
   b CHARACTER VARYING CHARACTER SET LATIN1
)

and so on


Is there a use case for that? UTF-8 is able to encode any unicode code
point, it's relatively compact, and it's backwards-compatible with 7-
bit ASCII. If you have a variety of text data in your system (and in
many cases even if not), then UTF-8 seems like the right solution.


Well, for what reason does anybody run PG now with the encoding set
to anything besides UTF-8? I don't really have my finger on that pulse.
Could it be that it bloats common strings in their local script, and
with enough of those to store, it could matter to use the local
encoding that stores them more economically?

Also, while any Unicode transfer format can encode any Unicode code
point, I'm unsure whether it's yet the case that {any Unicode code
point} is a superset of every character repertoire associated with
every non-Unicode encoding.

The cheap glaring counterexample is SQL_ASCII. Half those code points
are *nobody knows what Unicode character* (or even *whether*). I'm not
insisting that's a good thing, but it is a thing.

It might be a very tidy future to say all text is Unicode and all
server encodings are UTF-8, but I'm not sure it wouldn't still
be a good step on the way to be able to store some things in
their own encodings. We have JSON and XML now, two data types
that are *formally defined* to accept any Unicode content, and
we hedge and mumble and say (well, as long as it goes in the
server encoding) and that makes me sad. Things like that should
be easy to handle even without declaring UTF-8 as a server-wide
encoding ... they already are their own distinct data types, and
could conceivably know their own encodings.

But there again, it's possible that going with unconditional
UTF-8 for JSON or XML documents could, in some regions, bloat them.

Regards,
-Chap

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Nico Williams

On Wed, Oct 04, 2023 at 01:38:15PM -0700, Jeff Davis wrote:
> On Wed, 2023-10-04 at 14:02 -0400, Chapman Flack wrote:
> > The SQL standard would have me able to:
> > 
> > [...]
> > _UTF8'Hello, world!' and _LATIN1'Hello, world!'
> 
> Is there a use case for that? UTF-8 is able to encode any unicode code
> point, it's relatively compact, and it's backwards-compatible with 7-
> bit ASCII. If you have a variety of text data in your system (and in
> many cases even if not), then UTF-8 seems like the right solution.
> 
> Text data encoded 17 different ways requires a lot of bookkeeping in
> the type system, and it also requires injecting a bunch of fallible
> transcoding operators around just to compare strings.

Better that than TEXT blobs w/ the encoding given by the `CREATE
DATABASE` or `initdb` default!

It'd be a lot _less_ fragile to have all text tagged with an encoding
(indirectly, via its type which then denotes the encoding).

That would be a lot of work, but starting with just a UTF-8 text type
would be an improvement.

Nico
--

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread Chapman Flack


On 2023-10-04 12:19, Jim Jones wrote:

On 04.10.23 11:39, vignesh C wrote:

1) Why the default option was chosen without comments shouldn't it be
the other way round?
I'm not sure it is the way to go. The main idea is to check if two 
documents have the same content, and comments might be different even 
if the contents of two documents are identical. What are your concerns 
regarding this default behaviour?


I hope I'm not butting in, but I too would be leery of any default
behavior that's going to say thing1 and thing2 are the same thing
but ignoring (name part of thing here). If that's the comparison
I mean to make, and it's as easy as CANONICAL WITHOUT COMMENTS
to say that's what I mean, I'd be happy to write that. It also means
that the next person reading my code will know "oh, he means
'same' in *that* way", without having to think about it.

Regards,
-Chap

Re: Opportunistically pruning page before update

2023-10-04 Thread James Coleman

On Tue, Sep 26, 2023 at 8:30 AM James Coleman  wrote:
>
> On Tue, Sep 5, 2023 at 1:40 PM Melanie Plageman
>  wrote:
> >
> > On Wed, Jun 21, 2023 at 8:51 AM James Coleman  wrote:
> > > While at PGCon I was chatting with Andres (and I think Peter G. and a
> > > few others who I can't remember at the moment, apologies) and Andres
> > > noted that while we opportunistically prune a page when inserting a
> > > tuple (before deciding we need a new page) we don't do the same for
> > > updates.
> > >
> > > Attached is a patch series to do the following:
> > >
> > > 0001: Make it possible to call heap_page_prune_opt already holding an
> > > exclusive lock on the buffer.
> > > 0002: Opportunistically prune pages on update when the current tuple's
> > > page has no free space. If this frees up enough space, then we
> > > continue to put the new tuple on that page; if not, then we take the
> > > existing code path and get a new page.
> >
> > I've reviewed these patches and have questions.
> >
> > Under what conditions would this be exercised for UPDATE? Could you
> > provide an example?
> >
> > With your patch applied, when I create a table, the first time I update
> > it heap_page_prune_opt() will return before actually doing any pruning
> > because the page prune_xid hadn't been set (it is set after pruning as
> > well as later in heap_update() after RelationGetBufferForTuple() is
> > called).
> >
> > I actually added an additional parameter to heap_page_prune() and
> > heap_page_prune_opt() to identify if heap_page_prune() was called from
> > RelationGetBufferForTuple() and logged a message when this was true.
> > Running the test suite, I didn't see any UPDATEs executing
> > heap_page_prune() from RelationGetBufferForTuple(). I did, however, see
> > other statement types doing so (see RelationGetBufferForTuple()'s other
> > callers). Was that intended?
> >
> > > I started to work on benchmarking this, but haven't had time to devote
> > > properly to that, so I'm wondering if there's anyone who might be
> > > interested in collaborating on that part.
> >
> > I'm interested in this feature and in helping with it/helping with
> > benchmarking it, but I don't yet understand the design in its current
> > form.
>
> Hi Melanie,
>
> Thanks for taking a look at this! Apologies for the long delay in
> replying: I started to take a look at your questions earlier, and it
> turned into more of a rabbit hole than I'd anticipated. I've since
> been distracted by other things. So -- I don't have any conclusions
> here yet, but I'm hoping at or after PGConf NYC that I'll be able to
> dedicate the time this deserves.

Hi,

I poked at this a decent amount last night and uncovered a couple of
things (whether or not Andres and I had discussed these details at
PGCon...I don't remember):

1. We don't ever opportunistically prune on INSERT, but we do
(somewhat, see below) on UPDATE, since we call it the first time we
read the page with the to-be-updated tuple on it.
2. The reason that original testing on v1 didn't see any real changes
is because PageClearHasFreeLinePointers() wasn't the right fastpath
gate on this; I should have been using !PageIsFull().

With the change to use !PageIsFull() I can trivially show that there
is improvement functionally. Consider the following commands:

drop table if exists foo;
create table foo(pk serial primary key, t text);
insert into foo(t) select repeat('a', 250) from generate_series(1, 27);
select pg_relation_size('foo');
delete from foo where pk <= 10;
insert into foo(t) select repeat('b', 250) from generate_series(1, 10);
select pg_relation_size('foo');

On master this will result in a final relation size of 16384 while
with the patch applied the final relation size is 8192.

I talked to Andres and Peter again today, and out of that conversation
I have some observations and ideas for future improvements.

1. The most trivial case where this is useful is INSERT: we have a
target page, and it may have dead tuples, so trying to prune may
result in us being able to use the target page rather than getting a
new page.
2. The next most trivial case is where UPDATE (potentially after
failing to find space for a HOT tuple on the source tuple's page);
much like the INSERT case our backend's target page may benefit from
pruning.
3. A more complex UPDATE case occurs when we check the tuple's page
for space in order to insert a HOT tuple and fail to find enough
space. While we've already opportunistically pruned the page on
initial read of the tuple, in complex queries this might be some time
in the past, so it may be worth attempting again. Beyond that context
is key: if we already know we could otherwise do a HOT update but for
the lack of free space on the page, then spending extra cycles
rescuing that failed attempt is easier to justify. In order to do that
we ought to invent an "aggressive" flag to heap_page_prune_opt telling
it that it doesn't need to be quite so careful about exiting fast.
Perhaps we ca

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Jeff Davis

On Wed, 2023-10-04 at 14:02 -0400, Chapman Flack wrote:
> The SQL standard would have me able to:
> 
> CREATE TABLE foo (
>    a CHARACTER VARYING CHARACTER SET UTF8,
>    b CHARACTER VARYING CHARACTER SET LATIN1
> )
> 
> and so on, and write character literals like
> 
> _UTF8'Hello, world!' and _LATIN1'Hello, world!'

Is there a use case for that? UTF-8 is able to encode any unicode code
point, it's relatively compact, and it's backwards-compatible with 7-
bit ASCII. If you have a variety of text data in your system (and in
many cases even if not), then UTF-8 seems like the right solution.

Text data encoded 17 different ways requires a lot of bookkeeping in
the type system, and it also requires injecting a bunch of fallible
transcoding operators around just to compare strings.

Regards,
Jeff Davis

Re: Add annotation syntax to pg_hba.conf entries

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 4:06 PM Jim Jones  wrote:
> Any thoughts?

You're probably not going to like this answer very much, but this
doesn't seem particularly worthwhile to me. If somebody needs to
document why they did something in pg_hba.conf, they can already put a
comment in the file to explain that. Or they can track the reasons for
what's in the file using some completely external system, like a
Google document or a git repository or whatever. The argument for this
feature is not that this information needs to exist, but that it needs
to be queryable from within PostgreSQL. And I guess I just wonder if
that is something that users in general want. It's not a terrible idea
or anything, but it would be sad if we added such a feature and you
were the only one who ever used it... and if a bunch of people now
show up and say "actually, this would be great, I would totally like
to have that," well, then, forget I said anything.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [PoC] pg_upgrade: allow to upgrade publisher node

2023-10-04 Thread Amit Kapila

On Tue, Oct 3, 2023 at 9:58 AM Bharath Rupireddy
 wrote:
>
> On Fri, Sep 29, 2023 at 5:27 PM Hayato Kuroda (Fujitsu)
>  wrote:
> >
> > Yeah, the approach enforces developers to check the decodability.
> > But the benefit seems smaller than required efforts for it because the 
> > function
> > would be used only by pg_upgrade. Could you tell me if you have another use 
> > case
> > in mind? We may able to adopt if we have...
>
> I'm attaching 0002 patch (on top of v45) which implements the new
> decodable callback approach that I have in mind. IMO, this new
> approach is extensible, better than the current approach (hard-coding
> of certain WAL records that may be generated during pg_upgrade) taken
> by the patch, and helps deal with the issue that custom WAL resource
> managers can have with the current approach taken by the patch.
>

Today, I discussed this problem with Andres at PGConf NYC and he
suggested as following. To verify, if there is any pending unexpected
WAL after shutdown, we can have an API like
pg_logical_replication_slot_advance() which will simply process
records without actually sending anything downstream. In this new API,
we will start with each slot's restart_lsn location and try to process
till the end of WAL, if we encounter any WAL that needs to be
processed (like we need to send the decoded WAL downstream) we can
return a false indicating that there is an unexpected WAL. The reason
to start with restart_lsn is that it is the location that we use to
start scanning the WAL anyway.

Then, we should also try to create slots before invoking pg_resetwal.
The idea is that we can write a new binary mode function that will do
exactly what pg_resetwal does to compute the next segment and use that
location as a new location (restart_lsn) to create the slots in a new
node. Then, pass it pg_resetwal by using the existing option '-l
walfile'. As we don't have any API that takes restart_lsn as input, we
can write a new API probably for binary mode to create slots that do
take restart_lsn as input. This will ensure that there is no new WAL
inserted by background processes between resetwal and the creation of
slots.

The other potential problem Andres pointed out is that during shutdown
if due to some reason, the walreceiver goes down, we won't be able to
send the required WAL and users won't be able to ensure that because
even after restart the same situation can happen. The ideal way is to
have something that puts the system in READ ONLY state during shutdown
and then we can probably allow walreceivers to reconnect and receive
the required WALs. As we don't have such functionality available and
it won't be easy to achieve the same, we can leave this for now.

Thoughts?

-- 
With Regards,
Amit Kapila.

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Jeff Davis

On Wed, 2023-10-04 at 13:16 -0400, Robert Haas wrote:
> any byte sequence at all is accepted when you try to
> put values into the database.

We support SQL_ASCII, which allows something similar.

> At any rate, if we were to go in the direction of rejecting code
> points that aren't yet assigned, or aren't yet known to the collation
> library, that's another way for data loading to fail.

A failure during data loading is either a feature or a bug, depending
on whether you are the one loading the data or the one trying to make
sense of it later ;-)

>  Which feels like
> very defensible behavior, but not what everyone wants, or is used to.

Yeah, there are many reasons someone might want to accept unassigned
code points. An obvious one is if their application is on a newer
version of unicode where the codepoint *is* assigned.

> 
> The fact that there are multiple types of normalization and multiple
> notions of equality doesn't make this easier.

NFC is really the only one that makes sense.

NFD is semantically the same as NFC, but expanded into a larger
representation. NFKC/NFKD are based on a more relaxed notion of
equality -- kind of like non-deterministic collations. These other
forms might make sense in certain cases, but not general use.

I believe that having a kind of text data type where it's stored in NFC
and compared with memcmp() would be a good place for many users to be -
- probably most users. It's got all the performance and stability
benefits of memcmp(), with slightly richer semantics. It's less likely
that someone malicious can confuse the database by using different
representations of the same character.

The problem is that it's not universally better for everyone: there are
certainly users who would prefer that the codepoints they send to the
database are preserved exactly, and also users who would like to be
able to use unassigned code points.

Regards,
Jeff Davis

Add annotation syntax to pg_hba.conf entries

2023-10-04 Thread Jim Jones


Hi,

I'm opening this thread after a brief discussion regarding a potential 
new syntax to enable annotations in pg_hba entries. [1]


This feature mainly aims to annotate pg_hba entries in a way that the 
annotations can be parsed and displayed in the pg_hba_file_rule view for 
reporting purposes. For instance, these annotations could contain 
information like tags, client (application) names or any relevant info 
regarding the granted access.


Initially I explored the possibility of using the inline comments after 
a '#', but there were a few valid concerns to this approach [2]


hostssl  db  jim  127.0.0.1/32  cert  map=foo  # comment

I had previously thought of introducing a new character do identify such 
annotations, e.g [] ... but the necessary changes in the hba.c to add 
this feature could add too much complexity to the code. [3]


Perhaps a "less controversial" option would be to add a new variable, 
just like with user name maps.


hostssl  db  jim  127.0.0.1/32  cert  map=foo  annotation=comment
hostssl  db  jim  127.0.0.1/32  cert  map=bar annotation="comment"

Any thoughts?

Thanks!

Jim

1- 
https://www.postgresql.org/message-id/flat/4d623899-36ac-71b5-311d-2a4672d75...@uni-muenster.de
2- 
https://www.postgresql.org/message-id/E543222B-DE8D-4116-BA67-3C2D3FA83110%40yesql.se
3- 
https://www.postgresql.org/message-id/flat/ZPHAiNp%2ByKMsa/vc%40paquier.xyz#05a8405be272342037538ee432d92884

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Nathan Bossart

On Wed, Oct 04, 2023 at 12:24:36PM -0400, Tom Lane wrote:
> In any case, trying to standardize this looks like it would be a
> huge amount of churn for very little gain.  I'd recommend making
> your markup look similar to what's immediately adjacent, if possible,
> and not sweating too much otherwise.

I matched the adjacent options as you suggested.  Perhaps unsurprisingly,
the inclusion of class="parameter" is not the only inconsistency.  I also
found that pg_upgrade adds the  before the argument name!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: trying again to get incremental backup

2023-10-04 Thread Robert Haas

On Thu, Sep 28, 2023 at 6:22 AM Jakub Wartak
 wrote:
> all those basic tests had GOOD results. Please find attached. I'll try
> to schedule some more realistic (in terms of workload and sizes) test
> in a couple of days + maybe have some fun with cross-backup-and
> restores across standbys.

That's awesome! Thanks for testing! This can definitely benefit from
any amount of beating on it that people wish to do. It's a complex,
delicate area that risks data loss.

> If that is still an area open for discussion: wouldn't it be better to
> just specify LSN as it would allow resyncing standby across major lag
> where the WAL to replay would be enormous? Given that we had
> primary->standby where standby would be stuck on some LSN, right now
> it would be:
> 1) calculate backup manifest of desynced 10TB standby (how? using
> which tool?)  - even if possible, that means reading 10TB of data
> instead of just putting a number, isn't it?
> 2) backup primary with such incremental backup >= LSN
> 3) copy the incremental backup to standby
> 4) apply it to the impaired standby
> 5) restart the WAL replay

Hmm. I wonder if this would even be a safe procedure. I admit that I
can't quite see a problem with it, but sometimes I'm kind of dumb.

> Also maybe it's too early to ask, but wouldn't it be nice if we could
> have an future option in pg_combinebackup to avoid double writes when
> used from restore hosts (right now we need to first to reconstruct the
> original datadir from full and incremental backups on host hosting
> backups and then TRANSFER it again and on target host?). So something
> like that could work well from restorehost: pg_combinebackup
> /tmp/backup1 /tmp/incbackup2 /tmp/incbackup3 -O tar -o - | ssh
> dbserver 'tar xvf -C /path/to/restored/cluster - ' . The bad thing is
> that such a pipe prevents parallelism from day 1 and I'm afraid I do
> not have a better easy idea on how to have both at the same time in
> the long term.

I don't think it's too early to ask for this, but I do think it's too
early for you to get it. ;-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: POC, WIP: OR-clause support for indexes

2023-10-04 Thread a.rybakina


On 29.09.2023 20:35, a.rybakina wrote:


I'm sorry I didn't write for a long time, but I really had a very 
difficult month, now I'm fully back to work.


*I was able to implement the patches to the end and moved the 
transformation of "OR" expressions to ANY.* I haven't seen a big 
difference between them yet, one has a transformation before 
calculating selectivity (v7.1-Replace-OR-clause-to-ANY.patch), the 
other after (v7.2-Replace-OR-clause-to-ANY.patch). Regression tests 
are passing, I don't see any problems with selectivity, nothing has 
fallen into the coredump, but I found some incorrect transformations. 
What is the reason for these inaccuracies, I have not found, but, to 
be honest, they look unusual). Gave the error below.


In the patch, I don't like that I had to drag three libraries from 
parsing until I found a way around it.The advantage of this approach 
compared to the other ([1]) is that at this stage all possible or 
transformations are performed, compared to the patch, where the 
transformation was done at the parsing stage. That is, here, for 
example, there are such optimizations in the transformation:



I took the common element out of the bracket and the rest is 
converted to ANY, while, as noted by Peter Geoghegan, we did not have 
several bitmapscans, but only one scan through the array.


postgres=# explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 AND prolang=1 OR prolang = 13 AND prolang = 2 OR 
prolang = 13 AND prolang = 3;

  QUERY PLAN
---
 Seq Scan on pg_proc p1  (cost=0.00..151.66 rows=1 width=68) (actual 
time=1.167..1.168 rows=0 loops=1)
   Filter: ((prolang = '13'::oid) AND (prolang = ANY (ARRAY['1'::oid, 
'2'::oid, '3'::oid])))

   Rows Removed by Filter: 3302
 Planning Time: 0.146 ms
 Execution Time: 1.191 ms
(5 rows)
*Falls into coredump at me:*
explain analyze SELECT p1.oid, p1.proname
FROM pg_proc as p1
WHERE prolang = 13 OR prolang = 2 AND prolang = 2 OR prolang = 13;




Hi, all!

I fixed the kernel dump issue and all the regression tests were 
successful, but I discovered another problem when I added my own 
regression tests.
Some queries that contain "or" expressions do not convert to "ANY". I 
have described this in more detail using diff as expected and real results:


diff -U3 
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out 
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out
--- 
/home/alena/postgrespro__copy6/src/test/regress/expected/create_index.out 
2023-10-04 21:54:12.496282667 +0300
+++ 
/home/alena/postgrespro__copy6/src/test/regress/results/create_index.out 
2023-10-04 21:55:41.665422459 +0300

@@ -1925,17 +1925,20 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
   WHERE thousand = 42 AND (tenthous = 1 OR tenthous = 3) OR thousand = 41;
-   QUERY PLAN
-
+    QUERY PLAN
+---
  Aggregate
    ->  Bitmap Heap Scan on tenk1
- Recheck Cond: (((thousand = 42) AND (tenthous = ANY 
('{1,3}'::integer[]))) OR (thousand = 41))
+ Recheck Cond: thousand = 42) AND (tenthous = 1)) OR 
((thousand = 42) AND (tenthous = 3))) OR (thousand = 41))

  ->  BitmapOr
-   ->  Bitmap Index Scan on tenk1_thous_tenthous
- Index Cond: ((thousand = 42) AND (tenthous = ANY 
('{1,3}'::integer[])))

+   ->  BitmapOr
+ ->  Bitmap Index Scan on tenk1_thous_tenthous
+   Index Cond: ((thousand = 42) AND (tenthous = 1))
+ ->  Bitmap Index Scan on tenk1_thous_tenthous
+   Index Cond: ((thousand = 42) AND (tenthous = 3))
    ->  Bitmap Index Scan on tenk1_thous_tenthous
  Index Cond: (thousand = 41)
-(8 rows)
+(11 rows)
@@ -1946,24 +1949,50 @@
 EXPLAIN (COSTS OFF)
 SELECT count(*) FROM tenk1
+  WHERE thousand = 42 OR tenthous = 1 AND thousand = 42 OR tenthous = 1;
+    QUERY PLAN
+---
+ Aggregate
+   ->  Bitmap Heap Scan on tenk1
+ Recheck Cond: ((thousand = 42) OR ((thousand = 42) AND 
(tenthous = 1)) OR (tenthous = 1))

+ ->  BitmapOr
+   ->  Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: (thousand = 42)
+   ->  Bitmap Index Scan on tenk1_thous_tenthous
+ Index Cond: ((thousand = 42) AND (tenthous = 1))
+   ->  Bitmap Index Sca

Re: Request for comment on setting binary format output per session

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 10:17 AM Peter Eisentraut  wrote:
> I think intuitively, this facility ought to work like client_encoding.

I hadn't really considered client_encoding as a precedent for this
setting. A lot of my discomfort with the proposed mechanism also
applies to client_encoding, namely, suppose you call some function or
procedure or whatever and it changes client_encoding on your behalf
and now your communication with the server is all screwed up. That
seems very unpleasant. Yet it's also existing behavior. I think one
could conclude on these facts either that (a) client_encoding is fine
and the problems with controlling behavior using that kind of
mechanism are mostly theoretical or (b) that we messed up with
client_encoding and shouldn't add any more mistakes of the same ilk or
(c) that we should really be looking at redesigning the way
client_encoding works, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-10-04 Thread Dave Cramer

On Wed, 4 Oct 2023 at 10:17, Peter Eisentraut  wrote:

> On 31.07.23 18:27, Dave Cramer wrote:
> > On Mon, 10 Jul 2023 at 03:56, Daniel Gustafsson  > > wrote:
> >
> >  > On 25 Apr 2023, at 16:47, Dave Cramer  > > wrote:
> >
> >  > Patch attached with comments removed
> >
> > This patch no longer applies, please submit a rebased version on top
> > of HEAD.
> >
> >
> > Rebased see attached
>
> I have studied this thread now.  It seems it has gone through the same
> progression with the same (non-)result as my original patch on the subject.
>
> I have a few intermediate conclusions:
>
> - Doing it with a GUC is challenging.  It's questionable layering to
> have the GUC system control protocol behavior.  It would allow weird
> behavior where a GUC set, maybe for a user or a database, would confuse,
> say, psql or pg_dump.  We probably should make some of those more robust
> in any case.  Also, handling of GUCs through connection poolers is a
> challenge.  It does work, but it's more like opt-in, and so can't be
> fully relied on for protocol correctness.
>
> - Doing it with a session-level protocol-level setting is challenging.
> We currently don't have that kind of thing.  It's not clear how
> connection poolers would/should handle it.  Someone would have to work
> all this out before this could be used.
>
> - In either case, there are issues like what if there is a connection
> pooler and types have different OIDs in different databases.  (Or,
> similarly, an extension is upgraded during the lifetime of a session and
> a type's OID changes.)  Also, maybe, what if types are in different
> schemas on different databases.
>
> - We could avoid some of the session-state issues by doing this per
> request, like extending the Bind message somehow by appending the list
> of types to be sent in binary.  But the JDBC driver currently lists 24
> types for which it supports binary, so that would require adding 24*4=96
> bytes per request, which seems untenable.
>
> I think intuitively, this facility ought to work like client_encoding.
> There, the client declares its capabilities, and the server has to
> format the output according to the client's capabilities.  That works,
> and it also works through connection poolers.  (It is a GUC.)  If we can
> model it like that as closely as possible, then we have a chance of
> getting it working reliably.  Notably, the value space for
> client_encoding is a globally known fixed list of strings.  We need to
> figure out what is the right way to globally identify types, like either
> by fully-qualified name, by base name, some combination, how does it
> work with extensions, or do we need a new mechanism like UUIDs.  I think
> that is something we need to work out, no matter which protocol
> mechanism we end up using.
>

So how is this different than the GUC that I proposed ?

Dave

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Isaac Morland

On Wed, 4 Oct 2023 at 14:05, Chapman Flack  wrote:

> On 2023-10-04 13:47, Robert Haas wrote:
>

> The SQL standard would have me able to:
>
> CREATE TABLE foo (
>a CHARACTER VARYING CHARACTER SET UTF8,
>b CHARACTER VARYING CHARACTER SET LATIN1
> )
>
> and so on, and write character literals like
>
> _UTF8'Hello, world!' and _LATIN1'Hello, world!'
>
> and have those columns and data types independently contain what
> they can contain, without constraints imposed by one overall
> database encoding.
>
> Obviously, we're far from being able to do that. But should it
> become desirable to get closer, would it be worthwhile to also
> try to follow how the standard would have it look?
>
> Clearly, part of the job would involve making the wire protocol
> able to transmit binary values and identify their encodings.
>

I would go in the other direction (note: I’m ignoring all backward
compatibility considerations related to the current design of Postgres).

Always store only UTF-8 in the database, and send only UTF-8 on the wire
protocol. If we still want to have a concept of "client encoding", have the
client libpq take care of translating the bytes between the bytes used by
the caller and the bytes sent on the wire.

Note that you could still define columns as you say, but the character set
specification would effectively act simply as a CHECK constraint on the
characters allowed, essentially CHECK (column_name ~ '^[...all characters
in encoding...]$*'). We don't allow different on-disk representations of
dates or other data types; except when we really need to, and then we have
multiple data types (e.g. int vs. float) rather than different ways of
storing the same datatype.

What about characters not in UTF-8? If a character is important enough for
us to worry about in Postgres, it’s important enough to get a U+ number
from the Unicode Consortium, which automatically puts it in UTF-8. In the
modern context, "plain text" mean "UTF-8 encoded text", as far as I'm
concerned.

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 2:02 PM Chapman Flack  wrote:
> Clearly, part of the job would involve making the wire protocol
> able to transmit binary values and identify their encodings.

Right. Which unfortunately is moving the goal posts into the
stratosphere compared to any other work mentioned so far. I agree it
would be great. But not if you want concrete progress any time soon.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Chapman Flack


On 2023-10-04 13:47, Robert Haas wrote:
On Wed, Oct 4, 2023 at 1:27 PM Nico Williams  
wrote:

A UTEXT type would be helpful for specifying that the text must be
Unicode (in which transform?) even if the character data encoding for
the database is not UTF-8.


That's actually pretty thorny ... because right now client_encoding
specifies the encoding to be used for all data sent to the client. So
would we convert the data from UTF8 to the selected client encoding?


The SQL standard would have me able to:

CREATE TABLE foo (
  a CHARACTER VARYING CHARACTER SET UTF8,
  b CHARACTER VARYING CHARACTER SET LATIN1
)

and so on, and write character literals like

_UTF8'Hello, world!' and _LATIN1'Hello, world!'

and have those columns and data types independently contain what
they can contain, without constraints imposed by one overall
database encoding.

Obviously, we're far from being able to do that. But should it
become desirable to get closer, would it be worthwhile to also
try to follow how the standard would have it look?

Clearly, part of the job would involve making the wire protocol
able to transmit binary values and identify their encodings.

Regards,
-Chap

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 1:27 PM Nico Williams  wrote:
> A UTEXT type would be helpful for specifying that the text must be
> Unicode (in which transform?) even if the character data encoding for
> the database is not UTF-8.

That's actually pretty thorny ... because right now client_encoding
specifies the encoding to be used for all data sent to the client. So
would we convert the data from UTF8 to the selected client encoding?
Or what?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Nico Williams

On Tue, Sep 12, 2023 at 03:47:10PM -0700, Jeff Davis wrote:
> The idea is to have a new data type, say "UTEXT", that normalizes the
> input so that it can have an improved notion of equality while still
> using memcmp().

A UTEXT type would be helpful for specifying that the text must be
Unicode (in which transform?) even if the character data encoding for
the database is not UTF-8.

Maybe UTF8 might be a better name for the new type, since it would
denote the transform (and would allow for UTF16 and UTF32 some day,
though it's doubtful those would ever happen).

But it's one thing to specify Unicode (and transform) in the type and
another to specify an NF to normalize to on insert or on lookup.

How about new column constraint keywords, such as NORMALIZE (meaning
normalize on insert) and NORMALIZED (meaning reject non-canonical form
text), with an optional parenthetical by which to specify a non-default
form?  (These would apply to TEXT as well when the default encoding for
the DB is UTF-8.)

One could then ALTER TABLE to add this to existing tables.

This would also make it easier to add a form-preserving/form-insensitive
mode later if it turns out to be useful or necessary, maybe making it
the default for Unicode text in new tables.

> Questions:
> 
>  * Would this be useful enough to justify a new data type? Would it be
> confusing about when to choose one versus the other?

Yes.  See above.  I think I'd rather have it be called UTF8, and the
normalization properties of it to be specified as column constraints.

>  * Would cross-type comparisons between TEXT and UTEXT become a major
> problem that would reduce the utility?

Maybe when the database's encoding is UTF_8 then UTEXT (or UTF8) can be an alias
of TEXT.

>  * Should "some_utext_value = some_text_value" coerce the LHS to TEXT
> or the RHS to UTEXT?

Ooh, this is nice!  If the TEXT is _not_ UTF-8 then it could be
converted to UTF-8.  So I think which is RHS and which is LHS doesn't
matter -- it's which is UTF-8, and if both are then the only thing left
to do is normalize, and for that I'd take the LHS' form if the LHS is
UTF-8, else the RHS'.

Nico
--

Re: Pre-proposal: unicode normalized text

2023-10-04 Thread Robert Haas

On Tue, Oct 3, 2023 at 3:54 PM Jeff Davis  wrote:
> I assume you mean because we reject invalid byte sequences? Yeah, I'm
> sure that causes a problem for some (especially migrations), but it's
> difficult for me to imagine a database working well with no rules at
> all for the the basic data types.

There's a very popular commercial database where, or so I have been
led to believe, any byte sequence at all is accepted when you try to
put values into the database. The rumors I've heard -- I have not
played with it myself -- are that when you try to do anything, byte
sequences that are not valid in the configured encoding are treated as
single-byte characters or something of that sort. So like if you had
UTF-8 as the encoding and the first byte of the string is something
that can only appear as a continuation byte in UTF-8, I think that
byte is just treated as a separate character. I don't quite know how
you make all of the operations work that way, but it seems like
they've come up with a somewhat-consistent set of principles that are
applied across the board. Very different from the PG philosophy, of
course. And I'm not saying it's better. But it does eliminate the
problem of being unable to load data into the database, because in
such a model there's no such thing as invalidly-encoded data. Instead,
an encoding like UTF-8 is effectively extended so that every byte
sequence represents *something*. Whether that something is what you
wanted is another story.

At any rate, if we were to go in the direction of rejecting code
points that aren't yet assigned, or aren't yet known to the collation
library, that's another way for data loading to fail. Which feels like
very defensible behavior, but not what everyone wants, or is used to.

> At minimum I think we need to have some internal functions to check for
> unassigned code points. That belongs in core, because we generate the
> unicode tables from a specific version.

That's a good idea.

> I also think we should expose some SQL functions to check for
> unassigned code points. That sounds useful, especially since we already
> expose normalization functions.

That's a good idea, too.

> One could easily imagine a domain with CHECK(NOT
> contains_unassigned(a)). Or an extension with a data type that uses the
> internal functions.

Yeah.

> Whether we ever get to a core data type -- and more importantly,
> whether anyone uses it -- I'm not sure.

Same here.

> Yeah, I am looking for a better compromise between:
>
>   * everything is memcmp() and 'á' sometimes doesn't equal 'á'
> (depending on code point sequence)
>   * everything is constantly changing, indexes break, and text
> comparisons are slow
>
> A stable idea of unicode normalization based on using only assigned
> code points is very tempting.

The fact that there are multiple types of normalization and multiple
notions of equality doesn't make this easier.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Request for comment on setting binary format output per session

2023-10-04 Thread Merlin Moncure

On Wed, Oct 4, 2023 at 9:17 AM Peter Eisentraut 
wrote:

> I think intuitively, this facility ought to work like client_encoding.
> There, the client declares its capabilities, and the server has to
> format the output according to the client's capabilities.  That works,
> and it also works through connection poolers.  (It is a GUC.)  If we can
> model it like that as closely as possible, then we have a chance of
> getting it working reliably.  Notably, the value space for
> client_encoding is a globally known fixed list of strings.  We need to
> figure out what is the right way to globally identify types, like either
> by fully-qualified name, by base name, some combination, how does it
> work with extensions, or do we need a new mechanism like UUIDs.  I think
> that is something we need to work out, no matter which protocol
> mechanism we end up using.
>

 Fantastic write up.

> globally known fixed list of strings
Are you suggesting that we would have a client/server negotiation such as,
'jdbc', 'all', etc where that would identify which types are done
which way?  If you did that, why would we need to promote names/uuid to
permanent global space?

merlin

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Tom Lane

Robert Haas  writes:
> On Wed, Oct 4, 2023 at 11:27 AM Nathan Bossart  
> wrote:
>> Here's a patch.  I didn't address the class="parameter" stuff at all.  I
>> figured it would be best to handle that separately.

> I guess I'll vote for including class=parameter in this addition for
> now, as that appears to be the majority position in the documentation
> today. If we get a consensus to change something, so be it. But also,
> if you don't want to do that, so be it.

FWIW, I just did a little sed hacking to count the instances of the
different cases in the docs as of today.  I found

   4038 
  3 
   4017 

The three with "command" are all in plpgsql.sgml, and are marking up
query strings in the synopses of EXECUTE and variants.  I'm inclined
to argue that those three are actually wrong, on the grounds that

(1) From the perspective of EXECUTE, you could equally well say that
the string to be executed is a parameter;

(2) Our general convention elsewhere is that "command" refers to a
command type such as SELECT or UPDATE, not to a complete query string.

In any case, trying to standardize this looks like it would be a
huge amount of churn for very little gain.  I'd recommend making
your markup look similar to what's immediately adjacent, if possible,
and not sweating too much otherwise.

regards, tom lane

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread Jim Jones


Hi Vignesh

Thanks for the thorough review!

On 04.10.23 11:39, vignesh C wrote:

Few comments:
1) Why the default option was chosen without comments shouldn't it be
the other way round?
+opt_xml_serialize_format:
+   INDENT
  { $$ = XMLSERIALIZE_INDENT; }
+   | NO INDENT
  { $$ = XMLSERIALIZE_NO_FORMAT; }
+   | CANONICAL
  { $$ = XMLSERIALIZE_CANONICAL; }
+   | CANONICAL WITH NO COMMENTS
  { $$ = XMLSERIALIZE_CANONICAL; }
+   | CANONICAL WITH COMMENTS
  { $$ = XMLSERIALIZE_CANONICAL_WITH_COMMENTS; }
+   | /*EMPTY*/
  { $$ = XMLSERIALIZE_NO_FORMAT; }
I'm not sure it is the way to go. The main idea is to check if two 
documents have the same content, and comments might be different even if 
the contents of two documents are identical. What are your concerns 
regarding this default behaviour?

2) This should be added to typedefs.list:
+typedef enum XmlSerializeFormat
+{
+   XMLSERIALIZE_INDENT,/*
pretty-printed xml serialization  */
+   XMLSERIALIZE_CANONICAL, /*
canonical form without xml comments */
+   XMLSERIALIZE_CANONICAL_WITH_COMMENTS,   /* canonical form with
xml comments */
+   XMLSERIALIZE_NO_FORMAT  /*
unformatted xml representation */
+} XmlSerializeFormat;

added.

3) This change is not required:
 return result;
+
  #else
 NO_XML_SUPPORT();
 return NULL;

removed.


4) This comment body needs slight reformatting:
+   /*
+   * Parse the input according to the xmloption.
+   * XML canonical expects a well-formed XML input, so here in case of
+   * XMLSERIALIZE_CANONICAL or XMLSERIALIZE_CANONICAL_WITH_COMMENTS we
+   * force xml_parse() to parse 'data' as XMLOPTION_DOCUMENT despite
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will thrown an error otherwise.
+   */

reformatted.

5) Similarly here too:
-   if (newline == NULL || xmlerrcxt->err_occurred)
+   * Emit declaration only if the input had one.
Note: some versions of
+   * xmlSaveToBuffer leak memory if a non-null
encoding argument is
+   * passed, so don't do that.  We don't want any
encoding conversion
+   * anyway.
+   */
+   if (decl_len == 0)

reformatted.

6) Similarly here too:
+   /*
+   * Deal with the case where we have
non-singly-rooted XML.
+   * libxml's dump functions don't work
well for that without help.
+   * We build a fake root node that
serves as a container for the
+   * content nodes, and then iterate over
the nodes.
+   */

reformatted.

7) Similarly here too:
+   /*
+   * We use this node to insert newlines
in the dump.  Note: in at
+   * least some libxml versions,
xmlNewDocText would not attach the
+   * node to the document even if we
passed it.  Therefore, manage
+   * freeing of this node manually, and
pass NULL here to make sure
+   * there's not a dangling link.
+   */

reformatted.

8) Should this:
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will thrown an error otherwise.
Be:
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will throw an error otherwise.


I didn't understand the suggestion in 8) :)

Thanks again for the review. Much appreciated!

v7 attached.

Best, Jim
From 4bd06615b9aa9f3f0fcebdd1bc30a0500524cdad Mon Sep 17 00:00:00 2001
From: Jim Jones 
Date: Wed, 4 Oct 2023 17:58:24 +0200
Subject: [PATCH v7] Add CANONICAL output format to xmlserialize

This patch introduces the CANONICAL option to xmlserialize, which
serializes xml documents in their canonical form - as described in
the W3C Canonical XML Version 1.1 specification. This option can
be used with the additional parameter WITH [NO] COMMENTS to keep
or remove xml comments from the canonical xml output. This feature
is based on the function xmlC14NDocDumpMemory from the C14N module
of libxml2.

This patch also includes regression tests and documentation.
---
 doc/src/s

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Nathan Bossart

On Wed, Oct 04, 2023 at 11:51:32AM -0400, Robert Haas wrote:
> On Wed, Oct 4, 2023 at 11:27 AM Nathan Bossart  
> wrote:
>> Here's a patch.  I didn't address the class="parameter" stuff at all.  I
>> figured it would be best to handle that separately.
> 
> I guess I'll vote for including class=parameter in this addition for
> now, as that appears to be the majority position in the documentation
> today. If we get a consensus to change something, so be it. But also,
> if you don't want to do that, so be it.

WFM.  I'll add it before committing, which I plan to do later today.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 11:27 AM Nathan Bossart  wrote:
> Here's a patch.  I didn't address the class="parameter" stuff at all.  I
> figured it would be best to handle that separately.

I guess I'll vote for including class=parameter in this addition for
now, as that appears to be the majority position in the documentation
today. If we get a consensus to change something, so be it. But also,
if you don't want to do that, so be it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Nathan Bossart

On Wed, Oct 04, 2023 at 05:00:26PM +0200, Alvaro Herrera wrote:
> On 2023-Oct-04, Robert Haas wrote:
> 
>> The original issue I reported does make a real difference, though. :-)
> 
> Yes, absolutely, and I agree that it'd be better to get it fixed.

Here's a patch.  I didn't address the class="parameter" stuff at all.  I
figured it would be best to handle that separately.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 8a09c5c438..d43c91575c 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -366,7 +366,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,
diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 344de921e4..72290ec3e4 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -595,7 +595,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,
diff --git a/doc/src/sgml/ref/pg_checksums.sgml b/doc/src/sgml/ref/pg_checksums.sgml
index 7b44ba71cf..db5b29505c 100644
--- a/doc/src/sgml/ref/pg_checksums.sgml
+++ b/doc/src/sgml/ref/pg_checksums.sgml
@@ -140,7 +140,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,
diff --git a/doc/src/sgml/ref/pg_dump.sgml b/doc/src/sgml/ref/pg_dump.sgml
index 625a736eff..b61493743a 100644
--- a/doc/src/sgml/ref/pg_dump.sgml
+++ b/doc/src/sgml/ref/pg_dump.sgml
@@ -1183,7 +1183,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,
diff --git a/doc/src/sgml/ref/pg_rewind.sgml b/doc/src/sgml/ref/pg_rewind.sgml
index 80dff16168..cf97f1ea9a 100644
--- a/doc/src/sgml/ref/pg_rewind.sgml
+++ b/doc/src/sgml/ref/pg_rewind.sgml
@@ -285,7 +285,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,
diff --git a/doc/src/sgml/ref/pgupgrade.sgml b/doc/src/sgml/ref/pgupgrade.sgml
index 958c44b361..116b2c945d 100644
--- a/doc/src/sgml/ref/pgupgrade.sgml
+++ b/doc/src/sgml/ref/pgupgrade.sgml
@@ -191,7 +191,7 @@ PostgreSQL documentation
  
 
  
-  --sync-method
+  --sync-method=method
   

 When set to fsync, which is the default,

Re: should frontend tools use syncfs() ?

2023-10-04 Thread Nathan Bossart

On Wed, Sep 27, 2023 at 01:56:08PM +0100, Peter Eisentraut wrote:
> I think it's a bit much to add a whole appendix for that little content.

I'm inclined to agree.

> We have a collection of platform-specific notes in chapter 19, including
> file-system-related notes in section 19.2.  Maybe it could be put there?

I will give this a try.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Daniel Gustafsson

> On 4 Oct 2023, at 17:07, Tom Lane  wrote:

> OK, so now I know what the possible values of "class" are, but
> I'm still not seeing a reason why we shouldn't just assume that
> "parameter" is the only one of interest.

I think "parameter" is the only one of interest for this usecase (params to
application options), but it might not necessarily be applicable to other uses
of  we have in the tree, like for example in bki.sgml:

oprname(lefttype,righttype)

But since we don't know if anyone is rendering our docs with a custom style, or
ever will, this is all very theoretical.  Maybe skipping the class attribute is
the right choice for consistency?

--
Daniel Gustafsson

Re: Modernize const handling with readline

2023-10-04 Thread Aleksander Alekseev

Hi,

> On 03.10.23 13:28, Aleksander Alekseev wrote:
> > While examining the code for similar places I noticed that the
> > following functions can also be const'ified:
> >
> > - crc32_sz
>
> I suppose this could be changed.

OK, that's a simple change. Here is the patch.

-- 
Best regards,
Aleksander Alekseev


v1-0001-Constify-crc32_sz.patch
Description: Binary data

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Tom Lane

Daniel Gustafsson  writes:
> On 4 Oct 2023, at 16:51, Tom Lane  wrote:
>> To do that, we'd need some sort of agreement on what the possible
>> "class" values are and when to use each one.  I've never seen any
>> documentation about that.

> Thats fair.  The 4.5 docbook guide isn't terribly helpful on what the
> attributes mean and how they should be used:
>   https://tdg.docbook.org/tdg/4.5/replaceable
> In the 5.2 version the text is slightly expanded but not by all that much:
>   https://tdg.docbook.org/tdg/5.2/replaceable

OK, so now I know what the possible values of "class" are, but
I'm still not seeing a reason why we shouldn't just assume that
"parameter" is the only one of interest.  We don't really have
places where "command" or "function" would be appropriate AFAIR.
As for "option", what's the distinction between that and
"parameter"?  And why wouldn't I use "" instead?

Even if we did have uses for the other class values, I'm skeptical
that we'd ever use them consistently enough that there'd be
value in rendering them differently.

regards, tom lane

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Alvaro Herrera

On 2023-Oct-04, Robert Haas wrote:

> The original issue I reported does make a real difference, though. :-)

Yes, absolutely, and I agree that it'd be better to get it fixed.

-- 
Álvaro HerreraBreisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Learn about compilers. Then everything looks like either a compiler or
a database, and now you have two problems but one of them is fun."
https://twitter.com/thingskatedid/status/1456027786158776329

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Daniel Gustafsson

> On 4 Oct 2023, at 16:51, Tom Lane  wrote:
> 
> Daniel Gustafsson  writes:
>> On 4 Oct 2023, at 16:39, Tom Lane  wrote:
>>> I concluded a long time ago that it does nothing.
> 
>> It does nothing in our current doc rendering, but if someone would like to
>> render docs with another style where it does make a difference it seems
>> unhelpful to not be consistent.
> 
> To do that, we'd need some sort of agreement on what the possible
> "class" values are and when to use each one.  I've never seen any
> documentation about that.

Thats fair.  The 4.5 docbook guide isn't terribly helpful on what the
attributes mean and how they should be used:

https://tdg.docbook.org/tdg/4.5/replaceable

In the 5.2 version the text is slightly expanded but not by all that much:

https://tdg.docbook.org/tdg/5.2/replaceable

--
Daniel Gustafsson

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Nathan Bossart

On Wed, Oct 04, 2023 at 09:08:57AM -0400, Robert Haas wrote:
> The various command-line utilities that have recently acquired a
> --sync-method option document it like this:
>
> --sync-method
>
> But that is not how we document options which take an argument. We do
> it like this:
>
> --pgdata=directory
> --filenode=filenode
>
> etc.
>
> This one should be something like this:
>
> --sync-method=method

Whoops.  Thanks for pointing this out.  I'll get this fixed.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: stopgap fix for signal handling during restore_command

2023-10-04 Thread Nathan Bossart

On Sun, Oct 01, 2023 at 08:50:15PM +0200, Peter Eisentraut wrote:
> Is this still being contemplated?  What is the status of this?

I'll plan on committing this in the next couple of days.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Tom Lane

Daniel Gustafsson  writes:
> On 4 Oct 2023, at 16:39, Tom Lane  wrote:
>> I concluded a long time ago that it does nothing.

> It does nothing in our current doc rendering, but if someone would like to
> render docs with another style where it does make a difference it seems
> unhelpful to not be consistent.

To do that, we'd need some sort of agreement on what the possible
"class" values are and when to use each one.  I've never seen any
documentation about that.

regards, tom lane

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Daniel Gustafsson

> On 4 Oct 2023, at 16:39, Tom Lane  wrote:
> 
> Alvaro Herrera  writes:
>> So I'm not sure that specifying the class="parameter" bit does anything in
>> reality, or that changing lines to add or remove it will have any effect.
> 
> I concluded a long time ago that it does nothing.

It does nothing in our current doc rendering, but if someone would like to
render docs with another style where it does make a difference it seems
unhelpful to not be consistent.

--
Daniel Gustafsson

Re: Add support for AT LOCAL

2023-10-04 Thread Vik Fearing


On 9/29/23 09:27, Michael Paquier wrote:

On Sat, Sep 23, 2023 at 12:54:01AM +0200, Vik Fearing wrote:

On 9/22/23 23:46, cary huang wrote:

I think this feature can be a useful addition in dealing with time
zones. I have applied and tried out the patch, The feature works as
described and seems promising. The problem with compilation failure
was probably reported on CirrusCI when compiled on different
platforms. I have run the latest patch on my own Cirrus CI environment
and everything checked out fine.


Thank you for reviewing!


+| a_expr AT LOCAL%prec AT
+{
+/* Use the value of the session's time zone */
+FuncCall *tz = 
makeFuncCall(SystemFuncName("current_setting"),
+
list_make1(makeStringConst("TimeZone", -1)),
+COERCE_SQL_SYNTAX,
+-1);
+$$ = (Node *) makeFuncCall(SystemFuncName("timezone"),
+   list_make2(tz, $1),
+   COERCE_SQL_SYNTAX,
+   @2);

As the deparsing code introduced by this patch is showing, this leads
to a lot of extra complexity.  And, actually, this can be quite
expensive as well with these two layers of functions.  Note also that
in comparison to SQLValueFunctions, COERCE_SQL_SYNTAX does less
inlining.  So here comes my question: why doesn't this stuff just use
one underlying function to do this job?


Okay.  Here is a v3 using that approach.
--
Vik Fearing
From e68305dafd2d4f9439cbd341e8f04745d8a945ed Mon Sep 17 00:00:00 2001
From: Vik Fearing 
Date: Wed, 4 Oct 2023 15:46:38 +0100
Subject: [PATCH v3] Add support for AT LOCAL

When converting a timestamp to/from with/without time zone, the SQL
Standard specifies an AT LOCAL variant of AT TIME ZONE which uses the
session's time zone.
---
 doc/src/sgml/func.sgml| 13 +++
 src/backend/parser/gram.y |  7 
 src/backend/utils/adt/ruleutils.c |  9 +
 src/backend/utils/adt/timestamp.c | 20 ++
 src/include/catalog/pg_proc.dat   |  6 +++
 src/test/regress/expected/timestamptz.out | 47 +++
 src/test/regress/sql/timestamptz.sql  | 21 ++
 7 files changed, 123 insertions(+)

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index f1ad64c3d6..6bc61705a9 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -10619,12 +10619,16 @@ SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-0

 

 AT TIME ZONE

 
+   
+AT LOCAL
+   
+

 The AT TIME ZONE operator converts time
 stamp without time zone to/from
 time stamp with time zone, and
 time with time zone values to different time
 zones.  shows its
@@ -10707,24 +10711,33 @@ SELECT date_bin('15 minutes', TIMESTAMP '2020-02-11 15:44:17', TIMESTAMP '2001-0
 In the text case, a time zone name can be specified in any of the ways
 described in .
 The interval case is only useful for zones that have fixed offsets from
 UTC, so it is not very common in practice.

 
+   
+The syntax AT LOCAL may be used as shorthand for AT TIME ZONE
+local, where local is the
+session's TimeZone value.
+   
+

 Examples (assuming the current  setting
 is America/Los_Angeles):
 
 SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'America/Denver';
 Result: 2001-02-16 19:38:40-08
 
 SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT TIME ZONE 'America/Denver';
 Result: 2001-02-16 18:38:40
 
 SELECT TIMESTAMP '2001-02-16 20:38:40' AT TIME ZONE 'Asia/Tokyo' AT TIME ZONE 'America/Chicago';
 Result: 2001-02-16 05:38:40
+
+SELECT TIMESTAMP WITH TIME ZONE '2001-02-16 20:38:40-05' AT LOCAL;
+Result: 2001-02-16 17:38:40
 
 The first example adds a time zone to a value that lacks it, and
 displays the value using the current TimeZone
 setting.  The second example shifts the time stamp with time zone value
 to the specified time zone, and returns the value without a time zone.
 This allows storage and display of values different from the current
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index e56cbe77cb..50ed504e5a 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -14505,12 +14505,19 @@ a_expr:		c_expr	{ $$ = $1; }
 {
 	$$ = (Node *) makeFuncCall(SystemFuncName("timezone"),
 			   list_make2($5, $1),
 			   COERCE_SQL_SYNTAX,
 			   @2);
 }
+			| a_expr AT LOCAL		%prec AT
+{
+	$$ = (Node *) makeFuncCall(SystemFuncName("timezone"),
+			   list_make1($1),
+			   COERCE_SQL_SYNTAX,
+			   -1);
+}
 		/*
 		 * These operators must be called out explicitly i

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Tom Lane

Alvaro Herrera  writes:
> So I'm not sure that specifying the class="parameter" bit does anything in
> reality, or that changing lines to add or remove it will have any effect.

I concluded a long time ago that it does nothing.  We have a good
mix of places that write  with or without that, and
I've never detected any difference in the output.  So I tend to
leave it out, or at most make new entries look like the adjacent
ones.

regards, tom lane

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 10:35 AM Alvaro Herrera  wrote:
> So I'm not sure that specifying the class="parameter" bit does anything in
> reality, or that changing lines to add or remove it will have any effect.

Interesting. I wondered whether that might be the case.

The original issue I reported does make a real difference, though. :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Modernize const handling with readline

2023-10-04 Thread Peter Eisentraut


On 03.10.23 13:28, Aleksander Alekseev wrote:

While examining the code for similar places I noticed that the
following functions can also be const'ified:

- crc32_sz


I suppose this could be changed.


- pg_checksum_page (? temporary modifies the page but then restores it)


Then it's not really const?


- XLogRegisterData (?)


I don't think this would work, at least without further work elsewhere, 
because the data is stored in XLogRecData, which has no const handling.



The callers of cstring_to_text[_with_len] often cast the argument to
(char *) while in fact it's (const char *). This can be refactored
too.


These look fine to me.


Additionally there is a slight difference between XLogRegisterBlock()
declaration in xloginsert.h:

```
extern void XLogRegisterBlock(uint8 block_id, RelFileLocator *rlocator,
   ForkNumber forknum, BlockNumber blknum,
char *page,
   uint8 flags);
```

... and xloginsert.c:

```
void
XLogRegisterBlock(uint8 block_id, RelFileLocator *rlocator, ForkNumber forknum,
   BlockNumber blknum, Page page, uint8 flags)
```


It looks like the reason here is that xloginsert.h does not have the 
Page type in scope.  I don't know how difficult it would be to change 
that, but it seems fine as is.

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Alvaro Herrera

On 2023-Oct-04, Daniel Gustafsson wrote:

> I can take a stab at tidying this up during breaks at the conference.  It 
> might
> not be the most important bit of markup, but for anyone building the docs who
> might want to use this it seems consistency will help.

So for HTML, the result of the pg_basebackup lines are these two lines:

-S slotname--slot=slotname

and 

-r rate--max-rate=rate

So I'm not sure that specifying the class="parameter" bit does anything in
reality, or that changing lines to add or remove it will have any effect.

-- 
Álvaro Herrera   48°01'N 7°57'E  —  https://www.EnterpriseDB.com/

Re: Request for comment on setting binary format output per session

2023-10-04 Thread Peter Eisentraut

On 31.07.23 18:27, Dave Cramer wrote:
On Mon, 10 Jul 2023 at 03:56, Daniel Gustafsson > wrote:

 > On 25 Apr 2023, at 16:47, Dave Cramer mailto:davecra...@gmail.com>> wrote:

 > Patch attached with comments removed

This patch no longer applies, please submit a rebased version on top
of HEAD.

Rebased see attached

I have studied this thread now.  It seems it has gone through the same 
progression with the same (non-)result as my original patch on the subject.

I have a few intermediate conclusions:

- Doing it with a GUC is challenging.  It's questionable layering to 
have the GUC system control protocol behavior.  It would allow weird 
behavior where a GUC set, maybe for a user or a database, would confuse, 
say, psql or pg_dump.  We probably should make some of those more robust 
in any case.  Also, handling of GUCs through connection poolers is a 
challenge.  It does work, but it's more like opt-in, and so can't be 
fully relied on for protocol correctness.

- Doing it with a session-level protocol-level setting is challenging. 
We currently don't have that kind of thing.  It's not clear how 
connection poolers would/should handle it.  Someone would have to work 
all this out before this could be used.

- In either case, there are issues like what if there is a connection 
pooler and types have different OIDs in different databases.  (Or, 
similarly, an extension is upgraded during the lifetime of a session and 
a type's OID changes.)  Also, maybe, what if types are in different 
schemas on different databases.

- We could avoid some of the session-state issues by doing this per 
request, like extending the Bind message somehow by appending the list 
of types to be sent in binary.  But the JDBC driver currently lists 24 
types for which it supports binary, so that would require adding 24*4=96 
bytes per request, which seems untenable.

I think intuitively, this facility ought to work like client_encoding. 
There, the client declares its capabilities, and the server has to 
format the output according to the client's capabilities.  That works, 
and it also works through connection poolers.  (It is a GUC.)  If we can 
model it like that as closely as possible, then we have a chance of 
getting it working reliably.  Notably, the value space for 
client_encoding is a globally known fixed list of strings.  We need to 
figure out what is the right way to globally identify types, like either 
by fully-qualified name, by base name, some combination, how does it 
work with extensions, or do we need a new mechanism like UUIDs.  I think 
that is something we need to work out, no matter which protocol 
mechanism we end up using.

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 9:36 AM James Coleman  wrote:
> Are you thinking we should simply elide the fact that there is pruning
> that happens outside of HOT? Or add that information onto the HOT
> page, even though it doesn't directly fit?

I think we should elide it. Maybe with a much larger rewrite there
would be a good place to include that information, but with the
current structure, the page is about why HOT is good, and talking
about pruning that can happen apart from HOT doesn't advance that
message.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Document efficient self-joins / UPDATE LIMIT techniques.

2023-10-04 Thread Laurenz Albe

On Wed, 2023-06-28 at 14:20 -0400, Corey Huinker wrote:
> This patch adds a few examples to demonstrate the following:
> 
> * The existence of the ctid column on every table
> * The utility of ctds in self joins
> * A practical usage of SKIP LOCKED

I had a look at your patch, and I am in favor of the general idea.

Style considerations:
-

I think the SQL statements should end with semicolons.  Our SQL examples
are usually written like that.

Our general style with CTEs seems to be (according to
https://www.postgresql.org/docs/current/queries-with.html):

 WITH quaxi AS (
 SELECT ...
 )
 SELECT ...;

About the DELETE example:
-

The text suggests that a single, big DELETE operation can consume
too many resources.  That may be true, but the sum of your DELETEs
will consume even more resources.

In my experience, the bigger problem with bulk deletes like that is
that you can run into deadlocks easily, so maybe that would be a
better rationale to give.  You could say that with this technique,
you can force the lock to be taken in a certain order, which will
avoid the possibility of deadlock with other such DELETEs.

About the SELECT example:
-

That example belongs to UPDATE, I'd say, because that is the main
operation.

The reason you give (avoid excessive locking) is good.
Perhaps you could mention that updating in batches also avoids
excessive bload (if you VACUUM between the batches).

About the UPDATE example:
-

I think that could go, because it is pretty similar to the previous
one.  You even use ctid in both examples.

Status set to "waiting for author".

Yours,
Laurenz Albe

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2023-10-04 Thread James Coleman

On Wed, Oct 4, 2023 at 9:18 AM Robert Haas  wrote:
>
> On Tue, Oct 3, 2023 at 3:35 PM James Coleman  wrote:
> > I like your changes. Reading through this several times, and noting
> > Peter's comments about pruning being more than just HOT, I'm thinking
> > that rather than a simple fixup for this one paragraph what we
> > actually want is to split out the concept of page pruning into its own
> > section of the storage docs. Attached is a patch that does that,
> > incorporating much of your language about LP_REDIRECT, along with
> > LP_DEAD so that readers know this affects more than just heap-only
> > tuple workloads.
>
> I considered this kind of approach, but I don't think it's better. I
> think the point of this documentation is to give people a general idea
> of how HOT works, not all the technical details. If we start getting
> into the nitty gritty, we're going to have to explain a lot more
> stuff, which is going to detract from the actual purpose of this
> documentation. For example, your patch talks about LP_REDIRECT and
> LP_DEAD, which are not referenced in any other part of the
> documentation. It also uses the term line pointer instead of page item
> identifier without explaining that those are basically the same thing.
> Obviously those kinds of things can be fixed, but in my opinion, your
> version doesn't really add much information that is likely to be
> useful to a reader of this section, while at the same time it does add
> some things that might be confusing. If we wanted to have a more
> technical discussion of all of this somewhere in the documentation, we
> could, but that would be quite a bit more work to write and review,
> and it would probably duplicate some of what we've already got in
> README.HOT.
>
> I suggest that we should be looking for a patch that tries to correct
> the wrong stuff in the present wording while adding the minimum
> possible amount of text.

There's one primary thing I think this approach adds (I don't mean the
patch I proposed is the only way to address this concern): the fact
that pages get cleaned up outside of the HOT optimization. We could
add a short paragraph about that on the HOT page, but that seems a bit
confusing.

Are you thinking we should simply elide the fact that there is pruning
that happens outside of HOT? Or add that information onto the HOT
page, even though it doesn't directly fit?

Regards,
James

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Daniel Gustafsson

> On 4 Oct 2023, at 15:22, Robert Haas  wrote:
> 
> On Wed, Oct 4, 2023 at 9:15 AM Daniel Gustafsson  wrote:
>>> On 4 Oct 2023, at 15:08, Robert Haas  wrote:
>>> This one should be something like this:
>>> 
>>> --sync-method=method
>> 
>> Shouldn't it be method ?
> 
> Hmm, I think you're probably right. But look at this:
> 
>  -S slotname
>  --slot= class="parameter">slotname
> 
> But then in the very same file:
> 
>  -r  class="parameter">rate
>  --max-rate= class="parameter">rate

Hmm.. that's a bit unfortunate.

> It doesn't look to me like we're entirely consistent about this.

That (sadly) applies to a fair chunk of the docs.

I can take a stab at tidying this up during breaks at the conference.  It might
not be the most important bit of markup, but for anyone building the docs who
might want to use this it seems consistency will help.

--
Daniel Gustafsson

Re: Infinite Interval

2023-10-04 Thread Ashutosh Bapat

On Fri, Sep 29, 2023 at 12:43 PM Dean Rasheed  wrote:
>
> I think that part is now ready to commit, and I plan to push this fix
> to make_interval() separately, since it's really a bug-fix, not
> related to support for infinite intervals. In line with recent
> precedent, I don't think it's worth back-patching though, since such
> inputs are pretty unlikely in production.

The changes look good to me. I am not a fan of goto construct. But
this looks nicer.

I think we should introduce interval_out_of_range_error() function on
the lines of float_overflow_error(). Later patches introduce more
places where we raise that error. We can introduce the function as
part of those patches.

>
>
> 2. The various in_range() functions needed adjusting to handle
> infinite interval offsets.
>
> For timestamp values, I followed the precedent set by the equivalent
> float/numeric code. I.e., all (finite and non-finite) timestamps are
> regarded as infinitely following -infinity and infinitely preceding
> +infinity.
>
> For time values, it's a bit different because no time values precede
> or follow any other by more than 24 hours, so a window frame between
> +inf following and +inf following is empty (whereas in the timestamp
> case it contains +inf). Put another way, such a window frame is empty
> because a time value can't be infinity.
>

I will review and test this. I will also take a look at what else we
might be missing in the patch. [5] did mention that in_range()
functions need to be assessed but I don't see corresponding changes in
the subsequent patches. I will go over that list again.

>
> 3. I got rid of interval2timestamp_no_overflow() because I don't think
> it really makes much sense to convert an interval to a timestamp, and
> it's a bit of a hack anyway (as selfuncs.c itself admits). Actually, I
> think it's OK to just leave selfuncs.c as it is. The existing code
> will cope just fine with infinite intervals, since they aren't really
> infinite, just larger than any others.
>

This looks odd next to date2timestamp_no_overflow() which returns
-DBL_MIN/DBL_MAX for infinite value. But it's in agreement with what
we do with timestamp i.e. we don't convert infinities to DBL_MIN/MAX.
So I am fine with just adding a comment, the way you have done it.
Don't have much preference here.

>
> 4. I tested pg_upgrade on a table with an interval with INT_MAX
> months, and it was silently converted to infinity. I think that's
> probably the best outcome (better than failing).

[1] mentions that Interval with month = INT_MAX is a valid finite
value but out of documented range of interval [2]. The highest value
of Interval = 17800 (years) * 12 = 213600 months which is less
than (2^32 - 1). But we do not prohibit such a value from entering the
database, albeit very less probable.

> However, this means
> that we really should require all 3 fields of an interval to be
> INT_MIN/MAX for it to be considered infinite, otherwise it would be
> possible to have multiple internal representations of infinity that do
> not compare as equal.
>
> Similarly, interval_in() needs to accept such inputs, otherwise things
> like pg_dump/restore from pre-17 databases could fail. But since it
> now requires all 3 fields of the interval to be INT_MIN/MAX for it to
> be infinite, the odds of that happening by accident are vanishingly
> small in practice.
>
> This approach also means that the range of allowed finite intervals is
> only reduced by 1 microsecond at each end of the range, rather than a
> whole month.
>
> Also, it means that it is no longer necessary to change a number of
> the regression tests (such as the justify_interval() tests) for values
> near INT_MIN/MAX.

My first patch was comparing all the three fields to determine whether
a given Interval value represents infinity. [3] changed that to use
only the month field. I guess that was based on the discussion at [4].
You may want to review that discussion if not already done. I am fine
either way. We should be able to change the comparison code later if
we see performance getting impacted.

>
>
> Overall, I think this is now pretty close to being ready for commit.

Thanks.

[1] 
https://www.postgresql.org/message-id/CAAvxfHea4%2BsPybKK7agDYOMo9N-Z3J6ZXf3BOM79pFsFNcRjwA%40mail.gmail.com
[2] Table 8.9 at
https://www.postgresql.org/docs/current/datatype-datetime.html#DATATYPE-DATETIME
[3] 
https://www.postgresql.org/message-id/CAAvxfHf0-T99i%3DOrve_xfonVCvsCuPy7C4avVm%3D%2Byu128ujSGg%40mail.gmail.com
[4] https://www.postgresql.org/message-id/26022.1545087636%40sss.pgh.pa.us
[5] 
https://www.postgresql.org/message-id/CAAvxfHdzd5JLRBXDAW7OPhsNNACvhsCP3f5R4LNhRVaDuQG0gg%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Re: remaining sql/json patches

2023-10-04 Thread Amit Langote

On Tue, Oct 3, 2023 at 10:11 PM Amit Langote  wrote:
> On Mon, Oct 2, 2023 at 2:26 PM Amit Langote  wrote:
> > On Mon, Oct 2, 2023 at 1:24 PM Amit Langote  wrote:
> > > Pushed this 30 min ago (no email on -committers yet!) and am looking
> > > at the following llvm crash reported by buildfarm animal pogona [1]:
> > >
> > > #4  0x7f5bceb673d5 in __assert_fail_base (fmt=0x7f5bcecdbdc8
> > > "%s%s%s:%u: %s%sAssertion `%s' failed.\\n%n",
> > > assertion=assertion@entry=0x7f5bc1336419 "(i >= FTy->getNumParams() ||
> > > FTy->getParamType(i) == Args[i]->getType()) && \\"Calling a function
> > > with a bad signature!\\"", file=file@entry=0x7f5bc1336051
> > > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp",
> > > line=line@entry=299, function=function@entry=0x7f5bc13362af "void
> > > llvm::CallInst::init(llvm::FunctionType *, llvm::Value *,
> > > ArrayRef, ArrayRef, const
> > > llvm::Twine &)") at ./assert/assert.c:92
> > > #5  0x7f5bceb763a2 in __assert_fail (assertion=0x7f5bc1336419 "(i
> > > >= FTy->getNumParams() || FTy->getParamType(i) == Args[i]->getType())
> > > && \\"Calling a function with a bad signature!\\"",
> > > file=0x7f5bc1336051
> > > "/home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp", line=299,
> > > function=0x7f5bc13362af "void llvm::CallInst::init(llvm::FunctionType
> > > *, llvm::Value *, ArrayRef,
> > > ArrayRef, const llvm::Twine &)") at
> > > ./assert/assert.c:101
> > > #6  0x7f5bc110f138 in llvm::CallInst::init (this=0x557a91f3e508,
> > > FTy=0x557a91ed9ae0, Func=0x557a91f8be88, Args=..., Bundles=...,
> > > NameStr=...) at
> > > /home/bf/src/llvm-project-5/llvm/lib/IR/Instructions.cpp:297
> > > #7  0x7f5bc0fa579d in llvm::CallInst::CallInst
> > > (this=0x557a91f3e508, Ty=0x557a91ed9ae0, Func=0x557a91f8be88,
> > > Args=..., Bundles=..., NameStr=..., InsertBefore=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1934
> > > #8  0x7f5bc0fa538c in llvm::CallInst::Create (Ty=0x557a91ed9ae0,
> > > Func=0x557a91f8be88, Args=..., Bundles=..., NameStr=...,
> > > InsertBefore=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/Instructions.h:1444
> > > #9  0x7f5bc0fa51f9 in llvm::IRBuilder > > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > > FTy=0x557a91ed9ae0, Callee=0x557a91f8be88, Args=..., Name=...,
> > > FPMathTag=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1669
> > > #10 0x7f5bc100edda in llvm::IRBuilder > > llvm::IRBuilderDefaultInserter>::CreateCall (this=0x557a91f9c6a0,
> > > Callee=0x557a91f8be88, Args=..., Name=..., FPMathTag=0x0) at
> > > /home/bf/src/llvm-project-5/llvm/include/llvm/IR/IRBuilder.h:1663
> > > #11 0x7f5bc100714e in LLVMBuildCall (B=0x557a91f9c6a0,
> > > Fn=0x557a91f8be88, Args=0x7ffde6fa0b50, NumArgs=6, Name=0x7f5bc30b648c
> > > "funccall_iocoerce_in_safe") at
> > > /home/bf/src/llvm-project-5/llvm/lib/IR/Core.cpp:2964
> > > #12 0x7f5bc30af861 in llvm_compile_expr (state=0x557a91fbeac0) at
> > > /home/bf/bf-build/pogona/HEAD/pgsql.build/../pgsql/src/backend/jit/llvm/llvmjit_expr.c:1373
> > >
> > > This seems to me to be complaining about the following addition:
> > >
> > > +   {
> > > +   Oid ioparam = op->d.iocoerce.typioparam;
> > > +   LLVMValueRef v_params[6];
> > > +   LLVMValueRef v_success;
> > > +
> > > +   v_params[0] = l_ptr_const(op->d.iocoerce.finfo_in,
> > > + l_ptr(StructFmgrInfo));
> > > +   v_params[1] = v_output;
> > > +   v_params[2] = l_oid_const(lc, ioparam);
> > > +   v_params[3] = l_int32_const(lc, -1);
> > > +   v_params[4] = 
> > > l_ptr_const(op->d.iocoerce.escontext,
> > > +
> > > l_ptr(StructErrorSaveContext));
> > >
> > > -   LLVMBuildStore(b, v_retval, v_resvaluep);
> > > +   /*
> > > +* InputFunctionCallSafe() will write directly 
> > > into
> > > +* *op->resvalue.
> > > +*/
> > > +   v_params[5] = v_resvaluep;
> > > +
> > > +   v_success = LLVMBuildCall(b, llvm_pg_func(mod,
> > > "InputFunctionCallSafe"),
> > > + v_params, 
> > > lengthof(v_params),
> > > + 
> > > "funccall_iocoerce_in_safe");
> > > +
> > > +   /*
> > > +* Return null if InputFunctionCallSafe() 
> > > encountered
> > > +* an error.
> > > +*/
> > > +   v_resnullp = LLVMBuildICmp(b, LLVMIntEQ, 
> > > v_success,
> > > +  l_sbool_const(0), "");
> > > +   }
> >
> > Although most animals except p

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Robert Haas

On Wed, Oct 4, 2023 at 9:15 AM Daniel Gustafsson  wrote:
> > On 4 Oct 2023, at 15:08, Robert Haas  wrote:
> > This one should be something like this:
> >
> > --sync-method=method
>
> Shouldn't it be method ?

Hmm, I think you're probably right. But look at this:

  -S slotname
  --slot=slotname

But then in the very same file:

  -r rate
  --max-rate=rate

It doesn't look to me like we're entirely consistent about this. I
also found this in vacuumlo.sgml, and there seem to be various other
examples:

-U username

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: [DOCS] HOT - correct claim about indexes not referencing old line pointers

2023-10-04 Thread Robert Haas

On Tue, Oct 3, 2023 at 3:35 PM James Coleman  wrote:
> I like your changes. Reading through this several times, and noting
> Peter's comments about pruning being more than just HOT, I'm thinking
> that rather than a simple fixup for this one paragraph what we
> actually want is to split out the concept of page pruning into its own
> section of the storage docs. Attached is a patch that does that,
> incorporating much of your language about LP_REDIRECT, along with
> LP_DEAD so that readers know this affects more than just heap-only
> tuple workloads.

I considered this kind of approach, but I don't think it's better. I
think the point of this documentation is to give people a general idea
of how HOT works, not all the technical details. If we start getting
into the nitty gritty, we're going to have to explain a lot more
stuff, which is going to detract from the actual purpose of this
documentation. For example, your patch talks about LP_REDIRECT and
LP_DEAD, which are not referenced in any other part of the
documentation. It also uses the term line pointer instead of page item
identifier without explaining that those are basically the same thing.
Obviously those kinds of things can be fixed, but in my opinion, your
version doesn't really add much information that is likely to be
useful to a reader of this section, while at the same time it does add
some things that might be confusing. If we wanted to have a more
technical discussion of all of this somewhere in the documentation, we
could, but that would be quite a bit more work to write and review,
and it would probably duplicate some of what we've already got in
README.HOT.

I suggest that we should be looking for a patch that tries to correct
the wrong stuff in the present wording while adding the minimum
possible amount of text.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: --sync-method isn't documented to take an argument

2023-10-04 Thread Daniel Gustafsson

> On 4 Oct 2023, at 15:08, Robert Haas  wrote:

> This one should be something like this:
> 
> --sync-method=method

Shouldn't it be method ?

--
Daniel Gustafsson

--sync-method isn't documented to take an argument

2023-10-04 Thread Robert Haas

The various command-line utilities that have recently acquired a
--sync-method option document it like this:

--sync-method

But that is not how we document options which take an argument. We do
it like this:

--pgdata=directory
--filenode=filenode

etc.

This one should be something like this:

--sync-method=method

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Eager page freeze criteria clarification

2023-10-04 Thread Peter Geoghegan

On Mon, Oct 2, 2023 at 4:25 PM Robert Haas  wrote:
> On Mon, Oct 2, 2023 at 11:37 AM Peter Geoghegan  wrote:
> > If no vacuuming against pgbench_accounts is strictly better than some
> > vacuuming (unless it's just to advance relfrozenxid, which can't be
> > avoided), then to what extent is Melanie's freezing patch obligated to
> > not make the situation worse? I'm not saying that it doesn't matter at
> > all; just that there needs to be a sense of proportion.
>
> I agree. I think Melanie's previous test results showed no measurable
> performance regression but a significant regression in WAL volume. I
> don't remember how much of a regression it was, but it was enough to
> make me think that the extra WAL volume could probably be turned into
> a performance loss by adjusting the test scenario (a bit less
> hardware, or a bandwidth-constrained standby connection, etc.). I
> think I'd be OK to accept a small amount of additional WAL, though. Do
> you see it differently?

That's pretty much how I see it too.

> I wonder how much useless work we're doing in this area today. I mean,
> most pruning in that workload gets done by HOT rather than by vacuum,
> because vacuum simply can't keep up, but I don't think it's worthless
> work: if it wasn't done in the background by VACUUM, it would happen
> in the foreground anyway very soon after.

Yeah, but it probably wouldn't be as efficient, since only VACUUM uses
a somewhat older OldestXmin/prune cutoff.

As much as anything else, I'm arguing that we should treat the general
problem of useless work as relevant in the context of the performance
validation work/benchmarks.

> I don't have a good feeling
> for how much index cleanup we end up doing in a pgbench workload, but
> it seems to me that if we don't do index cleanup at least now and
> then, we'll lose the ability to recycle line pointers in the wake of
> non-HOT updates, and that seems bad. Maybe that's rare in pgbench
> specifically, or maybe it isn't, but it seems like you'd only have to
> change the workload a tiny bit for that to become a real problem.

There's no question that pruning that's required to go ahead in order
for index cleanup to take place isn't really ever something that we'd
want to skip.

-- 
Peter Geoghegan

Re: [PATCH] Clarify the behavior of the system when approaching XID wraparound

2023-10-04 Thread Peter Geoghegan

On Mon, Oct 2, 2023 at 1:25 PM Robert Haas  wrote:
> I'm also pretty happy with these patches and would like to see at
> least 0001 and 0002 committed, and probably 0003 as well. I am,
> however, -1 on back-patching. Perhaps that is overly cautious, but I
> don't like changing existing messages in back-branches. It will break
> translations, and potentially monitoring scripts, etc.
>
> If John's not available to take this forward, I can volunteer as
> substitute committer, unless Peter or Peter would like to handle it.

If you're willing to take over as committer here, I'll let the issue
of backpatching go.

I only ask that you note why you've not backpatched in the commit message.

-- 
Peter Geoghegan

Re: Synchronizing slots from primary to standby

2023-10-04 Thread Drouvot, Bertrand


Hi,

On 10/4/23 1:50 PM, shveta malik wrote:

On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila  wrote:


On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
 wrote:


On 10/4/23 6:26 AM, shveta malik wrote:

On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila  wrote:



How about an alternate scheme where we define sync_slot_names on
standby but then store the physical_slot_name in the corresponding
logical slot (ReplicationSlotPersistentData) to be synced? So, the
standby will send the list of 'sync_slot_names' and the primary will
add the physical standby's slot_name in each of the corresponding
sync_slot. Now, if we do this then even after restart, we should be
able to know for which physical slot each logical slot needs to wait.
We can even provide an SQL API to reset the value of
standby_slot_names in logical slots as a way to unblock decoding in
case of emergency (for example, corresponding when physical standby
never comes up).




Looks like a better approach to me. It solves most of the pain points like:
1) Avoids the need of multiple GUCs
2) Primary and standby need not to worry to be in sync if we maintain
sync-slot-names GUC on both


As per my understanding of this approach, we don't want
'sync-slot-names' to be set on the primary. Do you have a different
understanding?



Same understanding. We do not need it to be set on primary by user. It
will be GUC on standby and standby will convey it to primary.


+1, same understanding here.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Re: Synchronizing slots from primary to standby

2023-10-04 Thread shveta malik

On Wed, Oct 4, 2023 at 12:08 PM Drouvot, Bertrand
 wrote:
>
> Hi,
>
> On 10/4/23 7:00 AM, shveta malik wrote:
> > On Wed, Oct 4, 2023 at 9:56 AM shveta malik  wrote:
>
> > The most simplistic approach would be:
> >
> > 1) maintain standby_slot_names GUC on primary
> > 2) maintain synchronize_slot_names GUC on physical standby alone.
> >
> > On primary, let all logical-walsenders wait on physical-standbys
> > configured in standby_slot_names GUC. This will work and will avoid
> > all the complexity involved in designs discussed above. But  this
> > simplistic approach comes with disadvantages like below:
> >
> > 1) Even if the associated slot of logical-walsender is not part of
> > synchronize_slot_names of any of the physical-standbys, it is still
> > waiting for all the configured standbys to finish.
>
> That's right. Currently (with walsender waiting an arbitrary amount of time)
> that sounds like a -1. But if we're going with a new CV approach (like 
> proposed
> in [1], that might not be so terrible). Though I don't feel comfortable with
> waiting for no reasons (even if this is for a short amount of time possible).
>

Agreed. Not a good idea to block each logical walsender.

> > 2) If associated slot of logical walsender is part of
> > synchronize_slot_names of standby1, it is still waiting on standby2,3
> > etc to finish i.e. waiting on rest of the standbys configured in
> > standby_slot_names which have not even marked that logical slot in
> > their synchronize_slot_names.
> >
>
> Same thoughts as above for 1)
>
> > So we need to weigh our options here.
> >
>
> With the simplistic approach, if a standby goes down that would impact non 
> related
> walsenders on the primary until the standby's associated physical slot is 
> removed
> from standby_slot_names and I don't feel comfortable wit this behavior.
>
> So, I'm +1 for the ReplicationSlotPersistentData approach proposed by Amit.

yes, +1 for ReplicationSlotPersistentData approach. Will start
detailed analysis on that approach now.

thanks
Shveta

Re: Synchronizing slots from primary to standby

2023-10-04 Thread shveta malik

On Wed, Oct 4, 2023 at 5:00 PM Amit Kapila  wrote:
>
> On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
>  wrote:
> >
> > On 10/4/23 6:26 AM, shveta malik wrote:
> > > On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila  
> > > wrote:
> > >>
> > >>
> > >> How about an alternate scheme where we define sync_slot_names on
> > >> standby but then store the physical_slot_name in the corresponding
> > >> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> > >> standby will send the list of 'sync_slot_names' and the primary will
> > >> add the physical standby's slot_name in each of the corresponding
> > >> sync_slot. Now, if we do this then even after restart, we should be
> > >> able to know for which physical slot each logical slot needs to wait.
> > >> We can even provide an SQL API to reset the value of
> > >> standby_slot_names in logical slots as a way to unblock decoding in
> > >> case of emergency (for example, corresponding when physical standby
> > >> never comes up).
> > >>
> > >
> > >
> > > Looks like a better approach to me. It solves most of the pain points 
> > > like:
> > > 1) Avoids the need of multiple GUCs
> > > 2) Primary and standby need not to worry to be in sync if we maintain
> > > sync-slot-names GUC on both
>
> As per my understanding of this approach, we don't want
> 'sync-slot-names' to be set on the primary. Do you have a different
> understanding?
>

Same understanding. We do not need it to be set on primary by user. It
will be GUC on standby and standby will convey it to primary.

> > > 3) User still gets the flexibility to remove a standby from wait-lost
> > > of primary's logical-walsenders' using reset SQL API.
> > >
> >
> > Fully agree.
> >
> > > Now some initial thoughts:
> > > 1) Since each logical slot could be needed to be synched by multiple
> > > physical-standbys, so in ReplicationSlotPersistentData, we need to
> > > hold a list of standby's name. So this brings us to question as in how
> > > much shall we allocate initially in shared-memory? Shall it be for
> > > max_replication_slots (worst case scenario) in each
> > > ReplicationSlotPersistentData to hold physical-standby names?
> > >
> >
> > Yeah, and even if we do the opposite means add the 'to-sync'
> > logical replication slot in the ReplicationSlotPersistentData of the 
> > physical
> > slot(s) the questions still remain (as a physical standby could want to
> > sync multiples slots)
> >
>
> I think we don't need to allocate the entire max_replication_slots
> array in ReplicationSlotPersistentData. We should design something
> like the variable amount of data to be written on disk should be
> represented similar to what we do with variable TransactionIds in
> SnapBuildOnDisk. Now, we also need to store the list of standby's
> in-memory either shared or local memory of walsender. I think storing
> it in shared-memory say in ReplicationSlot has the advantage that we
> can easily set that via physical walsender and it may be easier to
> maintain both for manually created logical slots and logical slots
> associated with logical walsenders. But still this needs some thoughts
> as to what is the best way to store this information.
>

Thanks for the idea, I will review this.

thanks
Shveta

Is the logfile the only place to find the finish LSN?

2023-10-04 Thread pgchem pgchem

Hello all,
 
on pgsql-general this got no answers, so...

According to:

https://www.postgresql.org/docs/current/logical-replication-conflicts.html

or

https://www.postgresql.fastware.com/blog/addressing-replication-conflicts-using-alter-subscription-skip

The logfile is the _only_ place to find the transaction finish LSN that must be 
skipped if logical replication is stuck on a conflict. Is the logfile the only 
place to find this information, or can it also be found somewhere inside the 
database, e. g. in some system catalog view?

best regards

Ernst-Georg

RE: [PATCH] Fix memory leak in memoize for numeric key

2023-10-04 Thread Orlov Aleksej

I've finished testing the patch. 
I confirm that the patch solves the problem and works just as fast.

Thanks,
Alexey Orlov.

Re: Invalidate the subscription worker in cases where a user loses their superuser status

2023-10-04 Thread Shlok Kyal

On Wed, 4 Oct 2023 at 16:56, Peter Smith  wrote:
>
> On Tue, Oct 3, 2023 at 5:42 PM vignesh C  wrote:
> >
> > Thanks for the comments, the attached v6 version patch has the changes
> > for the same.
> >
>
> v6 LGTM.
>
I have verified the patch and it is working fine for me.

Re: Synchronizing slots from primary to standby

2023-10-04 Thread Amit Kapila

On Wed, Oct 4, 2023 at 11:55 AM Drouvot, Bertrand
 wrote:
>
> On 10/4/23 6:26 AM, shveta malik wrote:
> > On Wed, Oct 4, 2023 at 5:36 AM Amit Kapila  wrote:
> >>
> >>
> >> How about an alternate scheme where we define sync_slot_names on
> >> standby but then store the physical_slot_name in the corresponding
> >> logical slot (ReplicationSlotPersistentData) to be synced? So, the
> >> standby will send the list of 'sync_slot_names' and the primary will
> >> add the physical standby's slot_name in each of the corresponding
> >> sync_slot. Now, if we do this then even after restart, we should be
> >> able to know for which physical slot each logical slot needs to wait.
> >> We can even provide an SQL API to reset the value of
> >> standby_slot_names in logical slots as a way to unblock decoding in
> >> case of emergency (for example, corresponding when physical standby
> >> never comes up).
> >>
> >
> >
> > Looks like a better approach to me. It solves most of the pain points like:
> > 1) Avoids the need of multiple GUCs
> > 2) Primary and standby need not to worry to be in sync if we maintain
> > sync-slot-names GUC on both

As per my understanding of this approach, we don't want
'sync-slot-names' to be set on the primary. Do you have a different
understanding?

> > 3) User still gets the flexibility to remove a standby from wait-lost
> > of primary's logical-walsenders' using reset SQL API.
> >
>
> Fully agree.
>
> > Now some initial thoughts:
> > 1) Since each logical slot could be needed to be synched by multiple
> > physical-standbys, so in ReplicationSlotPersistentData, we need to
> > hold a list of standby's name. So this brings us to question as in how
> > much shall we allocate initially in shared-memory? Shall it be for
> > max_replication_slots (worst case scenario) in each
> > ReplicationSlotPersistentData to hold physical-standby names?
> >
>
> Yeah, and even if we do the opposite means add the 'to-sync'
> logical replication slot in the ReplicationSlotPersistentData of the physical
> slot(s) the questions still remain (as a physical standby could want to
> sync multiples slots)
>

I think we don't need to allocate the entire max_replication_slots
array in ReplicationSlotPersistentData. We should design something
like the variable amount of data to be written on disk should be
represented similar to what we do with variable TransactionIds in
SnapBuildOnDisk. Now, we also need to store the list of standby's
in-memory either shared or local memory of walsender. I think storing
it in shared-memory say in ReplicationSlot has the advantage that we
can easily set that via physical walsender and it may be easier to
maintain both for manually created logical slots and logical slots
associated with logical walsenders. But still this needs some thoughts
as to what is the best way to store this information.

> > 2) If standby sends '*', then we need to update each logical-slot with
> > that standby-name. Or do we have better way to deal with '*'? Need to
> > think more on this.
> >

I can't see any better way.

> > JFYI, on the similar line, currently in ReplicationSlotPersistentData,
> > we are maintaining a flag for slot-sync feature which is:
> >
> >  boolsynced; /* Is this a slot created by a
> > sync-slot worker? */
> >
> > This flag currently holds significance only on physical-standby. This
> > has been added to distinguish between a slot created by user for
> > logical decoding purpose and the ones being synced from primary.
>
> BTW, what about having this "user visible" through pg_replication_slots?
>

We can do that.

-- 
With Regards,
Amit Kapila.

Re: dikkop seems unhappy because of openssl stuff (FreeBSD 14-BETA1)

2023-10-04 Thread Tomas Vondra

On 9/30/23 01:57, Tom Lane wrote:
> Thomas Munro  writes:
>> Does the image lack a /etc/localtime file/link, but perhaps one of you
>> did something to create it?
> 
> Hah!  I thought it had to be some sort of locale effect, but I failed
> to think of that as a contributor :-(.  My installation does have
> /etc/localtime, and removing it duplicates Tomas' syndrome.
> 
> I also find that if I add "-gmt 1" to the clock invocation, it's happy
> with or without /etc/localtime.  So I think we should modify the test
> case to use that to reduce its environmental sensitivity.  Will
> go make it so.
> 

FWIW I've defined the timezone (copying it into /etc/localtime), and
that seems to have resolved the issue (well, maybe it's the "-gmt 1"
tweak, not sure).

I wonder how come it worked with the earlier image - I don't recall
defining the timezone (AFAIK I only did the bare minimum to get it
working), but maybe I did.

regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Add a new BGWORKER_BYPASS_ROLELOGINCHECK flag

2023-10-04 Thread Drouvot, Bertrand


Hi,

On 10/4/23 8:20 AM, Michael Paquier wrote:

On Mon, Oct 02, 2023 at 10:53:22AM +0200, Drouvot, Bertrand wrote:

On 10/2/23 10:17 AM, Michael Paquier wrote:

On Mon, Oct 02, 2023 at 10:01:04AM +0200, Drouvot, Bertrand wrote:

I think that would make sense to have more flexibility in the worker_spi
module. I think that could be done in a dedicated patch though. I
think it makes more sense to have the current patch "focusing" on
this new flag (while adding a test about it without too much
refactoring). What about doing the worker_spi module  re-factoring
as a follow up of this one?


I would do that first, as that's what I usually do,


The reason I was thinking not doing that first is that there is no real use
case in the current worker_spi module test.


As a template, improving and extending it seems worth it to me as long
as it can also improve tests.


but I see also
your point that this is not mandatory.  If you want, I could give it a
shot tomorrow to see where it leads.


Oh yeah that would be great (and maybe you already see a use case in the
current test). Thanks!


Yes, while it shows a bit more what one can achieve with bgw_extra, it
also helps in improving the test coverage with starting dynamic
workers across defined databases and roles through a SQL function.



Yeah right.

It took me a bit longer than I expected, 


Thanks for having looked at it!


but here is what I finish
with:
- 0001 extends worker_spi to be able to pass down database and role
IDs for the workers to start, through MyBgworkerEntry->bgw_extra.
Perhaps the two new arguments of worker_spi_launch() should use
InvalidOid as default, actually, to fall back to the database and
roles defined by the GUCs in these cases. 


I'm fine with the way it's currently done in 0001 and that sounds
more logical to me. I mean we don't "really" want InvalidOid but to fall
back to the GUCs.

Just a remark here:

+   if (!OidIsValid(roleoid))
+   {
+   /*
+* worker_spi_role is NULL by default, so just pass down an 
invalid
+* OID to let the main() routine do its connection work.
+*/
+   if (worker_spi_role)
+   roleoid = get_role_oid(worker_spi_role, false);
+   else
+   roleoid = InvalidOid;

the

+   else
+   roleoid = InvalidOid;

I think it is not needed as we're already in "!OidIsValid(roleoid)".


- 0002 is your patch, on top of which I have added handling for the
flags in the launch() function with a text[].  The tests get much
simpler, and don't need restarts.



Yeah, agree that's better.


By the way, I am pretty sure that we are going to need a wait phase
after the startup of the worker in the case where "nologrole" is not
allowed to log in even with the original patch: the worker may not
have started at the point where we check the logs.


I agree and it's now failing on my side.
I added this "wait" in v4-0002 attach and it's now working fine.

Please note there is more changes than adding this wait in 001_worker_spi.pl 
(as compare
to v3-0002) as I ran pgperltidy on it.
FWIW, the new "wait" is just the part related to "nb_errors".


What do you think?


Except the Nit that I mentioned in 0001, that looks all good to me (with the
new wait in 001_worker_spi.pl).

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 64465c2b2d54ba1c316efe347f346687415d01e0 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 4 Oct 2023 15:06:25 +0900
Subject: [PATCH v4 2/2] Allow background workers to bypass login check

This adds a new flag value (BGWORKER_BYPASS_ROLELOGINCHECK) to the flags
being used in BackgroundWorkerInitializeConnection() and
BackgroundWorkerInitializeConnectionByOid().  This flag allows the
background workers to bypass the login check done in
InitializeSessionUserId().
---
 doc/src/sgml/bgworker.sgml|  3 +-
 src/backend/bootstrap/bootstrap.c |  2 +-
 src/backend/postmaster/autovacuum.c   |  5 +-
 src/backend/postmaster/postmaster.c   |  6 +-
 src/backend/tcop/postgres.c   |  1 +
 src/backend/utils/init/miscinit.c |  4 +-
 src/backend/utils/init/postinit.c |  5 +-
 src/include/miscadmin.h   |  4 +-
 src/include/postmaster/bgworker.h | 10 +--
 .../modules/worker_spi/t/001_worker_spi.pl| 68 +--
 .../modules/worker_spi/worker_spi--1.0.sql|  3 +-
 src/test/modules/worker_spi/worker_spi.c  | 49 +++--
 12 files changed, 134 insertions(+), 26 deletions(-)
   4.6% doc/src/sgml/
   6.8% src/backend/postmaster/
   7.6% src/backend/utils/init/
   7.1% src/include/postmaster/
  41.4% src/test/modules/worker_spi/t/
  29.8% src/test/modules/worker_spi/

diff --git a/doc/src/sgml/bgworker.sgml b/doc/src/sgml/bgworker.sgml
index 9ad1146ba0..7

Re: [HACKERS] make async slave to wait for lsn to be replayed

2023-10-04 Thread Alexander Korotkov

Hi, Ivan!

On Fri, Jun 30, 2023 at 11:32 AM Картышов Иван 
wrote:

> All rebased and tested
>

Thank you for continuing to work on this patch.

I see you're concentrating on the procedural version of this feature.  But
when you're calling a procedure within a normal SQL statement, the executor
gets a snapshot and holds it until the procedure finishes. In the case the
WAL record conflicts with this snapshot, the query will be canceled.
Alternatively, when hot_standby_feedback = on, the query and WAL replayer
will be in a deadlock (WAL replayer will wait for the query to finish, and
the query will wait for WAL replayed).  Do you see this issue?  Or do you
think I'm missing something?

XLogRecPtr
GetMinWaitedLSN(void)
{
return state->min_lsn.value;
}

You definitely shouldn't access directly the fields
inside pg_atomic_uint64.  In this particular case, you should
use pg_atomic_read_u64().

Also, I think there is a race condition.

/* Check if we already reached the needed LSN */
if (cur_lsn >= target_lsn)
return true;

AddWaitedLSN(target_lsn);

Imagine, PerformWalRecovery() will replay a record after the check, but
before AddWaitedLSN().  This code will start the waiting cycle even if the
LSN is already achieved.  Surely this cycle will end soon because it
rechecks LSN value each 100 ms.  But anyway, I think there should be
another check after AddWaitedLSN() for the sake of consistency.

--
Regards,
Alexander Korotkov

Re: [PATCH] Add CANONICAL option to xmlserialize

2023-10-04 Thread vignesh C

On Fri, 17 Mar 2023 at 18:01, Jim Jones  wrote:
>
> After some more testing I realized that v5 was leaking the xmlDocPtr.
>
> Now fixed in v6.

Few comments:
1) Why the default option was chosen without comments shouldn't it be
the other way round?
+opt_xml_serialize_format:
+   INDENT
 { $$ = XMLSERIALIZE_INDENT; }
+   | NO INDENT
 { $$ = XMLSERIALIZE_NO_FORMAT; }
+   | CANONICAL
 { $$ = XMLSERIALIZE_CANONICAL; }
+   | CANONICAL WITH NO COMMENTS
 { $$ = XMLSERIALIZE_CANONICAL; }
+   | CANONICAL WITH COMMENTS
 { $$ = XMLSERIALIZE_CANONICAL_WITH_COMMENTS; }
+   | /*EMPTY*/
 { $$ = XMLSERIALIZE_NO_FORMAT; }

2) This should be added to typedefs.list:
+typedef enum XmlSerializeFormat
+{
+   XMLSERIALIZE_INDENT,/*
pretty-printed xml serialization  */
+   XMLSERIALIZE_CANONICAL, /*
canonical form without xml comments */
+   XMLSERIALIZE_CANONICAL_WITH_COMMENTS,   /* canonical form with
xml comments */
+   XMLSERIALIZE_NO_FORMAT  /*
unformatted xml representation */
+} XmlSerializeFormat;

3) This change is not required:
return result;
+
 #else
NO_XML_SUPPORT();
return NULL;

4) This comment body needs slight reformatting:
+   /*
+   * Parse the input according to the xmloption.
+   * XML canonical expects a well-formed XML input, so here in case of
+   * XMLSERIALIZE_CANONICAL or XMLSERIALIZE_CANONICAL_WITH_COMMENTS we
+   * force xml_parse() to parse 'data' as XMLOPTION_DOCUMENT despite
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will thrown an error otherwise.
+   */

5) Similarly here too:
-   if (newline == NULL || xmlerrcxt->err_occurred)
+   * Emit declaration only if the input had one.
Note: some versions of
+   * xmlSaveToBuffer leak memory if a non-null
encoding argument is
+   * passed, so don't do that.  We don't want any
encoding conversion
+   * anyway.
+   */
+   if (decl_len == 0)

6) Similarly here too:
+   /*
+   * Deal with the case where we have
non-singly-rooted XML.
+   * libxml's dump functions don't work
well for that without help.
+   * We build a fake root node that
serves as a container for the
+   * content nodes, and then iterate over
the nodes.
+   */

7) Similarly here too:
+   /*
+   * We use this node to insert newlines
in the dump.  Note: in at
+   * least some libxml versions,
xmlNewDocText would not attach the
+   * node to the document even if we
passed it.  Therefore, manage
+   * freeing of this node manually, and
pass NULL here to make sure
+   * there's not a dangling link.
+   */

8) Should this:
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will thrown an error otherwise.
Be:
+   * of the XmlOptionType given in 'xmloption_arg'. This enables the
+   * canonicalization of CONTENT fragments if they contain a singly-rooted
+   * XML - xml_parse() will throw an error otherwise.

Regards,
Vignesh

Re: Rethink the wait event names for postgres_fdw, dblink and etc

2023-10-04 Thread Michael Paquier

On Mon, Aug 21, 2023 at 11:04:23AM +0900, Masahiro Ikeda wrote:
> I updated the patch to v2.
> * Update a comment instead writing documentation about
>   the wait events for pg_prewarm.

Right.  It does not seem worth the addition currently, so I am
discarded this part.  It's just not worth the extra cycles for the
moment.

> * Make the name of wait events which are different code
>   path different. Add DblinkGetConnect and PgPrewarmDumpShutdown.
> * Make variable names shorter like pgfdw_we_receive.
> * Update documents.
> * Add some tests with pg_wait_events view.

Sounds like a good idea for postgres_fdw and dblink, still some of
them may not be stable?  First, PostgresFdwReceive and
PostgresFdwCleanupReceive would be registered only if the connection
is busy, but that may not be always the case depending on the timing?
PostgresFdwConnect is always OK because this code path in
connect_pg_server() is always taken.  Similarly, DblinkConnect and
DblinkGetConnect are registered in deterministic code paths, so these
will show up all the time.

I am lacking a bit of time now, but I have applied the bits for
test_shm_mq and worker_spi.  Note that I have not added tests for
test_shm_mq as it may be possible that the two events (for the
bgworker startup and for a message to be queued) are never reached
depending on the timing.  I'll handle the rest tomorrow, with likely
some adjustments to the tests.  (I may as well just remove them, this
API is already covered by worker_spi.)
--
Michael

signature.asc
Description: PGP signature

Re: Removing unneeded self joins

2023-10-04 Thread Alexander Korotkov

Hi!

On Wed, Oct 4, 2023 at 9:56 AM Andrei Lepikhov 
wrote:
> On 4/10/2023 07:12, Alexander Korotkov wrote:
> > On Tue, Sep 12, 2023 at 4:58 PM Andrey Lepikhov
> >  wrote:
> >> On 5/7/2023 21:28, Andrey Lepikhov wrote:
> >>> During the significant code revision in v.41 I lost some replacement
> >>> operations. Here is the fix and extra tests to check this in the
future.
> >>> Also, Tom added the JoinDomain structure five months ago, and I added
> >>> code to replace relids for that place too.
> >>> One more thing, I found out that we didn't replace SJs, defined by
> >>> baserestrictinfos if no one self-join clause have existed for the
join.
> >>> Now, it is fixed, and the test has been added.
> >>> To understand changes readily, see the delta file in the attachment.
> >> Here is new patch in attachment. Rebased on current master and some
> >> minor gaffes are fixed.
> >
> > I went through the thread and I think the patch gets better shape.  A
> > couple of notes from my side.
> > 1) Why replace_relid() makes a copy of lids only on insert/replace of
> > a member, but performs deletion in-place?
>
> Shortly speaking, it was done according to the 'Paranoid' strategy.
> The main reason for copying before deletion was the case with the rinfo
> required_relids and clause_relids. They both point to the same Bitmapset
> in some cases. And we feared such things for other fields.
> Right now, it may be redundant because we resolved the issue mentioned
> above in replace_varno_walker.

OK, but my point is still that you should be paranoid in all the cases or
none of the cases.  Right now (newId < 0) branch doesn't copy source
relids, but bms_is_member(oldId, relids) does copy.  Also, I think whether
we copy or not should be reflected in the function comment.

/*
 * Substitute newId by oldId in relids.
 */
static Bitmapset *
replace_relid(Relids relids, int oldId, int newId)
{
if (oldId < 0)
return relids;

if (newId < 0)
/* Delete relid without substitution. */
return bms_del_member(relids, oldId);

if (bms_is_member(oldId, relids))
return bms_add_member(bms_del_member(bms_copy(relids), oldId),
newId);

return relids;
}

> Relid replacement machinery is the most contradictory code here. We used
> a utilitarian approach and implemented a simplistic variant.

> > 2) It would be nice to skip the insertion of IS NOT NULL checks when
> > they are not necessary.  [1] points that infrastructure from [2] might
> > be useful.  The patchset from [2] seems committed mow.  However, I
> > can't see it is directly helpful in this matter.  Could we just skip
> > adding IS NOT NULL clause for the columns, that have
> > pg_attribute.attnotnull set?
> Thanks for the links, I will look into that case.

OK, thank you.

--
Regards,
Alexander Korotkov

84 matches

Mail list logo