date:20220623

Re: Postgres perl module namespace

2022-06-23 Thread Noah Misch

On Wed, Jun 22, 2022 at 11:03:22AM -0400, Andrew Dunstan wrote:
> On 2022-06-22 We 03:21, Noah Misch wrote:
> > On Tue, Apr 19, 2022 at 07:24:58PM -0400, Andrew Dunstan wrote:
> >> On 2022-04-19 Tu 18:39, Michael Paquier wrote:
> >>> +*generate_ascii_string = *TestLib::generate_ascii_string;
> >>> +*slurp_dir = *TestLib::slurp_dir;
> >>> +*slurp_file = *TestLib::slurp_file;
> >>>
> >>> I am not sure if it is possible and my perl-fu is limited in this
> >>> area, but could a failure be enforced when loading this path if a new
> >>> routine added in TestLib.pm is forgotten in this list?
> >> Not very easily that I'm aware of, but maybe some superior perl wizard
> >> will know better.
> > One can alias the symbol table, like https://metacpan.org/pod/Package::Alias
> > does.  I'm attaching what I plan to use.  Today, check-world fails after
> >
> >   sed -i 's/TestLib/PostgreSQL::Test::Utils/g; 
> > s/PostgresNode/PostgreSQL::Test::Cluster/g' **/*.pl
> >
> > on REL_14_STABLE, because today's alias list is incomplete.  With this 
> > change,
> > the same check-world passes.

The patch wasn't sufficient to make that experiment pass for REL_10_STABLE,
where 017_shm.pl uses the %params argument of get_new_node().  The problem
call stack had PostgreSQL::Test::Cluster->get_new_code calling
PostgreSQL::Test::Cluster->new, which needs v14- semantics.  Here's a fixed
version, just changing the new() hack.

I suspect v1 also misbehaved for non-core tests that subclass PostgresNode
(via the approach from commit 54dacc7) or PostgreSQL::Test::Cluster.  I expect
this version will work with subclasses written for v14- and with subclasses
written for v15+.  I didn't actually write dummy subclasses to test, and the
relevant permutations are numerous (e.g. whether or not the subclass overrides
new(), whether or not the subclass overrides get_new_node()).

> Nice. 30 years of writing perl and I'm still learning of nifty features.

Thanks for reviewing. 
commit 5155e0f
Author: Noah Misch 
AuthorDate: Thu Jun 23 15:31:41 2022 -0700
Commit: Noah Misch 
CommitDate: Thu Jun 23 15:31:41 2022 -0700

For PostgreSQL::Test compatibility, alias entire package symbol tables.

Remove the need to edit back-branch-specific code sites when
back-patching the addition of a PostgreSQL::Test::Utils symbol.  Replace
per-symbol, incomplete alias lists.  Give old and new package names the
same EXPORT and EXPORT_OK semantics.  Back-patch to v10 (all supported
versions).

Reviewed by Andrew Dunstan.

Discussion: https://postgr.es/m/20220622072144.gd4167...@rfd.leadboat.com
---
 src/test/perl/PostgreSQL/Test/Cluster.pm |  9 ---
 src/test/perl/PostgreSQL/Test/Utils.pm   | 40 +++---
 src/test/perl/PostgresNode.pm| 23 +++--
 src/test/perl/TestLib.pm | 42 
 4 files changed, 19 insertions(+), 95 deletions(-)

diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm 
b/src/test/perl/PostgreSQL/Test/Cluster.pm
index 12339c2..a855fbc 100644
--- a/src/test/perl/PostgreSQL/Test/Cluster.pm
+++ b/src/test/perl/PostgreSQL/Test/Cluster.pm
@@ -1,9 +1,9 @@
 
 # Copyright (c) 2022, PostgreSQL Global Development Group
 
-# allow use of release 15+ perl namespace in older branches
-# just 'use' the older module name.
-# See PostgresNode.pm for function implementations
+# Allow use of release 15+ Perl package name in older branches, by giving that
+# package the same symbol table as the older package.  See PostgresNode::new
+# for behavior reacting to the class name.
 
 package PostgreSQL::Test::Cluster;
 
@@ -11,5 +11,8 @@ use strict;
 use warnings;
 
 use PostgresNode;
+BEGIN { *PostgreSQL::Test::Cluster:: = \*PostgresNode::; }
+
+use Exporter 'import';
 
 1;
diff --git a/src/test/perl/PostgreSQL/Test/Utils.pm 
b/src/test/perl/PostgreSQL/Test/Utils.pm
index bdbbd6e..e743bdf 100644
--- a/src/test/perl/PostgreSQL/Test/Utils.pm
+++ b/src/test/perl/PostgreSQL/Test/Utils.pm
@@ -1,48 +1,16 @@
 # Copyright (c) 2022, PostgreSQL Global Development Group
 
-# allow use of release 15+ perl namespace in older branches
-# just 'use' the older module name.
-# We export the same names as the v15 module.
-# See TestLib.pm for alias assignment that makes this all work.
+# Allow use of release 15+ Perl package name in older branches, by giving that
+# package the same symbol table as the older package.
 
 package PostgreSQL::Test::Utils;
 
 use strict;
 use warnings;
 
-use Exporter 'import';
-
 use TestLib;
+BEGIN { *PostgreSQL::Test::Utils:: = \*TestLib::; }
 
-our @EXPORT = qw(
-  generate_ascii_string
-  slurp_dir
-  slurp_file
-  append_to_file
-  check_mode_recursive
-  chmod_recursive
-  check_pg_config
-  dir_symlink
-  system_or_bail
-  system_log
-  run_log
-  run_command
-  pump_until
-
-  command_ok
-  command_fails
-  command_exit_is
-  program_help_ok
-  program_version_ok
-  program_options_handling_ok
-  command_like
-

Postgres do not allow to create many tables with more than 63-symbols prefix

2022-06-23 Thread Andrey Lepikhov


Moved from the pgsql-bugs mailing list [1].

On 6/23/22 07:03, Masahiko Sawada wrote:
> Hi,
>
> On Sat, Jun 4, 2022 at 4:03 AM Andrey Lepikhov
>  wrote:
>>
>> According to subj you can try to create many tables (induced by the case
>> of partitioned table) with long prefix - see 6727v.sql for reproduction.
>> But now it's impossible because of logic of the makeUniqueTypeName()
>> routine.
>> You get the error:
>> ERROR:  could not form array type name for type ...
>>
>> It is very corner case, of course. But solution is easy and short. So,
>> why not to fix? - See the patch in attachment.
>
> While this seems to be a good improvement, I think it's not a bug.
> Probably we cannot backpatch it as it will end up having type names
> defined by different naming rules. I'd suggest discussing it on
> -hackers.
Done.

> Regarding the patch, I think we can merge makeUniqueTypeName() to
> makeArrayTypeName() as there is no caller of makeUniqueTypeName() who
> pass tryOriginal = true.
I partially agree with you. But I have one reason to leave 
makeUniqueTypeName() separated:
It may be used in other codes with auto generated types. For example, I 
think, the DefineRelation routine should choose composite type instead 
of using the same name as the table.


> Also looking at other ChooseXXXName()
> functions, we don't care about integer overflow. Is it better to make
> it consistent with them?
Done.

[1] 
https://www.postgresql.org/message-id/flat/121e286f-3796-c9d7-9eab-6fb8e0b9c701%40postgrespro.ru


--
Regards
Andrey Lepikhov
Postgres ProfessionalFrom ae8552b8f9474a6fae893bce501c534cabcfaa13 Mon Sep 17 00:00:00 2001
From: "Andrey V. Lepikhov" 
Date: Fri, 3 Jun 2022 21:40:01 +0300
Subject: [PATCH] Allow postgresql to generate more relations with the same 63
 symbols long prefix. Rewrite the makeUniqueTypeName routine - generator of
 unique name is based on the same idea as the ChooseRelationName() routine.

Authors: Dmitry Koval, Andrey Lepikhov
---
 src/backend/catalog/pg_type.c | 35 ---
 src/test/regress/expected/alter_table.out |  6 ++--
 2 files changed, 22 insertions(+), 19 deletions(-)

diff --git a/src/backend/catalog/pg_type.c b/src/backend/catalog/pg_type.c
index 03788cb975..bae6998360 100644
--- a/src/backend/catalog/pg_type.c
+++ b/src/backend/catalog/pg_type.c
@@ -26,6 +26,7 @@
 #include "catalog/pg_namespace.h"
 #include "catalog/pg_proc.h"
 #include "catalog/pg_type.h"
+#include "commands/defrem.h"
 #include "commands/typecmds.h"
 #include "mb/pg_wchar.h"
 #include "miscadmin.h"
@@ -936,17 +937,17 @@ makeMultirangeTypeName(const char *rangeTypeName, Oid typeNamespace)
  * makeUniqueTypeName
  *		Generate a unique name for a prospective new type
  *
- * Given a typeName, return a new palloc'ed name by prepending underscores
- * until a non-conflicting name results.
+ * Given a typeName, return a new palloc'ed name by prepending underscore
+ * and (if needed) adding a suffix to the end of the type name.
  *
  * If tryOriginal, first try with zero underscores.
  */
 static char *
 makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 {
-	int			i;
-	int			namelen;
-	char		dest[NAMEDATALEN];
+	int		pass = 0;
+	char	suffix[NAMEDATALEN];
+	char   *type_name;
 
 	Assert(strlen(typeName) <= NAMEDATALEN - 1);
 
@@ -956,22 +957,24 @@ makeUniqueTypeName(const char *typeName, Oid typeNamespace, bool tryOriginal)
 			   ObjectIdGetDatum(typeNamespace)))
 		return pstrdup(typeName);
 
+	/* Prepare initial object name. Just for compatibility. */
+	type_name = makeObjectName("", typeName, NULL);
+
 	/*
-	 * The idea is to prepend underscores as needed until we make a name that
-	 * doesn't collide with anything ...
+	 * The idea of unique name generation is similar to ChooseRelationName()
+	 * implementation logic.
 	 */
-	namelen = strlen(typeName);
-	for (i = 1; i < NAMEDATALEN - 1; i++)
+	for(;;)
 	{
-		dest[i - 1] = '_';
-		strlcpy(dest + i, typeName, NAMEDATALEN - i);
-		if (namelen + i >= NAMEDATALEN)
-			truncate_identifier(dest, NAMEDATALEN, false);
-
 		if (!SearchSysCacheExists2(TYPENAMENSP,
-   CStringGetDatum(dest),
+   CStringGetDatum(type_name),
    ObjectIdGetDatum(typeNamespace)))
-			return pstrdup(dest);
+			return type_name;
+
+		/* Previous attempt was failed. Prepare a new one. */
+		pfree(type_name);
+		snprintf(suffix, sizeof(suffix), "%d", ++pass);
+		type_name = makeObjectName("", typeName, suffix);
 	}
 
 	return NULL;
diff --git a/src/test/regress/expected/alter_table.out b/src/test/regress/expected/alter_table.out
index 5ede56d9b5..e634dbd86d 100644
--- a/src/test/regress/expected/alter_table.out
+++ b/src/test/regress/expected/alter_table.out
@@ -197,9 +197,9 @@ SELECT typname FROM pg_type WHERE oid = 'attmp_array[]'::regtype;
 (1 row)
 
 SELECT typname FROM pg_type WHERE oid = '_attmp_array[]'::regtype;
-typname 
-
- ___attmp_array
+ typname 
+-
+

Re: Handle infinite recursion in logical replication setup

2022-06-23 Thread Peter Smith

Here are my review comments for the v23* patch set.


v23-0001


No comments. LGTM


V23-0002


2.1 src/backend/commands/subscriptioncmds.c

+ opts->origin = defGetString(defel);
+
+ if ((strcmp(opts->origin, LOGICALREP_ORIGIN_LOCAL) != 0) &&
+ (strcmp(opts->origin, LOGICALREP_ORIGIN_ANY) != 0))
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized origin value: \"%s\"", opts->origin));
+ }

I thought maybe this should be ERRCODE_INVALID_PARAMETER_VALUE.

~~~

2.2 src/backend/replication/pgoutput/pgoutput.c

+ data->origin = defGetString(defel);
+ if (strcmp(data->origin, LOGICALREP_ORIGIN_LOCAL) == 0)
+ publish_local_origin = true;
+ else if (strcmp(data->origin, LOGICALREP_ORIGIN_ANY) == 0)
+ publish_local_origin = false;
+ else
+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("unrecognized origin value: \"%s\"", data->origin));
+ }

I thought maybe this should be ERRCODE_INVALID_PARAMETER_VALUE.


v23-0003


3.1 Commit message

After the subscription is created on node2, node1 will be synced to
node2 and the newly synced data will be sent to node2, this process of
node1 sending data to node2 and node2 sending data to node1 will repeat
infinitely. If table t1 has a unique key, this will lead to a unique key
violation, and replication won't proceed.

~

31a.
"node2, this process of" -> "node2; this process of"
OR
"node2, this process of" -> "node2. This process of"

31b.
Also, my grammar checker recommends removing the comma after "violation"

~~~

3.2 doc/src/sgml/ref/create_subscription.sgml

@@ -115,7 +115,8 @@ CREATE SUBSCRIPTION subscription_nameconnect
   to false with
   setting create_slot, enabled,
-  or copy_data to true.)
+  or copy_data to
+  true/force.)
  

I am not sure why that last sentence needs to be in parentheses, but
OTOH it seems to be that way already in PG15.

~~~

3.3 doc/src/sgml/ref/create_subscription.sgml

@@ -383,6 +398,15 @@ CREATE SUBSCRIPTION subscription_name

+  
+   If the subscription is created with origin = local and
+   copy_data = true, it will check if the publisher has
+   subscribed to the same table from other publishers and, if so, then throw an
+   error to prevent possible non-local data from being copied. The user can
+   override this check and continue with the copy operation by specifying
+   copy_data = force.
+  

"and, if so, then throw..." -> "and, if so, throw..."

~~~

3.4 src/backend/commands/subscriptioncmds.c

+ ereport(ERROR,
+ errcode(ERRCODE_SYNTAX_ERROR),
+ errmsg("%s requires a boolean or \"force\"", def->defname));
+ return COPY_DATA_OFF; /* keep compiler quiet */
+}

I thought maybe this should be ERRCODE_INVALID_PARAMETER_VALUE.


v23-0004


4.1 Commit message

Document the steps for the following:
a) Creating a two-node bidirectional replication when there is no data
on both nodes.
b) Adding a new node when there is no data on any of the nodes.
c) Adding a new node when data is present on the existing nodes.
d) Generic steps for adding a new node to an existing set of nodes.

~

These pgdocs titles have changed slightly. I think this commit message
text should use match the current pgdocs titles.

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: pg_ls_tmpdir to show directories and shared filesets (and pg_ls_*)

2022-06-23 Thread Justin Pryzby

This thread has been going for 2.5 years, so here's a(nother) recap.

This omits the patches for recursion, since they're optional and evidently a
distraction from the main patches.

On Fri, Dec 27, 2019 at 11:02:20AM -0600, Justin Pryzby wrote:
> The goal is to somehow show tmpfiles (or at least dirs) used by parallel
> workers.

On Thu, Jan 16, 2020 at 08:38:46AM -0600, Justin Pryzby wrote:
> I think if someone wants the full generality, they can do this:
> 
> postgres=# SELECT name, s.size, s.modification, s.isdir FROM (SELECT 
> 'base/pgsql_tmp'p)p, pg_ls_dir(p)name, pg_stat_file(p||'/'||name)s;
>  name | size |  modification  | isdir 
> --+--++---
>  .foo | 4096 | 2020-01-16 08:57:04-05 | t
> 
> In my mind, pg_ls_tmpdir() is for showing tmpfiles, not just a shortcut to
> SELECT pg_ls_dir((SELECT 'base/pgsql_tmp'p)); -- or, for all tablespaces:
> WITH x AS (SELECT format('/PG_%s_%s', 
> split_part(current_setting('server_version'), '.', 1), catalog_version_no) 
> suffix FROM pg_control_system()), y AS (SELECT a, pg_ls_dir(a) AS d FROM 
> (SELECT DISTINCT COALESCE(NULLIF(pg_tablespace_location(oid),'')||suffix, 
> 'base') a FROM pg_tablespace,x)a) SELECT a, pg_ls_dir(a||'/pgsql_tmp') FROM y 
> WHERE d='pgsql_tmp';

On Tue, Mar 10, 2020 at 01:30:37PM -0500, Justin Pryzby wrote:
> I took a step back, and I wondered whether we should add a generic function 
> for
> listing a dir with metadata, possibly instead of changing the existing
> functions.  Then one could do pg_ls_dir_metadata('pg_wal',false,false);
> 
> Since pg8.1, we have pg_ls_dir() to show a list of files.  Since pg10, we've
> had pg_ls_logdir and pg_ls_waldir, which show not only file names but also
> (some) metadata (size, mtime).  And since pg12, we've had pg_ls_tmpfile and
> pg_ls_archive_statusdir, which also show metadata.
> 
> ...but there's no a function which lists the metadata of an directory other
> than tmp, wal, log.
> 
> One can do this:
> |SELECT b.*, c.* FROM (SELECT 'base' a)a, LATERAL (SELECT 
> a||'/'||pg_ls_dir(a.a)b)b, pg_stat_file(b)c;
> ..but that's not as helpful as allowing:
> |SELECT * FROM pg_ls_dir_metadata('.',true,true);
> 
> There's also no function which recurses into an arbitrary directory, so it
> seems shortsighted to provide a function to recursively list a tmpdir.
> 
> Also, since pg_ls_dir_metadata indicates whether the path is a dir, one can
> write a SQL function to show the dir recursively.  It'd be trivial to plug in
> wal/log/tmp (it seems like tmpdirs of other tablespace's are not entirely
> trivial).
> |SELECT * FROM pg_ls_dir_recurse('base/pgsql_tmp');

> It's pretty unfortunate if a function called
> pg_ls_tmpdir hides shared filesets, so maybe it really is best to change that
> (it's new in v12).

On Fri, Mar 13, 2020 at 08:12:32AM -0500, Justin Pryzby wrote:
> The merge conflict presents another opportunity to solicit comments on the new
> approach.  Rather than making "recurse into tmpdir" the end goal:
> 
>   - add a function to show metadata of an arbitrary dir;
>   - add isdir arguments to pg_ls_* functions (including pg_ls_tmpdir but not
> pg_ls_dir).
>   - maybe add pg_ls_dir_recurse, which satisfies the original need;
>   - retire pg_ls_dir (does this work with tuplestore?)
>   - profit
> 
> The alternative seems to be to go back to Alvaro's earlier proposal:
>  - not only add "isdir", but also recurse;
> 
> I think I would insist on adding a general function to recurse into any dir.
> And *optionally* change ps_ls_* to recurse (either by accepting an argument, 
> or
> by making that a separate patch to debate).

On Tue, Mar 31, 2020 at 03:08:12PM -0500, Justin Pryzby wrote:
> The patch intends to fix the issue of "failing to show failed filesets"
> (because dirs are skipped) while also generalizing existing functions (to show
> directories and "isdir" column) and providing some more flexible ones (to list
> file and metadata of a dir, which is currently possible [only] for "special"
> directories, or by recursively calling pg_stat_file).

On Wed, Dec 23, 2020 at 01:17:10PM -0600, Justin Pryzby wrote:
> However, pg_ls_tmpdir is special since it handles tablespace tmpdirs, which it
> seems is not trivial to get from sql:
> 
> +SELECT * FROM (SELECT DISTINCT 
> COALESCE(NULLIF(pg_tablespace_location(b.oid),'')||suffix, 'base/pgsql_tmp') 
> AS dir
> +FROM pg_tablespace b, pg_control_system() pcs,
> +LATERAL format('/PG_%s_%s', left(current_setting('server_version_num'), 2), 
> pcs.catalog_version_no) AS suffix) AS dir,
> +LATERAL pg_ls_dir_recurse(dir) AS a;
> 
> For context, the line of reasoning that led me to this patch series was
> something like this:
> 
> 0) Why can't I list shared tempfiles (dirs) using pg_ls_tmpdir() ?
> 1) Implement recursion for pg_ls_tmpdir();
> 2) Eventually realize that it's silly to implement a function to recurse into
>one particular directory when no general feature exists;
> 3) Implement generic facility;

Re: amcheck is using a wrong macro to check compressed-ness

2022-06-23 Thread Michael Paquier

On Wed, Jun 22, 2022 at 01:14:57PM -0400, Robert Haas wrote:
> Oops, I missed this thread. I think the patch is correct, so I have
> committed it.

Thanks, Robert.
--
Michael


signature.asc
Description: PGP signature

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread John Naylor

On Thu, Jun 23, 2022 at 9:17 PM Andres Freund  wrote:
>
> Hi,
>
> On 2022-06-03 13:28:16 +0700, John Naylor wrote:
> > 1. That would require putting the name physically closer to the end of
> > the column list. To make this less annoying for users, we'd need to
> > separate physical order from display order (at least/at first only for
> > system catalogs).
>
> FWIW, it's not at all clear to me that this would be required. If I were to
> tackle this I'd look at breaking up the mirroring of C structs to catalog
> struct layout, by having genbki (or such) generate functions to map to/from
> tuple layout. It'll be an annoying amount of changes, but feasible, I think.

Hmm, I must have misunderstood this aspect. In my mind I was thinking
that if a varlen attribute were at the end, these functions would make
it easier to access them quickly. But from this and the follow-on
responses, these would be used to access varlen attributes wherever
they may be. I'll take a look at the current uses of GETSTRUCT().

-- 
John Naylor
EDB: http://www.enterprisedb.com

Re: Support logical replication of DDLs

2022-06-23 Thread Masahiko Sawada

On Thu, Jun 23, 2022 at 7:00 PM Amit Kapila  wrote:
>
> On Wed, Jun 22, 2022 at 11:09 AM Masahiko Sawada  
> wrote:
> >
> > I've attached a WIP patch for adding regression tests for DDL deparse.
> > The patch can be applied on
> > v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
> > submitted[1]. The basic idea is to define the event trigger to deparse
> > DDLs, run the regression tests, load the deparsed DDLs to another
> > database cluster, dump both databases, and compare the dumps.
> >
>
> Thanks for working on this. It is a good start. I think this will be
> helpful to see the missing DDL support. Do we want to run this as part
> of every regression run? Won't it increase the regression time as this
> seems to run internally the regression tests twice?

Yes, It will increase the regression test time but we already do a
similar thing in 002_pg_upgrade.pl and 027_stream_regress.pl and it
seems to be worth adding to me.

>
> Do we need a different trigger to capture drop cases as there are
> separate deparsing routines for them, for example
> deparse_drop_table()?

Right, we need to capture drop cases by another trigger.

>
> > [2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
> > 1000;" causes an assertion failure.
> >
>
> Sorry, it is not clear to me whether you are talking about some
> pre-existing bug or a bug in the proposed patch?

I meant there is a bug in the v9 DDL deparse patch.

Regards,


--
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

RE: Fix instability in subscription regression test

2022-06-23 Thread houzj.f...@fujitsu.com



> -Original Message-
> From: Amit Kapila 
> Sent: Friday, June 24, 2022 10:28 AM
> To: Hou, Zhijie/侯 志杰 
> Cc: PostgreSQL Hackers 
> Subject: Re: Fix instability in subscription regression test
> 
> On Thu, Jun 23, 2022 at 4:58 PM houzj.f...@fujitsu.com
>  wrote:
> >
> > Hi,
> >
> > I noticed BF member wrasse failed in 028_row_filter.pl.
> >
> > #   Failed test 'check publish_via_partition_root behavior'
> > #   at t/028_row_filter.pl line 669.
> > #  got: ''
> > # expected: '1|100
> > #  ...
> >
> > Log:
> > 2022-06-23 11:27:42.387 CEST [20589:3] 028_row_filter.pl LOG:
> > statement: ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION WITH
> > (copy_data = true)
> > 2022-06-23 11:27:42.470 CEST [20589:4] 028_row_filter.pl LOG:
> > disconnection: session time: 0:00:00.098 user=nm database=postgres
> > host=[local]
> > 2022-06-23 11:27:42.611 CEST [20593:1] LOG: logical replication table
> > synchronization worker for subscription "tap_sub", table
> "tab_rowfilter_partitioned" has started ...
> > 2022-06-23 11:27:43.197 CEST [20610:3] 028_row_filter.pl LOG:
> > statement: SELECT a, b FROM tab_rowfilter_partitioned ORDER BY 1, 2 ...
> > 2022-06-23 11:27:43.689 CEST [20593:2] LOG: logical replication table
> > synchronization worker for subscription "tap_sub", table
> > "tab_rowfilter_partitioned" has finished
> >
> > From the Log, I can see it query the target table before the table
> > sync is over. So, I think the reason is that we didn't wait for table
> > sync to finish after refreshing the publication. Sorry for not
> > catching that ealier. Here is a patch to fix it.
> >
> 
> +# wait for initial table synchronization to finish
> +$node_subscriber->poll_query_until('postgres', $synced_query)
> 
> We could probably slightly change the comment to say: "wait for table sync to
> finish". Normally, we use initial table sync after CREATE SUBSCRIPTION. This 
> is a
> minor nitpick and I can take care of it before committing unless you think
> otherwise.

Thanks for reviewing, the suggestion looks good to me.

Best regards,
Hou zj

Re: Fix instability in subscription regression test

2022-06-23 Thread Amit Kapila

On Thu, Jun 23, 2022 at 4:58 PM houzj.f...@fujitsu.com
 wrote:
>
> Hi,
>
> I noticed BF member wrasse failed in 028_row_filter.pl.
>
> #   Failed test 'check publish_via_partition_root behavior'
> #   at t/028_row_filter.pl line 669.
> #  got: ''
> # expected: '1|100
> #  ...
>
> Log:
> 2022-06-23 11:27:42.387 CEST [20589:3] 028_row_filter.pl LOG: statement: 
> ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION WITH (copy_data = true)
> 2022-06-23 11:27:42.470 CEST [20589:4] 028_row_filter.pl LOG: disconnection: 
> session time: 0:00:00.098 user=nm database=postgres host=[local]
> 2022-06-23 11:27:42.611 CEST [20593:1] LOG: logical replication table 
> synchronization worker for subscription "tap_sub", table 
> "tab_rowfilter_partitioned" has started
> ...
> 2022-06-23 11:27:43.197 CEST [20610:3] 028_row_filter.pl LOG: statement: 
> SELECT a, b FROM tab_rowfilter_partitioned ORDER BY 1, 2
> ...
> 2022-06-23 11:27:43.689 CEST [20593:2] LOG: logical replication table 
> synchronization worker for subscription "tap_sub", table 
> "tab_rowfilter_partitioned" has finished
>
> From the Log, I can see it query the target table before the table sync is
> over. So, I think the reason is that we didn't wait for table sync to
> finish after refreshing the publication. Sorry for not catching that
> ealier. Here is a patch to fix it.
>

+# wait for initial table synchronization to finish
+$node_subscriber->poll_query_until('postgres', $synced_query)

We could probably slightly change the comment to say: "wait for table
sync to finish". Normally, we use initial table sync after CREATE
SUBSCRIPTION. This is a minor nitpick and I can take care of it before
committing unless you think otherwise.

Your analysis and patch look good to me.

-- 
With Regards,
Amit Kapila.

Re: Fix instability in subscription regression test

2022-06-23 Thread Masahiko Sawada

On Thu, Jun 23, 2022 at 8:28 PM houzj.f...@fujitsu.com
 wrote:
>
> Hi,
>
> I noticed BF member wrasse failed in 028_row_filter.pl.
>
> #   Failed test 'check publish_via_partition_root behavior'
> #   at t/028_row_filter.pl line 669.
> #  got: ''
> # expected: '1|100
> #  ...
>
> Log:
> 2022-06-23 11:27:42.387 CEST [20589:3] 028_row_filter.pl LOG: statement: 
> ALTER SUBSCRIPTION tap_sub REFRESH PUBLICATION WITH (copy_data = true)
> 2022-06-23 11:27:42.470 CEST [20589:4] 028_row_filter.pl LOG: disconnection: 
> session time: 0:00:00.098 user=nm database=postgres host=[local]
> 2022-06-23 11:27:42.611 CEST [20593:1] LOG: logical replication table 
> synchronization worker for subscription "tap_sub", table 
> "tab_rowfilter_partitioned" has started
> ...
> 2022-06-23 11:27:43.197 CEST [20610:3] 028_row_filter.pl LOG: statement: 
> SELECT a, b FROM tab_rowfilter_partitioned ORDER BY 1, 2
> ...
> 2022-06-23 11:27:43.689 CEST [20593:2] LOG: logical replication table 
> synchronization worker for subscription "tap_sub", table 
> "tab_rowfilter_partitioned" has finished
>
> From the Log, I can see it query the target table before the table sync is
> over. So, I think the reason is that we didn't wait for table sync to
> finish after refreshing the publication. Sorry for not catching that
> ealier. Here is a patch to fix it.

+1

The patch looks good to me.

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 16:38:12 +0900, Michael Paquier wrote:
> On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote:
> > On 2022-06-21 Tu 17:25, Andres Freund wrote:
> >> On 2022-06-21 17:11:33 -0400, Andrew Dunstan wrote:
> >>> I and a couple of colleagues have looked it over. As far as it goes the
> >>> json fix looks kosher to me. I'll play with it some more.
> >>
> >> Cool.
> >>
> >> Any chance you could look at fixing the "structure" of the generated
> >> expression "program". The recursive ExecEvalExpr() calls are really not 
> >> ok...
>
> By how much does the size of ExprEvalStep go down once you don't
> inline the JSON structures as of 0004 in [1]?  And what of 0003?

0004 gets us back to 64 bytes, if 0003 is applied first. 0003 alone doesn't
yield a size reduction, because obviously 0004 is the bigger problem. Applying
just 0004 you end up with 88 bytes.

> The JSON portions seem like the largest portion of the cake, though both are
> must-fixes.

Yep.

> > Yes, but I don't guarantee to have a fix in time for Beta2.
>
> IMHO, it would be nice to get something done for beta2.  Now the
> thread is rather fresh and I guess that more performance study is
> required even for 0004, so..

I don't think there's a whole lot of performance study needed for 0004 - the
current state is obviously wrong.

I think Andrew's beta 2 comment was more about my other architectural
complains around the json expression eval stuff.

> Waiting for beta3 would a better move at this stage.  Is somebody confident
> enough in the patches proposed?

0001 is the one that needs to most careful analysis, I think. 0002 I'd be fine
with pushing after reviewing it again. For 0003 David's approach might be
better or worse, it doesn't matter much I think. 0004 is ok I think, perhaps
with the exception of quibbling over some naming decisions?

Greetings,

Andres Freund

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 11:38:29 +0300, Aleksander Alekseev wrote:
> > I know little about parquet - can it support FROM STDIN efficiently?
> 
> Parquet is a compressed binary format with data grouped by columns
> [1]. I wouldn't assume that this is a primary use case for this
> particular format.

IMO decent COPY FROM / TO STDIN support is crucial, because otherwise you
can't do COPY from/to a client. Which would make the feature unusable for
anybody not superuser, including just about all users of hosted PG.

Greetings,

Andres Freund

Re: Race condition in TransactionIdIsInProgress

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 22:03:27 +0300, Heikki Linnakangas wrote:
> On 12/02/2022 05:42, Andres Freund wrote:
> > The only reason I'm so far not succeeding in turning it into an
> > isolationtester spec is that a transaction waiting for SyncRep doesn't count
> > as waiting for isolationtester.
> >
> > Basically
> >
> > S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
> > S2: SELECT pg_xact_status($xid);
> > S2: UPDATE;
> >
> > suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
> > cache, which then causes the TransactionIdIsInProgress() to take the early
> > return path, despite the transaction still being in progress. Which then
> > allows the update to proceed, despite the S1 not having "properly committed"
> > yet.
>
> I started to improve isolationtester, to allow the spec file to specify a
> custom query to check for whether the backend is blocked. But then I
> realized that it's not quite enough: there is currently no way write the
> "$xid = txid_current()" step either. Isolationtester doesn't have variables
> like that. Also, you need to set synchronous_standby_names to make it block,
> which seems a bit weird in an isolation test.

I don't think we strictly need $xid - it likely can be implemented via
something like
  SELECT pg_xact_status(backend_xid) FROM pg_stat_activity WHERE 
application_name = 'testname/s1';


> I wish we had an explicit way to inject delays or failures. We discussed it
> before 
> (https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com),
> but I don't want to pick up that task right now.

Understandable :(


> Anyway, I wrote a TAP test for this instead. See attached. I'm not sure if
> this is worth committing, the pg_xact_status() trick is quite special for
> this particular bug.

Yea, not sure either.


> Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks good
> to me. Replying to the discussion on that:
>
> On 12/02/2022 23:50, Andres Freund wrote:
> > On 2022-02-12 13:25:58 +, Simon Riggs wrote:
> > > I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
> > > in backbranches.
> >
> > I think it'd be fine if we needed to. Or we could just make it always return
> > false or such.
> >
> >
> > > > > just removes the known offending call, passes make check, but IMHO
> > > > > leaves the same error just as likely by other callers.
> > > >
> > > > Which other callers are you referring to?
> > >
> > > I don't know of any, but we can't just remove a function in a
> > > backbranch, can we?
> >
> > We have in the past, if it's a "sufficiently internal" function.
>
> I think we should remove it in v15, and leave it as it is in backbranches.
> Just add a comment to point out that the name is a bit misleading, because
> it checks the clog rather than the proc array.  It's not inherently
> dangerous, and someone might have a legit use case for it. True, someone
> might also be doing this incorrect thing with it, but still seems like the
> right balance to me.
>
> I think we also need to change pg_xact_status(), to also call
> TransactionIdIsInProgress() before TransactionIdDidCommit/Abort().
> Currently, if it returns "committed", and you start a new transaction, the
> new transaction might not yet see the effects of that "committed"
> transaction. With that, you cannot reproduce the original bug with
> pg_xact_status() though, so there's no point in committing that test then.
>
> In summary, I think we should:
> - commit and backpatch Simon's
> just_remove_TransactionIdIsKnownCompleted_call.v1.patch
> - fix pg_xact_status() to check TransactionIdIsInProgress()
> - in v15, remove TransationIdIsKnownCompleted function altogether
>
> I'll try to get around to that in the next few days, unless someone beats me
> to it.

Makes sense.

Greetings,

Andres Freund

Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-24 00:39:41 +, Bagga, Rishu wrote:
> >When do we need to do so? We should never need to figure out whether an
> >on-disk block is for an SLRU or something else, without also knowing >which
> >relation / SLRU it is in.
> 
> You are correct that we wouldn’t need to rely on the pd_flag bit to
> determine page type for any access to a page where we come top down
> following the hierarchy. However, for the purpose of debugging “from the
> bottom up” it would be critical to know what type of page is being read in a
> system with multiple page header types.

That doesn't seem to justify using a bit on the page. Wouldn't it suffice to
add such information to the BufferDesc?

Greetings,

Andres Freund

Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-23 Thread Bagga, Rishu

Hi Andres,

>When do we need to do so? We should never need to figure out whether an
>on-disk block is for an SLRU or something else, without also knowing >which
>relation / SLRU it is in.

You are correct that we wouldn’t need to rely on the pd_flag bit to determine 
page type for any access to a page where we come top down following the 
hierarchy. However, for the purpose of debugging “from the bottom up” it would 
be critical to know what type of page is being read in a system with multiple 
page header types.

On 6/23/22, 2:22 PM, "Andres Freund"  wrote:

Hi,

On 2022-06-23 20:25:21 +, Bagga, Rishu wrote:
> >> 3. A flag to identify if the page is a relational or BufferedObject
> >Why is this needed in the page header?
>
> Now that we are dealing with two different type of page headers, we need 
to
> know how to interpret any given page. We need to use pd_flags to determine
> this.

When do we need to do so? We should never need to figure out whether an
on-disk block is for an SLRU or something else, without also knowing which
relation / SLRU it is in.

Greetings,

Andres Freund

Re: fix crash with Python 3.11

2022-06-23 Thread Tom Lane

Markus Wanner  writes:
> On 6/23/22 15:34, Tom Lane wrote:
>> Under what circumstances would it be OK for outside code to call
>> SPICleanup?

> For the same reasons previous Postgres versions called SPICleanup: from 
> a sigsetjmp handler that duplicates most of what Postgres does in such a 
> situation.

Does such code exist?  I don't see any other calls in Debian code search,
and I find it hard to believe that anyone would think such a thing is
maintainable.

regards, tom lane

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jun 23, 2022 at 5:49 PM Andres Freund  wrote:
>> I was thinking we'd basically do it wherever we do a GETSTRUCT() today.

> That seems a little fraught, because you'd be turning what's now
> basically a trivial operation into a non-trivial operation involving
> memory allocation.

Nonetheless, the presence of GETSTRUCT calls should be a good guide
to where we need to do something.

regards, tom lane

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 5:49 PM Andres Freund  wrote:
> I was thinking we'd basically do it wherever we do a GETSTRUCT() today.
>
> A first step could be to transform code like
> (Form_pg_attribute) GETSTRUCT(tuple)
> into
>GETSTRUCT(pg_attribute, tuple)
>
> then, in a subsequent step, we'd redefine GETSTRUCT as something
> #define GESTRUCT(catalog, tuple) tuple_to_struct_##catalog(tuple)

That seems a little fraught, because you'd be turning what's now
basically a trivial operation into a non-trivial operation involving
memory allocation.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 14:42:17 -0400, Robert Haas wrote:
> On Thu, Jun 23, 2022 at 2:07 PM Tom Lane  wrote:
> > The extra cost of the deforming step could also be repaid, in some
> > cases, by not having to use SysCacheGetAttr etc later on to fetch
> > variable-length fields.  That is, I'm imagining that the deformer
> > would extract all the fields, even varlena ones, and drop pointers
> > or whatever into fields of the C struct.

I was also thinking we'd translate all attributes. Not entirely sure whether
we'd want to use "plain" pointers though - there are some places where we rely
on being able to copy such structs around. That'd be a bit easier with
relative pointers, pointing to the end of the struct. But likely the
notational overhead of dealing with relative pointers would be far higher than
the notational overhead of having to invoke a generated "tuple struct" copy
function. Which we'd likely need anyway, because some previously statically
sized allocations would end up being variable sized?

> Yeah, if we were going to do something like this, I can't see why we
> wouldn't do it this way. It wouldn't make sense to do it for only some
> of the attributes.

Agreed.

> I'm not sure exactly where we would put this translation step, though.
> I think for the syscaches and relcache we'd want to translate when
> populating the cache so that when you do a cache lookup you get the
> data already translated. It's hard to be sure without testing, but
> that seems like it would make this cheap enough that we wouldn't have
> to be too worried, since the number of times we build new cache
> entries should be small compared to the number of times we access
> existing ones. The trickier thing might be code that uses
> systable_beginscan() et. al. directly.

I was thinking we'd basically do it wherever we do a GETSTRUCT() today.

A first step could be to transform code like
(Form_pg_attribute) GETSTRUCT(tuple)
into
   GETSTRUCT(pg_attribute, tuple)

then, in a subsequent step, we'd redefine GETSTRUCT as something
#define GESTRUCT(catalog, tuple) tuple_to_struct_##catalog(tuple)

Greetings,

Andres Freund

Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 20:25:21 +, Bagga, Rishu wrote:
> >> 3. A flag to identify if the page is a relational or BufferedObject
> >Why is this needed in the page header?
> 
> Now that we are dealing with two different type of page headers, we need to
> know how to interpret any given page. We need to use pd_flags to determine
> this.

When do we need to do so? We should never need to figure out whether an
on-disk block is for an SLRU or something else, without also knowing which
relation / SLRU it is in.

Greetings,

Andres Freund

Re: CI and test improvements

2022-06-23 Thread Thomas Munro

[Resending -- sorry if you receive this twice.  Jacob's mail server
has apparently offended the list management software so emails bounce
if he's in CC.]

On Fri, Jun 24, 2022 at 7:23 AM Justin Pryzby  wrote:
> Freebsd uploads an artifact 3x smaller than the size ccache reports, because
> its ccache is old so doesn't enable compression by default.

That port/package got stuck on 3.x because of some dependency problems
when using in bootstrapping FreeBSD itself (or other ports?),
apparently.  I didn't follow the details but the recent messages here
sound hopeful and I'm keeping my eye on it to see if 4.x lands as a
separate package we'd need to install, or something.  Fingers crossed!

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=234971

> I've also sent some patches to Thomas for cfbot to help progress some of these
> patches (code coverage and documentation upload as artifacts).
> https://github.com/justinpryzby/cfbot/commits/master

Thanks, sorry for lack of action, will get to these soon.

Re: tablesync copy ignores publication actions

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 2:13 AM Amit Kapila  wrote:
> The patch looks good to me as well. I will push this patch in HEAD (as
> per option (a)) tomorrow unless I see any more suggestions/comments.

The example seems to demonstrate the point quite well but one thing
that I notice is that it is quite long. I don't really see an obvious
way of making it shorter without making it less clear, so perhaps
that's fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-23 Thread Robert Haas

On Wed, Jun 22, 2022 at 5:06 PM Bagga, Rishu  wrote:
> We are suggesting a minimal BufferedObject page header
> to be the following, overlapping with the key fields near the beginning
> of the regular PageHeaderData:
>
> typedef struct BufferedObjectPageHeaderData
> {
> PageXLogRecPtr pd_lsn;
> uint16_t   pd_checksum;
> uint16_t   pd_flags;
> uint16_t   pd_pagesize_version;
> } BufferedObjectPageHeaderData;
>
> For reference, the regular page header looks like the following:
> typedef struct PageHeaderData
> {
> PageXLogRecPtrpd_lsn;
> uint16_tpd_checksum;
> uint16_tpd_flags;
> LocationIndex   pd_lower;
> LocationIndex   pd_upper;
> LocationIndex   pd_special;
> uint16_t   pd_pagesize_version;
> TransactionId   pd_prune_xid;
> ItemIdDataCommon  pd_linp[];
> } PageHeaderData;
>
> After careful review, we have trimmed out the heap and index specific
> fields from the suggested header that do not add any value to SLRU
> components. We plan to use pd_lsn, pd_checksum, and pd_pagesize_version
> in the same way that they are in relational pages. These fields are
> needed to ensure consistency, durability and page correctness

I think that it's not worth introducing a new page header format to
save 10 bytes per page. Keeping things on the same format is likely to
save more than the minor waste of space costs.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: SLRUs in the main buffer pool - Page Header definitions

2022-06-23 Thread Bagga, Rishu

Hi Andres,
Thanks for your response.

To answer your questions:

>> 3. A flag to identify if the page is a relational or BufferedObject
>Why is this needed in the page header?

Now that we are dealing with two different type of page headers, we need to 
know how to interpret any given page. We need to use pd_flags to determine this.

>How are you proposing to deal with this in the "key" to "offset in >SLRU"
>mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I >assume
>you're thinking to deal with this by making the conversion math a bit >more
>complicated?

You’re right; we would have to account for this in the conversion math between 
the ‘key’ and ‘offset’. The change to the macros would be as following:

#define MULTIXACT_OFFSETS_PER_PAGE ((BLCKSZ - 
SizeOfBufferedObjectPageHeaderData) / sizeof(MultiXactOffset))

Additionally, we need to account for the size of the page header when reading 
and writing multixacts in memory based off of the entryno.

Rishu Bagga

Amazon Web Services (AWS)

On 6/22/22, 2:40 PM, "Andres Freund"  wrote:

Hi,

On 2022-06-22 21:06:29 +, Bagga, Rishu wrote:
> 3. A flag to identify if the page is a relational or BufferedObject

Why is this needed in the page header?

> Using the new BufferedObject page header will be space efficient but
> introduces a significant change in the codebase to now track two types
> of page header data. During upgrade, all SLRU files that exist on the
> system must be converted to the new format with page header. This will
> require rewriting all the SLRU pages with the page header as part of
> pg_upgrade.

How are you proposing to deal with this in the "key" to "offset in SLRU"
mapping? E.g. converting a xid to an offset in the pg_xact SLRU. I assume
you're thinking to deal with this by making the conversion math a bit more
complicated?

Greetings,

Andres Freund

making relfilenodes 56 bits

2022-06-23 Thread Robert Haas

[ changing subject line so nobody misses what's under discussion ]

For a quick summary of the overall idea being discussed here and some
discussion of the problems it solves, see
http://postgr.es/m/ca+tgmobm5fn5x0u3tsponvk_tzpfcdbchxsxcoy1ytn1dxr...@mail.gmail.com

For discussion of the proposed renaming of non-user-visible references
to relfilenode to either RelFileLocator or RelFileNumber as
preparatory refactoring work for that change, see
http://postgr.es/m/ca+tgmoamotxbvaqf9hwfzonuo6bhhjs6tozqd7hz-pmojta...@mail.gmail.com

On Thu, Jun 23, 2022 at 3:55 AM Dilip Kumar  wrote:
> I have worked on this renaming stuff first and once we agree with that
> then I will rebase the other patches on top of this and will also work
> on the other review comments for those patches.
> So basically in this patch
> - The "RelFileNode" structure to "RelFileLocator" and also renamed
> other internal member as below
> typedef struct RelFileLocator
> {
>   Oid spcOid; /* tablespace */
>   Oid dbOid; /* database */
>   Oid relNumber; /* relation */
> } RelFileLocator;

I like those structure member names fine, but I'd like to see this
preliminary patch also introduce the RelFileNumber typedef as an alias
for Oid. Then the main patch can change it to be uint64.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: fix crash with Python 3.11

2022-06-23 Thread Markus Wanner


On 6/23/22 15:34, Tom Lane wrote:

Under what circumstances would it be OK for outside code to call
SPICleanup?


For the same reasons previous Postgres versions called SPICleanup: from 
a sigsetjmp handler that duplicates most of what Postgres does in such a 
situation.


However, I think that's the wrong question to ask for a stable branch. 
Postgres did export this function in previous versions. Removing it 
altogether constitutes an API change and makes extensions that link to 
it fail to even load, which is a bad way to fail after a patch version 
upgrade. Even if its original use was not sound in the first place.


Ofc my proposed patch is not meant for master, only for stable branches.

Best Regards

Markus

Re: CI and test improvements

2022-06-23 Thread Justin Pryzby

On Sat, May 28, 2022 at 10:37:41AM -0500, Justin Pryzby wrote:
> I'm "joining" a bunch of unresolved threads hoping to present them better 
> since
> they're related and I'm maintaining them together anyway.

This resolves an error with libpq tests in an intermediate commit, probably
caused by rebasing (and maybe hidden by the fact that the tests weren't being
run).

And updates ccache to avoid CCACHE_COMPILER.

Should any of the test changes go into v15 ?

> Subject: [PATCH 02/19] cirrus/vcregress: test modules/contrib with 
> NO_INSTALLCHECK=1
> Subject: [PATCH 08/19] vcregress: add alltaptests
> Subject: [PATCH 14/19] Move libpq_pipeline test into src/interfaces/libpq.
> Subject: [PATCH 15/19] msvc: do not install libpq test tools by default

I also added: cirrus/ccache: disable compression and show stats

Since v4.0, ccache enables zstd compression by default, saving roughly 2x-3x.
But, cirrus caches are compressed as tar.gz, so we could disable ccache
compression, allowing cirrus to gzip the uncompressed data (better than
ccache's default of zstd-1).

https://cirrus-ci.com/build/5194596123672576
debian/bullseye has ccache 4.2; cirrus says 92MB cache after a single 
compilation; cache_size_kibibyte 102000
macos: has 4.5.1: 46MB cache; cache_size_kibibyte51252
freebsd: has 3.7.12: 41MB cache; cache_size_kibibyte 130112

For some reason, mac's ccache uses 2x less space than either freesbsd or linux.
Linux is ~30% larger.
Freebsd uploads an artifact 3x smaller than the size ccache reports, because
its ccache is old so doesn't enable compression by default.

I've also sent some patches to Thomas for cfbot to help progress some of these
patches (code coverage and documentation upload as artifacts).
https://github.com/justinpryzby/cfbot/commits/master

-- 
Justin
>From 61b535bf97c3910e46029f294d40f6fbe72351d5 Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Wed, 25 May 2022 21:53:22 -0500
Subject: [PATCH 01/21] cirrus/windows: add compiler_warnings_script

I'm not sure how to write this test in windows shell; it's also not easy to
write it in posix sh, since windows shell is somehow interpretting && and ||...

https://www.postgresql.org/message-id/20220212212310.f645c6vw3njkgxka%40alap3.anarazel.de

See also: 8a1ce5e54f6d144e4f8e19af7c767b026ee0c956

ci-os-only: windows

https://cirrus-ci.com/task/6183879907213312
https://cirrus-ci.com/task/4876271443247104
---
 .cirrus.yml|  8 +++-
 src/tools/ci/windows-compiler-warnings | 16 
 2 files changed, 23 insertions(+), 1 deletion(-)
 create mode 100755 src/tools/ci/windows-compiler-warnings

diff --git a/.cirrus.yml b/.cirrus.yml
index f23d6cae552..bcb8d53db78 100644
--- a/.cirrus.yml
+++ b/.cirrus.yml
@@ -370,7 +370,8 @@ task:
 # ForceNoAlign prevents msbuild from introducing line-breaks for long lines
 # disable file tracker, we're never going to rebuild, and it slows down the
 #   build
-MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo
+# -fileLoggerParameters1: write warnings to msbuild.warn.log.
+MSBFLAGS: -m -verbosity:minimal "-consoleLoggerParameters:Summary;ForceNoAlign" /p:TrackFileAccess=false -nologo -fileLoggerParameters1:warningsonly;logfile=msbuild.warn.log
 
 # If tests hang forever, cirrus eventually times out. In that case log
 # output etc is not uploaded, making the problem hard to debug. Of course
@@ -450,6 +451,11 @@ task:
 cd src/tools/msvc
 %T_C% perl vcregress.pl ecpgcheck
 
+  # These should be last, so all the important checks are always run
+  always:
+compiler_warnings_script:
+  - sh src\tools\ci\windows-compiler-warnings msbuild.warn.log
+
   on_failure:
 <<: *on_failure
 crashlog_artifacts:
diff --git a/src/tools/ci/windows-compiler-warnings b/src/tools/ci/windows-compiler-warnings
new file mode 100755
index 000..d6f9a1fc569
--- /dev/null
+++ b/src/tools/ci/windows-compiler-warnings
@@ -0,0 +1,16 @@
+#! /bin/sh
+# Success if the given file doesn't exist or is empty, else fail
+# This is a separate file only to avoid dealing with windows shell quoting and escaping.
+set -e
+
+fn=$1
+
+if [ -s "$fn" ]
+then
+	# Display the file's content, then exit indicating failure
+	cat "$fn"
+	exit 1
+else
+	# Success
+	exit 0
+fi
-- 
2.17.1

>From 3abc971e5c9ddd2c237d994c17a579d1e86869eb Mon Sep 17 00:00:00 2001
From: Justin Pryzby 
Date: Sun, 9 Jan 2022 18:25:02 -0600
Subject: [PATCH 02/21] cirrus/vcregress: test modules/contrib with
 NO_INSTALLCHECK=1

--temp-config must be specified with an "=" because otherwise vcregress runs
pg_regress --temp-config test1 test2 [...],
..which means test1 gets eaten as the argument to --temp-config

https://www.postgresql.org/message-id/20220109191649.GL14051%40telsasoft.com
https://www.postgresql.org/message-id/CA%2BhUKGLneD%2Bq%2BE7upHGwn41KGvbxhsKbJ%2BM-y9nvv7_Xjv8Qog%40mail.gmail.com
---
 .cirrus.yml

Re: Race condition in TransactionIdIsInProgress

2022-06-23 Thread Heikki Linnakangas


On 12/02/2022 05:42, Andres Freund wrote:

On 2022-02-11 16:41:24 -0800, Andres Freund wrote:

FWIW, I've indeed reproduced this fairly easily with such a setup. A pgbench
r/w workload that's been modified to start 70 savepoints at the start shows

pgbench: error: client 22 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3853739 
is uncommitted in tuple (2,159) to be updated in table "pgbench_branches"
pgbench: error: client 13 script 0 aborted in command 12 query 0: ERROR:  t_xmin 3954305 
is uncommitted in tuple (2,58) to be updated in table "pgbench_branches"
pgbench: error: client 7 script 0 aborted in command 12 query 0: ERROR:  t_xmin 4017908 
is uncommitted in tuple (3,44) to be updated in table "pgbench_branches"

after a few minutes of running with a local, not slowed down, syncrep. Without
any other artifical slowdowns or such.


And this can easily be triggered even without subtransactions, in a completely
reliable way.

The only reason I'm so far not succeeding in turning it into an
isolationtester spec is that a transaction waiting for SyncRep doesn't count
as waiting for isolationtester.

Basically

S1: BEGIN; $xid = txid_current(); UPDATE; COMMIT; 
S2: SELECT pg_xact_status($xid);
S2: UPDATE;

suffices, because the pg_xact_status() causes an xlog fetch, priming the xid
cache, which then causes the TransactionIdIsInProgress() to take the early
return path, despite the transaction still being in progress. Which then
allows the update to proceed, despite the S1 not having "properly committed"
yet.


I started to improve isolationtester, to allow the spec file to specify 
a custom query to check for whether the backend is blocked. But then I 
realized that it's not quite enough: there is currently no way write the 
  "$xid = txid_current()" step either. Isolationtester doesn't have 
variables like that. Also, you need to set synchronous_standby_names to 
make it block, which seems a bit weird in an isolation test.


I wish we had an explicit way to inject delays or failures. We discussed 
it before 
(https://www.postgresql.org/message-id/flat/CANXE4TdxdESX1jKw48xet-5GvBFVSq%3D4cgNeioTQff372KO45A%40mail.gmail.com), 
but I don't want to pick up that task right now.


Anyway, I wrote a TAP test for this instead. See attached. I'm not sure 
if this is worth committing, the pg_xact_status() trick is quite special 
for this particular bug.


Simon's just_remove_TransactionIdIsKnownCompleted_call.v1.patch looks 
good to me. Replying to the discussion on that:


On 12/02/2022 23:50, Andres Freund wrote:

On 2022-02-12 13:25:58 +, Simon Riggs wrote:

I'm not sure it is possible to remove TransactionIdIsKnownCompleted()
in backbranches.


I think it'd be fine if we needed to. Or we could just make it always return
false or such.



just removes the known offending call, passes make check, but IMHO
leaves the same error just as likely by other callers.


Which other callers are you referring to?


I don't know of any, but we can't just remove a function in a
backbranch, can we?


We have in the past, if it's a "sufficiently internal" function.


I think we should remove it in v15, and leave it as it is in 
backbranches. Just add a comment to point out that the name is a bit 
misleading, because it checks the clog rather than the proc array.  It's 
not inherently dangerous, and someone might have a legit use case for 
it. True, someone might also be doing this incorrect thing with it, but 
still seems like the right balance to me.


I think we also need to change pg_xact_status(), to also call 
TransactionIdIsInProgress() before TransactionIdDidCommit/Abort(). 
Currently, if it returns "committed", and you start a new transaction, 
the new transaction might not yet see the effects of that "committed" 
transaction. With that, you cannot reproduce the original bug with 
pg_xact_status() though, so there's no point in committing that test then.


In summary, I think we should:
- commit and backpatch Simon's 
just_remove_TransactionIdIsKnownCompleted_call.v1.patch

- fix pg_xact_status() to check TransactionIdIsInProgress()
- in v15, remove TransationIdIsKnownCompleted function altogether

I'll try to get around to that in the next few days, unless someone 
beats me to it.


- Heikkidiff --git a/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
new file mode 100644
index 00..8a651c0c60
--- /dev/null
+++ b/src/test/modules/test_misc/t/004_bug_xact_inprogress.pl
@@ -0,0 +1,86 @@
+# Test for visibility checks, when an updating transaction has already
+# written commit record but has not yet removed its entry from the
+# proc array, and thus isn't yet visible to other transactions.
+#
+# In particular, this reproduced an old bug in that, see:
+# https://www.postgresql.org/message-id/flat/4da7913d-398c-e2ad-d777-f752cf7f0bbb%40garret.ru
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 2:07 PM Tom Lane  wrote:
> Sounds worth investigating, anyway.  It'd also get us out from under
> C-struct-related problems such as the nearby AIX alignment issue.

Yeah.

> The extra cost of the deforming step could also be repaid, in some
> cases, by not having to use SysCacheGetAttr etc later on to fetch
> variable-length fields.  That is, I'm imagining that the deformer
> would extract all the fields, even varlena ones, and drop pointers
> or whatever into fields of the C struct.

Yeah, if we were going to do something like this, I can't see why we
wouldn't do it this way. It wouldn't make sense to do it for only some
of the attributes.

I'm not sure exactly where we would put this translation step, though.
I think for the syscaches and relcache we'd want to translate when
populating the cache so that when you do a cache lookup you get the
data already translated. It's hard to be sure without testing, but
that seems like it would make this cheap enough that we wouldn't have
to be too worried, since the number of times we build new cache
entries should be small compared to the number of times we access
existing ones. The trickier thing might be code that uses
systable_beginscan() et. al. directly.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Tom Lane

Robert Haas  writes:
> On Thu, Jun 23, 2022 at 10:17 AM Andres Freund  wrote:
>> FWIW, I don't agree that this is a reasonable way to tackle changing
>> NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small
>> fraction of the problem of making Names variable length. You'll still have 
>> all
>> the problems of name fields being accessed all over, but now not being
>> included in the part of the struct visible to C etc.

> Yeah, I agree. I think that being able to reorder columns logically
> without changing anything physically would be cool, but I think we
> could find some way of making the catalog columns variable-length
> without introducing that feature.

Agreed.  Bearing in mind that multiple smart people have tackled that idea
and gotten nowhere, I think setting it as a prerequisite for this project
is a good way of ensuring this won't happen.

> I'm not sure whether your idea of creating translator functions is a
> good one or not, but it doesn't seem too crazy. It'd be like a special
> purpose tuple deformer.

Sounds worth investigating, anyway.  It'd also get us out from under
C-struct-related problems such as the nearby AIX alignment issue.

The extra cost of the deforming step could also be repaid, in some
cases, by not having to use SysCacheGetAttr etc later on to fetch
variable-length fields.  That is, I'm imagining that the deformer
would extract all the fields, even varlena ones, and drop pointers
or whatever into fields of the C struct.

regards, tom lane

Re: allow specifying action when standby encounters incompatible parameter settings

2022-06-23 Thread Nathan Bossart

Thanks for chiming in.

On Thu, Jun 23, 2022 at 04:19:45PM +0100, Simon Riggs wrote:
> I don't understand why you need this patch at all.
> 
> Since you have automation, you can use that layer to automatically
> restart all standbys at once, if you choose, without any help or
> hindrance from PostgreSQL.
> 
> But I really don't want you to do that, since that results in loss of
> availability of the service. I'd like you to try a little harder and
> automate this in a way that allows the service to continue with some
> standbys available while others restart.
> 
> All this feature does is allow you to implement things in a lazy way
> that causes a loss of availability for users. I don't think that is of
> benefit to PostgreSQL users, so -1 from me, on this patch (only),
> sorry about that.

Overall, this is intended for users that care more about keeping WAL replay
caught up than a temporary loss of availability due to a restart.  Without
this, I'd need to detect that WAL replay has paused due to insufficient
parameters and restart Postgres.  If І can configure Postgres to
automatically shut down in these scenarios, my automation can skip right to
adjusting the parameters and starting Postgres up.  Of course, if you care
more about availability, you'd keep this parameter set to the default
(pause) and restart on your own schedule.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: [PoC] Let libpq reject unexpected authentication requests

2022-06-23 Thread Jacob Champion

On Wed, Jun 22, 2022 at 5:56 PM David G. Johnston
 wrote:
> That just makes me want to not implement OR'ing...
>
> The existing set of individual parameters doesn't work as an API for 
> try-and-fallback.
>
> Something like would be less problematic when it comes to setting multiple 
> related options:
>
> --auth-try 
> "1;sslmode=require;channel_binding=require;method=scram-sha-256;sslcert=/tmp/machine.cert;sslkey=/tmp/machine.key"
> --auth-try 
> "2;sslmode=require;method=cert;sslcert=/tmp/me.cert;sslkey=/tmp/me.key"
> --auth-try "3;sslmode=prefer;method=md5"

I think that's a fair point, and your --auth-try example definitely
illustrates why having require_auth try to do everything is probably
not a viable strategy. My arguments for keeping OR in spite of that
are

- the default is effectively an OR of all available methods (plus "none");
- I think NOT is a important case in practice, which is effectively a
negative OR ("anything but this/these"); and
- not providing an explicit, positive OR to complement the above seems
like it would be a frustrating user experience once you want to get
just a little bit more creative.

It's also low-hanging fruit that doesn't require multiple connections
to the server per attempt (which I think your --auth-try proposal
might, if I understand it correctly).

> Absent that radical idea, require_auth probably shouldn't change any behavior 
> that exists today without having specified require_auth and having the chosen 
> method happen anyway.  So whatever happens today with an md5 password prompt 
> while channel_binding is set to require (not in the mood right now to figure 
> out how to test that on a compiled against HEAD instance).

I think the newest tests in v3 should enforce that, but let me know if
I've missed something.

--Jacob

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Nathan Bossart

On Thu, Jun 23, 2022 at 09:46:28AM -0400, Robert Haas wrote:
> I do agree that a general mechanism for getting cleanup tasks done in
> the background could be a useful thing to have, but I feel like it's
> hard to see exactly how to make it work well. We can't just allow it
> to spin up a million new processes, but at the same time, if it can't
> guarantee that time-critical tasks get performed relatively quickly,
> it's pretty worthless.

My intent with this new auxiliary process is to offload tasks that aren't
particularly time-critical.  They are only time-critical in the sense that
1) you might eventually run out of space and 2) you might encounter
wraparound with the logical replication files.  But AFAICT these same risks
exist today in the checkpointer approach, although maybe not to the same
extent.  In any case, 2 seems solvable to me outside of this patch set.

I'm grateful for the discussion in this thread so far, but I'm not seeing a
clear path forward.  I'm glad to see threads like the one to stop doing
end-of-recovery checkpoints [0], but I don't know if it will be possible to
solve all of these availability concerns in a piecemeal fashion.  I remain
open to exploring other suggested approaches beyond creating a new
auxiliary process.

[0] 
https://postgr.es/m/CA%2BTgmobrM2jvkiccCS9NgFcdjNSgAvk1qcAPx5S6F%2BoJT3D2mQ%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs

On Thu, 23 Jun 2022 at 14:46, Robert Haas  wrote:
>
> On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs
>  wrote:
> > Having a central cleanup process makes a lot of sense. There is a long
> > list of potential tasks for such a process. My understanding is that
> > autovacuum already has an interface for handling additional workload
> > types, which is how BRIN indexes are handled. Do we really need a new
> > process?
>
> It seems to me that if there's a long list of possible tasks for such
> a process, that's actually a trickier situation than if there were
> only one or two, because it may happen that when task X is really
> urgent, the process is already busy with task Y.
>
> I don't think that piggybacking more stuff onto autovacuum is a very
> good idea for this exact reason. We already know that autovacuum
> workers can get so busy that they can't keep up with the need to
> vacuum and analyze tables. If we give them more things to do, that
> figures to make it worse, at least on busy systems.
>
> I do agree that a general mechanism for getting cleanup tasks done in
> the background could be a useful thing to have, but I feel like it's
> hard to see exactly how to make it work well. We can't just allow it
> to spin up a million new processes, but at the same time, if it can't
> guarantee that time-critical tasks get performed relatively quickly,
> it's pretty worthless.

Most of the tasks mentioned aren't time critical.

I have no objection to a new auxiliary process to execute those tasks,
which can be spawned when needed.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: [PATCH] Compression dictionaries for JSONB

2022-06-23 Thread Simon Riggs

On Thu, 2 Jun 2022 at 14:30, Aleksander Alekseev
 wrote:

> > I saw there was some previous discussion about dictionary size. It
> > looks like this approach would put all dictionaries into a shared OID
> > pool. Since I don't know what a "standard" use case is, is there any
> > risk of OID exhaustion for larger deployments with many dictionaries?
> > Or is 2**32 so comparatively large that it's not really a serious
> > concern?
>
> I agree, this is a drawback of the current implementation. To be honest,
> I simply followed the example of how ENUMs are implemented. I'm not 100% sure
> if we should be worried here (apparently, freed OIDs are reused). I'm OK with
> using a separate sequence if someone could second this. This is the first time
> I'm altering the catalog so I'm not certain what the best practices are.

The goal of this patch is great, thank you for working on this (and ZSON).

The approach chosen has a few downsides that I'm not happy with yet.

* Assigning OIDs for each dictionary entry is not a great idea. I
don't see why you would need to do that; just assign monotonically
ascending keys for each dictionary, as we do for AttrNums.

* There is a limit on SQL statement size, which will effectively limit
the size of dictionaries, but the examples are unrealistically small,
so this isn't clear as a limitation, but it would be in practice. It
would be better to specify a filename, which can be read in when the
DDL executes. This can be put into pg_dump output in a similar way to
the COPY data for a table is, so once read in it stays static.

* The dictionaries are only allowed for certain datatypes. This should
not be specifically limited by this patch, i.e. user defined types
should not be rejected.

* Dictionaries have no versioning. Any list of data items changes over
time, so how do we express that? Enums were also invented as static
lists originally, then had to be modified later to accomodate
additions and revisions, so let's think about that now, even if we
don't add all of the commands in one go. Currently we would have to
create a whole new dictionary if even one word changes. Ideally, we
want the dictionary to have a top-level name and then have multiple
versions over time. Let's agree how we are going do these things, so
we can make sure the design and code allows for those future
enhancements.
i.e. how will we do ALTER TABLE ... UPGRADE DICTIONARY without causing
a table rewrite?

* Does the order of entries in the dictionary allow us to express a
priority? i.e. to allow Huffman coding.

Thanks for your efforts - this is a very important patch.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: allow specifying action when standby encounters incompatible parameter settings

2022-06-23 Thread Simon Riggs

On Wed, 13 Apr 2022 at 22:35, Nathan Bossart  wrote:

> As of 15251c0, when a standby encounters an incompatible parameter change,
> it pauses replay so that read traffic can continue while the administrator
> fixes the parameters.  Once the server is restarted, replay can continue.
> Before this change, such incompatible parameter changes caused the standby
> to immediately shut down.
>
> I noticed that there was some suggestion in the thread associated with
> 15251c0 [0] for making this behavior configurable, but there didn't seem to
> be much interest at the time.  I am interested in allowing administrators
> to specify the behavior before 15251c0 (i.e., immediately shut down the
> standby when an incompatible parameter change is detected).  The use-case I
> have in mind is when an administrator has automation in place for adjusting
> these parameters and would like to avoid stopping replay any longer than
> necessary.  FWIW this is what we do in RDS.

I don't understand why you need this patch at all.

Since you have automation, you can use that layer to automatically
restart all standbys at once, if you choose, without any help or
hindrance from PostgreSQL.

But I really don't want you to do that, since that results in loss of
availability of the service. I'd like you to try a little harder and
automate this in a way that allows the service to continue with some
standbys available while others restart.

All this feature does is allow you to implement things in a lazy way
that causes a loss of availability for users. I don't think that is of
benefit to PostgreSQL users, so -1 from me, on this patch (only),
sorry about that.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: some aspects of our qsort might not be ideal

2022-06-23 Thread Matthias van de Meent

On Thu, 23 Jun 2022 at 17:04, Peter Geoghegan  wrote:
>
> On Thu, Jun 23, 2022 at 7:51 AM Matthias van de Meent
>  wrote:
> > I think that mostly has to do with reliability / stability of ORDER BY
> > in combination with LIMIT and OFFSET, even though right now we cannot
> > fully guarantee such stability due to unstable results from underlying
> > plan nodes.
>
> The current qsort isn't stable.

Then I misunderstood Robert's comment, thanks for correcting me.

- Matthias

Re: some aspects of our qsort might not be ideal

2022-06-23 Thread Peter Geoghegan

On Thu, Jun 23, 2022 at 7:51 AM Matthias van de Meent
 wrote:
> I think that mostly has to do with reliability / stability of ORDER BY
> in combination with LIMIT and OFFSET, even though right now we cannot
> fully guarantee such stability due to unstable results from underlying
> plan nodes.

The current qsort isn't stable. While quicksort is never stable, our
particular implementation has fairly significant optimizations that
strongly rely on not using a stable sort. In particular, the B
optimizations for duplicate elements.

These optimizations make things like datum tuplesorts for
'SELECT(DISTINCT mycol) ...' style queries on low cardinality columns
extremely fast. We're not really sorting so much as bucketing. This is
based on Dijkstra's Dutch national flag problem.

-- 
Peter Geoghegan

Re: some aspects of our qsort might not be ideal

2022-06-23 Thread Matthias van de Meent

On Thu, 23 Jun 2022 at 15:52, Robert Haas  wrote:
>
> On Thu, Jun 23, 2022 at 6:13 AM John Naylor
>  wrote:
> > Here is a *rough* first pass at dual-pivot quicksort. I haven't
> > changed the regression tests to adjust for qsort being an unstable
> > sort, ...
>
> Hmm. I thought we had some reasons for preferring a stable sort algorithm.

I think that mostly has to do with reliability / stability of ORDER BY
in combination with LIMIT and OFFSET, even though right now we cannot
fully guarantee such stability due to unstable results from underlying
plan nodes.

As an example, a table scan node under a sort node can start its scan
at an arbitrary point in the table (using synchronize_seqscans), and
because Sort nodes only sort MinimalTuple-s, each set of tuples that
have an equal sort value will be ordered by TID + y (mod tablesize),
with arbitrary values for y.

- Matthias

refactor some protocol message sending in walsender and basebackup

2022-06-23 Thread Peter Eisentraut

Some places in walsender.c and basebackup_copy.c open-code the sending 
of RowDescription and DataRow protocol messages.  But there are already 
more compact and robust solutions available for this, using 
DestRemoteSimple and associated machinery, already in use in walsender.c.


The attached patches 0001 and 0002 are tiny bug fixes I found during this.

Patches 0003 and 0004 are the main refactorings.  They should probably 
be combined into one patch eventually, but this way the treatment of 
RowDescription and DataRow is presented separately.From 069b9896832412555470e30b481df8cc1e6bebec Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 19 May 2022 12:26:45 +0200
Subject: [PATCH 1/4] Fix attlen in RowDescription of BASE_BACKUP response

Should be 8 for int8, not -1.
---
 src/backend/replication/basebackup_copy.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/src/backend/replication/basebackup_copy.c 
b/src/backend/replication/basebackup_copy.c
index cabb077240..1eed9d8c3f 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -361,7 +361,7 @@ SendXlogRecPtrResult(XLogRecPtr ptr, TimeLineID tli)
 * would not be wide enough for this, as TimeLineID is unsigned.
 */
pq_sendint32(, INT8OID);/* type oid */
-   pq_sendint16(, -1);
+   pq_sendint16(, 8);
pq_sendint32(, 0);
pq_sendint16(, 0);
pq_endmessage();

base-commit: ac0e2d387a044faed310cbfe2fae78ecb0f6a4b6
-- 
2.36.1

From 045730e92bb67806f31fee033e1f87c69c8ae08e Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Jun 2022 10:56:55 +0200
Subject: [PATCH 2/4] Change timeline field of IDENTIFY_SYSTEM to int8

It was int4, but in the other replication commands, timelines are
returned as int8.
---
 doc/src/sgml/protocol.sgml  | 2 +-
 src/backend/replication/walsender.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index a94743b587..c0b89a3c01 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1838,7 +1838,7 @@ Streaming Replication Protocol

 

-timeline (int4)
+timeline (int8)
 
  
   Current timeline ID. Also useful to check that the standby is
diff --git a/src/backend/replication/walsender.c 
b/src/backend/replication/walsender.c
index e42671722a..fa60c92e13 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -444,7 +444,7 @@ IdentifySystem(void)
TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 1, "systemid",
  TEXTOID, -1, 0);
TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 2, "timeline",
- INT4OID, -1, 0);
+ INT8OID, -1, 0);
TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 3, "xlogpos",
  TEXTOID, -1, 0);
TupleDescInitBuiltinEntry(tupdesc, (AttrNumber) 4, "dbname",
-- 
2.36.1

From 9f30600e54e79a6d1e0ca4feda1511d90092148a Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Thu, 23 Jun 2022 10:55:09 +0200
Subject: [PATCH 3/4] Refactor sending of RowDescription messages in
 replication protocol

Some routines open-coded the construction of RowDescription messages.
Instead, we have support for doing this using tuple descriptors and
DestRemoteSimple, so use that instead.
---
 src/backend/access/common/tupdesc.c   |  9 +++
 src/backend/replication/basebackup_copy.c | 74 +++
 src/backend/replication/walsender.c   | 29 +++--
 3 files changed, 40 insertions(+), 72 deletions(-)

diff --git a/src/backend/access/common/tupdesc.c 
b/src/backend/access/common/tupdesc.c
index 9f41b1e854..d6fb261e20 100644
--- a/src/backend/access/common/tupdesc.c
+++ b/src/backend/access/common/tupdesc.c
@@ -739,6 +739,15 @@ TupleDescInitBuiltinEntry(TupleDesc desc,
att->attcollation = InvalidOid;
break;
 
+   case OIDOID:
+   att->attlen = 4;
+   att->attbyval = true;
+   att->attalign = TYPALIGN_INT;
+   att->attstorage = TYPSTORAGE_PLAIN;
+   att->attcompression = InvalidCompressionMethod;
+   att->attcollation = InvalidOid;
+   break;
+
default:
elog(ERROR, "unsupported type %u", oidtypeid);
}
diff --git a/src/backend/replication/basebackup_copy.c 
b/src/backend/replication/basebackup_copy.c
index 1eed9d8c3f..df0471a7a4 100644
--- a/src/backend/replication/basebackup_copy.c
+++ b/src/backend/replication/basebackup_copy.c
@@ -25,11 +25,13 @@
  */
 #include "postgres.h"
 
+#include

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 10:17 AM Andres Freund  wrote:
> > This would require:
> >
> > - changing star expansion in SELECTs (expandRTE etc)
> > - adjusting pg_dump, \d, etc
> >
> > That much seems clear and agreed upon.
>
> FWIW, I don't agree that this is a reasonable way to tackle changing
> NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small
> fraction of the problem of making Names variable length. You'll still have all
> the problems of name fields being accessed all over, but now not being
> included in the part of the struct visible to C etc.

Yeah, I agree. I think that being able to reorder columns logically
without changing anything physically would be cool, but I think we
could find some way of making the catalog columns variable-length
without introducing that feature.

I'm not sure whether your idea of creating translator functions is a
good one or not, but it doesn't seem too crazy. It'd be like a special
purpose tuple deformer.

deform_pg_class_tuple(_class_struct, pg_class_tuple);

The code in there could be automatically generated statements that
maybe look like this:

memcpy(, tuple + Aoffset_pg_class_relnamespace,
sizeof(Oid));

Once you hit the first variable-length field you'd need something more
clever, but because the code would be specialized for each catalog, it
would probably be quite fast anyway.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 6:13 AM Julien Rouhaud  wrote:
> While some problem wouldn't happen if we restricted the feature to system
> catalogs only (e.g. with renamed / dropped attributes, inheritance...), a lot
> would still exist and would have to be dealt with initially.  However I'm not
> sure what behavior would be wanted or acceptable, especially if we want
> something that can eventually be used for user relations too, with possibly
> dynamic positions.

I am not very convinced that this would make the project a whole lot
easier, but it does seem like the result would be less nice.

> I'll describe some of those problems, and just to make things easier I will 
> use
> a normal table "ab" with 2 attributes, a and b, with their physical / logical
> position reversed.
>
> Obviously
>
> SELECT * FROM ab
>
> should return a and b in that order.  The aliases should also probably match
> the logical position, as in:
>
> SELECT * FROM ab ab(aa, bb)
>
> should probably map aa to a and bb to b.

Check.

> But should COPY FROM / COPY TO / INSERT INTO use the logical position too if
> not column list is provided?  I'd think so, but it goes further from the
> original "only handle star expansion" idea.

I think yes.

> And should record_in / record_out use the logical position, as in:
> SELECT ab::text FROM ab / SELECT (a, b)::ab;
>
> I would think not, as relying on a possibly dynamic order could break things 
> if
> you store the results somewhere, but YMMV.

I think here the answer is yes again. I mean, consider that you could
also ALTER TABLE DROP COLUMN and then ALTER TABLE ADD COLUMN with the
same name. That is surely going to affect the meaning of such things.
I don't think we want to have one meaning if you reorder things that
way and a different meaning if you reorder things using whatever
commands we create for changing the display column positions.

> Note that there a variations of that, like
> SELECT ab::ab FROM (SELECT * FROM ab) ab;
>
> But those should probably work as intended (for now it doesn't) as you can't
> store a bare record, and storing a plain "ab" type wouldn't be problematic 
> with
> dynamic changes.

If the sub-SELECT and the cast both use the display order, which I
think they should, then there's no problem here, I believe.

> Then, what about joinrels expansion?  I learned that the column ordering rules
> are far from being obvious, and I didn't find those in the documentation (note
> that I don't know if that something actually described in the SQL standard).
> So for instance, if a join is using an explicit USING clause rather than an ON
> clause, the merged columns are expanded first, so:
>
> SELECT * FROM ab ab1 JOIN ab ab2 USING (b)
>
> should unexpectedly expand to (b, a, a).  Is this order a strict requirement?

I dunno, but I can't see why it creates a problem for this patch to
maintain the current behavior. I mean, just use the logical column
position instead of the physical one here and forget about the details
of how it works beyond that.

> Another problem (that probably wouldn't be a problem for system catalogs) is
> that defaults are evaluated in the physical position.  This example from the
> regression test will clearly have a different behavior if the columns are in a
> different physical order:
>
>  CREATE TABLE INSERT_TBL (
> x INT DEFAULT nextval('insert_seq'),
> y TEXT DEFAULT '-NULL-',
> z INT DEFAULT -1 * currval('insert_seq'),
> CONSTRAINT INSERT_TBL_CON CHECK (x >= 3 AND y <> 'check failed' AND x 
> < 8),
> CHECK (x + z = 0));
>
> But changing the behavior to rely on the logical position seems quite
> dangerous.

Why?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: NAMEDATALEN increase because of non-latin languages

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-03 13:28:16 +0700, John Naylor wrote:
> 1. That would require putting the name physically closer to the end of
> the column list. To make this less annoying for users, we'd need to
> separate physical order from display order (at least/at first only for
> system catalogs).

FWIW, it's not at all clear to me that this would be required. If I were to
tackle this I'd look at breaking up the mirroring of C structs to catalog
struct layout, by having genbki (or such) generate functions to map to/from
tuple layout. It'll be an annoying amount of changes, but feasible, I think.

> This would require:
> 
> - changing star expansion in SELECTs (expandRTE etc)
> - adjusting pg_dump, \d, etc
> 
> That much seems clear and agreed upon.

FWIW, I don't agree that this is a reasonable way to tackle changing
NAMEDATALEN. It'd be nice to have, but it to me it seems a pretty small
fraction of the problem of making Names variable length. You'll still have all
the problems of name fields being accessed all over, but now not being
included in the part of the struct visible to C etc.

Greetings,

Andres Freund

Re: Use fadvise in wal replay

2022-06-23 Thread Justin Pryzby

On Thu, Jun 23, 2022 at 09:49:31AM +, Jakub Wartak wrote:
> it would be nice if we had some advanced/deep/hidden parameters , but there 
> isn't such thing.

There's DEVELOPER_OPTIONS gucs, although I don't know if this is a good fit for
that.

-- 
Justin

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2022-06-23 Thread Andres Freund

Hi,

On 2022-06-23 13:33:24 +0200, Jelte Fennema wrote:
> (reviving an old thread)
> 
> On Thu, 23 Jun 2022 at 13:29, Merlin Moncure  wrote:
> > I'll still stand other point I made though; I'd
> > really want to see some benchmarks demonstrating benefit over
> > competing approaches that work over the current formats.  That should
> > frame the argument as to whether this is a good idea.
> 
> I tried to use COPY BINARY to copy data recently from one Postgres
> server to another and it was much slower than I expected. The backend
> process on the receiving side was using close to 100% of a CPU core.
> So the COPY command was clearly CPU bound in this case. After doing a
> profile it became clear that 50% of the CPU time was spent on parsing
> JSON. This seems extremely excessive to me.

It looks like there's quite a bit of low hanging fruits to optimize...


> I'm pretty sure any semi-decent binary format would be able to outperform
> this.

Sure. It's a decent amount of work to define one though... It's clearly not
acceptable to just dump out the internal representation, as already discussed
in this thread.

Greetings,

Andres Freund

Re: Skipping logical replication transactions on subscriber side

2022-06-23 Thread Robert Haas

On Wed, Jun 22, 2022 at 10:48 PM Noah Misch  wrote:
> Here's a more-verbose description of (2), with additions about what it does
> and doesn't achieve:
>
> 2. On systems where double alignment differs from int64 alignment, require
>NAMEDATALEN%8==0.  Modify the test from commits 79b716c and c1da0ac to stop
>treating "name" fields specially.  The test will still fail for AIX
>compatibility violations, but "name" columns no longer limit your field
>position candidates like they do today (today == option (1)).  Upgrading to
>v16 would require dump/reload for AIX users changing NAMEDATALEN to conform
>to the new restriction.  (I'm not sure pg_upgrade checks NAMEDATALEN
>compatibility, but it should require at least one of: same NAMEDATALEN, or
>absence of "name" columns in user tables.)

Doing this much seems pretty close to free to me. I doubt anyone
really cares about using a NAMEDATALEN value that is not a multiple of
8 on any platform. I also think there are few people who care about
AIX. The intersection must be very small indeed, or so I would think.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

2022-06-23 Thread Nikita Malakhov

Hi,
Alexander, thank you for your feedback and willingness to help. You can
send a suggested fixup in this thread, I'll check the issue
you've mentioned.

Best regards,
Nikita Malakhov

On Thu, Jun 23, 2022 at 4:38 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi Nikita,
>
> > We're currently working on rebase along other TOAST improvements, hope
> to do it in time for July CF.
> > Thank you for your patience.
>
> Just to clarify, does it include the dependent "CREATE TABLE ( ..
> STORAGE .. )" patch [1]? I was considering changing the patch
> according to the feedback it got, but if you are already working on
> this I'm not going to interfere.
>
> [1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
> --
> Best regards,
> Aleksander Alekseev

Re: some aspects of our qsort might not be ideal

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 6:13 AM John Naylor
 wrote:
> Here is a *rough* first pass at dual-pivot quicksort. I haven't
> changed the regression tests to adjust for qsort being an unstable
> sort, ...

Hmm. I thought we had some reasons for preferring a stable sort algorithm.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Robert Haas

On Thu, Jun 23, 2022 at 7:58 AM Simon Riggs
 wrote:
> Having a central cleanup process makes a lot of sense. There is a long
> list of potential tasks for such a process. My understanding is that
> autovacuum already has an interface for handling additional workload
> types, which is how BRIN indexes are handled. Do we really need a new
> process?

It seems to me that if there's a long list of possible tasks for such
a process, that's actually a trickier situation than if there were
only one or two, because it may happen that when task X is really
urgent, the process is already busy with task Y.

I don't think that piggybacking more stuff onto autovacuum is a very
good idea for this exact reason. We already know that autovacuum
workers can get so busy that they can't keep up with the need to
vacuum and analyze tables. If we give them more things to do, that
figures to make it worse, at least on busy systems.

I do agree that a general mechanism for getting cleanup tasks done in
the background could be a useful thing to have, but I feel like it's
hard to see exactly how to make it work well. We can't just allow it
to spin up a million new processes, but at the same time, if it can't
guarantee that time-critical tasks get performed relatively quickly,
it's pretty worthless.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Pluggable toaster

2022-06-23 Thread Aleksander Alekseev

Hi Nikita,

> We're currently working on rebase along other TOAST improvements, hope to do 
> it in time for July CF.
> Thank you for your patience.

Just to clarify, does it include the dependent "CREATE TABLE ( ..
STORAGE .. )" patch [1]? I was considering changing the patch
according to the feedback it got, but if you are already working on
this I'm not going to interfere.

[1]: https://postgr.es/m/de83407a-ae3d-a8e1-a788-920eb334f25b%40sigaev.ru
--
Best regards,
Aleksander Alekseev

Re: fix crash with Python 3.11

2022-06-23 Thread Tom Lane

Markus Wanner  writes:
> Actually, the backport of 2e517818f ("Fix SPI's handling of errors") 
> already broke the API for code using SPICleanup, as that function had 
> been removed. Granted, it's not documented, but still exported.

Under what circumstances would it be OK for outside code to call
SPICleanup?

regards, tom lane

Re: Custom tuplesorts for extensions

2022-06-23 Thread Alexander Korotkov

Hi, Maxim!

On Thu, Jun 23, 2022 at 3:12 PM Maxim Orlov  wrote:
> I've reviewed the patchset and noticed some minor issues:
> - extra semicolon in macro (lead to warnings)
> - comparison of var isWorker should be done in different way
>
> Here is an upgraded version of the patchset.

Thank you for fixing this.

> Overall, I consider this patchset useful. Any opinions?

Thank you.

--
Regards,
Alexander Korotkov

Re: Custom tuplesorts for extensions

2022-06-23 Thread Alexander Korotkov

Hi, Pavel!

Thank you for your feedback.

On Thu, Jun 23, 2022 at 2:26 PM Pavel Borisov  wrote:
>> Some PostgreSQL extensions need to sort their pieces of data. Then it
>> worth to re-use our tuplesort. But despite our tuplesort having
>> extensibility, it's hidden inside tuplesort.c. There are at least a
>> couple of examples of how extensions deal with that.
>>
>> 1. RUM table access method: https://github.com/postgrespro/rum
>> RUM repository contains a copy of tuplesort.c for each major
>> PostgreSQL release. A reliable solution, but this is not how things
>> are intended to work, right?
>> 2. OrioleDB table access method: https://github.com/orioledb/orioledb
>> OrioleDB runs on patches PostgreSQL. It contains a patch, which just
>> exposes all the guts of tuplesort.c to the tuplesort.h
>> https://github.com/orioledb/postgres/commit/d42755f52c
>>
>> I think we need a proper way to let extension re-use our core
>> tuplesort facility. The attached patchset is intended to do this the
>> right way. Patches don't revise all the comments and lack code
>> beautification. The intention behind publishing this revision is to
>> verify the direction and get some feedback for further work.
>
>
> I still have one doubt about the thing: the compatibility with previous PG 
> versions requires me to support code paths that I already added into RUM 
> extension. I won't be able to drop it from extension for quite long time in 
> the future. It could be avoided if we  backpatch this, which seems doubtful 
> to me provided the volume of code changes.
>
> If we just change this thing since say v16 this will only help to extensions 
> that doesn't support earlier PG versions. I still consider the change 
> beneficial but wonder do you have some view on how should it be managed in 
> existing extensions to benefit them?

I don't think there is a way to help extensions with earlier PG
versions. This is a significant patchset, which shouldn't be a subject
for backpatch. The existing extensions will benefit by simplification
of maintenance for PG 16+ releases. I think that's all we can do.

--
Regards,
Alexander Korotkov

Re: Support logical replication of DDLs

2022-06-23 Thread Alvaro Herrera

On 2022-Jun-15, houzj.f...@fujitsu.com wrote:

> On Wednesday, June 15, 2022 8:14 AM Zheng Li  wrote:

> > How does the deparser deparses CREATE FUNCTION STATEMENT? Will it
> > schema qualify
> > objects inside the function definition?
> 
> The current deparser doesn't schema qualify objects inside the function
> source as we won't know the schema of inner objects until the function is
> executed. The deparser will only schema qualify the objects around
> function declaration Like:
> 
> CREATE FUNCTION [public].test_func(i [pg_catalog].int4 ) RETURNS  
> [pg_catalog].int4 LANGUAGE plpgsql

Right, this is by design.  There is no way to deparse a function body --
as far as the backend is concerned, the body is just an opaque string.
That string is to be interpreted by the language handler only.

I don't know if it's possible to do different for non-core PLs, but I do
not think we have to worry about them in the Postgres implementation.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"La victoria es para quien se atreve a estar solo"

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2022-06-23 Thread Jelte Fennema

(attached is the flamegraph of the profile, in case others are interested)

On Thu, 23 Jun 2022 at 13:33, Jelte Fennema  wrote:
>
> (reviving an old thread)
>
> On Thu, 23 Jun 2022 at 13:29, Merlin Moncure  wrote:
> > I'll still stand other point I made though; I'd
> > really want to see some benchmarks demonstrating benefit over
> > competing approaches that work over the current formats.  That should
> > frame the argument as to whether this is a good idea.
>
> I tried to use COPY BINARY to copy data recently from one Postgres
> server to another and it was much slower than I expected. The backend
> process on the receiving side was using close to 100% of a CPU core.
> So the COPY command was clearly CPU bound in this case. After doing a
> profile it became clear that 50% of the CPU time was spent on parsing
> JSON. This seems extremely excessive to me. I'm pretty sure any
> semi-decent binary format would be able to outperform this.
>
> FYI: The table being copied contained large JSONB blobs in one of the
> columns. These blobs were around 15kB for each row.

Re: WIP Patch: Add a function that returns binary JSONB as a bytea

2022-06-23 Thread Jelte Fennema

(reviving an old thread)

On Thu, 23 Jun 2022 at 13:29, Merlin Moncure  wrote:
> I'll still stand other point I made though; I'd
> really want to see some benchmarks demonstrating benefit over
> competing approaches that work over the current formats.  That should
> frame the argument as to whether this is a good idea.

I tried to use COPY BINARY to copy data recently from one Postgres
server to another and it was much slower than I expected. The backend
process on the receiving side was using close to 100% of a CPU core.
So the COPY command was clearly CPU bound in this case. After doing a
profile it became clear that 50% of the CPU time was spent on parsing
JSON. This seems extremely excessive to me. I'm pretty sure any
semi-decent binary format would be able to outperform this.

FYI: The table being copied contained large JSONB blobs in one of the
columns. These blobs were around 15kB for each row.

Query about free Volunteer Development for a PostgreSQL extension development.

2022-06-23 Thread sminervini.prism

I have been trying to find active interest for volunteer development of a GPL 
license style PostgreSQL extension (or default PostgreSQL additions), complete 
and available, on Sourceforge, PGXN and the public internet, that will support 
the following:

High Precision Arithmetic and Elementary Functions Support
In PostgreSQL v14 or beyond.
HPPM: High Precision PostgreSQL Mathematics.

-The introduction of Integer Z, or Rational Mixed Decimal Q, numbers support in 
64 bit PostgreSQL. Via HPZ, and HPQ, original types. In this specification, 
they are collectively referred to as HPX types. These two types can be spelled 
in capital letters, or lower case. They are:

HPZ, HPQ or hpz, hpq.
HPZ(n), HPQ(n), hpz(n), or hpq(n).

HPX types can be declared like TEXT (with no associated value) or with an 
associated value, like varchar(n). IF HPX variables are declared with no 
associated value, the associated value for that variable is 20.

There should be range and multirange types corresponding to both HPX types:

HPZRANGE, hpzrange, HPZMULTIRANGE, hpzmultirange.
HPQRANGE, hpqrange, HPQMULTIRANGE, hpqmultirange.

-The extension could be based on another 3rd party library, and will be written 
in C, in either case. There is already some support for this kind of 
mathematics, in terms of its logic, and optimisation, publicly available, in C, 
as Free Open Source Software. That can be researched and apprehended for this 
extension and for all the relevant PostgreSQL OS platforms that it (HPPM) is 
designed for.

-Real numbers are comprised of the values of Integer, non-recurring Rational 
Numbers and recurring, and/or Irrational Numbers. Recurring numbers can be 
appropriately truncated, ultimately via another positive Integer value, always 
at least 1, to obtain a limited and approximating value. The approximating 
value can really be seen as a finite Rational number, possibly with Integer or 
Decimal parts, or both. These numbers may be positive or negative, or zero, 
scalar values, and always do exist on the one dimensional number line. These 
numbers may be positive, negative, or zero exactly.

-HPX 'associated values' really are a number of relevant positive Integer 
figures (Precision), that will get stored with each HPX type variable or column 
type. These are specified at type or variable declaration, before further use. 
Or the total defaulting precision amount is applied, being 20, as already shown.

Precision can be accessed and changed by means of coded values being changed. 
Precision is always apprehended before calculation begins. Precision is used to 
control number operations, and value output, when any numeric manipulation or 
processing occurs, since things that may go towards an infinity need to be 
stopped before then, to be useful.

If an HPX value is data on its own, without any set precision, it has the 
corresponding precision amount. If it is inserted into a table column with a 
different precision, then that precision is applied, be it larger, equal, or 
less, resulting in no behaviour and an error being thrown.

If an HPX value, in a PostgreSQL code expression, is sent straight into a 
RETURN statement or a SELECT statement, without assignment or precision 
alteration, or is just specified in a comparison expression, then that datum 
will contain the highest precision value out of any of the others in its 
expression, by checking the largest one found as the expression is considered, 
from left to right, within its sub expression.

If such a precision value cannot be arrived at, since nothing has been 
specified, because the value is irrational or infinitely reccuring, then the 
default precision value, for truncation to be applied by, will be 20, here, of 
course.

-This whole system will uphold any precision, certainly ones within a very 
large range limit, controlled by the already available type for large positive 
integers, the BIGINT. It can thereby enumerate digits within the range of
(+/-)1 to (+/-)9,223,372,036,854,775,807. This is at least between one and 
positive nine quintilion digit places. More than enough for the speed and scope 
of today, or maybe tomorrow, and the Desktop PC, as either a client or a server.

Naturally, evaluation will slow down, or not conclude in useful time frames, 
before those limits, presently. That phenomenon can be allowed, and left to the 
programmer to deal with or curtail.

A TEXT variable or table column, or even another HPX or numeric type (if the 
data of the value is in range) can be used to store digit data alone. Mixed 
Decimals can be broken down into integers and dealt with from there using 
operators, and reassembled into mixed integers again, if absolutely necessary, 
although this will be slower and inneficient internally.

--At the point of PostgreSQL code input and execution:

select pi(1001) as pi;

--HPX types can be declared like TEXT or
--like varchar(n),
--Within a table creation command:

create table example_table
(
id

Re: doc: Clarify Savepoint Behavior

2022-06-23 Thread Simon Riggs

On Thu, 9 Jun 2022 at 16:41, David G. Johnston
 wrote:
>
> On Thu, Jun 9, 2022 at 8:36 AM David G. Johnston  
> wrote:
>>
>> Hi,
>>
>> Reposting this on its own thread.
>>
>> https://www.postgresql.org/message-id/flat/CAKFQuwby1aMsJDMeibaBaohgoaZhivAo4WcqHC1%3D9-GDZ3TSng%40mail.gmail.com
>>
>> Presently, the open item seems to be whether my novelty regarding the 
>> reworked example is too much.
>>
>
> Commentary:
>
> Per documentation comment the savepoint command lacks an example
> where the savepoint name is reused.  The suggested example didn't
> conform to the others on the page, nor did the suggested location
> in compatibility seem desirable, but the omission rang true. Add
> another example to the examples section demonstrating this case.
> Additionally, document under the description for savepoint_name
> that we allow for the name to be repeated - and note what that
> means in terms of release and rollback. It seems desirable to
> place this comment in description rather than notes for savepoint.
> For the other two commands the behavior in the presence of
> duplicate savepoint names best fits as notes.  In fact release
> already had one.  This commit copies the same verbiage over to
> rollback.

Good idea.

"The name to give to the new savepoint.  The name may already exist,
+  in which case a rollback or release to the same name will use the
+  one that was most recently defined."

instead I propose:

"The name to give to the new savepoint. If the name duplicates a
 previously defined savepoint name then only the latest savepoint with that name
 can be referenced in a later ROLLBACK TO SAVEPOINT."

+  
+   If multiple savepoints have the same name, only the one that was most
+   recently defined is released.
+  

instead I propose

"Searches backwards through previously defined savepoints until the
 a savepoint name matches the request. If the savepoint name duplicated earlier
 defined savepoints then those earlier savepoints can only be released if
 multiple ROLLBACK TO SAVEPOINT commands are issued with the same
 name, as shown in the following example."

Also, I would just call the savepoint "s" in the example, to declutter it.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

TOAST - moving Compression into Toast for oversized Tuples

2022-06-23 Thread Nikita Malakhov

Hi hackers,
Along with major TOAST improvement with Pluggable TOAST we see the really
important smaller one - moving compression functionality when dealing with
oversized Tuples into Toast, because Toast is meant to deal with how
oversized Tuple is stored and it is logical to make it responsible for
compression too.
Currently it is done in heap_toast_insert_or_update() (file heaptoast.c)
before the attribute is TOASTed, we suggest the TOAST should decide if the
attribute must be compressed before stored externally or not for . Also, it
allows us to make Toasters completely responsible for TOASTed data storage
- how and where these data are stored.
Any advice or suggestion would be welcome.

--
Best regards,
Nikita Malakhov.

Re: [PATCH] Completed unaccent dictionary with many missing characters

2022-06-23 Thread Przemysław Sztoch


Michael Paquier wrote on 23.06.2022 06:39:

That'd leave just DEGREE CELSIUS and DEGREE FAHRENHEIT.  Not sure how
to kill those last two special cases -- they should be directly
replaced by their decomposition.

[1] https://unicode-org.atlassian.net/browse/CLDR-11383

I patch v3 support for cirilic is added.
Special character function has been purged.
Added support for category: So - Other Symbol. This category include
characters from special_cases().

I think that we'd better split v3 into more patches to keep each
improvement isolated.  The addition of cyrillic characters in the
range of letters and the removal of the sound copyright from the
special cases can be done on their own, before considering the
original case tackled by this thread.
--
Michael

The only division that is probably possible is the one attached.

--
Przemysław Sztoch | Mobile +48 509 99 00 66
commit 84bb8cdfeda1d4bfef813870d74a6db6d595dbc0
Author: Przemyslaw Sztoch 
Date:   Thu Jun 23 13:56:09 2022 +0200

Update unnaccent rules generator

add cirillic alpha
add digits
set->dict

diff --git a/contrib/unaccent/generate_unaccent_rules.py 
b/contrib/unaccent/generate_unaccent_rules.py
index c405e231b3..71932c8224 100644
--- a/contrib/unaccent/generate_unaccent_rules.py
+++ b/contrib/unaccent/generate_unaccent_rules.py
@@ -35,13 +35,16 @@ import xml.etree.ElementTree as ET
 sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer)
 
 # The ranges of Unicode characters that we consider to be "plain letters".
-# For now we are being conservative by including only Latin and Greek.  This
-# could be extended in future based on feedback from people with relevant
-# language knowledge.
+# For now we are being conservative by including only Latin, Greek and 
Cyrillic.
+# This could be extended in future based on feedback from people with
+# relevant language knowledge.
 PLAIN_LETTER_RANGES = ((ord('a'), ord('z')),  # Latin lower case
(ord('A'), ord('Z')),  # Latin upper case
-   (0x03b1, 0x03c9),  # GREEK SMALL LETTER ALPHA, 
GREEK SMALL LETTER OMEGA
-   (0x0391, 0x03a9))  # GREEK CAPITAL LETTER ALPHA, 
GREEK CAPITAL LETTER OMEGA
+   (ord('0'), ord('9')),  # Digits
+   (0x0391, 0x03a9),  # Greek capital letters 
(ALPHA-OMEGA)
+   (0x03b1, 0x03c9),  # Greek small letters 
(ALPHA-OMEGA)
+   (0x0410, 0x044f),  # Cyrillic capital and small 
letters
+   (0x00b0, 0x00b0))  # Degree sign
 
 # Combining marks follow a "base" character, and result in a composite
 # character. Example: "U&'A\0300'"produces "À".There are three types of
@@ -139,24 +142,24 @@ def get_plain_letter(codepoint, table):
 return codepoint
 
 # Should not come here
-assert(False)
+assert False, 'Codepoint U+%0.2X' % codepoint.id
 
 
 def is_ligature(codepoint, table):
 """Return true for letters combined with letters."""
-return all(is_letter(table[i], table) for i in codepoint.combining_ids)
+return all(i in table and is_letter(table[i], table) for i in 
codepoint.combining_ids)
 
 
 def get_plain_letters(codepoint, table):
 """Return a list of plain letters from a ligature."""
-assert(is_ligature(codepoint, table))
+assert is_ligature(codepoint, table), 'Codepoint U+%0.2X' % codepoint.id
 return [get_plain_letter(table[id], table) for id in 
codepoint.combining_ids]
 
 
 def parse_cldr_latin_ascii_transliterator(latinAsciiFilePath):
 """Parse the XML file and return a set of tuples (src, trg), where "src"
 is the original character and "trg" the substitute."""
-charactersSet = set()
+charactersDict = {}
 
 # RegEx to parse rules
 rulePattern = re.compile(r'^(?:(.)|(\\u[0-9a-fA-F]{4})) \u2192 
(?:\'(.+)\'|(.+)) ;')
@@ -196,25 +199,19 @@ def 
parse_cldr_latin_ascii_transliterator(latinAsciiFilePath):
 # the parser of unaccent only accepts non-whitespace characters
 # for "src" and "trg" (see unaccent.c)
 if not src.isspace() and not trg.isspace():
-charactersSet.add((ord(src), trg))
+charactersDict[ord(src)] = trg
 
-return charactersSet
+return charactersDict
 
 
 def special_cases():
 """Returns the special cases which are not handled by other methods"""
-charactersSet = set()
+charactersDict = {}
 
-# Cyrillic
-charactersSet.add((0x0401, "\u0415"))  # CYRILLIC CAPITAL LETTER IO
-charactersSet.add((0x0451, "\u0435"))  # CYRILLIC SMALL LETTER IO
+charactersDict[0x2103] = "\xb0C"   # DEGREE CELSIUS
+charactersDict[0x2109] = "\xb0F"   # DEGREE FAHRENHEIT
 
-# Symbols of "Letterlike Symbols" Unicode Block (U+2100 to U+214F)
-charactersSet.add((0x2103, "\xb0C"))   # DEGREE CELSIUS
-charactersSet.add((0x2109, "\xb0F"))   # DEGREE FAHRENHEIT
-charactersSet.add((0x2117, "(P)")) #

Re: Replica Identity check of partition table on subscriber

2022-06-23 Thread Amit Kapila

On Wed, Jun 22, 2022 at 5:05 PM houzj.f...@fujitsu.com
 wrote:
>
> On Wednesday, June 22, 2022 7:06 PM Amit Kapila  
> wrote:
> >
> > On Wed, Jun 22, 2022 at 10:09 AM Amit Langote 
> > wrote:
> > >
> > > On Wed, Jun 22, 2022 at 12:02 PM houzj.f...@fujitsu.com
> > >  wrote:
> > > > Since the patch has been committed. Attach the last patch to fix the
> > memory leak.
> > > >
> > > > The bug exists on PG10 ~ PG15(HEAD).
> > > >
> > > > For HEAD,PG14,PG13, to fix the memory leak, I think we should use
> > > > free_attrmap instead of pfree and release the no-longer-useful
> > > > attrmap When rebuilding the map info.
> > > >
> > > > For PG12,PG11,PG10, we only need to add the code to release the
> > > > no-longer-useful attrmap when rebuilding the map info. We can still
> > > > use
> > > > pfree() because the attrmap in back-branch is a single array like:
> > > >
> > > > entry->attrmap = palloc(desc->natts * sizeof(AttrNumber));
> > >
> > > LGTM, thank you.
> > >
> >
> > LGTM as well. One thing I am not completely sure about is whether to make 
> > this
> > change in PG10 for which the final release is in Nov?
> > AFAICS, the leak can only occur after the relcache invalidation on the 
> > subscriber
> > which may or may not be a very frequent case. What do you guys think?
> >
> > Personally, I feel it is good to fix it in all branches including PG10.
> +1
>

Pushed!

-- 
With Regards,
Amit Kapila.

Re: O(n) tasks cause lengthy startups and checkpoints

2022-06-23 Thread Simon Riggs

On Fri, 18 Feb 2022 at 20:51, Nathan Bossart  wrote:
>
> > On Thu, Feb 17, 2022 at 03:12:47PM -0800, Andres Freund wrote:
> >>> > The improvements around deleting temporary files and serialized 
> >>> > snapshots
> >>> > afaict don't require a dedicated process - they're only relevant during
> >>> > startup. We could use the approach of renaming the directory out of the 
> >>> > way as
> >>> > done in this patchset but perform the cleanup in the startup process 
> >>> > after
> >>> > we're up.
>
> BTW I know you don't like the dedicated process approach, but one
> improvement to that approach could be to shut down the custodian process
> when it has nothing to do.

Having a central cleanup process makes a lot of sense. There is a long
list of potential tasks for such a process. My understanding is that
autovacuum already has an interface for handling additional workload
types, which is how BRIN indexes are handled. Do we really need a new
process? If so, lets do this now.

Nathan's point that certain tasks are blocking fast startup is a good
one and higher availability is a critical end goal. The thought that
we should complete these tasks during checkpoint is a good one, but
checkpoints should NOT be delayed by long running tasks that are
secondary to availability.

Andres' point that it would be better to avoid long running tasks is
good, if that is possible. That can be done better over time. This
point does not block the higher level goal of better availability
asap, so I support Nathan's overall proposals.

-- 
Simon Riggshttp://www.EnterpriseDB.com/

Re: Pluggable toaster

2022-06-23 Thread Nikita Malakhov

Hi hackers,
We're currently working on rebase along other TOAST improvements, hope to
do it in time for July CF.
Thank you for your patience.

--
Best regards,
Nikita Malakhov

On Fri, Jun 17, 2022 at 5:33 PM Aleksander Alekseev <
aleksan...@timescale.com> wrote:

> Hi hackers,
>
> > For a pluggable toaster - in previous patch set part 7 patch file
> contains invalid string.
> > Fixup (v2 file should used instead of previous) patch:
> > 7) 0007_fix_alignment_of_custom_toast_pointers.patch - fixes custom
> toast pointer's
> > alignment required by bytea toaster by Nikita Glukhov;
>
> I finished digesting the thread and the referred presentations per
> Matthias (cc:'ed) suggestion in [1] discussion. Although the patchset
> got a fair amount of push-back above, I prefer to stay open minded and
> invest some of my time into this effort as a tester/reviewer during
> the July CF. Even if the patchset will not make it entirely to the
> core, some of its parts can be useful.
>
> Unfortunately, I didn't manage to find something that can be applied
> and tested. cfbot is currently not happy with the patchset.
> 0001_create_table_storage_v3.patch doesn't apply to the current
> origin/master manually either:
>
> ```
> error: patch failed: src/backend/parser/gram.y:2318
> error: src/backend/parser/gram.y: patch does not apply
> ```
>
> Any chance we can see a rebased patchset for the July CF?
>
> [1]: https://commitfest.postgresql.org/38/3626/
>
> --
> Best regards,
> Aleksander Alekseev
>

Fix instability in subscription regression test

2022-06-23 Thread houzj.f...@fujitsu.com

Hi,

I noticed BF member wrasse failed in 028_row_filter.pl.

#   Failed test 'check publish_via_partition_root behavior'
#   at t/028_row_filter.pl line 669.
#  got: ''
# expected: '1|100
#  ...

Log:
2022-06-23 11:27:42.387 CEST [20589:3] 028_row_filter.pl LOG: statement: ALTER 
SUBSCRIPTION tap_sub REFRESH PUBLICATION WITH (copy_data = true)
2022-06-23 11:27:42.470 CEST [20589:4] 028_row_filter.pl LOG: disconnection: 
session time: 0:00:00.098 user=nm database=postgres host=[local]
2022-06-23 11:27:42.611 CEST [20593:1] LOG: logical replication table 
synchronization worker for subscription "tap_sub", table 
"tab_rowfilter_partitioned" has started
...
2022-06-23 11:27:43.197 CEST [20610:3] 028_row_filter.pl LOG: statement: SELECT 
a, b FROM tab_rowfilter_partitioned ORDER BY 1, 2
...
2022-06-23 11:27:43.689 CEST [20593:2] LOG: logical replication table 
synchronization worker for subscription "tap_sub", table 
"tab_rowfilter_partitioned" has finished 

>From the Log, I can see it query the target table before the table sync is
over. So, I think the reason is that we didn't wait for table sync to
finish after refreshing the publication. Sorry for not catching that
ealier. Here is a patch to fix it.


Best regards,
Hou zj


0001-Fix-instability-in-subscription-regression-test.patch
Description:  0001-Fix-instability-in-subscription-regression-test.patch

Re: Custom tuplesorts for extensions

2022-06-23 Thread Pavel Borisov

>
> Some PostgreSQL extensions need to sort their pieces of data. Then it
> worth to re-use our tuplesort. But despite our tuplesort having
> extensibility, it's hidden inside tuplesort.c. There are at least a
> couple of examples of how extensions deal with that.
>
> 1. RUM table access method: https://github.com/postgrespro/rum
> RUM repository contains a copy of tuplesort.c for each major
> PostgreSQL release. A reliable solution, but this is not how things
> are intended to work, right?
> 2. OrioleDB table access method: https://github.com/orioledb/orioledb
> OrioleDB runs on patches PostgreSQL. It contains a patch, which just
> exposes all the guts of tuplesort.c to the tuplesort.h
> https://github.com/orioledb/postgres/commit/d42755f52c
>
> I think we need a proper way to let extension re-use our core
> tuplesort facility. The attached patchset is intended to do this the
> right way. Patches don't revise all the comments and lack code
> beautification. The intention behind publishing this revision is to
> verify the direction and get some feedback for further work.
>

I still have one doubt about the thing: the compatibility with previous PG
versions requires me to support code paths that I already added into RUM
extension. I won't be able to drop it from extension for quite long time in
the future. It could be avoided if we  backpatch this, which seems doubtful
to me provided the volume of code changes.

If we just change this thing since say v16 this will only help to
extensions that doesn't support earlier PG versions. I still consider the
change beneficial but wonder do you have some view on how should it be
managed in existing extensions to benefit them?

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: some aspects of our qsort might not be ideal

2022-06-23 Thread John Naylor

On Fri, Jun 3, 2022 at 10:44 AM Peter Geoghegan  wrote:

> What about dual-pivot quicksort, which is used in Java 7+? That is the
> defacto successor to Bentley & McIlroy. In fact, Jon Bentley himself
> collaborated with its author, and provided benchmarking input. The
> underlying philosophy is essentially the same as the original -- it
> is supposed to be an "industrial strength" quicksort, with various
> adversarial cases considered directly.

Here is a *rough* first pass at dual-pivot quicksort. I haven't
changed the regression tests to adjust for qsort being an unstable
sort, and there is some dead code. I looked to a couple JDKs for
examples of design decisions as a first approximation. It includes a
module with a few debugging printouts, run like so: select
debug_qsort_int(array[7,6,5,4,3,2,1]);

Pivot selection: A common way is to pick five candidates (here: mid,
+/- 1/7, +/- 2/7), sort them in place, then pick the 2nd and 4th of
them, or "tertile of five". If they are all evenly spaced, we can do
insertion sort.

Fall back to single-pivot: If any of the pivot candidates are equal,
JDK assumes there could be a large number of duplicates and falls back
to single-pivot, using the median of the five. I've done the same
here. Despite having two code paths in all of our template instances,
the VM binary is only about 5kb bigger, since MED3 is no longer built.

Performance testing: Not started yet. I'm thinking an apples-to-apples
comparison is not insightful enough, since the current insertion sort
threshold 7 is already a bit constrained for single-pivot, and would
be even more so for dual pivot, especially since 5 of the elements are
pre-sorted to choose the pivots. My plan is to retest the threshold on
HEAD using my latest tests, then use that as a starting point to test
thresholds with dual-pivot.

-- 
John Naylor
EDB: http://www.enterprisedb.com
From 5e920d9d3e8d2a2a75e63ade8bc73c8322c1934b Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Mon, 30 May 2022 10:09:17 +0700
Subject: [PATCH v1 1/2] Create internal workhorse for ST_SORT

No functional changes.
---
 src/include/lib/sort_template.h | 36 -
 1 file changed, 31 insertions(+), 5 deletions(-)

diff --git a/src/include/lib/sort_template.h b/src/include/lib/sort_template.h
index 3122a93009..54921de568 100644
--- a/src/include/lib/sort_template.h
+++ b/src/include/lib/sort_template.h
@@ -191,6 +191,7 @@ ST_SCOPE void ST_SORT(ST_ELEMENT_TYPE * first, size_t n
 
 /* sort private helper functions */
 #define ST_MED3 ST_MAKE_NAME(ST_SORT, med3)
+#define ST_SORT_INTERNAL ST_MAKE_NAME(ST_SORT, internal)
 #define ST_SWAP ST_MAKE_NAME(ST_SORT, swap)
 #define ST_SWAPN ST_MAKE_NAME(ST_SORT, swapn)
 
@@ -217,7 +218,7 @@ ST_SCOPE void ST_SORT(ST_ELEMENT_TYPE * first, size_t n
 			ST_SORT_INVOKE_COMPARE		\
 			ST_SORT_INVOKE_ARG)
 #define DO_SORT(a_, n_)	\
-	ST_SORT((a_), (n_)	\
+	ST_SORT_INTERNAL((a_), (n_)	\
 			ST_SORT_INVOKE_ELEMENT_SIZE	\
 			ST_SORT_INVOKE_COMPARE		\
 			ST_SORT_INVOKE_ARG)
@@ -273,15 +274,15 @@ ST_SWAPN(ST_POINTER_TYPE * a, ST_POINTER_TYPE * b, size_t n)
 }
 
 /*
- * Sort an array.
+ * Workhorse for ST_SORT
  */
-ST_SCOPE void
-ST_SORT(ST_ELEMENT_TYPE * data, size_t n
+static void
+ST_SORT_INTERNAL(ST_POINTER_TYPE * data, size_t n
 		ST_SORT_PROTO_ELEMENT_SIZE
 		ST_SORT_PROTO_COMPARE
 		ST_SORT_PROTO_ARG)
 {
-	ST_POINTER_TYPE *a = (ST_POINTER_TYPE *) data,
+	ST_POINTER_TYPE *a = data,
 			   *pa,
 			   *pb,
 			   *pc,
@@ -399,6 +400,30 @@ loop:
 		}
 	}
 }
+
+/*
+ * Sort an array.
+ */
+ST_SCOPE void
+ST_SORT(ST_ELEMENT_TYPE * data, size_t n
+		ST_SORT_PROTO_ELEMENT_SIZE
+		ST_SORT_PROTO_COMPARE
+		ST_SORT_PROTO_ARG)
+{
+	ST_POINTER_TYPE *begin = (ST_POINTER_TYPE *) data;
+
+	DO_SORT(begin, n);
+
+#ifdef USE_ASSERT_CHECKING
+	/* WIP: verify the sorting worked */
+	for (ST_POINTER_TYPE *pm = begin + ST_POINTER_STEP; pm < begin + n * ST_POINTER_STEP;
+		 pm += ST_POINTER_STEP)
+	{
+		if (DO_COMPARE(pm, pm - ST_POINTER_STEP) < 0)
+			Assert(false);
+	}
+#endif			/* USE_ASSERT_CHECKING */
+}
 #endif
 
 #undef DO_CHECK_FOR_INTERRUPTS
@@ -422,6 +447,7 @@ loop:
 #undef ST_POINTER_TYPE
 #undef ST_SCOPE
 #undef ST_SORT
+#undef ST_SORT_INTERNAL
 #undef ST_SORT_INVOKE_ARG
 #undef ST_SORT_INVOKE_COMPARE
 #undef ST_SORT_INVOKE_ELEMENT_SIZE
-- 
2.36.1

From ac7f8aaa5b7257fb84f819b27525e00a5a6ced84 Mon Sep 17 00:00:00 2001
From: John Naylor 
Date: Thu, 23 Jun 2022 16:07:10 +0700
Subject: [PATCH v1 2/2] Implement dual-pivot quicksort

Choose pivots by running insertion sort on five candidates and choosing
the 2nd and 4th, ("tertile of five"). If any of the five are equal, we
assume the input has many duplicates and fall back to B since it's
optimized for that case.
---
 src/include/lib/sort_template.h   | 191 +-
 src/test/modules/debug_qsort/Makefile |  18 ++
 .../modules/debug_qsort/debug_qsort--1.0.sql  |   1 +

Re: Custom tuplesorts for extensions

2022-06-23 Thread Pavel Borisov

I've bumped into this case in RUM extension. The need to build it with
tuplesort changes in different PG versions led me to reluctantly including
different tuplesort.c versions into the extension code. So I totally
support the intention of this patch and I'm planning to invest some time to
review it.

-- 
Best regards,
Pavel Borisov

Postgres Professional: http://postgrespro.com

Re: Handle infinite recursion in logical replication setup

2022-06-23 Thread vignesh C

On Thu, Jun 23, 2022 at 7:05 AM shiy.f...@fujitsu.com
 wrote:
>
> On Mon, Jun 20, 2022 7:55 PM vignesh C  wrote:
> >
> > Thanks for the comment, the v22 patch attached has the changes for the
> > same.
>
> Thanks for updating the patch, here are some comments on 0003 patch.
>
> 1. 032_origin.pl
> +###
> +# Join 3rd node (node_C) to the existing 2 nodes(node_A & node_B) 
> bidirectional
> +# replication setup when the existing nodes (node_A & node_B) and the new 
> node
> +# (node_C) does not have any data.
> +###
>
> "does not have any data" should be "do not have any data" I think.

I felt the existing is ok, I did not find it gramatically wrong. I did
not  make any changes for this.

> 2.
> The comment for 032_origin.pl:
>
> # Test the CREATE SUBSCRIPTION 'origin' parameter.
>
> After applying this patch, this file tests no only 'origin' parameter, but 
> also
> "copy_data" parameter, so should we modify this comment?

Modified

> Besides, should we change the file name in this patch? It looks more like test
> cases for bidirectional logical replication.

I felt let's keep it as origin, as most of the tests are based on
'origin' parameter, also later features will be able to filter a
particular origin, those tests can easily fit in the same file.

> 3. subscriptioncmds.c
> /* Set default values for the boolean supported options. */
> ...
> if (IsSet(supported_opts, SUBOPT_CREATE_SLOT))
> opts->create_slot = true;
> if (IsSet(supported_opts, SUBOPT_COPY_DATA))
> -   opts->copy_data = true;
> +   opts->copy_data = COPY_DATA_ON;
> if (IsSet(supported_opts, SUBOPT_REFRESH))
> opts->refresh = true;
> if (IsSet(supported_opts, SUBOPT_BINARY))
>
> "copy_data" option is not Boolean now, which is inconsistent with the comment.
> So maybe we can change the comment here? ("the boolean supported options" ->
> "the supported options")

Modified

Thanks for the comments, the v23 patch attached at [1] has the changes
for the same.
[1] - 
https://www.postgresql.org/message-id/CALDaNm1-ZrG%3DhaAoiB2yFKYc%2Bckcd1NLaU8QB3SWs32wPsph4w%40mail.gmail.com

Regards,
Vignesh

Re: Support logical replication of DDLs

2022-06-23 Thread Amit Kapila

On Wed, Jun 22, 2022 at 11:09 AM Masahiko Sawada  wrote:
>
> I've attached a WIP patch for adding regression tests for DDL deparse.
> The patch can be applied on
> v9-0001-Functions-to-deparse-DDL-commands.patch Hou recently
> submitted[1]. The basic idea is to define the event trigger to deparse
> DDLs, run the regression tests, load the deparsed DDLs to another
> database cluster, dump both databases, and compare the dumps.
>

Thanks for working on this. It is a good start. I think this will be
helpful to see the missing DDL support. Do we want to run this as part
of every regression run? Won't it increase the regression time as this
seems to run internally the regression tests twice?

Do we need a different trigger to capture drop cases as there are
separate deparsing routines for them, for example
deparse_drop_table()?

> [2] deparsing "ALTER INDEX tbl_idx ALTER COLUMN 2 SET STATISTICS
> 1000;" causes an assertion failure.
>

Sorry, it is not clear to me whether you are talking about some
pre-existing bug or a bug in the proposed patch?

-- 
With Regards,
Amit Kapila.

Re: [PATCH] audo-detect and use -moutline-atomics compilation flag for aarch64

2022-06-23 Thread Alexander Korotkov

Hi!

On Mon, Sep 7, 2020 at 1:12 AM Zidenberg, Tsahi  wrote:
> First, I apologize for taking so long to answer. This e-mail regretfully got 
> lost in my inbox.
>
> On 24/07/2020, 4:17, "Andres Freund"  wrote:
>
> > What does "not significantly affected" exactly mean? Could you post the
> > raw numbers?
>
> The following tests show benchmark behavior on m6g.8xl instance (32-core with 
> LSE support)
> and a1.4xlarge (16-core, no LSE support) with and without the patch, based on 
> postgresql 12.4.
> Tests are pgbench select-only/simple-update, and sysbench read-only/write 
> only.
>
> .  select-only. simple-update.read-only.  
>  write-only
> m6g.8xlarge/vanila.  482130. 56275.  273327.  
>  33364
> m6g.8xlarge/patch.   493748. 59681.  262702.  
>  33024
> a1.4xlarge/vanila.82437. 13978.   62489.  
>   2928
> a1.4xlarge/patch. 79499. 13932.   62796.  
>   2945
>
> Results obviously change with OS / parameters /etc. I have attempted ensure a 
> fair comparison,
> But I don't think these numbers should be taken as absolute.
> As reference points, m6g instance compiled with -march=native flag, and m5g 
> (x86) instances:
>
> m6g.8xlarge/native.   522771.60354.   261366. 
>  33582
> m5.8xlarge.   362908.58732.   147730. 
>  32750

I'd like to resurrect this thread, if there is still interest from
your side.  What number of clients and jobs did you use with pgbench?

I've noticed far more dramatic effect on high number of clients.
Could you verify that?
https://akorotkov.github.io/blog/2021/04/30/arm/

--
Regards,
Alexander Korotkov

RE: Use fadvise in wal replay

2022-06-23 Thread Jakub Wartak

Hey Andrey,

> > 23 июня 2022 г., в 13:50, Jakub Wartak 
> написал(а):
> >
> > Thoughts?
> The patch leaves 1st 128KB chunk unprefetched. Does it worth to add and extra
> branch for 120KB after 1st block when readOff==0?
> Or maybe do
> + posix_fadvise(readFile, readOff + XLOG_BLCKSZ, RACHUNK,
> POSIX_FADV_WILLNEED);
> instead of
> + posix_fadvise(readFile, readOff + RACHUNK, RACHUNK,
> POSIX_FADV_WILLNEED);
> ?

> > Notes:
> > - no GUC, as the default/identical value seems to be the best
> I think adding this performance boost on most systems definitely worth 1 
> syscall
> per 16 pages. And I believe 128KB to be optimal for most storages. And having
> no GUCs sounds great.
> 
> But storage systems might be different, far beyond benchmarks.
> All in all, I don't have strong opinion on having 1 or 0 GUCs to configure 
> this.
> 
> I've added patch to the CF.

Cool. As for GUC I'm afraid there's going to be resistance of adding yet 
another GUC (to avoid many knobs). Ideally it would be nice if we had some 
advanced/deep/hidden parameters , but there isn't such thing.
Maybe another option would be to use (N * maintenance_io_concurrency * 
XLOG_BLCKSZ), so N=1 that's 80kB and N=2 160kB (pretty close to default value, 
and still can be tweaked by enduser). Let's wait what others say?

-J.

Re: Use fadvise in wal replay

2022-06-23 Thread Andrey Borodin




> 23 июня 2022 г., в 13:50, Jakub Wartak  написал(а):
> 
> Thoughts?
The patch leaves 1st 128KB chunk unprefetched. Does it worth to add and extra 
branch for 120KB after 1st block when readOff==0?
Or maybe do
+   posix_fadvise(readFile, readOff + XLOG_BLCKSZ, RACHUNK, 
POSIX_FADV_WILLNEED);
instead of
+   posix_fadvise(readFile, readOff + RACHUNK, RACHUNK, 
POSIX_FADV_WILLNEED);
?

> Notes:
> - no GUC, as the default/identical value seems to be the best
I think adding this performance boost on most systems definitely worth 1 
syscall per 16 pages. And I believe 128KB to be optimal for most storages. And 
having no GUCs sounds great.

But storage systems might be different, far beyond benchmarks.
All in all, I don't have strong opinion on having 1 or 0 GUCs to configure this.

I've added patch to the CF.

Thanks!

Best regards, Andrey Borodin.

RE: Use fadvise in wal replay

2022-06-23 Thread Jakub Wartak

>> > On 21 Jun 2022, at 16:59, Jakub Wartak  wrote:
>> Oh, wow, your benchmarks show really impressive improvement.
>> 
>> > I think that 1 additional syscall is not going to be cheap just for
>> > non-standard OS configurations
>> Also we can reduce number of syscalls by something like
>> 
>> #if defined(USE_POSIX_FADVISE) && defined(POSIX_FADV_WILLNEED)
>> if ((readOff % (8 * XLOG_BLCKSZ)) == 0)
>> posix_fadvise(readFile, readOff + XLOG_BLCKSZ, XLOG_BLCKSZ * 8,
>> POSIX_FADV_WILLNEED); #endif
>> 
>> and maybe define\reuse the some GUC to control number of prefetched pages
>> at once.

Hi, I was thinking the same, so I got the patch (attached) to the point it gets 
the identical performance with and without readahead enabled:

baseline, master, default Linux readahead (128kb):
33.979, 0.478
35.137, 0.504
34.649, 0.518

master+patched, readahead disabled:
34.338, 0.528
34.568, 0.575
34.007, 1.136

master+patched, readahead enabled (as default):
33.935, 0.523
34.109, 0.501
33.408, 0.557

Thoughts?

Notes:
- no GUC, as the default/identical value seems to be the best
- POSIX_FADV_SEQUENTIAL is apparently much slower and doesn't seem to have 
effect from xlogreader.c at all while _WILLNEED does (testing again contradicts 
"common wisdom"?)

-J.


0001-Use-fadvise-to-prefetch-WAL-in-xlogrecovery.patch
Description: 0001-Use-fadvise-to-prefetch-WAL-in-xlogrecovery.patch

Re: Perform streaming logical transactions by background workers and parallel apply

2022-06-23 Thread Amit Kapila

On Thu, Jun 23, 2022 at 12:51 PM wangw.f...@fujitsu.com
 wrote:
>
> On Mon, Jun 20, 2022 at 11:00 AM Amit Kapila  wrote:
> > I have improved the comments in this and other related sections of the
> > patch. See attached.
> Thanks for your comments and patch!
> Improved the comments as you suggested.
>
> > > > 3.
> > > > +
> > > > +  
> > > > +   Setting streaming mode to apply could export 
> > > > invalid
> > LSN
> > > > +   as finish LSN of failed transaction. Changing the streaming mode and
> > making
> > > > +   the same conflict writes the finish LSN of the failed transaction 
> > > > in the
> > > > +   server log if required.
> > > > +  
> > > >
> > > > How will the user identify that this is an invalid LSN value and she
> > > > shouldn't use it to SKIP the transaction? Can we change the second
> > > > sentence to: "User should change the streaming mode to 'on' if they
> > > > would instead wish to see the finish LSN on error. Users can use
> > > > finish LSN to SKIP applying the transaction." I think we can give
> > > > reference to docs where the SKIP feature is explained.
> > > Improved the sentence as suggested.
> > >
> >
> > You haven't answered first part of the comment: "How will the user
> > identify that this is an invalid LSN value and she shouldn't use it to
> > SKIP the transaction?". Have you checked what value it displays? For
> > example, in one of the case in apply_error_callback as shown in below
> > code, we don't even display finish LSN if it is invalid.
> > else if (XLogRecPtrIsInvalid(errarg->finish_lsn))
> > errcontext("processing remote data for replication origin \"%s\"
> > during \"%s\" in transaction %u",
> >errarg->origin_name,
> >logicalrep_message_type(errarg->command),
> >errarg->remote_xid);
> I am sorry that I missed something in my previous reply.
> The invalid LSN value here is to say InvalidXLogRecPtr (0/0).
> Here is an example :
> ```
> 2022-06-23 14:30:11.343 CST [822333] logical replication worker CONTEXT:  
> processing remote data for replication origin "pg_16389" during "INSERT" for 
> replication target relation "public.tab" in transaction 727 finished at 0/0
> ```
>

I don't think it is a good idea to display invalid values. We can mask
this as we are doing in other cases in function apply_error_callback.
The ideal way is that we provide a view/system table for users to
check these errors but that is a matter of another patch. So users
probably need to check Logs to see if the error is from a background
apply worker to decide whether or not to switch streaming mode.

-- 
With Regards,
Amit Kapila.

Re: Make COPY extendable in order to support Parquet and other formats

2022-06-23 Thread Aleksander Alekseev

Andres, Tom,

> > I suspect that we'd first need a patch to refactor the existing copy code a
> > good bit to clean things up. After that it hopefully will be possible to 
> > plug
> > in a new format without being too intrusive.
>
> I think that step 1 ought to be to convert the existing formats into
> plug-ins, and demonstrate that there's no significant loss of performance.

Yep, this looks like a promising strategy to me too.

> I know little about parquet - can it support FROM STDIN efficiently?

Parquet is a compressed binary format with data grouped by columns
[1]. I wouldn't assume that this is a primary use case for this
particular format.

[1]: https://parquet.apache.org/docs/file-format/

-- 
Best regards,
Aleksander Alekseev

Re: pltcl crash on recent macOS

2022-06-23 Thread Peter Eisentraut




On 20.06.22 12:36, Peter Eisentraut wrote:

On 14.06.22 05:05, Tom Lane wrote:

I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
what the buildfarm says.  The fact that we needed it in 1998 doesn't
mean that we still need it on supported versions of Tcl; nor was it
ever anything but a hack for us to be overriding what TCL_LIBS says.


Ok, I propose to proceed with the attached patch (with a bit more 
explanation added) for the master branch (for now) and see how it goes.


done

Re: Fix typo in pg_publication.c

2022-06-23 Thread Michael Paquier

On Thu, Jun 23, 2022 at 05:35:37PM +1000, Peter Smith wrote:
> PSA trivial patch fixing a harmless #define typo.

Thanks, done.
--
Michael


signature.asc
Description: PGP signature

Re: fix crash with Python 3.11

2022-06-23 Thread Markus Wanner



On 6/21/22 18:33, Tom Lane wrote:

My inclination at this point is to not back-patch the second change
12d768e70 ("Don't use static storage for SaveTransactionCharacteristics").
It's not clear that the benefit would be worth even a small risk of
somebody being unhappy about the API break.


Actually, the backport of 2e517818f ("Fix SPI's handling of errors") 
already broke the API for code using SPICleanup, as that function had 
been removed. Granted, it's not documented, but still exported.


I propose to re-introduce a no-op placeholder similar to what we have 
for SPI_start_transaction, somewhat like the attached patch.


Regards

Markusdiff --git a/src/backend/executor/spi.c b/src/backend/executor/spi.c
index dd5ef762707..f73c1e79e18 100644
--- a/src/backend/executor/spi.c
+++ b/src/backend/executor/spi.c
@@ -422,6 +422,16 @@ SPI_rollback_and_chain(void)
 	_SPI_rollback(true);
 }
 
+/*
+ * SPICleanup is a no-op, kept for backwards compatibility. We rely on
+ * AtEOXact_SPI to cleanup. Extensions should not (need to) fiddle with the
+ * internal SPI state directly.
+ */
+void
+SPICleanup(void)
+{
+}
+
 /*
  * Clean up SPI state at transaction commit or abort.
  */
diff --git a/src/include/executor/spi.h b/src/include/executor/spi.h
index 153eb5c7ad5..1e66a7d2ea0 100644
--- a/src/include/executor/spi.h
+++ b/src/include/executor/spi.h
@@ -205,6 +205,7 @@ extern void SPI_commit_and_chain(void);
 extern void SPI_rollback(void);
 extern void SPI_rollback_and_chain(void);
 
+extern void SPICleanup(void);
 extern void AtEOXact_SPI(bool isCommit);
 extern void AtEOSubXact_SPI(bool isCommit, SubTransactionId mySubid);
 extern bool SPI_inside_nonatomic_context(void);

Re: PG 15 (and to a smaller degree 14) regression due to ExprEvalStep size

2022-06-23 Thread Michael Paquier

On Tue, Jun 21, 2022 at 05:41:07PM -0400, Andrew Dunstan wrote:
> On 2022-06-21 Tu 17:25, Andres Freund wrote:
>> On 2022-06-21 17:11:33 -0400, Andrew Dunstan wrote:
>>> I and a couple of colleagues have looked it over. As far as it goes the
>>> json fix looks kosher to me. I'll play with it some more.
>>
>> Cool.
>>
>> Any chance you could look at fixing the "structure" of the generated
>> expression "program". The recursive ExecEvalExpr() calls are really not ok...

By how much does the size of ExprEvalStep go down once you don't
inline the JSON structures as of 0004 in [1]?  And what of 0003?  The
JSON portions seem like the largest portion of the cake, though both
are must-fixes.

> Yes, but I don't guarantee to have a fix in time for Beta2.

IMHO, it would be nice to get something done for beta2.  Now the
thread is rather fresh and I guess that more performance study is 
required even for 0004, so..  Waiting for beta3 would a better move at
this stage.  Is somebody confident enough in the patches proposed?
0004 looks rather sane, seen from here, at least.

[1]: 
https://www.postgresql.org/message-id/20220617200605.3moq7dtxua5cx...@alap3.anarazel.de
--
Michael

signature.asc
Description: PGP signature

Fix typo in pg_publication.c

2022-06-23 Thread Peter Smith

PSA trivial patch fixing a harmless #define typo.

--
Kind Regards,
Peter Smith.
Fujitsu Australia


v1-0001-Fix-typo-in-pg_publication.c.patch
Description: Binary data

Re: gcc -ftabstop option

2022-06-23 Thread Peter Eisentraut


On 22.06.22 21:48, Daniel Gustafsson wrote:

On 21 Jun 2022, at 12:49, Peter Eisentraut  
wrote:



Second, it enables the compiler's detection of confusingly indented code to 
work more correctly.


Wouldn't we also need to add -Wmisleading-indentation for this to trigger any
warnings?


That is included in -Wall.

RE: Support logical replication of DDLs

2022-06-23 Thread houzj.f...@fujitsu.com

On Thursday, June 23, 2022 6:22 AM Zheng Li  wrote:

Hi,

> 
> > Here are some points in my mind about the two approaches discussed here.
> >
> > 1) search_patch vs schema qualify
> >
> > Again, I still think it will bring more flexibility and security by
> > schema qualify the objects in DDL command as mentioned before[1].
> 
> I wonder what security concerns you have? We certainly don't want to log the
> search_path if there are serious security issues.

I was thinking the case when the publisher has two schema "s1, s2" while
subscriber only has schema "s2". If we set publisher's search_patch to
's1, s2' and execute CREATE TABLE xxx (); If we replicate the original SQL
with search_path to subcriber, it would silently create the table on
schema s2 instead of reporting an error "schema s1 doesn't exist" which
looks dangerous to me.


> 
> > > "Create Table As .." is already handled by setting the skipData flag
> > > of the statement parsetreee before replay:
> >
> > 2) About the handling of CREATE TABLE AS:
> >
> > I think it's not a appropriate approach to set the skipdata flag on
> > subscriber as it cannot handle EXECUTE command in CTAS.
> >
> > CREATE TABLE q5_prep_results AS EXECUTE q5(200, 'DT');
> >
> > The Prepared statement is a temporary object which we don't replicate.
> > So if you directly execute the original SQL on subscriber, even if you
> > set skipdata it will fail.
> >
> > I think it difficult to make this work as you need handle the
> > create/drop of this prepared statement. And even if we extended
> > subscriber's code to make it work, it doesn't seems like a standard and
> elegant approach.
> 
> This is indeed an interesting case, thanks for pointing this out. One light 
> weight
> solution I can think of is to directly deparse the parsetree on the publisher 
> into
> a simple CREATE TABLE statement without the prepared statement and then
> replicate the simple CREATE TABLE statement .
> This doesn't have to involve the json format though.

I thought about this solution as well. But I am not very sure about this,
I feel it looks a bit hacky to directly do this instead of using a standard
event trigger(Or introduce a new type event trigger).


> > > "Alter Table .. " that rewrites with volatile expressions can also
> > > be handled without any syntax change, by enabling the table rewrite
> > > replication and converting the rewrite inserts to updates. ZJ's patch
> introduced this solution.
> >
> > 3) About the handling of ALTER TABLE rewrite.
> >
> > The approach I proposed before is based on the event trigger +
> > deparser approach. We were able to improve that approach as we don't
> > need to replicate the rewrite in many cases. For example: we don't
> > need to replicate rewrite dml if there is no volatile/mutable
> > function. We should check and filter these case at publisher (e.g. via
> deparser) instead of checking that at subscriber.
> 
> Surely we can make the check about volatile/mutable functions on the
> publisher side as well. It doesn't have to be done via the deparser.
> 
> > Besides, as discussed, we need to give warning or error for the cases
> > when DDL contains volatile function which would be executed[2]. We
> > should check this at publisher as well(via deparser).
> 
> Again, I think the check doesn't have to be done via the deparser.

Personally, I think it's not great to add lots of logical replication
related code(check for rewrite DDL, check for function volatility) in
utility.c or tablecmds.c which seems a bit ad-hoc.


Best regards,
Hou zj

Re: Perform streaming logical transactions by background workers and parallel apply

2022-06-23 Thread Peter Smith

Here are some review comments for v12-0002

==

1. Commit message

"streaming" option -> "streaming" parameter

~~~

2. General (every file in this patch)

"streaming" option -> "streaming" parameter

~~~

3. .../subscription/t/022_twophase_cascade.pl

For every test file in this patch the new function is passed $is_apply
= 0/1 to indicate to use 'on' or 'apply' parameter value. But in this
test file the parameter is passed as $streaming_mode = 'on'/'apply'.

I was wondering if (for the sake of consistency) it might be better to
use the same parameter kind for all of the test files. Actually, I
don't care if you choose to do nothing and leave this as-is; I am just
posting this review comment in case it was not a deliberate decision
to implement them differently.

e.g.
+ my ($node_publisher, $node_subscriber, $appname, $is_apply) = @_;

versus
+ my ($node_A, $node_B, $node_C, $appname_B, $appname_C, $streaming_mode) =
+   @_;

--
Kind Regards,
Peter Smith.
Fujitsu Australia

Re: tablesync copy ignores publication actions

2022-06-23 Thread Amit Kapila

On Thu, Jun 23, 2022 at 8:43 AM shiy.f...@fujitsu.com
 wrote:
>
> On Wed, Jun 22, 2022 4:49 PM Peter Smith  wrote:
> >
> > >
> > > This patch looks mostly good to me except for a few minor comments
> > > which are mentioned below. It is not very clear in which branch(es) we
> > > should commit this patch? As per my understanding, this is a
> > > pre-existing behavior but we want to document it because (a) It was
> > > not already documented, and (b) we followed it for row filters in
> > > PG-15 it seems that should be explained. So, we have the following
> > > options (a) commit it only for PG-15, (b) commit for PG-15 and
> > > backpatch the relevant sections, or (c) commit it when branch opens
> > > for PG-16. What do you or others think?
> >
> > Even though this is a very old docs omission, AFAIK nobody ever raised
> > it as a problem before. It only became more important because of the
> > PG15 row-filters. So I think option (a) is ok.
> >
>
> I also think option (a) is ok.
>
> >
> > PSA patch v4 to address all the above review comments.
> >
>
> Thanks for updating the patch. It looks good to me.
>

The patch looks good to me as well. I will push this patch in HEAD (as
per option (a)) tomorrow unless I see any more suggestions/comments.

-- 
With Regards,
Amit Kapila.

84 matches

Mail list logo