enable/disable broken for statement triggers on partitioned tables

2022-05-23 Thread Amit Langote
Hi,

Simon reported $subject off-list.

For triggers on partitioned tables, various enable/disable trigger
variants recurse to also process partitions' triggers by way of
ATSimpleRecursion() done in the "prep" phase.  While that way of
recursing is fine for row-level triggers which are cloned to
partitions, it isn't for statement-level triggers which are not
cloned, so you get an unexpected error as follows:

create table p (a int primary key) partition by list (a);
create table p1 partition of p for values in (1);
create function trigfun () returns trigger language plpgsql as $$
begin raise notice 'insert on p'; end; $$;
create trigger trig before insert on p for statement execute function trigfun();
alter table p disable trigger trig;
ERROR:  trigger "trig" for table "p1" does not exist

The problem is that ATPrepCmd() is too soon to perform the recursion
in this case as it's not known at that stage if the trigger being
enabled/disabled is row-level or statement level, so it's better to
perform it during ATExecCmd().  Actually, that is how it used to be
done before bbb927b4db9b changed things to use ATSimpleRecursion() to
fix a related problem, which was that the ONLY specification was
ignored by the earlier implementation.  The information of whether
ONLY is specified in a given command is only directly available in the
"prep" phase and must be remembered somehow if the recursion must be
handled in the "exec" phase.  The way that's typically done that I see
in tablecmds.c is to have ATPrepCmd() change the AlterTableCmd.subtype
to a recursive variant of a given sub-command.  For example,
AT_ValidateConstraint by AT_ValidateConstraintRecurse if ONLY is not
specified.

So, I think we should do something like the attached.  A lot of
boilerplate is needed given that the various enable/disable trigger
variants are represented as separate sub-commands (AlterTableCmd
subtypes), which can perhaps be avoided by inventing a
EnableDisableTrigStmt sub-command node that stores (only?) the recurse
flag.

-- 
Thanks, Amit Langote
EDB: http://www.enterprisedb.com


v1-0001-Fix-ENABLE-DISABLE-TRIGGER-to-handle-recursion-co.patch
Description: Binary data


re: partition wise aggregate wrong rows cost

2022-05-23 Thread bucoo
> We try fairly hard to ensure that the row count estimate for a given relation
> does not vary across paths, so I concur with the OP that this is a bug.  
> Having
> said that, I'm not sure that the consequences are significant.  As you say, 
> the
> estimates seem to get a lot closer as soon as the table has some statistics.
> (But nonetheless, they are not identical, so it's still a bug.)

Yes, the estimates seem to get a lot closer as soon as the table has some 
statistics.

> I'm not sure that the consequences are significant.
At least it doesn't make any difference to me for now.
I noticed this problem while testing aggregation.
It looks a little weird, so I emailed.

Thanks every one.





Improve TAP tests of pg_upgrade for cross-version tests

2022-05-23 Thread Michael Paquier
Hi all,
(Adding Andrew Dunstan in CC.)

I have been toying with $subject, trying to improve the ways to test
pg_upgrade across different major versions as perl makes that easier.
The buildfarm does three things to allow such tests to work (see
TestUpgradeXversion.pm):
- Apply a filter to the dumps generated to make them perfectly equal
as the set of regexps on the SQL dumps (removal of empty lines and
comments in the SQL dumps as arguments of a diff command).
- Apply diffs dumps to remove or modify objects, to avoid
inconsistencies, which is what upgrade_adapt.sql does in the tree.
- Add tweaks to the dump commands used, like --extra-float-digits=0
when testing with a version <= 11 as origin (aka c6f9464b in the
buildfarm client).

Attached is a basic patch to show what can be used to improve the TAP
tests of pg_upgrade in this area, with contents coming mostly from the
buildfarm client.  The end picture would be to allow all those tests
to use the core code, rather than duplicating that in the buildfarm
client.  This reduces a lot the amount of noise that can be seen when
comparing the dumps taken (the tests pass with v14) while remaining
simple, down to v11, so that could be a first step.  There are a
couple of things where I am not sure how the buildfarm handles things,
but perhaps the dumps of installcheck have been tweaked to ignore such
cases?  Here is an exhaustive list:
- multirange_type_name when using PG <= v13 as origin, for CREATE
TYPE.
- CREATE/ALTER PROCEDURE append IN to the list of parameters dumped,
when using PG <= v13 as origin.
- CREATE OPERATOR CLASS and ALTER OPERATOR FAMILY, where FUNCTION 2 is
moved from one command to the other.

Thoughts?
--
Michael
From a0876368a15a980e57e057481b47be8dd35cdbf9 Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 24 May 2022 15:00:00 +0900
Subject: [PATCH] Add more filtering capabilities in the dumps.

This adds support for some filtering capabilities, for some tests done
down to v11.  This allows the tests to pass with v14, while v11~13 still
generate a few diffs.
---
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 80 --
 1 file changed, 63 insertions(+), 17 deletions(-)

diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 75ac768a96..3d9b443048 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -26,6 +26,39 @@ sub generate_db
 	$node->command_ok([ 'createdb', $dbname ]);
 }
 
+# Filter the contents of a dump before its use in a content comparison.
+# This returns the path to the filtered dump.
+sub filter_dump
+{
+	my ($node, $tempdir, $dump_file_name) = @_;
+	my $dump_file = "$tempdir/$dump_file_name";
+	my $dump_contents = slurp_file($dump_file);
+
+	# Remove the comments.
+	$dump_contents =~ s/^\-\-.*//mgx;
+	# Remove empty lines.
+	$dump_contents =~ s/^\n//mgx;
+
+	# Locale setup has changed for collations in 15~.
+	$dump_contents =~
+	  s/(^CREATE\sCOLLATION\s.*?)\slocale\s=\s'und'/$1 locale = ''/mgx
+	  if ($node->pg_version < 15);
+
+	# Dumps taken from <= 11 use EXECUTE PROCEDURE.  Replace it
+	# with EXECUTE FUNCTION.
+	$dump_contents =~
+	  s/(^CREATE\sTRIGGER\s.*?)\sEXECUTE\sPROCEDURE/$1 EXECUTE FUNCTION/mgx
+	  if ($node->pg_version < 12);
+
+	my $dump_file_filtered = "$tempdir/${dump_file_name}_filter";
+	open(my $dh, '>', $dump_file_filtered)
+	  || die "opening $dump_file_filtered";
+	print $dh $dump_contents;
+	close($dh);
+
+	return $dump_file_filtered;
+}
+
 # The test of pg_upgrade requires two clusters, an old one and a new one
 # that gets upgraded.  Before running the upgrade, a logical dump of the
 # old cluster is taken, and a second logical dump of the new one is taken
@@ -134,7 +167,7 @@ if (defined($ENV{oldinstall}))
 	$oldnode->command_ok(
 		[
 			'psql', '-X',
-			'-f', "$srcdir/src/bin/pg_upgrade/upgrade_adapt.sql",
+			'-f',   "$srcdir/src/bin/pg_upgrade/upgrade_adapt.sql",
 			'regression'
 		]);
 }
@@ -147,13 +180,13 @@ my $oldbindir = $oldnode->config_data('--bindir');
 
 # Take a dump before performing the upgrade as a base comparison. Note
 # that we need to use pg_dumpall from the new node here.
-$newnode->command_ok(
-	[
-		'pg_dumpall', '--no-sync',
-		'-d', $oldnode->connstr('postgres'),
-		'-f', "$tempdir/dump1.sql"
-	],
-	'dump before running pg_upgrade');
+my @dump_command = (
+	'pg_dumpall', '--no-sync', '-d', $oldnode->connstr('postgres'),
+	'-f', "$tempdir/dump1.sql");
+# --extra-float-digits is needed when upgrading from a version older than 11.
+push(@dump_command, '--extra-float-digits', '0')
+  if ($oldnode->pg_version < 12);
+$newnode->command_ok(\@dump_command, 'dump before running pg_upgrade');
 
 # After dumping, update references to the old source tree's regress.so
 # to point to the new tree.
@@ -228,23 +261,36 @@ if (-d $log_path)
 }
 
 # Second dump from the upgraded instance.
-$newnode->command_ok(
-	[
-		'pg_dumpall', '--no-sync',
-		'-d', 

RE: bogus: logical replication rows/cols combinations

2022-05-23 Thread houzj.f...@fujitsu.com
On Friday, May 20, 2022 11:06 AM Amit Kapila  wrote:
> 
> On Thu, May 19, 2022 at 7:54 PM Tom Lane  wrote:
> 
> >  Even more to the point, how can we
> > have a subscriber do that by relying on view columns that don't exist
> > in older versions?
> >
> 
> We need a version check like (if
> (walrcv_server_version(LogRepWorkerWalRcvConn) >= 15)) for that.

Thanks for pointing it out. Here is the new version patch which add this 
version check.

Best regards,
Hou zj


v4-0002-Prohibit-combining-publications-with-different-colum.patch
Description:  v4-0002-Prohibit-combining-publications-with-different-colum.patch


Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-05-23 Thread Amit Kapila
On Tue, May 24, 2022 at 7:58 AM Masahiko Sawada  wrote:
>
> On Mon, May 23, 2022 at 2:39 PM Amit Kapila  wrote:
> >
> > On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
> >  wrote:
> > >
> > > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila  
> > > wrote in
> > > > I think if we don't have any better ideas then we should go with
> > > > either this or one of the other proposals in this thread. The other
> > > > idea that occurred to me is whether we can somehow update the snapshot
> > > > we have serialized on disk about this information. On each
> > > > running_xact record when we serialize the snapshot, we also try to
> > > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > > that we can check if there are committed xacts to be purged and if we
> > > > have previously serialized the snapshot for the prior running xact
> > > > record, if so, we can update it with the list of xacts that have
> > > > catalog changes. If this is feasible then I think we need to somehow
> > > > remember the point where we last serialized the snapshot (maybe by
> > > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > > may not be able to do this in back-branches because of the disk-format
> > > > change required for this.
> > > >
> > > > Thoughts?
>
> It seems to work, could you draft the patch?
>

I can help with the review and discussion.

-- 
With Regards,
Amit Kapila.




Re: First draft of the PG 15 release notes

2022-05-23 Thread John Naylor
On Mon, May 16, 2022 at 9:18 PM Bruce Momjian  wrote:
>
> Newer wording:
>
> Improve validation of UTF-8 text (even if only ASCII) by processing
> 16 bytes at a time (John Naylor)

Thanks! I also think Heikki should be mentioned as a coauthor here --
the ASCII coding was his work in large part.

-- 
John Naylor
EDB: http://www.enterprisedb.com




Re: PostgreSQL 15 Beta 1 release announcement draft

2022-05-23 Thread Jeff Janes
On Sat, May 14, 2022 at 2:52 PM Jonathan S. Katz 
wrote:

> Hi,
>
> Attached is a draft of the release announcement for the PostgreSQL 15
> Beta 1 release. The goal of this announcement is to raise awareness
> around many of the new features appearing in PostgreSQL 15 and to
> encourage people to test. The success of the PostgreSQL 15 GA depends
> heavily on people testing during the Beta period!
>

I have some belated feedback.  I was excited to try this on Windows (I
don't have a build system for that) and so followed the first link in the
message, to https://www.postgresql.org/download/.  At first glance there is
nothing about beta there, but there is a prominent Windows icon so I
click on that.  And then to EDB, but there is no apparent way to download
beta, just the released versions.  I poked around EDB a bit but didn't find
anything promising, then backed out of all that poking around and
eventually all the way back to /download, where I scrolled down and finally
found the link to https://www.postgresql.org/download/snapshots/ which
tells me what I need to know.  But at this point I was more annoyed than
excited.

An invitation to download the beta should take me directly to the page
relevant to doing that.  I shouldn't have to read the page backwards, or do
a breadth-first traversal, to get to the right place efficiently.  People
will click on the first link which seems relevant, and "Windows" on the
generic download page certainly seems relevant to Beta for Windows, until
after you have scrolled down to find the beta/RC specific link instead.  (I
now recall being annoyed by this in a prior year as well, I guess I have a
bad memory for avoiding mistakes but a good memory for recalling them).
Also, the download page should probably say "binary packages and
installers" where it currently says  "There are source code and binary
packages of beta and release candidates", although I guess that is not
about the announcement itself.

Cheers,

Jeff


Re: partition wise aggregate wrong rows cost

2022-05-23 Thread Tom Lane
David Rowley  writes:
> On Tue, 24 May 2022 at 15:38, bucoo  wrote:
>> Normal aggregate and partition wise aggregate have a big difference rows 
>> cost:

> I wouldn't say this is a bug. Could you not say that they're both
> wrong given that your tables are empty?

We try fairly hard to ensure that the rowcount estimate for a given
relation does not vary across paths, so I concur with the OP that
this is a bug.  Having said that, I'm not sure that the consequences
are significant.  As you say, the estimates seem to get a lot closer
as soon as the table has some statistics.  (But nonetheless, they
are not identical, so it's still a bug.)

regards, tom lane




Re: partition wise aggregate wrong rows cost

2022-05-23 Thread David Rowley
On Tue, 24 May 2022 at 15:38, bucoo  wrote:
>
> Normal aggregate and partition wise aggregate have a big difference rows cost:

> explain (verbose)
> select count(1) from t1 group by id;
> HashAggregate  (cost=106.20..108.20 rows=200 width=12) --here rows is 200

> set enable_partitionwise_aggregate = on;
> explain (verbose)
> select count(1) from t1 group by id;
> Append  (cost=29.05..96.15 rows=600 width=12) --here rows is 600

I wouldn't say this is a bug. Could you not say that they're both
wrong given that your tables are empty?

What's going on here is that estimate_num_groups() is just returning
200, which is what it returns when there are no statistics to give any
indication of a better value. 200 is returned no matter if the
estimate is for a single partition or the partitioned table.  For the
partition-wise aggregate case, the 3 individual 200 estimates are just
summed up by the Append costing code to give 600.

The only way we could really do anything different here would be to
have estimate_num_groups() return a default value based on the number
of input rows. However, that 200 default is pretty long standing. We'd
need to consider quite a bit more than this case before we could
realistically consider changing it.

For tables that are being created and queried quickly after, we
normally tell people to run ANALYZE on the given tables to prevent
this sort of thing.

David




Re: Patch: Don't set LoadedSSL unless secure_initialize succeeds

2022-05-23 Thread Tom Lane
Daniel Gustafsson  writes:
>> On 22 May 2022, at 08:41, Gurjeet Singh  wrote:
>> The initialization in PostmasterMain() blindly turns on LoadedSSL,
>> irrespective of the outcome of secure_initialize().

> This call is invoked with isServerStart set to true so any error in
> secure_initialize should error out with ereport FATAL (in be_tls_init()).  
> That
> could be explained in a comment though, which is currently isn't.

The comments for secure_initialize() and be_tls_init() both explain
this already.

It's not great that be_tls_init() implements two different error
handling behaviors, perhaps.  One could imagine separating those.
But we've pretty much bought into such messes with the very fact
that elog/ereport sometimes return and sometimes not.

regards, tom lane




partition wise aggregate wrong rows cost

2022-05-23 Thread bucoo
Normal aggregate and partition wise aggregate have a big difference rows cost:

begin;
create table t1(id integer, name text) partition by hash(id);
create table t1_0 partition of t1 for values with(modulus 3, remainder 0);
create table t1_1 partition of t1 for values with(modulus 3, remainder 1);
create table t1_2 partition of t1 for values with(modulus 3, remainder 2);
commit;

normal aggregate rows cost is 200.
explain (verbose)
select count(1) from t1 group by id;
HashAggregate  (cost=106.20..108.20 rows=200 width=12) --here rows is 200
   Output: count(1), t1.id
   Group Key: t1.id
   ->  Append  (cost=0.00..87.15 rows=3810 width=4)
 ->  Seq Scan on public.t1_0 t1_1  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1_1.id
 ->  Seq Scan on public.t1_1 t1_2  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1_2.id
 ->  Seq Scan on public.t1_2 t1_3  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1_3.id

And partition wise aggregate rows cost is 600
set enable_partitionwise_aggregate = on;
explain (verbose)
select count(1) from t1 group by id;
Append  (cost=29.05..96.15 rows=600 width=12) --here rows is 600
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)  --this rows looks 
like same as normal aggregate
 Output: count(1), t1.id
 Group Key: t1.id
 ->  Seq Scan on public.t1_0 t1  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1.id
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)
 Output: count(1), t1_1.id
 Group Key: t1_1.id
 ->  Seq Scan on public.t1_1  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1_1.id
   ->  HashAggregate  (cost=29.05..31.05 rows=200 width=12)
 Output: count(1), t1_2.id
 Group Key: t1_2.id
 ->  Seq Scan on public.t1_2  (cost=0.00..22.70 rows=1270 width=4)
   Output: t1_2.id

Source code is 15beta1(7fdbdf204920ac279f280d0a8e96946fdaf41aef)





Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Jonathan S. Katz

On 5/23/22 8:04 PM, Nathan Bossart wrote:

On Mon, May 23, 2022 at 07:09:03PM -0400, Stephen Frost wrote:

Instead, I'd argue that we should be continuing to work in the direction
of splitting up what can only be done by a superuser today using
predefined roles and other methods along those lines.  How that lines up
with this latest ask around untrusted languages is something I'm not
exactly sure about, but a magic configure option that is
"--don't-allow-what-AWS-doesn't-want-to-allow" certainly doesn't seem
like it's going in the right direction (and, no, not every cloud
provider is going to want the exact same thing when it comes to whatever
this option is that we're talking about, so we'd end up having to have
configure options for each if we start going down this road...).


I guess I'd like to do both.  I agree with continuing the work with
predefined roles, etc., but I also think there is value in being able to
compile out things that allow arbitrary disk/network access.  My intent
with this thread is the latter, and I'm trying to tackle this in a way that
is generically useful even beyond the cloud provider use case.


(+1 on continuing to split up superuser into other predefined roles and 
other privs)


For other use cases, I suggest considering PostgreSQL deployments in 
environments that run on restricted filesystems, e.g. containers. When 
configured properly, a containerized filesystem will disallow writes 
outside of the data directory. However, a) they typically only restrict 
writes (which is often good enough) and b) this model holds so long as 
there are no exploits in the container itself.


The latter may not be our problem, but we can provide an additional risk 
mitigation for folks who deploy PostgreSQL in containers or other 
restricted environments the option to compile out features that do allow 
arbitrary disk access.


I agree with a bunch of the upthread sentiment, but I would ask if the 
current proposal provides acceptable risk mitigation for PostgreSQL 
deployments who want to restrict users having access to the filesystem?


Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


RE: Handle infinite recursion in logical replication setup

2022-05-23 Thread shiy.f...@fujitsu.com
On Tue, May 24, 2022 1:34 AM vignesh C  wrote:
> 
> Thanks for the comments, the attached v15 patch has the fixes for the same.
> 

Thanks for updating the patch. I have a comment on the document in 0002 patch.

@@ -300,6 +310,11 @@ CREATE SUBSCRIPTION subscription_namefalse.
  
+ 
+  There is some interaction between the only_local
+  option and copy_data option. Refer to the
+   for details.
+ 
 


This change is related to "only_local" option and "copy_data" option, so I think
it should be put together with "only_local", instead of "disable_on_error".

Besides, I tested some cross version cases, pg_dump and describe command, the
results are all as expected.

Regards,
Shi yu


Re: Patch: Don't set LoadedSSL unless secure_initialize succeeds

2022-05-23 Thread Michael Paquier
On Sun, May 22, 2022 at 09:17:37AM +0200, Daniel Gustafsson wrote:
> This call is invoked with isServerStart set to true so any error in
> secure_initialize should error out with ereport FATAL (in be_tls_init()).  
> That
> could be explained in a comment though, which is currently isn't.

All the inner routines of be_tls_init() would pull out a FATAL "goto
error", and it does not look like we have a hole here, so I am a bit
surprised by what's proposed, TBH.
--
Michael


signature.asc
Description: PGP signature


Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns

2022-05-23 Thread Masahiko Sawada
On Mon, May 23, 2022 at 2:39 PM Amit Kapila  wrote:
>
> On Mon, May 23, 2022 at 10:03 AM Kyotaro Horiguchi
>  wrote:
> >
> > At Sat, 21 May 2022 15:35:58 +0530, Amit Kapila  
> > wrote in
> > > I think if we don't have any better ideas then we should go with
> > > either this or one of the other proposals in this thread. The other
> > > idea that occurred to me is whether we can somehow update the snapshot
> > > we have serialized on disk about this information. On each
> > > running_xact record when we serialize the snapshot, we also try to
> > > purge the committed xacts (via SnapBuildPurgeCommittedTxn). So, during
> > > that we can check if there are committed xacts to be purged and if we
> > > have previously serialized the snapshot for the prior running xact
> > > record, if so, we can update it with the list of xacts that have
> > > catalog changes. If this is feasible then I think we need to somehow
> > > remember the point where we last serialized the snapshot (maybe by
> > > using builder->last_serialized_snapshot). Even, if this is feasible we
> > > may not be able to do this in back-branches because of the disk-format
> > > change required for this.
> > >
> > > Thoughts?

It seems to work, could you draft the patch?

> >
> > I didn't look it closer, but it seems to work. I'm not sure how much
> > spurious invalidations at replication start impacts on performance,
> > but it is promising if the impact is significant.
> >
>
> It seems Sawada-San's patch is doing at each commit not at the start
> of replication and I think that is required because we need this each
> time for replication restart. So, I feel this will be an ongoing
> overhead for spurious cases with the current approach.
>
> >  That being said I'm
> > a bit negative for doing that in post-beta1 stage.
> >
>
> Fair point. We can use the do it early in PG-16 if the approach is
> feasible, and backpatch something on lines of what Sawada-San or you
> proposed.

+1.

I proposed two approaches: [1] and [2,] and I prefer [1].
Horiguchi-san's idea[3] also looks good but I think it's better to
somehow deal with the problem he mentioned:

> One problem with this is that change creates the case where multiple
> ReorderBufferTXNs share the same first_lsn.  I haven't come up with a
> clean idea to avoid relaxing the restriction of AssertTXNLsnOrder..

Regards,

[1] 
https://www.postgresql.org/message-id/CAD21AoAn-k6OpZ6HSAH_G91tpTXR6KYvkf663kg6EqW-f6sz1w%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/CAD21AoD00wV4gt-53ze%2BZB8n4bqJrdH8J_UnDHddy8S2A%2Ba25g%40mail.gmail.com
[3] 
https://www.postgresql.org/message-id/20211008.165055.1621145185927268721.horikyota.ntt%40gmail.com

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/




Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs

2022-05-23 Thread Michael Paquier
On Mon, May 23, 2022 at 10:08:54AM -0400, Tom Lane wrote:
> In general, you can't do that (i.e. try to enforce constraints between
> GUC values via check hooks).  It's been tried in the past and failed
> miserably, because the hooks can't know whether the other value is
> about to be changed.

+1.  Aka, cough, 41aadee followed by 414c2fd.
--
Michael


signature.asc
Description: PGP signature


Re: Add --{no-,}bypassrls flags to createuser

2022-05-23 Thread Michael Paquier
On Mon, May 23, 2022 at 09:37:35AM -0700, Nathan Bossart wrote:
> Michael also requested a test for multiple -a switches and for fancier
> string patterns.  Once that is taken care of, I think this can be marked as
> ready-for-committer.

Looking at v7, this means to extend the tests to process lists for
--admin with more name patterns.  And while on it, we could do the
same for the existing command for --role, but this one is on me, being
overly-pedantic while looking at the patch :)
--
Michael


signature.asc
Description: PGP signature


Re: fix typos in storing statistics in shared memory

2022-05-23 Thread Michael Paquier
On Mon, May 23, 2022 at 10:22:36AM -0700, Nathan Bossart wrote:
> On Mon, May 23, 2022 at 10:46:22PM +0900, torikoshia wrote:
>> It seems like there are some duplications of 'the' in pgstat.c and
>> pgstat_internal.h.
>> Attaching a tiny patch to fix them.
> 
> LGTM

Thanks Torikoshi-san, fixed.
--
Michael


signature.asc
Description: PGP signature


Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs

2022-05-23 Thread Kyotaro Horiguchi
At Tue, 24 May 2022 10:19:53 +0900 (JST), Kyotaro Horiguchi 
 wrote in 
> At Mon, 23 May 2022 10:08:54 -0400, Tom Lane  wrote in 
> > Bharath Rupireddy  writes:
> > > Thanks. Having check hooks for min_wal_size and max_wal_size that
> > > throw errors if they aren't at least twice the wal_segment_size has a
> > > "BIG" problem - ./initdb -D data --wal-segsize=1 (or a value < 16)
> > > fails.
> > 
> > In general, you can't do that (i.e. try to enforce constraints between
> > GUC values via check hooks).  It's been tried in the past and failed
> > miserably, because the hooks can't know whether the other value is
> > about to be changed.
> 
> I thought that wal_segment_size is a semi-constant for a server life.
> But looking at the startup sequence closely, postmaster tries
> changing max_wal_size before reading control file.
> 
> Couldn't we use PGC_S_TEST for this purpose?  AlterSystemSetConfigFile
> is calling parse_and_validate_value() with PGC_S_FILE, but it is
> actually a "value to be used later"(@guc.h:93). So it can be thought
> that PG_S_TEST is the right choice there.  If it is still not work
> perfectly, we could have a new source value, say PGC_S_ALTER_SYSTEM,
> exactly for this use. (but I don't see a following users comes in
> future..)

This duscussion is based on the assumption that "wal_segment_size can
be assumed to be a constant when a check function is called with
PGC_S_FILE".

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Allow file inclusion in pg_hba and pg_ident files

2022-05-23 Thread Michael Paquier
On Mon, May 23, 2022 at 07:43:08PM +0800, Julien Rouhaud wrote:
> That being said, I'd gladly drop that enum and only handle a single error
> message, as the rest of the error context (including the owning file name and
> line) should provide enough information to users.
> 
> If so, should I use "included authentication file" everywhere, or use 
> something
> else?

With the file name provided the context, "authentication file"?

> The detail should be provided in the logs so it should disambiguate the
> situation.  While you're mentioning this message, AFAICS it can already be
> entirely wrong as-is, as it doesn't take into account the hba_file
> configuration:
> 
> ALTER SYSTEM SET hba_file = '/tmp/myfile';

Hmm, indeed.  And we load the GUCs before the HBA rules, obviously.
This could be made less confusing.  Do you think that this can be
improved independently of the main patch, building the include
structure on top of it?

>> The set of HBA rules are very sensitive to their
>> ordering, and ReadDir() depends on readdir(), which provides a list
>> of files depending on its FS implementation.  That does not seem like
>> a sane concept to me.  I can this this order (tracked by rule_number)
>> being strictly enforced when it comes to the loading of files, though,
>> so I would recommend to focus on the implementation of "include"
>> rather than "include_dir".
> 
> But the same problem already exists for the postgresql.conf's include_dir.
> 
> Having an hba rule masking another isn't really better than having a GUC
> value overloaded by another one.  Usually people rely on a sane naming (like
> 01_file.conf and so on) to get a predictable behavior, and we even document
> that very thoroughly for the postgresql.conf part.  Improving the
> documentation, as you already pointed out a bit above, should be enough?

Ah.  ParseConfigDirectory() does a qsort() before processing each
file included in a folder.  Unless I am missing something, your patch
reads the entries in the directory, but does not sort the files by
name to ensure a strict ordering of them.

While on it, it looks like you should have sanity checks similar to
ParseConfigDirectory() for CRLFs & friends, as of:
if (strspn(includedir, " \t\r\n") == strlen(includedir))

All this logic is very similar, so this could be perhaps refactored.

>> +  
>> +   Rule number, in priority order, of this rule if the rule is valid,
>> +   otherwise null
>> +  
>> This is a very important field.  I think that this explanation should
>> be extended and explain why relying on this number counts (aka this is
>> the order of the rules loaded across files).  Btw, this could be added
>> as a separate patch, even if this maps to the line number when it
>> comes to the ordering.
> 
> Agreed, I will improve the documentation to outline the importance of that
> information.
> 
> Do you mean a separate patch to ease review or to eventually commit both
> separately?  FWIW I don't think that adding any form of inclusion in the hba
> files should be done without a way for users to check the results.  Any test
> facility would also probably rely on this field.

Yeah, agreed that rule_number is important for the sake of the
inclusions.  Still, I was wondering if rule_number makes sense on its
own, meaning that we could add it before working on the inclusion
logic.  Having it without the inclusion is less interesting if you
have the line numbers to order the HBA rules, of course, as this is
enough to guess the priority of the HBA entries.

>> +   /*
>> +* Only parse files with names ending in ".conf".
>> +* Explicitly reject files starting with ".". This
>> +* excludes things like "." and "..", as well as typical
>> +* hidden files, backup files, and editor debris.
>> +*/
>> I don't think that there is any need to restrict that to files ending
>> with .conf.  We don't do that for postgresql.conf's include, for one.
> 
> I'm confused.  Are you talking about postgresql.conf's include or include_dir
> option?  I'm pretty sure that I borrowed this logic from
> src/backend/utils/misc/guc-file.l / ParseConfigDirectory(), which implements
> the include_dir logic for the postgresql.conf file.

Ah, ParseConfigDirectory() does that for the sake of avoiding any
debris files.  The reasoning (2a0c81a, aka
https://www.postgresql.org/message-id/4EC341E1.1060908%402ndQuadrant.com)
relates to debris files like the ones created by editors and such.  So
you are right to do so.

>> Finally I also added 0003, which is a POC for a new pg_hba_matches()
>> function, that can help DBA to understand why their configuration isn't
>> working as they expect.  This only to start the discussion on that topic, the
>> code is for now really hackish, as I don't know how much this is wanted
>> and/or if some other behavior would be better, and there's also no
>> documentation or test.  T

Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs

2022-05-23 Thread Kyotaro Horiguchi
At Mon, 23 May 2022 10:08:54 -0400, Tom Lane  wrote in 
> Bharath Rupireddy  writes:
> > Thanks. Having check hooks for min_wal_size and max_wal_size that
> > throw errors if they aren't at least twice the wal_segment_size has a
> > "BIG" problem - ./initdb -D data --wal-segsize=1 (or a value < 16)
> > fails.
> 
> In general, you can't do that (i.e. try to enforce constraints between
> GUC values via check hooks).  It's been tried in the past and failed
> miserably, because the hooks can't know whether the other value is
> about to be changed.

I thought that wal_segment_size is a semi-constant for a server life.
But looking at the startup sequence closely, postmaster tries
changing max_wal_size before reading control file.

Couldn't we use PGC_S_TEST for this purpose?  AlterSystemSetConfigFile
is calling parse_and_validate_value() with PGC_S_FILE, but it is
actually a "value to be used later"(@guc.h:93). So it can be thought
that PG_S_TEST is the right choice there.  If it is still not work
perfectly, we could have a new source value, say PGC_S_ALTER_SYSTEM,
exactly for this use. (but I don't see a following users comes in
future..)

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Jonathan S. Katz

On 5/20/22 1:56 AM, David Rowley wrote:

Hackers,

Over the past few days I've been gathering some benchmark results
together to show the sort performance improvements in PG15 [1].



So it basically looks like I discovered a very bad case that causes a
significant slowdown.  Yet other cases that are not an exact power of
2 stand to gain significantly from this change.

I'm unsure exactly what I should do about this right now.


While this is being actively investigated, I added this into open items.

Thanks,

Jonathan


OpenPGP_signature
Description: OpenPGP digital signature


Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 07:09:03PM -0400, Stephen Frost wrote:
> Instead, I'd argue that we should be continuing to work in the direction
> of splitting up what can only be done by a superuser today using
> predefined roles and other methods along those lines.  How that lines up
> with this latest ask around untrusted languages is something I'm not
> exactly sure about, but a magic configure option that is
> "--don't-allow-what-AWS-doesn't-want-to-allow" certainly doesn't seem
> like it's going in the right direction (and, no, not every cloud
> provider is going to want the exact same thing when it comes to whatever
> this option is that we're talking about, so we'd end up having to have
> configure options for each if we start going down this road...).

I guess I'd like to do both.  I agree with continuing the work with
predefined roles, etc., but I also think there is value in being able to
compile out things that allow arbitrary disk/network access.  My intent
with this thread is the latter, and I'm trying to tackle this in a way that
is generically useful even beyond the cloud provider use case.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Robert Haas  writes:
> > It's not just a question of whether it was meant to be SQL-callable --
> > it's also a question of what arguments it was expecting to be called
> > with. At the very least, you can cause the server to core dump if you
> > pass something that isn't a valid pointer to a function that is
> > expecting a pointer, which is something a CSP very likely does not
> > want a customer to be able to do. I think, however, that there's every
> > possibility that you can create more havoc than that. You can
> > basically call a function that's expecting a pointer with a pointer to
> > anything you can find or guess the memory address of. Maybe that's not
> > enough control to cause anything worse than a server crash, but I sure
> > wouldn't bet on it. There's a lot of functions floating around, and if
> > none of them can be tricked into doing filesystem access today, well
> > someone might add a new one tomorrow.
> 
> [ shrug... ]  So is your point that we shouldn't bother to do anything?
> I don't personally have a problem with leaving things where they stand
> in this area.  However, if we're going to do something, I think at
> minimum it should involve blocking off everything we can identify as
> straightforward reproducible methods to get disk access.

I have a hard time seeing the value in allowing catalog hacking, even
for a cloud provider, and making sure to cut off all possible ways that
could be abused strikes me as unlikely to be successful.

Instead, I'd argue that we should be continuing to work in the direction
of splitting up what can only be done by a superuser today using
predefined roles and other methods along those lines.  How that lines up
with this latest ask around untrusted languages is something I'm not
exactly sure about, but a magic configure option that is
"--don't-allow-what-AWS-doesn't-want-to-allow" certainly doesn't seem
like it's going in the right direction (and, no, not every cloud
provider is going to want the exact same thing when it comes to whatever
this option is that we're talking about, so we'd end up having to have
configure options for each if we start going down this road...).

I agree with the general idea of "has all of today's superuser rights
except the ability to hack catalogs or do disk access" being one
use-case we should be thinking about, along with "also can't do network
access" and "allowed to do network or disk access but can't directly
hack up the catalog", but I don't see us growing configure options for
all these things and would much rather we have a way to let folks
configure their systems along these different lines, ideally without
having to make that decision at build time.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
I had another, possibly-crazy idea.  I think that the API requirement
that the word before a chunk's start point to a MemoryContext is
overly strong.  What we need is that it point to something in which
a MemoryContextMethods pointer can be found (at a predefined offset).
Thus, if generation.c is willing to add the overhead of a
MemoryContextMethods pointer in GenerationBlock, it could dispense
with the per-chunk context field and just have the GenerationBlock
link there.  I guess most likely we'd also need a back-link to
the GenerationContext from GenerationBlock.  Still, two more
pointers per GenerationBlock is an easy tradeoff to make to save
one pointer per GenerationChunk.

I've not trawled the code to make sure that *only* the methods
pointer is touched by context-type-independent code, but it
seems like we could get to that even if we're not there today.

Whether this idea is something we could implement post-beta,
I'm not sure.  I'm inclined to think that we can't change the layout
of MemoryContextData post-beta, as people may already be building
extensions for production use.  We could bloat GenerationBlock even
more so that the methods pointer is at the right offset for today's
layout of MemoryContextData, though obviously going forward it'd be
better if there wasn't extra overhead needed to make this happen.

regards, tom lane




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Tom Lane
Robert Haas  writes:
> It's not just a question of whether it was meant to be SQL-callable --
> it's also a question of what arguments it was expecting to be called
> with. At the very least, you can cause the server to core dump if you
> pass something that isn't a valid pointer to a function that is
> expecting a pointer, which is something a CSP very likely does not
> want a customer to be able to do. I think, however, that there's every
> possibility that you can create more havoc than that. You can
> basically call a function that's expecting a pointer with a pointer to
> anything you can find or guess the memory address of. Maybe that's not
> enough control to cause anything worse than a server crash, but I sure
> wouldn't bet on it. There's a lot of functions floating around, and if
> none of them can be tricked into doing filesystem access today, well
> someone might add a new one tomorrow.

[ shrug... ]  So is your point that we shouldn't bother to do anything?
I don't personally have a problem with leaving things where they stand
in this area.  However, if we're going to do something, I think at
minimum it should involve blocking off everything we can identify as
straightforward reproducible methods to get disk access.

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
David Rowley  writes:
> On Tue, 24 May 2022 at 10:02, Tom Lane  wrote:
>> BTW, shouldn't GenerationCheck be ifdef'd out if MEMORY_CONTEXT_CHECKING
>> isn't set?  aset.c does things that way.

> Isn't it done in generation.c:954?

Ah, sorry, didn't look that far up ...

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread David Rowley
On Tue, 24 May 2022 at 10:02, Tom Lane  wrote:
> BTW, shouldn't GenerationCheck be ifdef'd out if MEMORY_CONTEXT_CHECKING
> isn't set?  aset.c does things that way.

Isn't it done in generation.c:954?

David




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
David Rowley  writes:
> GenerationRealloc: uses "size" to figure out if the new size is
> smaller than the old size.  Maybe we could just always move to a new
> chunk regardless of if the new size is smaller or larger than the old
> size.

I had the same idea ... but we need to know the old size to know how much
to copy.

> GenerationGetChunkSpace: Uses "size" to figure out the amount of
> memory is used by the chunk. I'm not sure how we'd work around the
> fact that USEMEM() uses that extensively in tuplesort.c to figure out
> how much memory we're using.

Ugh, that seems like a killer.

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
I wrote:
> However, here's a different idea: how badly do we need the "size"
> field in GenerationChunk?  We're not going to ever recycle the
> chunk, IIUC, so it doesn't matter exactly how big it is.  When
> doing MEMORY_CONTEXT_CHECKING we'll still want requested_size,
> but that's not relevant to performance-critical cases.

Refining that a bit: we could provide the size field only when
MEMORY_CONTEXT_CHECKING and/or CLOBBER_FREED_MEMORY are defined.
That would leave us with GenerationRealloc and GenerationGetChunkSpace
not being supportable operations, but I wonder how much we need either.

BTW, shouldn't GenerationCheck be ifdef'd out if MEMORY_CONTEXT_CHECKING
isn't set?  aset.c does things that way.

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Ranier Vilela
Em seg., 23 de mai. de 2022 às 18:01, Tomas Vondra <
tomas.von...@enterprisedb.com> escreveu:

> FYI not sure why, but your responses seem to break threading quite
> often, due to missing headers identifying the message you're responding
> to (In-Reply-To, References). Not sure why or how to fix it, but this
> makes it much harder to follow the discussion.
>
> On 5/22/22 21:11, Ranier Vilela wrote:
> > Hi David,
> >
> >>Over the past few days I've been gathering some benchmark results
> >>together to show the sort performance improvements in PG15 [1].
> >
> >>One of the test cases I did was to demonstrate Heikki's change to use
> >>a k-way merge (65014000b).
> >
> >>The test I did to try this out was along the lines of:
> >
> >>set max_parallel_workers_per_gather = 0;
> >>create table t (a bigint not null, b bigint not null, c bigint not
> >>null, d bigint not null, e bigint not null, f bigint not null);
> >
> >>insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x;
> > -- 10GB!
> >>vacuum freeze t;
> >
> >>The query I ran was:
> >
> >>select * from t order by a offset 140247142;
> >
> > I redid this test here:
> > Windows 10 64 bits
> > msvc 2019 64 bits
> > RAM 8GB
> > SSD 256 GB
> >
> > HEAD (default configuration)
> > Time: 229396,551 ms (03:49,397)
> > PATCHED:
> > Time: 220887,346 ms (03:40,887)
> >
>
> This is 10x longer than reported by David. Presumably David used a
> machine a lot of RAM, while your system has 8GB and so is I/O bound.
>
Probably, but Windows is slower than Linux, certainly.

>
> Also, what exactly does "patched" mean? The patch you attached?
>
It means the results of the benchmark with the patch applied.


>
> >>I tested various sizes of work_mem starting at 4MB and doubled that
> >>all the way to 16GB. For many of the smaller values of work_mem the
> >>performance is vastly improved by Heikki's change, however for
> >>work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
> >>seconds and PG15 beta 1 took 29 seconds!
> >
> >>I've been trying to get to the bottom of this today and finally have
> >>discovered this is due to the tuple size allocations in the sort being
> >>exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
> >>to store tuples in sorts) the tuple for the sort would be stored in an
> >>aset context. After 40af10b57 we'll use a generation context. The
> >>idea with that change is that the generation context does no
> >>power-of-2 round ups for allocations, so we save memory in most cases.
> >>However, due to this particular test having a tuple size of 64-bytes,
> >>there was no power-of-2 wastage with aset.
> >
> >>The problem is that generation chunks have a larger chunk header than
> >>aset do due to having to store the block pointer that the chunk
> >>belongs to so that GenerationFree() can increment the nfree chunks in
> >>the block. aset.c does not require this as freed chunks just go onto a
> >>freelist that's global to the entire context.
> >
> >>Basically, for my test query, the slowdown is because instead of being
> >>able to store 620702 tuples per tape over 226 tapes with an aset
> >>context, we can now only store 576845 tuples per tape resulting in
> >>requiring 244 tapes when using the generation context.
> >
> >>If I had added column "g" to make the tuple size 72 bytes causing
> >>aset's code to round allocations up to 128 bytes and generation.c to
> >>maintain the 72 bytes then the sort would have stored 385805 tuples
> >>over 364 batches for aset and 538761 tuples over 261 batches using the
> >>generation context. That would have been a huge win.
> >
> >>So it basically looks like I discovered a very bad case that causes a
> >>significant slowdown. Yet other cases that are not an exact power of
> >>2 stand to gain significantly from this change.
> >
> >>One thing 40af10b57 does is stops those terrible performance jumps
> >>when the tuple size crosses a power-of-2 boundary. The performance
> >>should be more aligned to the size of the data being sorted now...
> >>Unfortunately, that seems to mean regressions for large sorts with
> >>power-of-2 sized tuples.
> >
> > It seems to me that the solution would be to use aset allocations
> >
> > when the size of the tuples is power-of-2?
> >
> > if (state->sortopt & TUPLESORT_ALLOWBOUNDED ||
> > (state->memtupsize & (state->memtupsize - 1)) == 0)
> > state->tuplecontext = AllocSetContextCreate(state->sortcontext,
> > "Caller tuples",  ALLOCSET_DEFAULT_SIZES);
> > else
> > state->tuplecontext = GenerationContextCreate(state->sortcontext,
> >  "Caller tuples",  ALLOCSET_DEFAULT_SIZES);
> >
>
> I'm pretty sure this is pointless, because memtupsize is the size of the
> memtuples array. But the issue is about size of the tuples. After all,
> David was talking about 64B chunks, but the array is always at least
> 1024 elements, so it obviously can't be the same thing.
>
It was more of a guessing attempt.


> How would we even know how large the tuples will be at this point,
>

Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread David Rowley
On Tue, 24 May 2022 at 09:36, Tom Lane  wrote:
> However, here's a different idea: how badly do we need the "size"
> field in GenerationChunk?  We're not going to ever recycle the
> chunk, IIUC, so it doesn't matter exactly how big it is.  When
> doing MEMORY_CONTEXT_CHECKING we'll still want requested_size,
> but that's not relevant to performance-critical cases.

Interesting idea. However, I do see a couple of usages of the "size"
field away from MEMORY_CONTEXT_CHECKING builds:

GenerationRealloc: uses "size" to figure out if the new size is
smaller than the old size.  Maybe we could just always move to a new
chunk regardless of if the new size is smaller or larger than the old
size.

GenerationGetChunkSpace: Uses "size" to figure out the amount of
memory is used by the chunk. I'm not sure how we'd work around the
fact that USEMEM() uses that extensively in tuplesort.c to figure out
how much memory we're using.

David




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Robert Haas
On Mon, May 23, 2022 at 4:46 PM Tom Lane  wrote:
> Maybe I shouldn't be putting words into Nathan's mouth, but I think
> what he is after is a mode intended for use by cloud service providers,
> who would like to offer locked-down database services where there's
> just no way to get to the disk from inside the DB, superuser or no.

The cloud service provider use case is also what I was thinking about.

> What you're talking about is perhaps interesting to a different set of
> people, but it doesn't offer any guarantees because it's always possible
> that $attacker manages to hack his way into access to a superuser role.

I mean, you can hypothesize that any sort of restriction can be
bypassed, regardless of how they're implemented. I don't think this is
a valid way of discriminating among possible solutions.

> The main flaw I'm aware of in that argument is that it used to be possible
> for superusers to create C-language pg_proc entries pointing at random C
> entry point symbols, eg open(2) or write(2), and then invoke those
> functions from SQL --- maybe with only restricted possibilities for the
> arguments, but you just need to find one combination that works.
> When we got rid of v0 function call support, that became at least far
> more difficult to exploit, but I'm not sure if it's entirely impossible.
> A component of this exercise would need to be making sure that that's
> bulletproof, ie you can't make a usable pg_proc entry that points at
> something that wasn't meant to be a SQL-callable function.

It's not just a question of whether it was meant to be SQL-callable --
it's also a question of what arguments it was expecting to be called
with. At the very least, you can cause the server to core dump if you
pass something that isn't a valid pointer to a function that is
expecting a pointer, which is something a CSP very likely does not
want a customer to be able to do. I think, however, that there's every
possibility that you can create more havoc than that. You can
basically call a function that's expecting a pointer with a pointer to
anything you can find or guess the memory address of. Maybe that's not
enough control to cause anything worse than a server crash, but I sure
wouldn't bet on it. There's a lot of functions floating around, and if
none of them can be tricked into doing filesystem access today, well
someone might add a new one tomorrow.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
David Rowley  writes:
> Handy wavy idea: It's probably too complex for now, and it also might
> be too much overhead, but having GenerationPointerGetChunk() do a
> binary search on a sorted-by-memory-address array of block pointers
> might be a fast enough way to find the block that the pointer belongs
> to. There should be far fewer blocks now since generation.c now grows
> the block sizes. N in O(log2 N) the search complexity should never be
> excessively high.

> However, it would mean a binary search for every pfree, which feels
> pretty horrible. My feeling is that it seems unlikely that saving 8
> bytes by not storing the GenerationBlock would be a net win here.

I think probably that could be made to work, since a Generation
context should not contain all that many live blocks at any one time.

However, here's a different idea: how badly do we need the "size"
field in GenerationChunk?  We're not going to ever recycle the
chunk, IIUC, so it doesn't matter exactly how big it is.  When
doing MEMORY_CONTEXT_CHECKING we'll still want requested_size,
but that's not relevant to performance-critical cases.

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread David Rowley
On Tue, 24 May 2022 at 08:52, Tomas Vondra
 wrote:
>
> On 5/23/22 22:47, Tom Lane wrote:
> > How would you know which context type to consult for that?
> >
>
> D'oh! I knew there has to be some flaw in that idea, but I forgot about
> this chicken-or-egg issue.

Handy wavy idea: It's probably too complex for now, and it also might
be too much overhead, but having GenerationPointerGetChunk() do a
binary search on a sorted-by-memory-address array of block pointers
might be a fast enough way to find the block that the pointer belongs
to. There should be far fewer blocks now since generation.c now grows
the block sizes. N in O(log2 N) the search complexity should never be
excessively high.

However, it would mean a binary search for every pfree, which feels
pretty horrible. My feeling is that it seems unlikely that saving 8
bytes by not storing the GenerationBlock would be a net win here.
There may be something to claw back as for the tuplesort.c case as I
think the heap_free_minimal_tuple() call in writetup_heap() is not
needed in many cases as we reset the tuple context directly afterward
writing the tuples out.

David




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tomas Vondra
FYI not sure why, but your responses seem to break threading quite
often, due to missing headers identifying the message you're responding
to (In-Reply-To, References). Not sure why or how to fix it, but this
makes it much harder to follow the discussion.

On 5/22/22 21:11, Ranier Vilela wrote:
> Hi David,
> 
>>Over the past few days I've been gathering some benchmark results
>>together to show the sort performance improvements in PG15 [1].
> 
>>One of the test cases I did was to demonstrate Heikki's change to use
>>a k-way merge (65014000b).
> 
>>The test I did to try this out was along the lines of:
> 
>>set max_parallel_workers_per_gather = 0;
>>create table t (a bigint not null, b bigint not null, c bigint not
>>null, d bigint not null, e bigint not null, f bigint not null);
> 
>>insert into t select x,x,x,x,x,x from generate_Series(1,140247142) x;
> -- 10GB!
>>vacuum freeze t;
> 
>>The query I ran was:
> 
>>select * from t order by a offset 140247142;
> 
> I redid this test here:
> Windows 10 64 bits
> msvc 2019 64 bits
> RAM 8GB
> SSD 256 GB
> 
> HEAD (default configuration)
> Time: 229396,551 ms (03:49,397)
> PATCHED:
> Time: 220887,346 ms (03:40,887)
> 

This is 10x longer than reported by David. Presumably David used a
machine a lot of RAM, while your system has 8GB and so is I/O bound.

Also, what exactly does "patched" mean? The patch you attached?


>>I tested various sizes of work_mem starting at 4MB and doubled that
>>all the way to 16GB. For many of the smaller values of work_mem the
>>performance is vastly improved by Heikki's change, however for
>>work_mem = 64MB I detected quite a large slowdown. PG14 took 20.9
>>seconds and PG15 beta 1 took 29 seconds!
> 
>>I've been trying to get to the bottom of this today and finally have
>>discovered this is due to the tuple size allocations in the sort being
>>exactly 64 bytes. Prior to 40af10b57 (Use Generation memory contexts
>>to store tuples in sorts) the tuple for the sort would be stored in an
>>aset context. After 40af10b57 we'll use a generation context. The
>>idea with that change is that the generation context does no
>>power-of-2 round ups for allocations, so we save memory in most cases.
>>However, due to this particular test having a tuple size of 64-bytes,
>>there was no power-of-2 wastage with aset.
> 
>>The problem is that generation chunks have a larger chunk header than
>>aset do due to having to store the block pointer that the chunk
>>belongs to so that GenerationFree() can increment the nfree chunks in
>>the block. aset.c does not require this as freed chunks just go onto a
>>freelist that's global to the entire context.
> 
>>Basically, for my test query, the slowdown is because instead of being
>>able to store 620702 tuples per tape over 226 tapes with an aset
>>context, we can now only store 576845 tuples per tape resulting in
>>requiring 244 tapes when using the generation context.
> 
>>If I had added column "g" to make the tuple size 72 bytes causing
>>aset's code to round allocations up to 128 bytes and generation.c to
>>maintain the 72 bytes then the sort would have stored 385805 tuples
>>over 364 batches for aset and 538761 tuples over 261 batches using the
>>generation context. That would have been a huge win.
> 
>>So it basically looks like I discovered a very bad case that causes a
>>significant slowdown. Yet other cases that are not an exact power of
>>2 stand to gain significantly from this change.
> 
>>One thing 40af10b57 does is stops those terrible performance jumps
>>when the tuple size crosses a power-of-2 boundary. The performance
>>should be more aligned to the size of the data being sorted now...
>>Unfortunately, that seems to mean regressions for large sorts with
>>power-of-2 sized tuples.
> 
> It seems to me that the solution would be to use aset allocations
> 
> when the size of the tuples is power-of-2?
> 
> if (state->sortopt & TUPLESORT_ALLOWBOUNDED ||
> (state->memtupsize & (state->memtupsize - 1)) == 0)
> state->tuplecontext = AllocSetContextCreate(state->sortcontext,
> "Caller tuples",  ALLOCSET_DEFAULT_SIZES);
> else
> state->tuplecontext = GenerationContextCreate(state->sortcontext,
>  "Caller tuples",  ALLOCSET_DEFAULT_SIZES);
> 

I'm pretty sure this is pointless, because memtupsize is the size of the
memtuples array. But the issue is about size of the tuples. After all,
David was talking about 64B chunks, but the array is always at least
1024 elements, so it obviously can't be the same thing.

How would we even know how large the tuples will be at this point,
before we even see the first of them?

> I took a look and tried some improvements to see if I had a better result.
> 

IMHO special-casing should be the last resort, because it makes the
behavior much harder to follow. Also, we're talking about sort, but
don't other places using Generation context have the same issue?
Treating prefferrable to  find a fix addressing all those places,


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.ente

Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tomas Vondra



On 5/23/22 22:47, Tom Lane wrote:
> Tomas Vondra  writes:
>> On 5/20/22 12:01, Heikki Linnakangas wrote:
>>> Could the 'context' field be moved from GenerationChunk to GenerationBlock?
> 
>> Not easily, because GetMemoryChunkContext() expects the context to be
>> stored right before the chunk. In principle we could add "get context"
>> callback to MemoryContextMethods, so that different implementations can
>> override that.
> 
> How would you know which context type to consult for that?
> 

D'oh! I knew there has to be some flaw in that idea, but I forgot about
this chicken-or-egg issue.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread David Rowley
On Tue, 24 May 2022 at 08:32, Tomas Vondra
 wrote:
>
> On 5/20/22 12:01, Heikki Linnakangas wrote:
> > Could the 'context' field be moved from GenerationChunk to GenerationBlock?
> >
>
> Not easily, because GetMemoryChunkContext() expects the context to be
> stored right before the chunk. In principle we could add "get context"
> callback to MemoryContextMethods, so that different implementations can
> override that.

hmm, but we need to know the context first before we can know which
callback to call.

David




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tom Lane
Tomas Vondra  writes:
> On 5/20/22 12:01, Heikki Linnakangas wrote:
>> Could the 'context' field be moved from GenerationChunk to GenerationBlock?

> Not easily, because GetMemoryChunkContext() expects the context to be
> stored right before the chunk. In principle we could add "get context"
> callback to MemoryContextMethods, so that different implementations can
> override that.

How would you know which context type to consult for that?

regards, tom lane




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Tom Lane
Robert Haas  writes:
> I definitely think there's a need for a user who can manipulate
> objects in the database much like a superuser (i.e. access all
> objects, grant and revoke all privileges, etc.) but who can't break
> out into the OS user account and assume it's privileges. I'm not sure
> whether it's best to try to get there by creating a mode where the
> superuser's privileges are trimmed back, or to get there by still
> having a super-user who is just as super as at present but then also
> have the ability to create other users who are not superusers but have
> many of the same privileges with respect to in-database objects.

Maybe I shouldn't be putting words into Nathan's mouth, but I think
what he is after is a mode intended for use by cloud service providers,
who would like to offer locked-down database services where there's
just no way to get to the disk from inside the DB, superuser or no.

What you're talking about is perhaps interesting to a different set of
people, but it doesn't offer any guarantees because it's always possible
that $attacker manages to hack his way into access to a superuser role.

> It seems to me that you've got to think not only about vectors for
> executing arbitrary C code and/or shell commands, but also the
> superuser's power to mess with the catalogs. If you can UPDATE
> pg_proc, you can certainly hack the system, I think.

I think if all the functions that would let you get to the disk are
disabled at the C-code level, it doesn't much matter what the catalogs
say about them.

The main flaw I'm aware of in that argument is that it used to be possible
for superusers to create C-language pg_proc entries pointing at random C
entry point symbols, eg open(2) or write(2), and then invoke those
functions from SQL --- maybe with only restricted possibilities for the
arguments, but you just need to find one combination that works.
When we got rid of v0 function call support, that became at least far
more difficult to exploit, but I'm not sure if it's entirely impossible.
A component of this exercise would need to be making sure that that's
bulletproof, ie you can't make a usable pg_proc entry that points at
something that wasn't meant to be a SQL-callable function.

> In my mind, it's
> reasonable as a matter of security policy to decide that you don't
> ever want plperlu on your system, only plperl.

Absolutely, but for that you can just not install plperlu's extension
support files.

If you're concerned about whether that decision is un-hackable, then
you soon realize that you need a bulletproof no-disk-access restriction.

regards, tom lane




Re: PG15 beta1 sort performance regression due to Generation context change

2022-05-23 Thread Tomas Vondra
On 5/20/22 12:01, Heikki Linnakangas wrote:
> On 20/05/2022 08:56, David Rowley wrote:
>> The problem is that generation chunks have a larger chunk header than
>> aset do due to having to store the block pointer that the chunk
>> belongs to so that GenerationFree() can increment the nfree chunks in
>> the block. aset.c does not require this as freed chunks just go onto a
>> freelist that's global to the entire context.
> 
> Could the 'context' field be moved from GenerationChunk to GenerationBlock?
> 

Not easily, because GetMemoryChunkContext() expects the context to be
stored right before the chunk. In principle we could add "get context"
callback to MemoryContextMethods, so that different implementations can
override that.

I wonder how expensive the extra redirect would be, but probably not
much because the places touching chunk->context deal with the block too
(e.g. GenerationFree has to tweak block->nfree).


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Robert Haas
On Mon, May 23, 2022 at 1:17 PM Tom Lane  wrote:
> There would probably be some interest in a "--disable-disk-access"
> configure option that did all of this stuff (and some more things
> too), with the aim of locking down *all* known paths to filesystem
> access.  I don't see much value in retail options that do some of that.
> In fact, what they might mostly accomplish is to give people a false
> sense of security.

I definitely think there's a need for a user who can manipulate
objects in the database much like a superuser (i.e. access all
objects, grant and revoke all privileges, etc.) but who can't break
out into the OS user account and assume it's privileges. I'm not sure
whether it's best to try to get there by creating a mode where the
superuser's privileges are trimmed back, or to get there by still
having a super-user who is just as super as at present but then also
have the ability to create other users who are not superusers but have
many of the same privileges with respect to in-database objects.

It seems to me that you've got to think not only about vectors for
executing arbitrary C code and/or shell commands, but also the
superuser's power to mess with the catalogs. If you can UPDATE
pg_proc, you can certainly hack the system, I think. But that isn't
really implied by --disable-disk-access, which makes me think that's
not really the right way of thinking about it. In my mind, it's
reasonable as a matter of security policy to decide that you don't
ever want plperlu on your system, only plperl. And it's reasonable to
decide whether or not you also need some kind of restricted super-user
facility. They're just two different issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 02:20:02PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> That's a reasonable point.  I'll go ahead an explore some options for
>> something along those lines.  A couple of questions immediately come to
>> mind.  For example, should this configuration option just cause these
>> functions to ERROR, or should it compile them out?
> 
> Letting them be present but throw error is likely to be far less
> painful than the other way, because then you don't need a separate
> set of SQL-visible object definitions.  You could, in fact, imagine
> jacking up an existing database and driving a set of locked-down
> binaries under it --- or vice versa.  If there have to be different
> versions of the extension SQL files for the two cases then everything
> gets way hairier, both for developers and users.

Agreed.  I'll do it that way.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Tom Lane
Nathan Bossart  writes:
> That's a reasonable point.  I'll go ahead an explore some options for
> something along those lines.  A couple of questions immediately come to
> mind.  For example, should this configuration option just cause these
> functions to ERROR, or should it compile them out?

Letting them be present but throw error is likely to be far less
painful than the other way, because then you don't need a separate
set of SQL-visible object definitions.  You could, in fact, imagine
jacking up an existing database and driving a set of locked-down
binaries under it --- or vice versa.  If there have to be different
versions of the extension SQL files for the two cases then everything
gets way hairier, both for developers and users.

regards, tom lane




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 01:17:08PM -0400, Tom Lane wrote:
> There would probably be some interest in a "--disable-disk-access"
> configure option that did all of this stuff (and some more things
> too), with the aim of locking down *all* known paths to filesystem
> access.  I don't see much value in retail options that do some of that.
> In fact, what they might mostly accomplish is to give people a false
> sense of security.

That's a reasonable point.  I'll go ahead an explore some options for
something along those lines.  A couple of questions immediately come to
mind.  For example, should this configuration option just cause these
functions to ERROR, or should it compile them out?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: fix typos in storing statistics in shared memory

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 10:46:22PM +0900, torikoshia wrote:
> It seems like there are some duplications of 'the' in pgstat.c and
> pgstat_internal.h.
> Attaching a tiny patch to fix them.

LGTM

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Tom Lane
Nathan Bossart  writes:
> On Fri, May 20, 2022 at 08:20:11PM -0400, Tom Lane wrote:
>> Lastly, you've offered no reason to think this would provide any real
>> security improvement.  Someone who's gained the ability to issue CREATE
>> EXTENSION on untrusted extensions has already got all the privileges he
>> needs; leaving out a few extension files is at most going to slow him
>> down a bit on the way to full filesystem access.  (See, eg, COPY TO
>> PROGRAM.)

> I'd like to provide the ability to disallow these other things, too.  This
> is intended to be a first step in that direction.

There would probably be some interest in a "--disable-disk-access"
configure option that did all of this stuff (and some more things
too), with the aim of locking down *all* known paths to filesystem
access.  I don't see much value in retail options that do some of that.
In fact, what they might mostly accomplish is to give people a false
sense of security.

regards, tom lane




Re: allow building trusted languages without the untrusted versions

2022-05-23 Thread Nathan Bossart
On Fri, May 20, 2022 at 08:20:11PM -0400, Tom Lane wrote:
> Nathan Bossart  writes:
>> Presently, if you want to only build trusted PL/Perl and PL/Tcl, you need
>> to make a couple of code changes to compile out the untrusted parts.  I
>> suspect many users (e.g., anyone who wants to disallow file system access)
>> would benefit from a better supported way to do this.  Thus, I've attached
>> some patches that introduce an optional argument for the --with-perl and
>> --with-tcl configuration options.  This new argument can be used to build
>> only the trusted or untrusted version of the language.  If the argument is
>> not provided, both the trusted and untrusted versions are built, so this
>> change is backward compatible.
> 
> I do not believe that this is worth the extra complication.  Nobody has
> ever asked for it before, so I estimate the number of people who would
> use it as near zero, and those folk are entirely capable of removing
> the relevant extension files from their installations.

Of course, if there is no interest in this feature, I'll withdraw the patch
from consideration.  However, I will note that moving the extension files
aside is not sufficient for blocking all use of untrusted languages, since
the symbols for their handler/validator functions will still be present.

> Moreover, if we accept this as a useful configure option, what other
> things will we be called on to change?  It surely makes no sense to
> install contrib/adminpack, for instance, if you're afraid of having
> plperlu installed.
> 
> Lastly, you've offered no reason to think this would provide any real
> security improvement.  Someone who's gained the ability to issue CREATE
> EXTENSION on untrusted extensions has already got all the privileges he
> needs; leaving out a few extension files is at most going to slow him
> down a bit on the way to full filesystem access.  (See, eg, COPY TO
> PROGRAM.)

I'd like to provide the ability to disallow these other things, too.  This
is intended to be a first step in that direction.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: docs: mention "pg_read_all_stats" in "track_activities" description

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 08:53:24AM +0900, Michael Paquier wrote:
> On Sun, May 22, 2022 at 01:26:08PM -0700, Nathan Bossart wrote:
>>  ... superusers, roles with privileges of the pg_read_all_stats role,
>>  and roles with privileges of the user owning the session being reported
>>  on (including the session owner).
> 
> Yeah, that sounds better to me.  monitoring.sgml has a different way
> of wording what looks like the same thing for pg_stat_xact_*_tables:
> "Ordinary users can only see all the information about their own
> sessions (sessions belonging to a role that they are a member of)".
> 
> So you could say instead something like: this information is only
> visible to superusers, roles with privileges of the pg_read_all_stats
> role, and the user owning the sessionS being reported on (including
> sessions belonging to a role that they are a member of).

I think we need to be careful about saying "member of" when we really mean
"roles with privileges of."  Unless I am mistaken, role membership alone is
not sufficient for viewing this information.  You also need to inherit the
role's privileges via INHERIT.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Add --{no-,}bypassrls flags to createuser

2022-05-23 Thread Nathan Bossart
On Mon, May 23, 2022 at 11:55:43PM +0900, Shinya Kato wrote:
> On 2022-05-23 16:29, Michael Paquier wrote:
>> May I ask for the addition of tests when one specifies multiple
>> switches for --admin and --member?  This would check the code path
>> where you build a list of role names.  You could check fancier string
>> patterns, while on it, to look after the use of fmtId(), say with
>> role names that include whitespaces or such.
> 
> Thanks!
> I changed to the test that describes multiple "-m".
> It seems to be working without any problems, how about it?

Michael also requested a test for multiple -a switches and for fancier
string patterns.  Once that is taken care of, I think this can be marked as
ready-for-committer.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com




Re: Support logical replication of DDLs

2022-05-23 Thread Zheng Li
> Now, say, the user has added a bar column with "ALTER TABLE foo ADD
> COLUMN bar double precision NOT NULL DEFAULT random();" If we compare
> with replication of DMLs like (UPDATE ddl_test SET bar = random();),
> the replication won't update rows with values (3 and 4) on subscriber
> as they don't exist on the publisher. However, if we follow the same
> here for DDL replication of Alter, it will fail because of NOT NULL
> constraint. So, it seems we should update all the existing rows on the
> subscriber to make replication of such constraints successful. It
> seems that IBM's replication solution allows replication of such DDLs
> and does update all existing rows on the target table [1][2].
>
> I think it would be tricky to update the rows in subscriber that
> doesn't exist in the publisher as we need to distinguish/find such
> rows during apply but I think we can find some way to achieve this if
> we decide to go this way.

I think the behavior that makes most sense here is to replicate the default
values of the new column for existing rows from the publisher, and generate
default values for the additional rows on the subscriber.

> We can also conclude that we want to restrict the replication of Alter
> Table for such cases but as other databases seem to support this, I
> think it is worth trying to support such an operation. If it turns out
> to be too complex or not at all feasible then we can always exclude it
> from the first version.

I think it might be OK that we exclude such cases (fail apply worker on
occurence) for the first version. I've spent some time on it and it seems
to be a high effort, low reward task at the moment.

Regards,
Zheng


Re: WIP Patch: Precalculate stable functions, infrastructure v1

2022-05-23 Thread David Geier
Hi hackers!

I would like to revive this thread. At ServiceNow we recurringly encounter
queries that are much slower than they would have to be, because of
frequent calls to uncached stable functions with constant arguments (mostly
to_date()). We've seen e.g. queries that get more than 8x faster by
temporarily changing to_date() from stable to immutable.

I would be glad to help bringing this effort forward. Was there more work
on the patch left than rebasing on latest master?
@Marina: do you have any plans to continue with this?

For reference here are all existing mailing list discussions I could find
on this topic:

- [WIP] Caching constant stable expressions per execution (Marti, 2011),
https://www.postgresql.org/message-id/flat/CABRT9RC-1wGxZC_Z5mwkdk70fgY2DRX3sLXzdP4voBKuKPZDow%40mail.gmail.com
- Caching for stable expressions with constant arguments v6 (Marti, 2012),
https://www.postgresql.org/message-id/flat/CABRT9RA-RomVS-yzQ2wUtZ=m-ev61lcbrl1p1j3jydpsttf...@mail.gmail.com
- WIP Patch: Precalculate stable functions (Marina, 2017),
https://www.postgresql.org/message-id/flat/ba261b9fc25dea4069d8ba9a8fcad...@postgrespro.ru
- WIP Patch: Precalculate stable functions, infrastructure v1 (Marina,
2017),
https://www.postgresql.org/message-id/flat/da87bb6a014e029176a04f6e50033cfb%40postgrespro.ru

--
David Geier
(ServiceNow)

On Mon, 23 May 2022 at 17:06, Andres Freund  wrote:

> On 2018-11-29 18:00:15 +0100, Dmitry Dolgov wrote:
> > > On Tue, Oct 2, 2018 at 4:22 AM Michael Paquier 
> wrote:
> > >
> > > On Thu, May 24, 2018 at 04:00:33PM +0300, Marina Polyakova wrote:
> > > > Here there's a 9-th version of the patches for the precalculation of
> stable
> > > > or immutable functions, stable or immutable operators and other
> nonvolatile
> > > > expressions. This is a try to execute cached expressions as
> PARAM_EXEC,
> > > > thanks to the comments of Tom Lane and Andres Freund [1].
> > >
> > > Please note that v9-0004 fails to apply, so a rebase is needed.  This
> > > patch is moved to next CF, waiting on author.
> >
> > Unfortunately, patch still has some conflicts, could you please post an
> updated
> > version?
>
> As nothing has happened since, I'm marking this as returned with
> feedback.
>
> Greetings,
>
> Andres Freund
>
>
>
>


Re: Add --{no-,}bypassrls flags to createuser

2022-05-23 Thread Shinya Kato

On 2022-05-23 16:29, Michael Paquier wrote:

+$node->issues_sql_like(
+   [ 'createuser', 'regress_role2', '-a', 'regress_user1' ],
+	qr/statement: CREATE ROLE regress_role2 NOSUPERUSER NOCREATEDB 
NOCREATEROLE INHERIT LOGIN ADMIN regress_user1;/,
+	'add a role as a member with admin option of the newly created 
role');

+$node->issues_sql_like(
+   [ 'createuser', 'regress_role3', '-m', 'regress_user1' ],
+	qr/statement: CREATE ROLE regress_role3 NOSUPERUSER NOCREATEDB 
NOCREATEROLE INHERIT LOGIN ROLE regress_user1;/,

+   'add a role as a member of the newly created role');


May I ask for the addition of tests when one specifies multiple
switches for --admin and --member?  This would check the code path
where you build a list of role names.  You could check fancier string
patterns, while on it, to look after the use of fmtId(), say with
role names that include whitespaces or such.


Thanks!
I changed to the test that describes multiple "-m".
It seems to be working without any problems, how about it?


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATIONdiff --git a/doc/src/sgml/ref/createuser.sgml b/doc/src/sgml/ref/createuser.sgml
index 17579e50af..27efebbfe2 100644
--- a/doc/src/sgml/ref/createuser.sgml
+++ b/doc/src/sgml/ref/createuser.sgml
@@ -76,6 +76,20 @@ PostgreSQL documentation
   
  
 
+ 
+  -a role
+  --admin=role
+  
+   
+Indicates role that will be immediately added as a member of the new
+role with admin option, giving it the right to grant membership in the
+new role to others.  Multiple roles to add as members (with admin
+option) of the new role can be specified by writing multiple
+-a switches.
+   
+  
+ 
+
  
   -c number
   --connection-limit=number
@@ -204,6 +218,18 @@ PostgreSQL documentation
   
  
 
+ 
+  -m role
+  --member=role
+  
+   
+Indicates role that will be immediately added as a member of the new
+role.  Multiple roles to add as members of the new role can be specified
+by writing multiple -m switches.
+   
+  
+ 
+
  
   -P
   --pwprompt
@@ -258,6 +284,17 @@ PostgreSQL documentation
   
  
 
+ 
+  -v timestamp
+  --valid-until=timestamp
+  
+   
+Set a date and time after which the role's password is no longer valid.
+The default is to set no password expiry date.
+   
+  
+ 
+
  
-V
--version
@@ -290,6 +327,25 @@ PostgreSQL documentation
   
  
 
+ 
+  --bypassrls
+  
+   
+The new user will bypass every row-level security (RLS) policy.
+   
+  
+ 
+
+ 
+  --no-bypassrls
+  
+   
+The new user will not bypass row-level security (RLS) policies.  This is
+the default.
+   
+  
+ 
+
  
-?
--help
diff --git a/src/bin/scripts/createuser.c b/src/bin/scripts/createuser.c
index bfba0d09d1..7e3e61a9c8 100644
--- a/src/bin/scripts/createuser.c
+++ b/src/bin/scripts/createuser.c
@@ -51,6 +51,11 @@ main(int argc, char *argv[])
 		{"connection-limit", required_argument, NULL, 'c'},
 		{"pwprompt", no_argument, NULL, 'P'},
 		{"encrypted", no_argument, NULL, 'E'},
+		{"bypassrls", no_argument, NULL, 4},
+		{"no-bypassrls", no_argument, NULL, 5},
+		{"valid-until", required_argument, NULL, 'v'},
+		{"member", required_argument, NULL, 'm'},
+		{"admin", required_argument, NULL, 'a'},
 		{NULL, 0, NULL, 0}
 	};
 
@@ -62,6 +67,8 @@ main(int argc, char *argv[])
 	char	   *port = NULL;
 	char	   *username = NULL;
 	SimpleStringList roles = {NULL, NULL};
+	SimpleStringList members = {NULL, NULL};
+	SimpleStringList admins = {NULL, NULL};
 	enum trivalue prompt_password = TRI_DEFAULT;
 	ConnParams	cparams;
 	bool		echo = false;
@@ -69,6 +76,7 @@ main(int argc, char *argv[])
 	int			conn_limit = -2;	/* less than minimum valid value */
 	bool		pwprompt = false;
 	char	   *newpassword = NULL;
+	char	   *pwexpiry = NULL;
 
 	/* Tri-valued variables.  */
 	enum trivalue createdb = TRI_DEFAULT,
@@ -76,7 +84,8 @@ main(int argc, char *argv[])
 createrole = TRI_DEFAULT,
 inherit = TRI_DEFAULT,
 login = TRI_DEFAULT,
-replication = TRI_DEFAULT;
+replication = TRI_DEFAULT,
+bypassrls = TRI_DEFAULT;
 
 	PQExpBufferData sql;
 
@@ -89,7 +98,7 @@ main(int argc, char *argv[])
 
 	handle_help_version_opts(argc, argv, "createuser", help);
 
-	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PE",
+	while ((c = getopt_long(argc, argv, "h:p:U:g:wWedDsSrRiIlLc:PEv:m:a:",
 			long_options, &optindex)) != -1)
 	{
 		switch (c)
@@ -165,6 +174,21 @@ main(int argc, char *argv[])
 			case 3:
 interactive = true;
 break;
+			case 4:
+bypassrls = TRI_YES;
+break;
+			case 5:
+bypassrls = TRI_NO;
+break;
+			case 'v':
+pwexp

Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs

2022-05-23 Thread Tom Lane
Bharath Rupireddy  writes:
> Thanks. Having check hooks for min_wal_size and max_wal_size that
> throw errors if they aren't at least twice the wal_segment_size has a
> "BIG" problem - ./initdb -D data --wal-segsize=1 (or a value < 16)
> fails.

In general, you can't do that (i.e. try to enforce constraints between
GUC values via check hooks).  It's been tried in the past and failed
miserably, because the hooks can't know whether the other value is
about to be changed.

regards, tom lane




fix typos in storing statistics in shared memory

2022-05-23 Thread torikoshia

Hi,

It seems like there are some duplications of 'the' in pgstat.c and 
pgstat_internal.h.

Attaching a tiny patch to fix them.

--
Regards,

--
Atsushi Torikoshi
NTT DATA CORPORATIONFrom fabf3bc5da8e5ebd4f600b6ba762d4413cbdcd6b Mon Sep 17 00:00:00 2001
From: Atsushi Torikoshi 
Date: Mon, 23 May 2022 17:11:54 +0900
Subject: [PATCH v1] Fix duplications of 'the' in pgstat.c and
 pgstat_internal.h

---
 src/backend/utils/activity/pgstat.c | 4 ++--
 src/include/utils/pgstat_internal.h | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/src/backend/utils/activity/pgstat.c b/src/backend/utils/activity/pgstat.c
index 3c3fd0e9b7..05f2d9e9bd 100644
--- a/src/backend/utils/activity/pgstat.c
+++ b/src/backend/utils/activity/pgstat.c
@@ -32,9 +32,9 @@
  * backend-local hashtable (pgStatEntryRefHash) in front of the shared
  * hashtable, containing references (PgStat_EntryRef) to shared hashtable
  * entries. The shared hashtable only needs to be accessed when no prior
- * reference is found in the local hashtable. Besides pointing to the the
+ * reference is found in the local hashtable. Besides pointing to the
  * shared hashtable entry (PgStatShared_HashEntry) PgStat_EntryRef also
- * contains a pointer to the the shared statistics data, as a process-local
+ * contains a pointer to the shared statistics data, as a process-local
  * address, to reduce access costs.
  *
  * The names for structs stored in shared memory are prefixed with
diff --git a/src/include/utils/pgstat_internal.h b/src/include/utils/pgstat_internal.h
index 0eccaea6c1..7d13310fe6 100644
--- a/src/include/utils/pgstat_internal.h
+++ b/src/include/utils/pgstat_internal.h
@@ -296,9 +296,9 @@ static const char *const slru_names[] = {
  * values in a copy of the stats data, which is protected by ->lock. See
  * pgstat_fetch_stat_(archiver|bgwriter|checkpointer) for the reader side.
  *
- * The only exception to that is the the stat_reset_timestamp in these
- * structs, which is protected by ->lock, because it has to be written by
- * another backend while resetting
+ * The only exception to that is the stat_reset_timestamp in these structs,
+ * which is protected by ->lock, because it has to be written by another
+ * backend while resetting
  * --
  */
 
-- 
2.27.0



Re: Zstandard support for toast compression

2022-05-23 Thread Robert Haas
On Fri, May 20, 2022 at 4:17 PM Stephen Frost  wrote:
> A thought I've had before is that it'd be nice to specify a particular
> compression method on a data type basis.  Wasn't the direction that this
> was taken, for reasons, but I wonder about perhaps still having a data
> type compression method and perhaps one of these bits might be "the data
> type's (default?) compression method".  Even so though, having an
> extensible way to add new compression methods would be a good thing.

If we look at pglz vs. LZ4, there's no argument that it makes more
sense to use LZ4 for some data types and PGLZ for others. Indeed, it's
unclear why you would ever use PGLZ if you had LZ4 as an option. Even
if we imagine a world in which a full spectrum of modern compressors -
Zstandard, bzip2, gzip, and whatever else you want - it's basically a
time/space tradeoff. You will either want a fast compressor or a good
one.

The situation in which this sort of thing might make sense is if we
had a compressor that is specifically designed to work well on a
certain data type, and especially if the code for that data type could
perform some operations directly on the compressed representation.
>From what I understand, the ideas that people have in this area around
jsonb require that there be a dictionary available. For instance, you
might scan a jsonb column, collect all the keys that occur frequently,
put them in a dictionary, and then use them to compress the column. I
can see that being effective, but the infrastructure to store that
dictionary someplace is infrastructure we have not got.

It may be better to try to handle these use cases by building the
compression into the data type representation proper, perhaps
disabling the general-purpose TOAST compression stuff, rather than by
making it part of TOAST compression. We found during the
implementation of LZ4 TOAST compression that it's basically impossible
to keep a compressed datum from "leaking out" into other parts of the
system. We have to assume that any datum we create by TOAST
compression may continue to exist somewhere in the system long after
the table in which it was originally stored is gone. So, while a
dictionary could be used for compression, it would have to be done in
a way where that dictionary wasn't required to decompress, unless
we're prepared to prohibit ever dropping a dictionary, which sounds
like not a lot of fun. If the compression were part of the data type
instead of part of TOAST compression, we would dodge this problem.

I think that might be a better way to go.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Zstandard support for toast compression

2022-05-23 Thread Robert Haas
On Mon, May 23, 2022 at 12:33 AM Michael Paquier  wrote:
> On Thu, May 19, 2022 at 04:12:01PM -0400, Robert Haas wrote:
> > On Thu, May 19, 2022 at 4:20 AM Michael Paquier  wrote:
> >> Btw, shouldn't we have something a bit more, err, extensible for the
> >> design of an extensible varlena header?  If we keep it down to some
> >> bitwise information, we'd be fine for a long time but it would be
> >> annoying to review again an extended design if we need to extend it
> >> with more data.
> >
> > What do you have in mind?
>
> A per-varlena checksum was one thing that came into my mind.

It's a bit hard for me to believe that such a thing would be
desirable. I think it makes more sense to checksum blocks than datums,
because:

(1) There might be a lot of really small datums, and storing checksums
for all of them could be costly, or
(2) The datums could on the other hand be really big, and then the
checksum is pretty non-specific about where the problem has happened.

YMMV, of course.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: Limiting memory allocation

2022-05-23 Thread Jan Wieck

On 5/20/22 19:08, Tomas Vondra wrote:

Well, we already have the memory-accounting built into the memory
context infrastructure. It kinda does the same thing as the malloc()
wrapper, except that it does not publish the information anywhere and
it's per-context (so we have to walk the context recursively).

So maybe we could make this on-request somehow? Say, we'd a signal to
the process, and it'd run MemoryContextMemAllocated() on the top memory
context and store the result somewhere.


One remaining problem with all this is that we don't get any feedback 
from calling free() telling if any memory has been returned to the OS or 
not.


How portable would using sbrk() with a zero size be? If that is an 
option then I could envision counting actual calls to malloc() and 
whenever a GUC configurable amount is reached, use sbrk() to find out, 
do the accounting in shared memory and react accordingly.


This way we are not creating another highly contended lock and use 
authoritative information.



Regards, Jan




Re: Enforce "max_wal_size/ min_wal_size must be at least twice wal_segment_size" limit while setting GUCs

2022-05-23 Thread Bharath Rupireddy
On Mon, May 23, 2022 at 10:45 AM Kyotaro Horiguchi
 wrote:
>
> At Sat, 21 May 2022 19:08:06 +0530, Bharath Rupireddy 
>  wrote in
> > How about we add GUC check hooks for both max_wal_size and
> > min_wal_size where we can either emit ERROR or WARNING if values are
> > not "at least twice as wal_segment_size"?
>
> It should be ERROR.
>
> As you say, it should have been changed when the unit of them is
> changed to MB and wal_segment_size became variable.

Thanks. Having check hooks for min_wal_size and max_wal_size that
throw errors if they aren't at least twice the wal_segment_size has a
"BIG" problem - ./initdb -D data --wal-segsize=1 (or a value < 16)
fails. This is because during the bootstrap mode the min_wal_size is
calculated using the supplied wal-segsize and written to
postgresql.conf file, but in the "post-bootstrap initialization" in
single user mode, the min_wal_size's default value is calculated as
80MB using default wal_segment_size 16MB
(PostmasterMain->InitializeGUCOptions->InitializeOneGUCOption->check_hook)
as wal_segment_size isn't read from control file yet, see [1] and [2]
for reference.

Maybe we have a fundamental problem here that in single user mode we
aren't reading control file. I have no further thoughts to offer at
this moment

[1]
foobaralicebob@foobaralicebob:~/postgres/inst/bin$ ./initdb -D data
--wal-segsize=1
The files belonging to this database system will be owned by user
"foobaralicebob".
This user must also own the server process.

The database cluster will be initialized with locale "C.UTF-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... 2022-05-23 11:57:35.999 GMT [3277331]
LOG:  min_wal_size 80, wal_segment_size 16777216
2022-05-23 11:57:36.000 GMT [3277331] LOG:  min_wal_size 5,
wal_segment_size 1048576
ok
performing post-bootstrap initialization ... 2022-05-23 11:57:36.178
GMT [3277333] LOG:  min_wal_size 80, wal_segment_size 16777216
2022-05-23 11:57:36.179 GMT [3277333] LOG:  min_wal_size 5,
wal_segment_size 16777216
2022-05-23 11:57:36.179 GMT [3277333] LOG:  "min_wal_size" must be at
least twice "wal_segment_size"
2022-05-23 11:57:36.179 UTC [3277333] FATAL:  configuration file
"/home/foobaralicebob/postgres/inst/bin/data/postgresql.conf" contains
errors
child process exited with exit code 1
initdb: removing data directory "data"

[2]
(gdb) bt
#0  0x7f879105cfaa in __GI___select (nfds=0, readfds=0x0,
writefds=0x0, exceptfds=0x0, timeout=0x7ffd31e040c0) at
../sysdeps/unix/sysv/linux/select.c:41
#1  0x556cee068326 in pg_usleep (microsec=100) at pgsleep.c:56
#2  0x556ced9cc06e in check_min_wal_size (newval=0x7ffd31e04240,
extra=0x7ffd31e04248, source=PGC_S_FILE) at xlog.c:4327
#3  0x556cee016e2e in call_int_check_hook (conf=0x556cee365c58
, newval=0x7ffd31e04240, extra=0x7ffd31e04248,
source=PGC_S_FILE, elevel=15) at guc.c:11786
#4  0x556cee00eb28 in parse_and_validate_value
(record=0x556cee365c58 , name=0x556cef5a9778
"min_wal_size", value=0x556cef5a97a0 "5MB", source=PGC_S_FILE,
elevel=15, newval=0x7ffd31e04240, newextra=0x7ffd31e04248) at guc.c:7413
#5  0x556cee00f908 in set_config_option (name=0x556cef5a9778
"min_wal_size", value=0x556cef5a97a0 "5MB", context=PGC_POSTMASTER,
source=PGC_S_FILE, action=GUC_ACTION_SET,
changeVal=true, elevel=15, is_reload=false) at guc.c:7922
#6  0x556cee01b1b2 in ProcessConfigFileInternal
(context=PGC_POSTMASTER, applySettings=true, elevel=15) at
guc-file.l:441
#7  0x556cee01ab2d in ProcessConfigFile (context=PGC_POSTMASTER)
at guc-file.l:155
#8  0x556cee00c859 in SelectConfigFiles (userDoption=0x0,
progname=0x556cef584eb0 "postgres") at guc.c:6196
#9  0x556cede249c6 in PostgresSingleUserMain (argc=12,
argv=0x556cef585800, username=0x556cef58cc60 "foobaralicebob") at
postgres.c:3991
#10 0x556cedc34a72 in main (argc=12, argv=0x556cef585800) at main.c:199
(gdb)

Regards,
Bharath Rupireddy.




Re: Support logical replication of DDLs

2022-05-23 Thread Amit Kapila
On Thu, Apr 14, 2022 at 7:45 PM Euler Taveira  wrote:
>
> You should forbid it. Unless you can decompose the command into multiple SQL
> commands to make it a safe operation for logical replication.
>
> Let's say you want to add a column with a volatile default.
>
> ALTER TABLE foo ADD COLUMN bar double precision DEFAULT random();
>
> If you replicate the DDL command as is, you will have different data
> downstream. You should forbid it. However, this operation can be supported if
> the DDL command is decomposed in multiple steps.
>
> -- add a new column without DEFAULT to avoid rewrite
> ALTER TABLE foo ADD COLUMN bar double precision;
>
> -- future rows could use the DEFAULT expression
> -- it also doesn't rewrite the table
> ALTER TABLE foo ALTER COLUMN bar SET DEFAULT random();
>
> -- it effectively rewrites the table
> -- all rows are built from one source node
> -- data will be the same on all nodes
> UPDATE foo SET bar = random();
>

While thinking about this, I see more to it than this. Say, we are
able to decompose/split the DDL command with the help of deparser, do
we want to update the additional rows on the subscriber that didn't
exist on the publisher? For example,

A table on the publisher side has rows:
ddl_test(foo)
a

1
2

The same table on the subscriber side has rows:
ddl_test(foo)
a

1
2
3
4

Now, say, the user has added a bar column with "ALTER TABLE foo ADD
COLUMN bar double precision NOT NULL DEFAULT random();" If we compare
with replication of DMLs like (UPDATE ddl_test SET bar = random();),
the replication won't update rows with values (3 and 4) on subscriber
as they don't exist on the publisher. However, if we follow the same
here for DDL replication of Alter, it will fail because of NOT NULL
constraint. So, it seems we should update all the existing rows on the
subscriber to make replication of such constraints successful. It
seems that IBM's replication solution allows replication of such DDLs
and does update all existing rows on the target table [1][2].

I think it would be tricky to update the rows in subscriber that
doesn't exist in the publisher as we need to distinguish/find such
rows during apply but I think we can find some way to achieve this if
we decide to go this way.

We can also conclude that we want to restrict the replication of Alter
Table for such cases but as other databases seem to support this, I
think it is worth trying to support such an operation. If it turns out
to be too complex or not at all feasible then we can always exclude it
from the first version.

Thoughts?

[1] - 
https://www.ibm.com/docs/en/idr/10.2.1?topic=replication-adding-existing-columns-subscription-unidirectional
(... Columns are added to the target table with the same data type,
null characteristic, and default value as the matching columns in the
source table... Rows that existed in the target table before the new
column is added will have a NULL or default value for the new column.
[2] - 
https://www.ibm.com/docs/en/idr/11.4.0?topic=replication-alter-add-column-command-multidirectional

-- 
With Regards,
Amit Kapila.




Re: Allow file inclusion in pg_hba and pg_ident files

2022-05-23 Thread Julien Rouhaud
Hi,

On Mon, May 23, 2022 at 03:53:06PM +0900, Michael Paquier wrote:
>
> +   switch (kind)
> +   {
> +   case SecondaryAuthFile:
> +   msglog = "could not open secondary authentication file 
> \"@%s\" as \"%s\": %m";
> +   msgview = "could not open secondary authentication file 
> \"@%s\" as \"%s\": %s";
> +   break;
> +   case IncludedAuthFile:
> +   msglog = "could not open included authentication file \"%s\" 
> as \"%s\": %m";
> +   msgview = "could not open included authentication file \"%s\" 
> as \"%s\": %s";
> +   break;
> +   default:
> +   elog(ERROR, "unknown HbaIncludeKind: %d", kind);
> +   break;
> +   }
> I don't think that HbaIncludeKind is necessary, considering that we
> could rely on the file name to provide enough context about the type
> involved in the error string generated.

I'm not sure it would always be the case.  For instance, why wouldn't someone
use something like

peer all @app_users ident

rather than

@include app_users

or a mix of both?

That being said, I'd gladly drop that enum and only handle a single error
message, as the rest of the error context (including the owning file name and
line) should provide enough information to users.

If so, should I use "included authentication file" everywhere, or use something
else?

> The "default" clause in the
> switch could just be removed, btw, to generate warnings if a new value
> is added in the kind enum.

Right.

> +   /* relative path is relative to dir of calling file */
> There are many relatives here.  It seems to me that this is the kind
> of behavior we should document precisely

Agreed.  I will try to mimic how it's done for
https://www.postgresql.org/docs/current/config-setting.html#CONFIG-INCLUDES.

> (and test!).

Sure.  As I said previously I'm waiting for some agreement on the feature (and
the syntax), before adding tests.  After that I will definitely include full
test coverage!

> I am
> understanding the following for cascading configurations:
> - pg_hba.conf has "include_dir path1".
> - path1/ has file1.conf that has "include file2.conf".  This means
> that file2.conf has to be also included in path1/.

Right, same rules as for postgresql.conf's include_dir.
>
> postmaster.c and postinit.c do that:
> /*
>  * Load configuration files for client authentication.
>  */
> if (!load_hba())
> {
> /*
>  * It makes no sense to continue if we fail to load the HBA file,
>  * since there is no way to connect to the database in this case.
>  */
> ereport(FATAL,
> (errmsg("could not load pg_hba.conf")));
> }
> This could be confusing as a different file may fail to load, while
> pg_hba.conf was able to do its work outside an include clause.

Well, even if included they are still part of the configuration and lead to a
failure to load the main file.

The detail should be provided in the logs so it should disambiguate the
situation.  While you're mentioning this message, AFAICS it can already be
entirely wrong as-is, as it doesn't take into account the hba_file
configuration:

ALTER SYSTEM SET hba_file = '/tmp/myfile';

After a restart:
2022-05-23 18:52:08.081 CST [41890] LOG:  could not open configuration file
"/tmp/myfile": No such file or directory
2022-05-23 18:52:08.081 CST [41890] FATAL:  could not load pg_hba.conf

> How does include_dir work with the ordering of the HBA entries across
> multiple files?

Bugs apart, same as for postgresql.conf's include_dir.

> The set of HBA rules are very sensitive to their
> ordering, and ReadDir() depends on readdir(), which provides a list
> of files depending on its FS implementation.  That does not seem like
> a sane concept to me.  I can this this order (tracked by rule_number)
> being strictly enforced when it comes to the loading of files, though,
> so I would recommend to focus on the implementation of "include"
> rather than "include_dir".

But the same problem already exists for the postgresql.conf's include_dir.

Having an hba rule masking another isn't really better than having a GUC
value overloaded by another one.  Usually people rely on a sane naming (like
01_file.conf and so on) to get a predictable behavior, and we even document
that very thoroughly for the postgresql.conf part.  Improving the
documentation, as you already pointed out a bit above, should be enough?

> +  
> +   Rule number, in priority order, of this rule if the rule is valid,
> +   otherwise null
> +  
> This is a very important field.  I think that this explanation should
> be extended and explain why relying on this number counts (aka this is
> the order of the rules loaded across files).  Btw, this could be added
> as a separate patch, even if this maps to the line number when it
> comes to the ordering.

Agreed, I will improve the documentation to outline the import

Re: create_help.pl treats as replaceable

2022-05-23 Thread Peter Eisentraut

On 19.05.22 04:12, Kyotaro Horiguchi wrote:

At Wed, 18 May 2022 18:23:57 +0200, Peter Eisentraut 
 wrote in

"Boolean" is correct; see  for
example.


Ok, so, don't we in turn need to replace "boolean"s with "Boolean"?

"only boolean operators can have negators"
"only boolean operators can have restriction selectivity"
...

And I'm not sure how to do with "bool". Should it be "Boolean" instead
from the point of uniformity?

errmsg("only bool, numeric, and text types could be "


The SQL data type is called BOOLEAN, and we typically lower-case type 
names in PostgreSQL, so messages should be like


column %s should be of type integer
column %s should be of type boolean

As an adjective, not a type, it should be spelled Boolean, because 
that's how it's in the dictionary (cf. Gaussian).


%s should have a string value
%s should have a Boolean value

"bool" should normally not appear in user-facing messages, unless we are 
dealing with internal type names (cf. int4) or C types.


Of course, the lines between all of the above are blurry.




Re: create_help.pl treats as replaceable

2022-05-23 Thread Peter Eisentraut

On 18.05.22 18:29, Tom Lane wrote:

I think you should have upper-cased MATCH while at it, to make it clear
that it acts like a keyword in this context.  The current situation is
quite unreadable in plain-ASCII output:

regression=# \help copy
Command: COPY
...
 HEADER [ boolean | match ]
...

Since "boolean" is a metasyntactic variable here, it's absolutely
not obvious that "match" isn't.


done




Re: Frontend error logging style

2022-05-23 Thread Peter Eisentraut

On 29.03.22 16:24, Peter Eisentraut wrote:
I think I would want the program name/location also in front of the 
detail and hint lines.  I need to think about this a bit more.  This 
shouldn't hold up this patch; it would be a quick localized change.


After experiencing this for a bit now, I propose to make this change.
It lines up better and allows easier filtering of messages by program.

Example:

Before:

initdb: error: locale "zh_HK.Big5HKSCS" requires unsupported encoding "BIG5"
detail: Encoding "BIG5" is not allowed as a server-side encoding.
hint: Rerun initdb with a different locale selection.

After:

initdb: error: locale "zh_HK.Big5HKSCS" requires unsupported encoding "BIG5"
initdb: detail: Encoding "BIG5" is not allowed as a server-side encoding.
initdb: hint: Rerun initdb with a different locale selection.From 565fe56ebf9449cd9143424ef1b8083d99019d75 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Mon, 23 May 2022 11:44:32 +0200
Subject: [PATCH] logging: Also add the command prefix to detail and hint
 messages

This makes the output line up better and allows filtering messages by
command.
---
 src/bin/pg_dump/t/003_pg_dump_with_server.pl | 2 +-
 src/common/logging.c | 3 +--
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/src/bin/pg_dump/t/003_pg_dump_with_server.pl 
b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
index a0b23aae0f..8cc9da0659 100644
--- a/src/bin/pg_dump/t/003_pg_dump_with_server.pl
+++ b/src/bin/pg_dump/t/003_pg_dump_with_server.pl
@@ -30,7 +30,7 @@
 
 command_fails_like(
[ "pg_dump", '-p', $port, '--include-foreign-data=s0', 'postgres' ],
-   qr/foreign-data wrapper \"dummy\" has no handler\r?\ndetail: Query was: 
.*t0/,
+   qr/foreign-data wrapper \"dummy\" has no handler\r?\npg_dump: detail: 
Query was: .*t0/,
"correctly fails to dump a foreign table from a dummy FDW");
 
 command_ok(
diff --git a/src/common/logging.c b/src/common/logging.c
index 0b5bcb1a17..64604c5209 100644
--- a/src/common/logging.c
+++ b/src/common/logging.c
@@ -246,8 +246,7 @@ pg_log_generic_v(enum pg_log_level level, enum pg_log_part 
part,
 
fmt = _(fmt);
 
-   if (part == PG_LOG_PRIMARY &&
-   (!(log_flags & PG_LOG_FLAG_TERSE) || filename))
+   if (!(log_flags & PG_LOG_FLAG_TERSE) || filename)
{
if (sgr_locus)
fprintf(stderr, ANSI_ESCAPE_FMT, sgr_locus);
-- 
2.36.1



Re: psql now shows zero elapsed time after an error

2022-05-23 Thread Fabien COELHO



Probably it would be appropriate to add a test case. I'll propose something 
later.


committed with a test


Thanks!

--
Fabien.




Re: PG15 beta1 fix pg_database view document

2022-05-23 Thread Peter Eisentraut

On 23.05.22 06:25, Michael Paquier wrote:

Indeed.  I have checked the rest of the catalog headers for any
inconsistencies with the docs and what you have noticed here is the
only one.


Committed.  Thanks for checking.


+   datlocprovider char
+  
+  
+   Provider of the collation for this database:
c = libc, i = icu
+  

ICU needs to be upper-case if you are referring to the library name.
Here, my guess is that you are referring to the value that can be
passed down to the command?  You could use lower-case terms like on
the CREATE COLLATION page, but I would put these within a 
markup.  Anyway, this formulation is incorrect.


This is presumably copied from pg_collation.collprovider, which uses the 
same markup.  If we want to improve it, they should be changed together.





Use Dblink without column defination

2022-05-23 Thread Chirag Karkera
Hi Team,

Appreciate your time to look into this.

To select from another database I try to use dblink or fdw extension of
Postgres, like this:

mesods =>CREATE EXTENSION dblink;

CREATE EXTENSION

mesods => CREATE EXTENSION postgres_fdw;

CREATE EXTENSION

mesods=> select dblink_connect('conn_db_link','foreign_server') ;

 dblink_connect



 OK

(1 row)



mesods=> select * from dblink('foreign_server','select * from ods_sch.emp')
AS x(a int,b text);

 a |b

---+-

 1 | Gohan

 1 | Piccolo

 1 | Tien

(3 rows)


This works fine when I specify which columns I want to select.


Is there something that postgres has without specifying the column names we
can fetch the data from dblink.


Awaiting your reply.


Thank you.


Regards,

Chirag Karkera


Re: psql now shows zero elapsed time after an error

2022-05-23 Thread Peter Eisentraut

On 10.05.22 15:42, Fabien COELHO wrote:


Hello,

Thanks for the catch and the proposed fix! Indeed, on errors the timing 
is not updated appropriately.


ISTM that the best course is to update the elapsed time whenever a 
result is obtained, so that a sensible value is always available.


See attached patch which is a variant of Richard's version.

  fabien=# SELECT 1 as one \; SELECT 1/0 \; SELECT 2 as two;
  ┌─┐
  │ one │
  ├─┤
  │   1 │
  └─┘
  (1 row)

  ERROR:  division by zero
  Time: 0,352 ms

Probably it would be appropriate to add a test case. I'll propose 
something later.


committed with a test





Re: Add --{no-,}bypassrls flags to createuser

2022-05-23 Thread Michael Paquier
On Fri, May 20, 2022 at 02:45:19PM -0700, Nathan Bossart wrote:
> Thanks for the new patch!  I attached a new version with a few small
> changes.  What do you think?

So you have settled down to --member to emulate the clause ROLE.
Well, this choice is fine by me at the end.

> +$node->issues_sql_like(
> + [ 'createuser', 'regress_role2', '-a', 'regress_user1' ],
> + qr/statement: CREATE ROLE regress_role2 NOSUPERUSER NOCREATEDB 
> NOCREATEROLE INHERIT LOGIN ADMIN regress_user1;/,
> + 'add a role as a member with admin option of the newly created role');
> +$node->issues_sql_like(
> + [ 'createuser', 'regress_role3', '-m', 'regress_user1' ],
> + qr/statement: CREATE ROLE regress_role3 NOSUPERUSER NOCREATEDB 
> NOCREATEROLE INHERIT LOGIN ROLE regress_user1;/,
> + 'add a role as a member of the newly created role');

May I ask for the addition of tests when one specifies multiple
switches for --admin and --member?  This would check the code path
where you build a list of role names.  You could check fancier string
patterns, while on it, to look after the use of fmtId(), say with
role names that include whitespaces or such.
--
Michael


signature.asc
Description: PGP signature