Re: Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-17 Thread Drouvot, Bertrand

Hi,

On 10/17/22 4:07 AM, Michael Paquier wrote:

On Sat, Oct 15, 2022 at 07:54:30AM +0200, Drouvot, Bertrand wrote:

Right. Giving a second thought to the non matching case, I think I'd prefer
to concatenate the system_user to the system_user instead. This is what v2
does.


Fine by me, so applied v2.  Thanks!


Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/15/22 5:11 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 06:31:15PM +0200, Drouvot, Bertrand wrote:

while working on [1], I thought it could also be useful to add regular
expression testing for user name mapping in the peer authentication TAP
test.


Good idea now that we have a bit more coverage in the authentication
tests.


Thanks for looking at it!


+# Test with regular expression in user name map.
+my $last_system_user_char = substr($system_user, -1);


This would attach to the regex the last character of the system user.


Right.


I would perhaps have used more characters than that (-3?), as substr()
with a negative number larger than the string given in input would
give the entire string.  That's a nit, though.


I don't have a strong opinion on this, so let's extract the last 3 
characters. This is what v2 attached does.





+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser');


This matches only an empty string, my brain gets that right?


Right. Giving a second thought to the non matching case, I think I'd 
prefer to concatenate the system_user to the system_user instead. This 
is what v2 does.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
index fc951dea06..e719b05d0f 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -23,18 +23,34 @@ sub reset_pg_hba
return;
 }
 
+# Delete pg_ident.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_ident
+{
+   my $node= shift;
+   my $map_name= shift;
+   my $system_user = shift;
+   my $pg_user = shift;
+
+   unlink($node->data_dir . '/pg_ident.conf');
+   $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user");
+   $node->reload;
+   return;
+}
+
 # Test access for a single role, useful to wrap all tests into one.
 sub test_role
 {
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my ($node, $role, $method, $expected_res, %params) = @_;
+   my ($node, $role, $method, $expected_res, $test_details, %params) = @_;
my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
my $connstr = "user=$role";
my $testname =
- "authentication $status_string for method $method, role $role";
+ "authentication $status_string for method $method, role $role "
+ . $test_details;
 
if ($expected_res eq 0)
{
@@ -87,16 +103,50 @@ my $system_user =
 # Tests without the user name map.
 # Failure as connection is attempted with a database role not mapping
 # to an authorized system user.
-test_role($node, qq{testmapuser}, 'peer', 2,
+test_role(
+   $node, qq{testmapuser}, 'peer', 2,
+   'without user name map',
log_like => [qr/Peer authentication failed for user "testmapuser"/]);
 
 # Tests with a user name map.
-$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser});
+reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser');
 reset_pg_hba($node, 'peer map=mypeermap');
 
 # Success as the database role matches with the system user in the map.
-test_role($node, qq{testmapuser}, 'peer', 0,
+test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map',
log_like =>
  [qr/connection authenticated: identity="$system_user" method=peer/]);
 
+# Test with regular expression in user name map.
+# Extract the last 3 characters from the system_user
+# or the entire system_user (if its length is <= -3).
+my $regex_test_string = substr($system_user, -3);
+
+# The regular expression matches.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   0,
+   'with regular expression in user name map',
+   log_like =>
+ [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+
+# Concatenate system_user to system_user.
+$regex_test_string = $system_user . $system_user;
+
+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   2,
+   'with regular expression in user name map',
+   log_like => [qr/no match in usermap "mypeermap" for user 
"testmapuser"/]);
+
 done_testing();


Add regular expression testing for user name mapping in the peer authentication TAP test

2022-10-14 Thread Drouvot, Bertrand

Hi hackers,

while working on [1], I thought it could also be useful to add regular 
expression testing for user name mapping in the peer authentication TAP 
test.


This kind of test already exists in kerberos/t/001_auth.pl but the 
proposed one in the peer authentication testing would probably be more 
widely tested.


Please find attached a patch proposal to do so.

[1]: 
https://www.postgresql.org/message-id/4f55303e-62c1-1072-61db-fbfb30bd66c8%40gmail.com


Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
index fc951dea06..4e2efbe5e3 100644
--- a/src/test/authentication/t/003_peer.pl
+++ b/src/test/authentication/t/003_peer.pl
@@ -23,18 +23,34 @@ sub reset_pg_hba
return;
 }
 
+# Delete pg_ident.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_ident
+{
+   my $node= shift;
+   my $map_name= shift;
+   my $system_user = shift;
+   my $pg_user = shift;
+
+   unlink($node->data_dir . '/pg_ident.conf');
+   $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user");
+   $node->reload;
+   return;
+}
+
 # Test access for a single role, useful to wrap all tests into one.
 sub test_role
 {
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my ($node, $role, $method, $expected_res, %params) = @_;
+   my ($node, $role, $method, $expected_res, $test_details, %params) = @_;
my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
my $connstr = "user=$role";
my $testname =
- "authentication $status_string for method $method, role $role";
+ "authentication $status_string for method $method, role $role "
+ . $test_details;
 
if ($expected_res eq 0)
{
@@ -87,16 +103,43 @@ my $system_user =
 # Tests without the user name map.
 # Failure as connection is attempted with a database role not mapping
 # to an authorized system user.
-test_role($node, qq{testmapuser}, 'peer', 2,
+test_role(
+   $node, qq{testmapuser}, 'peer', 2,
+   'without user name map',
log_like => [qr/Peer authentication failed for user "testmapuser"/]);
 
 # Tests with a user name map.
-$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser});
+reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser');
 reset_pg_hba($node, 'peer map=mypeermap');
 
 # Success as the database role matches with the system user in the map.
-test_role($node, qq{testmapuser}, 'peer', 0,
+test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map',
log_like =>
  [qr/connection authenticated: identity="$system_user" method=peer/]);
 
+# Test with regular expression in user name map.
+my $last_system_user_char = substr($system_user, -1);
+
+# The regular expression matches.
+reset_pg_ident($node, 'mypeermap', qq{/^.*$last_system_user_char\$},
+   'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   0,
+   'with regular expression in user name map',
+   log_like =>
+ [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+# The regular expression does not match.
+reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser');
+test_role(
+   $node,
+   qq{testmapuser},
+   'peer',
+   2,
+   'with regular expression in user name map',
+   log_like => [qr/no match in usermap "mypeermap" for user 
"testmapuser"/]);
+
 done_testing();


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)


So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code,


Thanks!


The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--


I agree to work step-by-step.

While looking at it again now, I discovered that the new TAP test for 
the regexp on the hostname in ssl/002_scram.pl is failing on some of my 
tests environment (and not all..).


So, I agree with the dedicated steps you are proposing and that the 
"host case" needs a dedicated attention.


I'm not ignoring all the remarks you've just done up-thread, I'll 
address them and/or provide my feedback on them when I'll come back with 
the step-by-step sub patches.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 8:18 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.


I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it. 


Yeah, I also think this is the right place for it.


Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?


I agree, that makes sense. I'll work on that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-12 Thread Drouvot, Bertrand

Hi,

On 10/11/22 8:29 AM, Michael Paquier wrote:

On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:

foreach(cell, tokens)
{
[...]
+   tokreg = lfirst(cell);
+   if (!token_is_regexp(tokreg))
{
-   if (strcmp(dbname, role) == 0)
+   if (am_walsender && !am_db_walsender)
+   {
+   /*
+* physical replication walsender connections 
can only match
+* replication keyword
+*/
+   if (token_is_keyword(tokreg->authtoken, 
"replication"))
+   return true;
+   }
+   else if (token_is_keyword(tokreg->authtoken, "all"))
return true;


When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no?  


Oh right, good catch, thanks! Please find attached v6 fixing it.



This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced.  WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).



Thanks for the explanation!


+typedef struct AuthToken
+{
+   char   *string;
+   boolquoted;
+} AuthToken;
+
+/*
+ * Distinguish the case a token has to be treated as a regular
+ * expression or not.
+ */
+typedef struct AuthTokenOrRegex
+{
+   boolis_regex;
+
+   /*
+* Not an union as we still need the token string for fill_hba_line().
+*/
+   AuthToken  *authtoken;
+   regex_t*regex;
+} AuthTokenOrRegex;


Hmm.  With is_regex to check if a regex_t exists, both structures may
not be necessary. 


Agree that both struct are not necessary. In v6, AuthTokenOrRegex has 
been removed and the regex has been moved to AuthToken. There is no 
is_regex bool anymore, as it's enough to test whether regex is NULL or not.



I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?


The patch does allow that. For example it happens for the test where we 
add a comma in the role name. As we don't rely on a dedicated char to 
mark the end of a reg exp (we only rely on / to mark its start) then 
allowing (regex_t != NULL && quoted) seems reasonable to me.



+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+   0);


So, we check here that the role includes "5," in its name.  This is
getting fun to parse ;)



Indeed, ;-)



  elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
  {
-   plan skip_all => 'Potentially unsafe test SSL not enabled in 
PG_TEST_EXTRA';
+   plan skip_all =>
+ 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
  }


Unrelated noise from perltidy.


Right.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+

Re: Query Jumbling for CALL and SET utility statements

2022-10-10 Thread Drouvot, Bertrand

Hi,

On 10/7/22 6:13 AM, Michael Paquier wrote:

On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote:

I've been thinking since the beginning of this thread that there
was no coherent, defensible rationale being offered for jumbling
some utility statements and not others.


Yeah.  The potential performance impact of all the TransactionStmts
worries me a bit, though.


I wonder if the answer is to jumble them all.  We avoided that
up to now because it would imply a ton of manual effort and
future code maintenance ... but now that the backend/nodes/
infrastructure is largely auto-generated, could we auto-generate
the jumbling code?


I think that's a good idea.



Probably.  One part that may be tricky though is the location of the
constants we'd like to make generic, but perhaps this could be handled
by using a dedicated variable type that just maps to int?  


It looks to me that we'd also need to teach the perl parser which fields 
per statements struct need to be jumbled (or more probably which ones to 
exclude from the jumbling, for example funccall in CallStmt). Not sure 
yet how to teach the perl parser, but I'll build this list first to help 
decide for a right approach, unless you already have some thoughts about it?


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-10 Thread Drouvot, Bertrand

Hi,

On 10/6/22 2:53 AM, Michael Paquier wrote:

On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:

On 10/5/22 9:24 AM, Michael Paquier wrote:

- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently,


0001 looks good to me.


Thanks.  I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test. 


Thanks!


Thanks! I'll look at it and the comments you just made up-thread.


Cool, thanks.  One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.


Fully agree, all the versions that have been submitted in this thread 
preserves the ordering.




Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.


Good idea, it has been added in v5 just shared up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-10 Thread Drouvot, Bertrand

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
Anyway, I have looked at the patch.

+   List   *roles_re;
+   List   *databases_re;
+   regex_thostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names.  Wouldn't it be better to store
everything in a single list but assign an entry type?  In this case it
would be either regex or plain string.  This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts).  And it seems
to me that this would make unnecessary the use of re_num here and
there. 


Please find attached v5 addressing this. I started with an union but it 
turns out that we still need the plain string when a regex is used. This 
is not needed for the authentication per say but for fill_hba_line(). So 
I ended up creating a new struct without union in v5.



The hostname is different, of course, requiring only an extra
field for its type, or something like that.


I'm using the same new struct as described above for the hostname.



Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?



Right, I added more examples in v5.


-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there. 


Good point, I moved the hostname related tests in src/test/ssl.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -739,6 +743,24 @@ hostall all ::1/128
 trust
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all localhost   trust
 
+# The same using a regular expression for host name, which allows connection 
for
+# host name ending with "test".
+#
+# TYPE  DATABASEUSERADDRESS METHOD
+hostall all /^.*test$   trust
+
+# The same using regular expression for DATABASE, which allows connection to 
the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE   USERADDRESS METHOD
+local   db1,/^.*test$,testdb   all /^.*test$   trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE  

Re: Record SET session in VariableSetStmt

2022-10-06 Thread Drouvot, Bertrand




On 10/6/22 2:28 PM, Julien Rouhaud wrote:

On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote:


On 10/6/22 1:18 PM, Julien Rouhaud wrote:


so
nothing should rely on how exactly someone spelled it.  This is also the case
for our core jumbling code, where we guarantee (or at least try to) that two
semantically identical statements will get the same queryid, and therefore
don't distinguish eg. LIKE vs ~~.


Agree, but on the other hand currently SET and SET SESSION are recorded with
distinct queryid:

postgres=# select calls, query, queryid from pg_stat_statements;
  calls |query|   queryid
---+-+--
  2 | select calls, query from pg_stat_statements | -6345508659980235519
  1 | set session enable_seqscan=1| -3921418831612111986
  1 | create extension pg_stat_statements | -1739183385080879393
  1 | set enable_seqscan=1|  7925920505912025406
(4 rows)

and this behavior would change with the Jumbling work in progress in [1]
(mentioned up-thread) if we don't record "SET SESSION".

I think that would make sense to keep the same behavior, what do you think?


It's because until now jumbling of utility statements was just hashing the
query text, which is quite terrible.  This was also implying getting different
queryids for things like this:

=# select query, queryid from pg_stat_statements where query like '%work_mem%';;
  query |   queryid
---+--
  SeT work_mem = 123465 | -1114638544275583196
  Set work_mem = 123465 | -1966597613643458788
  SET work_mem = 123465 |  4161441071081149574
  seT work_mem = 123465 |  8327271737593275474
(4 rows)

If we move to a real jumbling of VariableSetStmt, we should keep the rules
consistent with the rest of the jumble code and ignore an explicit "SESSION" in
the original command.


Understood, so I agree that it makes sense to keep the jumbling behavior 
consistent and so keep the same queryid for statements that are 
semantically identical.


Thanks for your feedback!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Record SET session in VariableSetStmt

2022-10-06 Thread Drouvot, Bertrand

Hi,

On 10/6/22 1:18 PM, Julien Rouhaud wrote:

Hi,

On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote:

Hi hackers,

"SET local" is currently recorded in VariableSetStmt (with the boolean
is_local) but "SET session" is not.

Please find attached a patch proposal to also record "SET session" so that
VariableSetStmt records all the cases.

Remark: Recording "SET session" will also help for the Jumbling work being
done in [1].


I don't think it's necessary.  SET and SET SESSION are semantically the same


Thanks for your feedback!


so
nothing should rely on how exactly someone spelled it.  This is also the case
for our core jumbling code, where we guarantee (or at least try to) that two
semantically identical statements will get the same queryid, and therefore
don't distinguish eg. LIKE vs ~~.


Agree, but on the other hand currently SET and SET SESSION are recorded 
with distinct queryid:


postgres=# select calls, query, queryid from pg_stat_statements;
 calls |query|   queryid
---+-+--
 2 | select calls, query from pg_stat_statements | -6345508659980235519
 1 | set session enable_seqscan=1| -3921418831612111986
 1 | create extension pg_stat_statements | -1739183385080879393
 1 | set enable_seqscan=1|  7925920505912025406
(4 rows)

and this behavior would change with the Jumbling work in progress in [1] 
(mentioned up-thread) if we don't record "SET SESSION".


I think that would make sense to keep the same behavior, what do you think?

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Record SET session in VariableSetStmt

2022-10-06 Thread Drouvot, Bertrand

Hi hackers,

"SET local" is currently recorded in VariableSetStmt (with the boolean 
is_local) but "SET session" is not.


Please find attached a patch proposal to also record "SET session" so 
that VariableSetStmt records all the cases.


Remark: Recording "SET session" will also help for the Jumbling work 
being done in [1].


[1]: 
https://www.postgresql.org/message-id/66be1104-164f-dcb8-6c43-f03a68a139a7%40gmail.com


Looking forward to your feedback,

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comcommit 2401087366432b0a520ac45947c4584924099617
Author: bdrouvotAWS 
Date:   Thu Oct 6 10:20:18 2022 +

v1-0001-record-set-session-in-VariableSetStmt.patch

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 94d5142a4a..809e617d18 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -1563,6 +1563,7 @@ VariableSetStmt:
VariableSetStmt *n = $2;
 
n->is_local = false;
+   n->is_session = false;
$$ = (Node *) n;
}
| SET LOCAL set_rest
@@ -1570,6 +1571,7 @@ VariableSetStmt:
VariableSetStmt *n = $3;
 
n->is_local = true;
+   n->is_session = false;
$$ = (Node *) n;
}
| SET SESSION set_rest
@@ -1577,6 +1579,7 @@ VariableSetStmt:
VariableSetStmt *n = $3;
 
n->is_local = false;
+   n->is_session = true;
$$ = (Node *) n;
}
;
diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h
index 633e7671b3..c3f33e7c4a 100644
--- a/src/include/nodes/parsenodes.h
+++ b/src/include/nodes/parsenodes.h
@@ -2228,6 +2228,7 @@ typedef struct VariableSetStmt
char   *name;   /* variable to be set */
List   *args;   /* List of A_Const nodes */
boolis_local;   /* SET LOCAL? */
+   boolis_session; /* SET SESSION? */
 } VariableSetStmt;
 
 /* --


Re: Query Jumbling for CALL and SET utility statements

2022-10-06 Thread Drouvot, Bertrand

Hi,

On 10/6/22 8:39 AM, Michael Paquier wrote:

On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote:

Please find attached v6 taking care of the remarks mentioned above.
+case T_VariableSetStmt:
+{
+VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+/* stmt->name is NULL for RESET ALL */
+if (stmt->name)
+{
+APP_JUMB(stmt->kind);
+APP_JUMB_STRING(stmt->name);
+JumbleExpr(jstate, (Node *) stmt->args);
+}
+}
+break;


Hmm.  If VariableSetStmt->is_local is not added to the jumble, then
aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same
query?



Good catch, thanks!
While at it let's also jumble "SET SESSION foo =". For this one, we 
would need to record another bool in VariableSetStmt: I'll create a 
dedicated patch for that.




I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT
mentioned on this thread.  Would these be worth considering in what
gets compiled?  That would cover the remaining bits of
TransactionStmt.  The ODBC driver abuses of savepoints, for example,
so this could be useful for monitoring purposes in such cases.


Agree. I'll look at those too.



As of the code stands, it could be cleaner to check
IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back
to a default in JumbleQuery().  Not that what your patch does is
incorrect, of course.


Will look at it.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Query Jumbling for CALL and SET utility statements

2022-10-06 Thread Drouvot, Bertrand

Hi,

On 9/26/22 12:40 PM, Drouvot, Bertrand wrote:

let's add it in V7 attached
(that's safer should the code change later on).


Attached a tiny rebase needed due to 249b0409b1.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index 9ac5c87c3a..5d3330a87d 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -311,7 +311,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  wal_records > $2 as wal_records_generated,   +|   |  |
 |   | 
  wal_records >= rows as wal_records_ge_rows   +|   |  |
 |   | 
  FROM pg_stat_statements ORDER BY query COLLATE "C"|   |  |
 |   | 
- SET pg_stat_statements.track_utility = FALSE  | 1 |0 | f  
 | f | t
+ SET pg_stat_statements.track_utility = $1 | 1 |0 | f  
 | f | t
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t  
 | t | t
 (7 rows)
 
@@ -462,6 +462,111 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER 
BY query COLLATE "C";
  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
 (6 rows)
 
+-- PL/pgSQL procedure and pg_stat_statements.track = top
+CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (i - 1 - 1.0)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (j + j)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+SET pg_stat_statements.track = 'top';
+SET pg_stat_statements.track_utility = FALSE;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+query 
| calls | rows 
+--+---+--
+ CALL MINUS_TWO($1)   
| 2 |0
+ CALL SUM_TWO($1, $2) 
| 2 |0
+ SELECT pg_stat_statements_reset()
| 1 |1
+ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
+(4 rows)
+
+--
+-- pg_stat_statements.track = all
+--
+SET pg_stat_statements.track = 'all';
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+-- PL/pgSQL procedure and pg_stat_statements.track = all
+-- we drop and recreate the procedures to avoid any caching funnies
+DROP PROCEDURE MINUS_TWO(INTEGER);
+DROP PROCEDURE SUM_TWO(INTEGER, INTEGER);
+CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (i - 1 - 1.0)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (j + j)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+query 
| calls | rows 
+--+---+--
+ CALL MINUS_TWO($1)   
| 2 |0
+ CALL SUM_TWO($1, $2) 
| 2 |0
+ SELECT (i - $2 - $3)::INTEGER
| 2 |2
+ SELECT (j + j)::INTEGER  
| 2 |2
+ SELECT pg_stat_statements_reset()
| 1 |1
+ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
+(6 rows)
+
+SET pg_stat_statements.track_utility = TRUE;
+-- SET
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+set enable_seqscan=false;
+set enable_seqscan=true;
+set seq_page_cost=2.0;
+set seq_page_cost=1.0;
+set enable_seqscan to default;
+set seq_page_cost to default;
+reset seq_page_cost;
+reset 

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-05 Thread Drouvot, Bertrand

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:


Good idea, thanks for the proposal.


- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently, 


0001 looks good to me.


reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).


Thanks! I'll look at it and the comments you just made up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add peer authentication TAP test

2022-10-03 Thread Drouvot, Bertrand

Hi,

On 10/3/22 9:46 AM, Michael Paquier wrote:

On Fri, Sep 30, 2022 at 07:51:29PM +0200, Drouvot, Bertrand wrote:

Agree that it could be simplified, thanks for the hints!

Attached a simplified version.


While looking at that, I have noticed that it is possible to reduce
the number of connection attempts (for example no need to re-test that
the connection works when the map is not set, and the authn log would
be the same with the map in place).


Yeah that's right, thanks for simplifying further.


 Note that a path's meson.build
needs a refresh for any new file added into the tree, with 003_peer.pl
missing so this new test was not running in the recent CI runs.  The
indentation was also a bit wrong and I have tweaked a few comments,
before finally applying it.


Thanks!

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add peer authentication TAP test

2022-09-30 Thread Drouvot, Bertrand

Hi,

On 9/30/22 2:00 AM, Michael Paquier wrote:

On Wed, Sep 28, 2022 at 04:24:44PM +0900, Michael Paquier wrote:

Hmm, indeed.  It would be more reliable to rely on the contents
returned by getpeereid()/getpwuid() after one successful peer
connection, then use it in the map.  I was wondering whether using
stuff like getpwuid() in the perl script itself would be better, but
it sounds less of a headache in terms of portability to just rely on
authn_id via SYSTEM_USER to generate the contents of the correct map.


By the way, on an extra read I have found a few things that can be
simplified
- I think that test_role() should be reworked so as the log patterns
expected are passed down to connect_ok() and connect_fails() rather
than involving find_in_log().  You still need find_in_log() to skip
properly the case where peer is not supported by the platform, of
course.
- get_log_size() is not necessary.  You should be able to get the same
information with "-s $self->logfile".
- Nit: a newline should be added at the end of 003_peer.pl.
--


Agree that it could be simplified, thanks for the hints!

Attached a simplified version.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
new file mode 100644
index 00..57bf2343fc
--- /dev/null
+++ b/src/test/authentication/t/003_peer.pl
@@ -0,0 +1,108 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+# Tests for peer authentication and user name map.
+# The test is skipped if the platform does not support peer authentication.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Delete pg_hba.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_hba
+{
+   my $node   = shift;
+   my $hba_method = shift;
+
+   unlink($node->data_dir . '/pg_hba.conf');
+   # just for testing purposes, use a continuation line.
+   $node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+   $node->reload;
+   return;
+}
+
+# Test access for a single role, useful to wrap all tests into one.
+sub test_role
+{
+  local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+  my ($node, $role, $method, $expected_res, %params) = @_;
+  my $status_string = 'failed';
+  $status_string = 'success' if ($expected_res eq 0);
+
+  my $connstr = "user=$role";
+  my $testname =
+"authentication $status_string for method $method, role $role";
+
+  if ($expected_res eq 0)
+  {
+$node->connect_ok($connstr, $testname, %params);
+  }
+  else
+  {
+  # No checks of the error message, only the status code.
+  $node->connect_fails($connstr, $testname, %params);
+  }
+}
+
+# find $pat in logfile of $node.
+sub find_in_log
+{
+  my ($node, $pat) = @_;
+
+  my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
+  return 0 if (length($log) <= 0);
+
+  return $log =~ m/$pat/;
+}
+
+# Initialize primary node.
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->start;
+
+# Set pg_hba.conf with the peer authentication.
+reset_pg_hba($node, 'peer');
+
+# Check that peer authentication is supported on this platform.
+$node->psql('postgres');
+
+# If not supported, then skip the rest of the test.
+if (find_in_log($node, qr/peer authentication is not supported on this 
platform/))
+{
+plan skip_all => 'peer authentication is not supported on this platform';
+}
+
+# Let's create a new user for the user name map test.
+$node->safe_psql('postgres',
+  qq{CREATE USER testmapuser});
+
+# Get the system_user to define the user name map test.
+my $system_user =
+  $node->safe_psql('postgres', q(select (string_to_array(SYSTEM_USER, 
':'))[2]));
+
+# This connection should succeed.
+test_role($node, $system_user, 'peer', 0,
+log_like =>
+  [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+# This connection should failed.
+test_role($node, qq{testmapuser}, 'peer', 2,
+log_like => [qr/Peer authentication failed for user "testmapuser"/]);
+
+# Define a user name map.
+$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser});
+
+# Set pg_hba.conf with the peer authentication and the user name map.
+reset_pg_hba($node, 'peer map=mypeermap');
+
+# This connection should now succeed.
+test_role($node, qq{testmapuser}, 'peer', 0,
+log_like =>
+  [qr/connection authenticated: identity="$system_user" method=peer/]);
+
+done_testing();


Re: log_heap_visible(): remove unused parameter and update comment

2022-09-30 Thread Drouvot, Bertrand

Hi,

On 9/30/22 1:32 PM, Bharath Rupireddy wrote:

On Fri, Sep 30, 2022 at 1:07 PM Drouvot, Bertrand
 wrote:


Hi,

While resuming the work on [1] I noticed that:

- there is an unused parameter in log_heap_visible()
- the comment associated to the function is not in "sync" with the
current implementation (referring a "block" that is not involved anymore)

Attached a tiny patch as an attempt to address the above remarks.

[1]: https://commitfest.postgresql.org/39/3740/


It looks like that parameter was originally introduced and used in PG
9.4 where xl_heap_visible structure was having RelFileNode, which was
later removed in PG 9.5, since then the RelFileNode rnode parameter is
left out. This parameter got renamed to RelFileLocator rlocator by the
commit b0a55e43299c4ea2a9a8c757f9c26352407d0ccc in HEAD.

The attached patch LGTM.


Thanks for looking at it!



We recently committed another patch to remove an unused function
parameter - 65b158ae4e892c2da7a5e31e2d2645e5e79a0bfd.

It makes me think that why can't we remove the unused function
parameters once and for all, say with a compiler flag such as
-Wunused-parameter [1]? We might have to be careful in removing
certain parameters which are not being used right now, but might be
used in the near future though.

[1] https://man7.org/linux/man-pages/man1/gcc.1.html

  -Wunused-parameter
Warn whenever a function parameter is unused aside from its
declaration.

To suppress this warning use the "unused" attribute.



That's right. I have the feeling this will be somehow time consuming and 
I'm not sure the added value is worth it (as compare to fix them when 
"accidentally" cross their paths).


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Minimal logical decoding on standbys

2022-09-30 Thread Drouvot, Bertrand

Hi,

On 7/6/22 3:30 PM, Drouvot, Bertrand wrote:

Hi,

On 10/28/21 11:07 PM, Andres Freund wrote:

Hi,

On 2021-10-28 16:24:22 -0400, Robert Haas wrote:
On Wed, Oct 27, 2021 at 2:56 AM Drouvot, Bertrand 
 wrote:
So you have in mind to check for XLogLogicalInfoActive() first, and 
if true, then open the relation and call

RelationIsAccessibleInLogicalDecoding()?

I think 0001 is utterly unacceptable. We cannot add calls to
table_open() in low-level functions like this. Suppose for example
that _bt_getbuf() calls _bt_log_reuse_page() which with 0001 applied
would call get_rel_logical_catalog(). _bt_getbuf() will have acquired
a buffer lock on the page. The idea that it's safe to call
table_open() while holding a buffer lock cannot be taken seriously.
Yes - that's pretty clearly a deadlock hazard. It shouldn't too hard 
to fix, I

think. Possibly a bit more verbose than nice, but...

Alternatively we could propagate the information whether a relcache 
entry is
for a catalog from the table to the index. Then we'd not need to 
change the

btree code to pass the table down.


Looking closer at RelationIsAccessibleInLogicalDecoding() It seems to me 
that the missing part to be able to tell whether or not an index is for 
a catalog is the rd_options->user_catalog_table value of its related 
heap relation.


Then, a way to achieve that could be to:

- Add to Relation a new "heap_rd_options" representing the rd_options of 
the related heap relation when appropriate


- Trigger the related indexes relcache invalidations when an 
ATExecSetRelOptions() is triggered on a heap relation


- Write an equivalent of RelationIsUsedAsCatalogTable() for indexes that 
would make use of the heap_rd_options instead


Does that sound like a valid option to you or do you have another idea 
in mind to propagate the information whether a relcache entry is for a 
catalog from the table to the index?




I ended up with the attached proposal to propagate the catalog 
information to the indexes.


The attached adds a new field "isusercatalog" in pg_index to indicate 
whether or not the index is linked to a table that has the storage 
parameter user_catalog_table set to true.


Then it defines new macros, including 
"IndexIsAccessibleInLogicalDecoding" making use of this new field.


This new macro replaces get_rel_logical_catalog() that was part of the 
previous patch version.


What do you think about this approach and the attached?

If that sounds reasonable, then I'll add tap tests for it and try to 
improve the way isusercatalog is propagated to the index(es) in case a 
reset is done on user_catalog_table on the table (currently in this POC 
patch, it's hardcoded to "false" which is the default value for 
user_catalog_table in boolRelOpts[]) (A better approach would be 
probably to retrieve the value from the table once the reset is done and 
then propagate it to the index(es).)


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comFrom 0dc838a0fdb0f8daccc398bea4f11152b68e0c3a Mon Sep 17 00:00:00 2001
From: bdrouvotAWS 
Date: Fri, 30 Sep 2022 09:28:09 +
Subject: [PATCH v26] Add info in WAL records in preparation for logical slot
 conflict handling.

When a WAL replay on standby indicates that a catalog table tuple is
to be deleted by an xid that is greater than a logical slot's
catalog_xmin, then that means the slot's catalog_xmin conflicts with
the xid, and we need to handle the conflict.  While subsequent commits
will do the actual conflict handling, this commit adds a new field
onCatalogTable in such WAL records, that is true for catalog tables,
so as to arrange for conflict handling.

Author: Andres Freund (in an older version), Amit Khandekar, Bertrand
Drouvot
Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de
Royes Mello
---
 doc/src/sgml/catalogs.sgml  | 11 +
 src/backend/access/common/reloptions.c  |  3 +-
 src/backend/access/gist/gistxlog.c  |  1 +
 src/backend/access/hash/hashinsert.c|  1 +
 src/backend/access/heap/heapam.c|  4 +-
 src/backend/access/heap/pruneheap.c |  1 +
 src/backend/access/heap/visibilitymap.c |  3 +-
 src/backend/access/nbtree/nbtpage.c |  3 ++
 src/backend/access/spgist/spgvacuum.c   |  1 +
 src/backend/catalog/index.c | 14 --
 src/backend/commands/tablecmds.c| 57 +
 src/include/access/gistxlog.h   |  2 +
 src/include/access/hash_xlog.h  |  1 +
 src/include/access/heapam_xlog.h|  5 ++-
 src/include/access/nbtxlog.h|  2 +
 src/include/access/spgxlog.h|  1 +
 src/include/catalog/pg_index.h  |  2 +
 src/include/utils/rel.h | 33 ++
 18 files changed, 137 insertions(+), 8 deletions(-)
   8.3% doc/src/sgml/
   9.5% src/backend/access/heap/
   8.1% src/backend/access/
   7.7% src/backend/catalog

log_heap_visible(): remove unused parameter and update comment

2022-09-30 Thread Drouvot, Bertrand

Hi,

While resuming the work on [1] I noticed that:

- there is an unused parameter in log_heap_visible()
- the comment associated to the function is not in "sync" with the 
current implementation (referring a "block" that is not involved anymore)


Attached a tiny patch as an attempt to address the above remarks.


[1]: https://commitfest.postgresql.org/39/3740/

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 75b214824d..7465334a83 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -8188,17 +8188,17 @@ log_heap_freeze(Relation reln, Buffer buffer, 
TransactionId cutoff_xid,
 }
 
 /*
- * Perform XLogInsert for a heap-visible operation.  'block' is the block
- * being marked all-visible, and vm_buffer is the buffer containing the
- * corresponding visibility map block.  Both should have already been modified
- * and dirtied.
+ * Perform XLogInsert for a heap-visible operation. heap_buffer is the buffer
+ * containing the block being marked all-visible, and vm_buffer is the buffer
+ * containing the corresponding visibility map block. Both should have already
+ * been modified and dirtied.
  *
  * If checksums are enabled, we also generate a full-page image of
  * heap_buffer, if necessary.
  */
 XLogRecPtr
-log_heap_visible(RelFileLocator rlocator, Buffer heap_buffer, Buffer vm_buffer,
-TransactionId cutoff_xid, uint8 vmflags)
+log_heap_visible(Buffer heap_buffer, Buffer vm_buffer, TransactionId 
cutoff_xid,
+uint8 vmflags)
 {
xl_heap_visible xlrec;
XLogRecPtr  recptr;
diff --git a/src/backend/access/heap/visibilitymap.c 
b/src/backend/access/heap/visibilitymap.c
index d62761728b..ccc97e3b09 100644
--- a/src/backend/access/heap/visibilitymap.c
+++ b/src/backend/access/heap/visibilitymap.c
@@ -283,8 +283,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer 
heapBuf,
if (XLogRecPtrIsInvalid(recptr))
{
Assert(!InRecovery);
-   recptr = log_heap_visible(rel->rd_locator, 
heapBuf, vmBuf,
-   
  cutoff_xid, flags);
+   recptr = log_heap_visible(heapBuf, vmBuf, 
cutoff_xid, flags);
 
/*
 * If data checksums are enabled (or 
wal_log_hints=on), we
diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h
index 34220d93cf..4b6bdf6955 100644
--- a/src/include/access/heapam_xlog.h
+++ b/src/include/access/heapam_xlog.h
@@ -415,7 +415,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple,
  
MultiXactId *relminmxid_out);
 extern void heap_execute_freeze_tuple(HeapTupleHeader tuple,
  
xl_heap_freeze_tuple *frz);
-extern XLogRecPtr log_heap_visible(RelFileLocator rlocator, Buffer heap_buffer,
-  Buffer 
vm_buffer, TransactionId cutoff_xid, uint8 vmflags);
+extern XLogRecPtr log_heap_visible(Buffer heap_buffer, Buffer vm_buffer,
+  
TransactionId cutoff_xid, uint8 vmflags);
 
 #endif /* HEAPAM_XLOG_H */


Re: SYSTEM_USER reserved word implementation

2022-09-29 Thread Drouvot, Bertrand

Hi,

On 9/29/22 8:12 AM, Michael Paquier wrote:

On Wed, Sep 28, 2022 at 12:58:48PM +0200, Drouvot, Bertrand wrote:

I had a look at v5 and it does look good to me.


Okay, cool.  I have spent some time today doing a last pass over it
and an extra round of tests.  Things looked fine, so applied.
--


Thanks for your precious help!

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-09-28 Thread Drouvot, Bertrand

Hi,

On 9/28/22 5:28 AM, Michael Paquier wrote:

On Tue, Sep 27, 2022 at 03:38:49PM -0700, Jacob Champion wrote:

On 9/26/22 06:29, Drouvot, Bertrand wrote:
Since there are only internal clients to the API, I'd argue this makes
more sense as an Assert(authn_id != NULL), but I don't think it's a
dealbreaker.


Using an assert() looks like a good idea from here.  If this is called
with a NULL authn, this could reflect a problem in the authentication
logic.



Agree, thanks for pointing out.


As far the assertion failure mentioned by Michael when moving the
SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is
safe to force the collation to C_COLLATION_OID for SQLValueFunction
having a TEXT type, but I would be happy to also hear your thoughts
about it.


Unfortunately I don't have much to add here; I don't know enough about
the underlying problems.


I have been looking at that, and after putting my hands on that this
comes down to the facility introduced in 40c24bf.  So, I think that
we'd better use COERCE_SQL_SYNTAX so as there is no need to worry
about the shortcuts this patch is trying to use with the collation
setup.


Nice!


 And there are a few tests for get_func_sql_syntax() in
create_view.sql.  Note that this makes the patch slightly shorter, and
simpler.



Agree that it does look simpler that way and that making use of 
COERCE_SQL_SYNTAX does looks like a better approach. Nice catch!




The docs still mentioned "name", and not "text".



Oups, thanks for pointing out.

I had a look at v5 and it does look good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: [PATCH] Add peer authentication TAP test

2022-09-28 Thread Drouvot, Bertrand

Hi,

On 9/28/22 7:52 AM, Michael Paquier wrote:

On Fri, Aug 26, 2022 at 10:43:43AM +0200, Drouvot, Bertrand wrote:

During the work in [1] we created a new TAP test to test the SYSTEM_USER
behavior with peer authentication.

It turns out that there is currently no TAP test for the peer
authentication, so we think (thanks Michael for the suggestion [2]) that
it's better to split the work in [1] between "pure" SYSTEM_USER related work
and the "pure" peer authentication TAP test work.

That's the reason of this new thread, please find attached a patch to add a
new TAP test for the peer authentication.


+# Get the session_user to define the user name map test.
+my $session_user =
+  $node->safe_psql('postgres', 'select session_user');
[...]
+# Define a user name map.
+$node->append_conf('pg_ident.conf', qq{mypeermap $session_user 
testmap$session_user});
+
+# Set pg_hba.conf with the peer authentication and the user name map.
+reset_pg_hba($node, 'peer map=mypeermap');

A map consists of a "MAPNAME SYSTEM_USER PG_USER".  Why does this test
use a Postgres role (from session_user) as the system user for the
peer map?


Thanks for looking at it!

Initially selecting the session_user with a "local" connection and no 
user provided during the connection is a way I came up to retrieve the 
"SYSTEM_USER" to be used later on in the map.


Maybe the variable name should be system_user instead or should we use 
another way to get the "SYSTEM_USER" to be used in the map?


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-09-26 Thread Drouvot, Bertrand

Hi,

On 9/8/22 3:17 AM, Michael Paquier wrote:

On Wed, Sep 07, 2022 at 08:48:43AM -0700, Jacob Champion wrote:


We could pass the bare auth_method index, or update the documentation
for auth_method to state that it's guaranteed to be zero if authn_id is
NULL (and then enforce that).


 case SVFOP_CURRENT_USER:
 case SVFOP_USER:
 case SVFOP_SESSION_USER:
+   case SVFOP_SYSTEM_USER:
 case SVFOP_CURRENT_CATALOG:
 case SVFOP_CURRENT_SCHEMA:
 svf->type = NAMEOID;


Should this be moved to use TEXTOID instead?


Yeah, it should.  There is actually a second and much deeper issue
here, in the shape of a collation problem.  See the assertion failure
in exprSetCollation(), because we expect SQLValueFunction nodes to
return a name or a non-collatable type.  However, for this case, we'd
require a text to get rid of the 63-character limit, and that's
a collatable type.  This reminds me of the recent thread to work on
getting rid of this limit for the name type..


Please find attached V4 taking care of Jacob's previous comments.

As far the assertion failure mentioned by Michael when moving the 
SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is 
safe to force the collation to C_COLLATION_OID for SQLValueFunction 
having a TEXT type, but I would be happy to also hear your thoughts 
about it.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index e1fe4fec57..fe99e65dd5 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -22623,6 +22623,25 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS 
t(ls,n);

   
 
+  
+   
+
+ system_user
+
+system_user
+name
+   
+   
+Returns the authentication method and the identity (if any) that the
+user presented during the authentication cycle before they were
+assigned a database role. It is represented as
+auth_method:identity or
+NULL if the user has not been authenticated (for
+example if Trust authentication has
+been used).
+   
+  
+
   

 
diff --git a/src/backend/access/transam/parallel.c 
b/src/backend/access/transam/parallel.c
index bc93101ff7..c2a08e9414 100644
--- a/src/backend/access/transam/parallel.c
+++ b/src/backend/access/transam/parallel.c
@@ -1496,6 +1496,14 @@ ParallelWorkerMain(Datum main_arg)

 false);
RestoreClientConnectionInfo(clientconninfospace);
 
+   /*
+* Initialize SystemUser now that MyClientConnectionInfo is restored.
+* Also ensure that auth_method is actually valid, aka authn_id is not 
NULL.
+*/
+   if (MyClientConnectionInfo.authn_id)
+   InitializeSystemUser(MyClientConnectionInfo.authn_id,
+
hba_authname(MyClientConnectionInfo.auth_method));
+
/* Attach to the leader's serializable transaction, if SERIALIZABLE. */
AttachSerializableXact(fps->serializable_xact_handle);
 
diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index 9b9bbf00a9..c51578c0b9 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2537,6 +2537,11 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep 
*op)
*op->resvalue = session_user(fcinfo);
*op->resnull = fcinfo->isnull;
break;
+   case SVFOP_SYSTEM_USER:
+   InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
+   *op->resvalue = system_user(fcinfo);
+   *op->resnull = fcinfo->isnull;
+   break;
case SVFOP_CURRENT_CATALOG:
InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
*op->resvalue = current_database(fcinfo);
diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c
index 3bac350bf5..453ba84494 100644
--- a/src/backend/nodes/nodeFuncs.c
+++ b/src/backend/nodes/nodeFuncs.c
@@ -1142,7 +1142,8 @@ exprSetCollation(Node *expr, Oid collation)
((MinMaxExpr *) expr)->minmaxcollid = collation;
break;
case T_SQLValueFunction:
-   AssertSQLValueFunction *) expr)->type == NAMEOID) ?
+   AssertSQLValueFunction *) expr)->type == NAMEOID ||
+   ((SQLValueFunction *) expr)->type == 
TEXTOID) ?
   (collation == C_COLLATION_OID) :
   (collation == InvalidOid));

Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-09-26 Thread Drouvot, Bertrand

Hi,

On 9/26/22 10:23 AM, Drouvot, Bertrand wrote:

Hi,

On 9/23/22 10:45 PM, Andres Freund wrote:





Hi,

On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote:
Thanks for the suggestion, I'm coming up with this proposal in v4 
attached:


I pushed the bugfix / related test portion to 15, master. Thanks!


Thanks!



Forgot to say that with that being fixed, I'll come back with a patch 
proposal for the tables/indexes stats split (discovered the issue fixed 
in this current thread while working on the split patch.)


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: SYSTEM_USER reserved word implementation

2022-09-26 Thread Drouvot, Bertrand

Hi,

On 9/7/22 5:48 PM, Jacob Champion wrote:

On 9/7/22 07:46, Drouvot, Bertrand wrote:

Except the Nit above, that looks all good to me.


A few additional comments:


+assigned a database role. It is represented as
+auth_method:identity or
+NULL if the user has not been authenticated (for
+example if  has been used).
+   


This is rendered as

 ... (for example if Section 21.4 has been used).

which IMO isn't too helpful. Maybe a  would read better than an
?


Thanks for looking at it!
Good catch, V4 coming soon will make use of  instead.



Also, this function's placement in the docs (with the System Catalog
Information Functions) seems wrong to me. I think it should go up above
in the Session Information Functions, with current_user et al.


Agree, will move it to the Session Information Functions in V4.




+   /* Build system user as auth_method:authn_id */
+   char   *system_user;
+   Sizeauthname_len = strlen(auth_method);
+   Sizeauthn_id_len = strlen(authn_id);
+
+   system_user = palloc0(authname_len + authn_id_len + 2);
+   strcat(system_user, auth_method);
+   strcat(system_user, ":");
+   strcat(system_user, authn_id);


If we're palloc'ing anyway, can this be replaced with a single psprintf()?


Fair point, V4 will make use of psprintf().




+   /* Initialize SystemUser now that MyClientConnectionInfo is restored. */
+   InitializeSystemUser(MyClientConnectionInfo.authn_id,
+
hba_authname(MyClientConnectionInfo.auth_method));


It makes me a little nervous to call hba_authname(auth_method) without
checking to see that auth_method is actually valid (which is only true
if authn_id is not NULL).


Will add additional check for safety in V4.




We could pass the bare auth_method index, or update the documentation
for auth_method to state that it's guaranteed to be zero if authn_id is
NULL (and then enforce that).


 case SVFOP_CURRENT_USER:
 case SVFOP_USER:
 case SVFOP_SESSION_USER:
+   case SVFOP_SYSTEM_USER:
 case SVFOP_CURRENT_CATALOG:
 case SVFOP_CURRENT_SCHEMA:
 svf->type = NAMEOID;


Should this be moved to use TEXTOID instead?


Good catch, will do in V4.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Query Jumbling for CALL and SET utility statements

2022-09-26 Thread Drouvot, Bertrand


Hi,

On 9/21/22 6:07 PM, Fujii Masao wrote:



On 2022/09/19 15:29, Drouvot, Bertrand wrote:

Please find attached v6 taking care of the remarks mentioned above.


Thanks for updating the patch!

+SET pg_stat_statements.track_utility = TRUE;
+
+-- PL/pgSQL procedure and pg_stat_statements.track = all
+-- we drop and recreate the procedures to avoid any caching funnies
+SET pg_stat_statements.track_utility = FALSE;

Could you tell me why track_utility is enabled just before it's disabled?


Thanks for looking at the new version!

No real reason, I removed those useless SET in the new V7 attached.



Could you tell me what actually happens if we don't drop and
recreate the procedures? I'd like to know what "any caching funnies" are.


Without the drop/recreate the procedure body does not appear normalized 
(while the CALL itself is) when switching from track = top to track = all.


I copy-pasted this comment from the already existing "function" section 
in the pg_stat_statements.sql file. This comment has been introduced for 
the function section in commit 83f2061dd0. Note that the behavior would 
be the same for the function case (function body does not appear 
normalized without the drop/recreate).




+SELECT pg_stat_statements_reset();
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query 
COLLATE "C";


This test set for the procedures is executed with the following
four conditions, respectively. Do we really need all of these tests?

track = top, track_utility = true
track = top, track_utility = false
track = all, track_utility = true
track = all, track_utility = false


Oh right, the track_utility = false cases have been added when we 
decided up-thread to force track CALL.


But now that's probably not needed to test with track_utility = true. So 
I'm just keeping track_utility = off with track = top or all in the new 
V7 attached (like this is the case for the "function" section).




+begin;
+prepare transaction 'tx1';
+insert into test_tx values (1);
+commit prepared 'tx1';

The test set of 2PC commands is also executed with track_utility = on
and off, respectively. But why do we need to run that test when
track_utility = off?


That's useless, thanks for pointing out. Removed in V7 attached.



-    if (query->utilityStmt)
+    if (query->utilityStmt && !jstate)
  {
  if (pgss_track_utility && 
!PGSS_HANDLED_UTILITY(query->utilityStmt))


"pgss_track_utility" should be
"pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically?


Good catch! That's not needed (in practice) with the current code but 
that is "theoretically" needed indeed, let's add it in V7 attached 
(that's safer should the code change later on).



Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index ff0166fb9d..f5fc2f1f38 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -272,7 +272,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  wal_records > $2 as wal_records_generated,   +|   |  |
 |   | 
  wal_records >= rows as wal_records_ge_rows   +|   |  |
 |   | 
  FROM pg_stat_statements ORDER BY query COLLATE "C"|   |  |
 |   | 
- SET pg_stat_statements.track_utility = FALSE  | 1 |0 | f  
 | f | t
+ SET pg_stat_statements.track_utility = $1 | 1 |0 | f  
 | f | t
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t  
 | t | t
 (7 rows)
 
@@ -423,6 +423,111 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER 
BY query COLLATE "C";
  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
 (6 rows)
 
+-- PL/pgSQL procedure and pg_stat_statements.track = top
+CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (i - 1 - 1.0)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (j + j)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+SET pg_stat_statements.track = 'top';
+SET pg_stat_statements.track_utility = FALSE;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+CALL MINUS_TW

Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)

2022-09-26 Thread Drouvot, Bertrand

Hi,

On 9/23/22 10:45 PM, Andres Freund wrote:





Hi,

On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote:

Thanks for the suggestion, I'm coming up with this proposal in v4 attached:


I pushed the bugfix / related test portion to 15, master. Thanks!


Thanks!



I left the replication stuff out - it seemed somewhat independent.


Yeah.


Probably
will just push that to master, unless somebody thinks it should be in both?


Sounds good to me as this is not a bug and that seems unlikely to me 
that an issue in this area will be introduced later on on 15 without 
being introduced on master too.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-20 Thread Drouvot, Bertrand

Hi,

On 9/20/22 6:30 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry.  OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.


It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.


I also have the feeling that having (a), (b) and (d) is low probability.

That said, If the user "/john" already exists and has a hba entry then 
this entry will still match with the patch. Problem is that all the 
users that contain "john" would also now match.


But things get worst if say /a is an existing user and hba entry as the 
entry would match any users that contains "a" with the patch.


I assume (maybe i should not) that if objects starting with / already 
exist there is very good reason(s) behind. Then I don't think that 
preventing their creation in the DDL would help (quite the contrary for 
the ones that really need them).


It looks to me that adding a GUC (off by default) to enable/disable the 
regexp usage in the hba could be a fair compromise. It won't block any 
creation starting with a / and won't open more doors (if such objects 
exist) by default.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch to address creation of PgStat* contexts with null parent context

2022-09-19 Thread Drouvot, Bertrand

Hi,

On 9/17/22 6:10 PM, Andres Freund wrote:

Hi,

On 2022-09-07 11:11:11 +0200, Drouvot, Bertrand wrote:

On 9/6/22 7:53 AM, Kyotaro Horiguchi wrote:

At Mon, 5 Sep 2022 14:46:55 +0200, "Drouvot, Bertrand"  
wrote in

Looks like that both approaches have their pros and cons. I'm tempted
to vote +1 on "just changing" the parent context to TopMemoryContext
and not changing the allocations locations.

Yeah. It is safe more than anything and we don't have a problem there.

So, I'm fine with just replacing the parent context at the three places.


Attached a patch proposal to do so.


Pushed. Thanks for the report and the fix!


Thanks! I just marked the corresponding CF entry as Committed.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-19 Thread Drouvot, Bertrand

Hi,

On 9/17/22 8:53 AM, Michael Paquier wrote:

On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:

The CF bot is failing for Windows (all other tests are green) and only for
the new tap test related to the regular expression on the host name (the
ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows Cirrus
test does not like when a host name is provided for a "host" entry in
pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1] where the only change is to replace the CIDR
by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
on Windows only (its log file is here [2]).

I'll look at this "Windows" related issue but would appreciate any
guidance/help if someone has experience in this area on windows.


I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm).  Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names.  If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.


Thanks for looking at it!

That sounds reasonable, v3 attached is skipping the regular expression 
tests for the hostname on Windows.





While reading the patch, I am a bit confused about token_regcomp() and
token_regexec().  It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().


Fully agree, comments were missing. They've been added in v3 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -785,16 +789,18 @@ hostall all 192.168.12.10/32  
  gss
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all 192.168.0.0/16  ident 
map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASEUSERADDRESS METHOD
 local   sameuserall 

Re: Query Jumbling for CALL and SET utility statements

2022-09-19 Thread Drouvot, Bertrand

Hi,

On 9/16/22 5:47 PM, Drouvot, Bertrand wrote:

Hi,

On 9/16/22 2:53 PM, Fujii Masao wrote:




Attached v5 to normalize 2PC commands too, so that we get things like:


+    case T_VariableSetStmt:
+    {
+    VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+    /* stmt->name is NULL for RESET ALL */
+    if (stmt->name)
+    {
+    APP_JUMB_STRING(stmt->name);
+    JumbleExpr(jstate, (Node *) stmt->args);

With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as 
the same query.
Is this intentional? 


Thanks for looking at the patch!
No, it is not intentional, good catch!


Which might be ok because their behavior is basically the same.
But I'm afaid which may cause users to be confused. For example, they 
may fail to
find the pgss entry for RESET command they ran and just wonder why the 
command was
not recorded. To avoid such confusion, how about appending stmt->kind 
to the jumble?

Thought?


I think that's a good idea and will provide a new version taking care of 
it (and also Sami's comments up-thread).


Please find attached v6 taking care of the remarks mentioned above.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out 
b/contrib/pg_stat_statements/expected/pg_stat_statements.out
index ff0166fb9d..ad8cebcbad 100644
--- a/contrib/pg_stat_statements/expected/pg_stat_statements.out
+++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out
@@ -272,7 +272,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C";
  wal_records > $2 as wal_records_generated,   +|   |  |
 |   | 
  wal_records >= rows as wal_records_ge_rows   +|   |  |
 |   | 
  FROM pg_stat_statements ORDER BY query COLLATE "C"|   |  |
 |   | 
- SET pg_stat_statements.track_utility = FALSE  | 1 |0 | f  
 | f | t
+ SET pg_stat_statements.track_utility = $1 | 1 |0 | f  
 | f | t
  UPDATE pgss_test SET b = $1 WHERE a > $2  | 1 |3 | t  
 | t | t
 (7 rows)
 
@@ -423,6 +423,155 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER 
BY query COLLATE "C";
  SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
 (6 rows)
 
+-- PL/pgSQL procedure and pg_stat_statements.track = top
+CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (i - 1 - 1.0)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$
+DECLARE
+  r INTEGER;
+BEGIN
+  SELECT (j + j)::INTEGER INTO r;
+END; $$ LANGUAGE plpgsql;
+SET pg_stat_statements.track = 'top';
+SET pg_stat_statements.track_utility = TRUE;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+query 
| calls | rows 
+--+---+--
+ CALL MINUS_TWO($1)   
| 2 |0
+ CALL SUM_TWO($1, $2) 
| 2 |0
+ SELECT pg_stat_statements_reset()
| 1 |1
+ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" 
| 0 |0
+(4 rows)
+
+-- Ensure CALL is tracked even if pg_stat_statements.track_utility is FALSE
+SET pg_stat_statements.track_utility = FALSE;
+SELECT pg_stat_statements_reset();
+ pg_stat_statements_reset 
+--
+ 
+(1 row)
+
+CALL MINUS_TWO(3);
+CALL MINUS_TWO(7);
+CALL SUM_TWO(3, 8);
+CALL SUM_TWO(7, 5);
+SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C";
+query 
| calls | rows 
+--+---+--
+ CALL MINUS_TWO($1)   
| 2 |0
+ CALL SUM_TWO($1, $2) 
| 2 |0
+ SELECT pg_stat_statements_reset()
| 1 |1
+ SELECT query, calls, rows FROM pg_stat_statements

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-16 Thread Drouvot, Bertrand

Hi,

On 9/12/22 9:55 AM, Drouvot, Bertrand wrote:

Hi,

On 9/10/22 1:21 AM, Jacob Champion wrote:

On 8/19/22 01:12, Drouvot, Bertrand wrote:

+   wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+   wlen = pg_mb2wchar_with_len(tok->string + 1,
+   wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.


Thanks for the feedback.

Yeah fully agree. I'll provide a new version that follow the same logic 
as the pg_ident code.



+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for 
username', 0);
+

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.


Agree, will do.


Other than that, and Tom's note on potentially expanding this to other
areas,


I'll add regexp usage for the database column and also the for the 
address one when non CIDR is provided (so host name(s)) (I think it also 
makes sense specially as we don't allow multiple values for this column).





Please find attached v2 addressing the comments mentioned above.

v2 also provides regular expression usage for the database and the 
address columns (when a host name is being used).


Remark:

The CF bot is failing for Windows (all other tests are green) and only 
for the new tap test related to the regular expression on the host name 
(the ones on database and role are fine).


The issue is not related to the patch. The issue is that the Windows 
Cirrus test does not like when a host name is provided for a "host" 
entry in pg_hba.conf (while it works fine when a CIDR is provided).


You can see an example in [1] where the only change is to replace the 
CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are 
failing on Windows only (its log file is here [2]).


I'll look at this "Windows" related issue but would appreciate any 
guidance/help if someone has experience in this area on windows.





[1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr”

[2]: 
https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -785,16 +789,18 @@ hostall all 192.168.12.10/32  
  gss
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all  

Re: Query Jumbling for CALL and SET utility statements

2022-09-16 Thread Drouvot, Bertrand

Hi,

On 9/16/22 2:53 PM, Fujii Masao wrote:


 

Attached v5 to normalize 2PC commands too, so that we get things like:


+    case T_VariableSetStmt:
+    {
+    VariableSetStmt *stmt = (VariableSetStmt *) node;
+
+    /* stmt->name is NULL for RESET ALL */
+    if (stmt->name)
+    {
+    APP_JUMB_STRING(stmt->name);
+    JumbleExpr(jstate, (Node *) stmt->args);

With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the 
same query.
Is this intentional? 


Thanks for looking at the patch!
No, it is not intentional, good catch!

Which might be ok because their behavior is 
basically the same.
But I'm afaid which may cause users to be confused. For example, they 
may fail to
find the pgss entry for RESET command they ran and just wonder why the 
command was
not recorded. To avoid such confusion, how about appending stmt->kind to 
the jumble?

Thought?


I think that's a good idea and will provide a new version taking care of 
it (and also Sami's comments up-thread).


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Query Jumbling for CALL and SET utility statements

2022-09-12 Thread Drouvot, Bertrand

Hi,

On 9/13/22 6:33 AM, Julien Rouhaud wrote:

Hi,

On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote:

I found that utility statement is counted separately in upper and lower
case.
For example BEGIN and begin are counted separately.
Is it difficult to fix this problem?

This is a known behavior, utility command aren't normalized (apart from the few
that will be with this patch) and the queryid is just a hash of the provided
query string.

It seems unrelated to this patch though.  While it can be a bit annoying, it's
unlikely that the application will have thousands of way to ask for a new
transaction (mixing case, adding a random number of spaces between BEGIN and
TRANSACTION and so on), so in real life it won't cause any problem.


Agree that it seems unlikely to cause any problem (as compare to the 
utility statements that are handled in this patch).



  Fixing it
would require to fully jumble all utility statements, which would require a
separate discussion.


Agree.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, 
R.C.S. Luxembourg B186284

Amazon Web Services EMEA SARL, succursale francaise, 31 Place des Corolles, 
Tour Carpe Diem, F-92400 Courbevoie, SIREN 831 001 334, RCS Nanterre, APE 
6311Z,  TVA FR30831001334




[PATCH] Add peer authentication TAP test

2022-08-26 Thread Drouvot, Bertrand

Hi hackers,

During the work in [1] we created a new TAP test to test the SYSTEM_USER 
behavior with peer authentication.


It turns out that there is currently no TAP test for the peer 
authentication, so we think (thanks Michael for the suggestion [2]) that 
it's better to split the work in [1] between "pure" SYSTEM_USER related 
work and the "pure" peer authentication TAP test work.


That's the reason of this new thread, please find attached a patch to 
add a new TAP test for the peer authentication.


[1]: 
https://www.postgresql.org/message-id/flat/7e692b8c-0b11-45db-1cad-3afc5b57409f%40amazon.com


[2]: https://www.postgresql.org/message-id/YwgboqQUV1%2BY/k6z%40paquier.xyz

Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
diff --git a/src/test/authentication/t/003_peer.pl 
b/src/test/authentication/t/003_peer.pl
new file mode 100644
index 00..006c4405a5
--- /dev/null
+++ b/src/test/authentication/t/003_peer.pl
@@ -0,0 +1,140 @@
+
+# Copyright (c) 2022, PostgreSQL Global Development Group
+
+# Tests for peer authentication and user name map.
+# The test is skipped if the platform does not support peer authentication.
+
+use strict;
+use warnings;
+use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Utils;
+use Test::More;
+
+# Delete pg_hba.conf from the given node, add a new entry to it
+# and then execute a reload to refresh it.
+sub reset_pg_hba
+{
+   my $node   = shift;
+   my $hba_method = shift;
+
+   unlink($node->data_dir . '/pg_hba.conf');
+   # just for testing purposes, use a continuation line.
+   $node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+   $node->reload;
+   return;
+}
+
+# Test access for a single role, useful to wrap all tests into one.
+sub test_role
+{
+  local $Test::Builder::Level = $Test::Builder::Level + 1;
+
+  my ($node, $role, $method, $expected_res) = @_;
+  my $status_string = 'failed';
+  $status_string = 'success' if ($expected_res eq 0);
+
+  my $connstr = "user=$role";
+  my $testname =
+"authentication $status_string for method $method, role $role";
+
+  if ($expected_res eq 0)
+  {
+$node->connect_ok($connstr, $testname);
+  }
+  else
+  {
+  # No checks of the error message, only the status code.
+  $node->connect_fails($connstr, $testname);
+  }
+}
+
+# return the size of logfile of $node in bytes.
+sub get_log_size
+{
+  my ($node) = @_;
+
+  return (stat $node->logfile)[7];
+}
+
+# find $pat in logfile of $node after $off-th byte.
+sub find_in_log
+{
+  my ($node, $pat, $off) = @_;
+
+  $off = 0 unless defined $off;
+  my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile);
+  return 0 if (length($log) <= $off);
+
+  $log = substr($log, $off);
+
+  return $log =~ m/$pat/;
+}
+
+# Initialize primary node.
+my $node = PostgreSQL::Test::Cluster->new('primary');
+$node->init;
+$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->start;
+
+# Get the session_user to define the user name map test.
+my $session_user =
+  $node->safe_psql('postgres', 'select session_user');
+
+# Create a new user for the user name map test.
+$node->safe_psql('postgres',
+  qq{CREATE USER testmap$session_user});
+
+# Set pg_hba.conf with the peer authentication.
+reset_pg_hba($node, 'peer');
+
+# Check that peer authentication is supported on this platform.
+my $logstart = get_log_size($node);
+
+$node->psql('postgres', undef, connstr  => "user=$session_user");
+
+# If not supported, then skip the rest of the test.
+if (find_in_log($node, qr/peer authentication is not supported on this 
platform/, $logstart))
+{
+plan skip_all => 'peer authentication is not supported on this platform';
+}
+
+# It's supported so let's test peer authentication.
+# This connection should succeed.
+$logstart = get_log_size($node);
+test_role($node, $session_user, 'peer', 0);
+
+ok( find_in_log(
+  $node,
+  qr/connection authenticated: identity="$session_user" method=peer/, 
$logstart),
+  "logfile: connection authenticated for method peer and identity 
$session_user");
+
+# This connection should failed.
+$logstart = get_log_size($node);
+test_role($node, qq{testmap$session_user}, 'peer', 2);
+
+ok( find_in_log(
+  $node,
+  qr/Peer authentication failed for user "testmap$session_user"/, 
$logstart),
+  "logfile: Peer authentication failed for user testmap$session_user");
+
+# Define a user name map.
+$node->append_conf('pg_ident.conf', qq{mypeermap $session_user 
testmap$session_user});
+
+# Set pg_hba.conf with the peer authentication and the user name map.
+reset_pg_hba($node, 'peer map=mypeermap');
+
+# This connection should now succeed.
+$logstart = get_log_size($node);
+test_role($node, qq{testmap$session_user}, 'peer', 0);
+
+ok( find_in_log(
+  $node,
+  qr/connection authenticated: identity="$session_user" method=peer/, 
$logstart),
+  "logfile: connection authenticated for method peer and identity 
$session_user");
+
+ok( 

Re: shared-memory based stats collector - v70

2022-08-19 Thread Drouvot, Bertrand

Hi,

On 8/18/22 9:51 PM, Andres Freund wrote:

Hi,

On 2022-08-18 15:26:31 -0400, Greg Stark wrote:

And indexes of course. It's a bit frustrating since without the
catalog you won't know what table the index actually is for... But
they're pretty important stats.

FWIW, I think we should split relation stats into table and index
stats. Historically it'd have added a lot of complexity to separate the two,
but I don't think that's the case anymore. And we waste space for index stats
by having lots of table specific fields.


It seems to me that we should work on that first then, what do you 
think? (If so I can try to have a look at it).


And once done then resume the work to provide the APIs to get all 
tables/indexes from all the databases.


That way we'll be able to provide one API for the tables and one for the 
indexes (instead of one API for both like my current POC is doing).



On that note though... What do you think about having the capability
to add other stats kinds to the stats infrastructure?


I think that's a good idea and that would be great to have.


Getting closer to that was one of my goals working on the shared memory stats
stuff.



It would make a lot of sense for pg_stat_statements to add its entries here
instead of having to reimplement a lot of the same magic.

Yes, we should move pg_stat_statements over.

It's pretty easy to get massive contention on stats entries with
pg_stat_statements, because it doesn't have support for "batching" updates to
shared stats. And reimplementing the same logic in pg_stat_statements.c
doesn't make sense.

And the set of normalized queries could probably stored in DSA as well - the
file based thing we have right now is problematic.



To do that I guess more of the code needs to be moved to be table
driven from the kind structs either with callbacks or with other meta
data.

Pretty much all of it already is. The only substantial missing bit is
reading/writing of stats files, but that should be pretty easy. And of course
making the callback array extensible.



So the kind record could contain tupledesc and the code to construct the
returned tuple so that these functions could return any custom entry as well
as the standard entries.

I don't see how this would work well - we don't have functions returning
variable kinds of tuples. And what would convert a struct to a tuple?

Nor do I think it's needed - if you have an extension providing a new stats
kind it can also provide accessors.


I think the same (the extension should be able to do that).

I really like the idea of being able to provide new stats kind.

Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com





Re: Patch proposal: New hooks in the connection path

2022-08-16 Thread Drouvot, Bertrand

Hi,

On 8/16/22 10:10 AM, Bharath Rupireddy wrote:

On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand  wrote:

On 8/14/22 7:52 AM, Gurjeet Singh wrote:

On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand  wrote:
I think we can reduce the number of places the hook is called, if we
call the hook from proc_exit(), and at all the other places we simply set
a global variable to signify the reason for the failure. The case of
_exit(1) from the signal-handler cannot use such a mechanism, but I
think all the other cases of interest can simply register one of the
FCET_* values, and let the call from proc_exit() pass that value
to the hook.

That looks like a good idea to me. I'm tempted to rewrite the patch that
way (and addressing the first comment in the same time).

Curious to hear about others hackers thoughts too.

IMO, calling the hook from proc_exit() is not a good design as
proc_exit() is a generic code called from many places in the source
code, even the simple code of kind  if(call_failed_conn_hook) {
falied_conn_hook(params);} can come in the way of many exit code paths
which is undesirable, and the likelihood of introducing new bugs may
increase.


Thanks for the feedback.

What do you think about calling the hook only if the new global variable 
is not equal to its default value (which would mean don't trigger the 
hook)?


Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com





Re: [PATCH] Expose port->authn_id to extensions and triggers

2022-08-12 Thread Drouvot, Bertrand

Hi,

On 8/12/22 12:28 AM, Jacob Champion wrote:

On Wed, Aug 10, 2022 at 10:48 PM Drouvot, Bertrand  wrote:

What do you think about adding a second field in ClientConnectionInfo
for the auth method (as suggested by Michael upthread)?

Sure -- without a followup patch, it's not really tested, though.

v2 adjusts set_authn_id() to copy the auth_method over as well. It
"passes tests" but is otherwise unexercised.


Thank you!

To help with the testing I've just provided a new version (aka 
v2-0004-system_user-implementation.patch) of the SYSTEM_USER patch in 
[1] that can be applied on top of 
"v2-0001-Allow-parallel-workers-to-read-authn_id.patch".


But for this to work, the first comment below on your patch needs to be 
addressed.


Once the first comment is addressed and the new SYSTEM_USER patch 
applied (that adds new tap tests) then we can test the propagation to 
the parallel workers with:


make -C src/test/kerberos check PROVE_TESTS=t/001_auth.pl PROVE_FLAGS=-v

and

make -C src/test/authentication check PROVE_TESTS=t/001_password.pl 
PROVE_FLAGS=-v


Both are currently successful.

Regarding the comments on 
v2-0001-Allow-parallel-workers-to-read-authn_id.patch:


1)

This is the one to be applied before adding the new SYSTEM_USER one on top:

+typedef struct
+{
+   /*
+    * Authenticated identity.  The meaning of this identifier is 
dependent on


has to be replaced by:

+typedef struct ClientConnectionInfo
+{
+   /*
+    * Authenticated identity.  The meaning of this identifier is 
dependent on


2)

+    * Authenticated identity.  The meaning of this identifier is 
dependent on


There is one extra space before "The"

3)

+SerializeClientConnectionInfo(Size maxsize, char *start_address)
+{
+   /*
+    * First byte is an indication of whether or not authn_id has 
been set to

+    * non-NULL, to differentiate that case from the empty string.
+    */

is authn_id being an empty string possible?

4)

+ */
+
+ClientConnectionInfo MyClientConnectionInfo;
+
+/*
+ * Calculate the space needed to serialize MyClientConnectionInfo.
+ */
+Size
+EstimateClientConnectionInfoSpace(void)

From a coding style point of view, shouldn't "ClientConnectionInfo 
MyClientConnectionInfo;" be moved to the top of the file?



[1]: https://commitfest.postgresql.org/39/3703/

Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com





Re: shared-memory based stats collector - v70

2022-08-11 Thread Drouvot, Bertrand

Hi,

On 8/10/22 11:25 PM, Greg Stark wrote:

On Wed, 10 Aug 2022 at 04:05, Drouvot, Bertrand  wrote:

Hi,

On 8/9/22 6:00 PM, Greg Stark wrote:

On Tue, 9 Aug 2022 at 06:19, Drouvot, Bertrand  wrote:

What do you think about adding a function in core PG to provide such
functionality? (means being able to retrieve all the stats (+ eventually
add some filtering) without the need to connect to each database).

I'm working on it myself too. I'll post a patch for discussion in a bit.

Great! Thank you!

So I was adding the code to pgstat.c because I had thought there were
some data types I needed and/or static functions I needed. However you
and Andres encouraged me to check again now. And indeed I was able,
after fixing a couple things, to make the code work entirely
externally.


Nice!

Though I still think to have an SQL API in core could be useful to.

As Andres was not -1 about that idea (as it should be low cost to add 
and maintain) as long as somebody cares enough to write something: then 
I'll give it a try and submit a patch for it.




This is definitely not polished and there's a couple obvious things
missing. But at the risk of embarrassment I've attached my WIP. Please
be gentle :) I'll post the github link in a bit when I've fixed up
some meta info.


Thanks! I will have a look at it on github (once you share the link).

Regards,

--
Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com





Re: Patch proposal: New hooks in the connection path

2022-07-06 Thread Drouvot, Bertrand

Hi,

On 7/6/22 12:11 AM, Joe Conway wrote:


On 7/5/22 03:37, Bharath Rupireddy wrote:
On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand 
 wrote:

On 7/2/22 1:00 AM, Nathan Bossart wrote:
> Could we model this after fmgr_hook?  The first argument in that hook
> indicates where it is being called from.  This doesn't alleviate 
the need
> for several calls to the hook in the authentication logic, but 
extension

> authors would only need to define one hook.

I like the idea and indeed fmgr.h looks a good place to model it.

Attached a new patch version doing so.


I was thinking along the same lines, so +1 for the general approach


Thanks for the review!




Thanks for the patch. Can we think of enhancing
ClientAuthentication_hook_type itself i.e. make it a generic hook for
all sorts of authentication metrics, info etc. with the type parameter
embedded to it instead of new hook FailedConnection_hook?We can either
add a new parameter for the "event" (the existing
ClientAuthentication_hook_type implementers will have problems), or
embed/multiplex the "event" info to existing Port structure or status
variable (macro or enum) (existing implementers will not have
compatibility problems).  IMO, this looks cleaner going forward.


Not sure I like this though -- I'll have to think about that


Not sure about this one neither.

The "enhanced" ClientAuthentication_hook will have to be fired at the 
same places as the new FailedConnection_hook is, but i think those 
places are not necessary linked to real authentication per say (making 
the name confusing).





On the v2 patch:

1. Why do we need to place the hook and structures in fmgr.h? Why not 
in auth.h?


agreed -- it does not belong in fmgr.h


Moved to auth.h.




2. Timeout Handler is a signal handler, called as part of SIGALRM
signal handler, most of the times, signal handlers ought to be doing
small things, now that we are handing off the control to hook, which
can do any long running work (writing to a remote storage, file,
aggregate etc.), I don't think it's the right thing to do here.
  static void
  StartupPacketTimeoutHandler(void)
  {
+ if (FailedConnection_hook)
+ (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort);
+ ereport(COMMERROR,
+ (errcode(ERRCODE_PROTOCOL_VIOLATION),
+ errmsg("timeout while processing startup packet")));


Why add the ereport()?


removed it.



But more to Bharath's point, perhaps this is a case that is better
served by incrementing a stat counter and not exposed as a hook?


I think that the advantage of the hook is that it gives the extension 
author the ability/flexibility to aggregate the counter based on 
information available in the Port Struct (say the client addr for 
example) at this stage.


What about to aggregate the stat counter based on the client addr? (Not 
sure if there is more useful information (than the client addr) at this 
stage though)


That said, i agree that having a hook in a time out handler might not be 
the right thing to do (even if at the end that would be to the extension 
author responsibility to do "small things" in it), so it has been 
removed in the new attached version.




Also, a teeny nit:
8<--
+   if (status != STATUS_OK) {
+   if (FailedConnection_hook)
8<--

does not follow usual practice and probably should be:

8<--
+   if (status != STATUS_OK)
+   {
+   if (FailedConnection_hook)
8<--



Thanks!, fixed.

--

Bertrand Drouvot
Amazon Web Services: https://aws.amazon.com
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index d7257e4056..d9e1e3b4c1 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -567,6 +567,8 @@ int postmaster_alive_fds[2] = {-1, -1};
 HANDLE PostmasterHandle;
 #endif
 
+FailedConnection_hook_type FailedConnection_hook = NULL;
+
 /*
  * Postmaster main entry point
  */
@@ -4462,7 +4464,11 @@ BackendInitialize(Port *port)
 * already did any appropriate error reporting.
 */
if (status != STATUS_OK)
+   {
+   if (FailedConnection_hook)
+   (*FailedConnection_hook) (FCET_BAD_STARTUP_PACKET, 
port);
proc_exit(0);
+   }
 
/*
 * Now that we have the user and database name, we can set the process
diff --git a/src/backend/utils/init/postinit.c 
b/src/backend/utils/init/postinit.c
index 6b9082604f..e9bbd185f4 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -360,10 +360,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool 
override_allow_connect
if (!am_superuser &&
pg_database_aclcheck(MyDatabaseId, GetUserId(),
 ACL_CONNECT) 
!

Patch proposal: New hooks in the connection path

2022-06-30 Thread Drouvot, Bertrand

Hi hackers,

While commit 960869da08 added some information about connections that 
have been successfully authenticated, there is no metrics for 
connections that have not (or did not reached the authentication stage).


Adding metrics about failed connections attempts could also help, for 
example with proper sampling, to:


 * detect spikes in failed login attempts
 * check if there is a correlation between spikes in successful and
   failed connection attempts

While the number of successful connections could also already been 
tracked with the ClientAuthentication_hook (and also the ones that 
failed the authentication) we are missing metrics about:


 * why the connection failed (could be bad password, bad database, bad
   user, missing CONNECT privilege...)
 * number of times the authentication stage has not been reached
 * why the authentication stage has not been reached (bad startup
   packets, timeout while processing startup packet,...)

Those missing metrics (in addition to the ones that can be already 
gathered) could provide value for:


 * security investigations
 * anomalies detections
 * tracking application misconfigurations

In an attempt to be able to provide those metrics, please find attached 
a patch proposal to add new hooks in the connection path, that would be 
fired if:


 * there is a bad startup packet
 * there is a timeout while processing the startup packet
 * user does not have CONNECT privilege
 * database does not exist

For safety those hooks request the use of a const Port parameter, so 
that they could be used only for reporting purpose (for example, we are 
working on an extension to record detailed login metrics counters).


Another option could be to add those metrics in the engine itself 
(instead of providing new hooks to get them), but the new hooks option 
gives more flexibility on how to render and exploit them (there is a lot 
of information in the Port Struct that one could be interested with).


I’m adding this patch proposal to the commitfest.
Looking forward to your feedback,

Regards,
Bertrand
diff --git a/src/backend/postmaster/postmaster.c 
b/src/backend/postmaster/postmaster.c
index dde4bc25b1..8e00327a5d 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -567,6 +567,9 @@ int postmaster_alive_fds[2] = {-1, -1};
 HANDLE PostmasterHandle;
 #endif
 
+StartupPacketTimeout_hook_type StartupPacketTimeout_hook = NULL;
+BadConnPacket_hook_type BadConnPacket_hook = NULL;
+
 /*
  * Postmaster main entry point
  */
@@ -4462,8 +4465,11 @@ BackendInitialize(Port *port)
 * Stop here if it was bad or a cancel packet.  ProcessStartupPacket
 * already did any appropriate error reporting.
 */
-   if (status != STATUS_OK)
+   if (status != STATUS_OK) {
+   if (BadConnPacket_hook)
+   (*BadConnPacket_hook) (port);
proc_exit(0);
+   }
 
/*
 * Now that we have the user and database name, we can set the process
@@ -5323,6 +5329,11 @@ dummy_handler(SIGNAL_ARGS)
 static void
 StartupPacketTimeoutHandler(void)
 {
+   if (StartupPacketTimeout_hook)
+   (*StartupPacketTimeout_hook) (MyProcPort);
+   ereport(COMMERROR,
+   (errcode(ERRCODE_PROTOCOL_VIOLATION),
+errmsg("timeout while processing startup packet")));
_exit(1);
 }
 
diff --git a/src/backend/utils/init/postinit.c 
b/src/backend/utils/init/postinit.c
index 6b9082604f..562ed331bf 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -67,6 +67,14 @@
 #include "utils/syscache.h"
 #include "utils/timeout.h"
 
+/*
+ * Hooks to be used when a connection has been refused in case of bad
+ * database name, bad database oid or bad permissions.
+ */
+BadDb_hook_type baddbname_hook = NULL;
+BadDb_hook_type baddboid_hook = NULL;
+BadDb_hook_type baddbperm_hook = NULL;
+
 static HeapTuple GetDatabaseTuple(const char *dbname);
 static HeapTuple GetDatabaseTupleByOid(Oid dboid);
 static void PerformAuthentication(Port *port);
@@ -360,10 +368,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool 
override_allow_connect
if (!am_superuser &&
pg_database_aclcheck(MyDatabaseId, GetUserId(),
 ACL_CONNECT) 
!= ACLCHECK_OK)
+   {
+   if (baddbperm_hook)
+   (*baddbperm_hook) (MyProcPort);
ereport(FATAL,

(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
 errmsg("permission denied for database 
\"%s\"", name),
 errdetail("User does not have CONNECT 
privilege.")));
+   }
 
/*
 * Check connection limit for this database.
@@ 

SYSTEM_USER reserved word implementation

2022-06-22 Thread Drouvot, Bertrand

Hi hackers,

The SYSTEM_USER is a sql reserved word as mentioned in [1] and is 
currently not implemented.


Please find attached a patch proposal to make use of the SYSTEM_USER so 
that it returns the authenticated identity (if any) (aka authn_id in the 
Port struct).


Indeed in some circumstances, the authenticated identity is not the 
SESSION_USER and then the information is lost from the connection point 
of view (it could still be retrieved thanks to commit 9afffcb833 and 
log_connections set to on).


_Example 1, using the gss authentification._

Say we have this entry in pg_hba.conf:

host all all 0.0.0.0/0 gss map=mygssmap

and the related mapping in pg_ident.conf

mygssmap   /^(.*@.*)\.LOCAL$    mary

Then, connecting with a valid Kerberos Ticket that contains 
“bertrand@BDTFOREST.LOCAL” as the default principal that way: psql -U 
mary -h myhostname -d postgres,


we will get:

postgres=> select current_user, session_user;
 current_user | session_user
--+--
 mary | mary
(1 row)

While the SYSTEM_USER would produce the Kerberos principal:

postgres=> select system_user;
   system_user
--
bertrand@BDTFOREST.LOCAL
(1 row)

_Example 2, using the peer authentification._

Say we have this entry in pg_hba.conf:

local all john peer map=mypeermap

and the related mapping in pg_ident.conf

mypeermap postgres john

Then connected localy as the system user postgres and connecting to the 
database that way: psql -U john -d postgres, we will get:


postgres=> select current_user, session_user;
 current_user | session_user
--+--
 john | john
(1 row)

While the SYSTEM_USER would produce the system user that requested the 
connection:


postgres=> select system_user;
 system_user
-
 postgres
(1 row)

Thanks to those examples we have seen some situations where the 
information related to the authenticated identity has been lost from the 
connection point of view (means not visible in the current_session or in 
the session_user).


The purpose of this patch is to make it visible through the SYSTEM_USER 
sql reserved word.


_Remarks: _

- In case port->authn_id is NULL then the patch is returning the 
SESSION_USER for the SYSTEM_USER. Perhaps it should return NULL instead.


- There is another thread [2] to expose port->authn_id to extensions and 
triggers thanks to a new API. This thread [2] leads to discussions about 
providing this information to the parallel workers too. While the new 
MyClientConnectionInfo being discussed in [2] could be useful to hold 
the client information that needs to be shared between the backend and 
any parallel workers, it does not seem to be needed in the case 
port->authn_id is exposed through SYSTEM_USER (like it is not for 
CURRENT_USER and SESSION_USER).


I will add this patch to the next commitfest.
I look forward to your feedback.

Bertrand

[1]: https://www.postgresql.org/docs/current/sql-keywords-appendix.html
[2]: 
https://www.postgresql.org/message-id/flat/793d990837ae5c06a558d58d62de9378ab525d83.camel%40vmware.com



diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 478a216dbb..07bb333a3b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -23660,6 +23660,20 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS 
t(ls,n);

   
 
+  
+   
+
+ system_user
+
+system_user
+name
+   
+   
+Returns the identity (if any) that the user presented during the
+authentication cycle, before they were assigned a database role.
+   
+  
+
   

 
diff --git a/src/backend/executor/execExprInterp.c 
b/src/backend/executor/execExprInterp.c
index e44ad68cda..63cf3d8bb7 100644
--- a/src/backend/executor/execExprInterp.c
+++ b/src/backend/executor/execExprInterp.c
@@ -2543,6 +2543,11 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep 
*op)
*op->resvalue = session_user(fcinfo);
*op->resnull = fcinfo->isnull;
break;
+   case SVFOP_SYSTEM_USER:
+   InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
+   *op->resvalue = system_user(fcinfo);
+   *op->resnull = fcinfo->isnull;
+   break;
case SVFOP_CURRENT_CATALOG:
InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, 
NULL, NULL);
*op->resvalue = current_database(fcinfo);
diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 969c9c158f..29dcc77724 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y
@@ -842,7 +842,7 @@ static Node *makeRecursiveViewSelect(char *relname, List 
*aliases, Node *query);
SEQUENCE SEQUENCES SERIALIZABLE SERVER SESSION SESSION_USER SET SETS 
SETOF
SHARE SHOW SIMILAR SIMPLE 

Re: logical decoding bug: segfault in ReorderBufferToastReplace()

2019-12-11 Thread Drouvot, Bertrand
On 12/9/19, 10:10 AM, "Tomas Vondra"  wrote:
>On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote:
>>On 9/8/19 14:01, Tom Lane wrote:
>>> Fix RelationIdGetRelation calls that weren't bothering with error 
checks.
>>>
>>> ...
>>>
>>> Details
>>> ---
>>> 
https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56
>>
>>We had two different databases this week (with the same schema) both
>>independently hit the condition of this recent commit from Tom. It's on
>>11.5 so we're actually segfaulting and restarting rather than just
>>causing the walsender process to ERROR, but regardless there's still
>>some underlying bug here.
>>
>>We have core files and we're still working to see if we can figure out
>>what's going on, but I thought I'd report now in case anyone has extra
>>ideas or suggestions.  The segfault is on line 3034 of reorderbuffer.c.
>>

>>https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034
>>
>>3033 toast_rel = 
RelationIdGetRelation(relation->rd_rel->reltoastrelid);
>>3034 toast_desc = RelationGetDescr(toast_rel);
>>
>>We'll keep looking; let me know any feedback! Would love to track down
>>whatever bug is in the logical decoding code, if that's what it is.
>>
>>==
>>
>>backtrace showing the call stack...
>>
>>Core was generated by `postgres: walsender 
>>(31712)'.
>>Program terminated with signal 11, Segmentation fault.
>>#0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>>at reorderbuffer.c:3034
>>3034reorderbuffer.c: No such file or directory.
>>...
>>(gdb) #0  ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78,
>>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938)
>>at reorderbuffer.c:3034
>>#1  ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809,
>>commit_lsn=9430473346032, end_lsn=,
>>commit_time=commit_time@entry=628712466364268,
>>origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at
>>reorderbuffer.c:1584
>>#2  0x00716248 in DecodeCommit (xid=1358809,
>>parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637
>>#3  DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at
>>decode.c:245
>>#4  0x0071655a in LogicalDecodingProcessRecord (ctx=0x3068f70,
>>record=0x3069208) at decode.c:117
>>#5  0x00727150 in XLogSendLogical () at walsender.c:2886
>>#6  0x00729192 in WalSndLoop (send_data=send_data@entry=0x7270f0
>>) at walsender.c:2249
>>#7  0x00729f91 in StartLogicalReplication (cmd=0x30485a0) at
>>walsender.c:
>>#8  exec_replication_command (
>>cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT
>>\"\" LOGICAL 893/38002B98 (proto_version '1',
>>publication_names '\"\"')") at walsender.c:1628
>>#9  0x0076e939 in PostgresMain (argc=,
>>argv=argv@entry=0x2fea168, dbname=0x2fea020 "",
>>username=) at postgres.c:4182
>>#10 0x004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410
>>#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082
>>#12 ServerLoop () at postmaster.c:1759
>>#13 0x007062f9 in PostmasterMain (argc=argc@entry=7,
>>argv=argv@entry=0x2f92540) at postmaster.c:1432
>>#14 0x004be73b in main (argc=7, argv=0x2f92540) at main.c:228
>>
>>==
>>
>>Some additional context...
>>
>># select * from pg_publication_rel;
>> prpubid | prrelid
>>-+-
>>   71417 |   16453
>>   71417 |   54949
>>(2 rows)
>>
>>(gdb) print toast_rel
>>$4 = (struct RelationData *) 0x0
>>
>>(gdb) print *relation->rd_rel
>>$11 = {relname = {data = "", '\000' },
>>relnamespace = 16402, reltype = 16430, reloftype = 0,
>>relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0,
>>relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0,

>Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST
>relation. Yet we're calling ReorderBufferToastReplace on the decoded
>record ... interesting.
>
>Can you share structure of the relation causing the issue?

   Here it is:

\d+ rel_having_issue
 Table 
"public.rel_having_issue"
 Column |   Type   | Collation | Nullable | 
Default | Storage  | Stats target | Description
+--+---+--+-+--+--+-
 id | integer  

<    1   2   3   4   5