Re: Add regular expression testing for user name mapping in the peer authentication TAP test
Hi, On 10/17/22 4:07 AM, Michael Paquier wrote: On Sat, Oct 15, 2022 at 07:54:30AM +0200, Drouvot, Bertrand wrote: Right. Giving a second thought to the non matching case, I think I'd prefer to concatenate the system_user to the system_user instead. This is what v2 does. Fine by me, so applied v2. Thanks! Thanks! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Add regular expression testing for user name mapping in the peer authentication TAP test
Hi, On 10/15/22 5:11 AM, Michael Paquier wrote: On Fri, Oct 14, 2022 at 06:31:15PM +0200, Drouvot, Bertrand wrote: while working on [1], I thought it could also be useful to add regular expression testing for user name mapping in the peer authentication TAP test. Good idea now that we have a bit more coverage in the authentication tests. Thanks for looking at it! +# Test with regular expression in user name map. +my $last_system_user_char = substr($system_user, -1); This would attach to the regex the last character of the system user. Right. I would perhaps have used more characters than that (-3?), as substr() with a negative number larger than the string given in input would give the entire string. That's a nit, though. I don't have a strong opinion on this, so let's extract the last 3 characters. This is what v2 attached does. +# The regular expression does not match. +reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser'); This matches only an empty string, my brain gets that right? Right. Giving a second thought to the non matching case, I think I'd prefer to concatenate the system_user to the system_user instead. This is what v2 does. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl index fc951dea06..e719b05d0f 100644 --- a/src/test/authentication/t/003_peer.pl +++ b/src/test/authentication/t/003_peer.pl @@ -23,18 +23,34 @@ sub reset_pg_hba return; } +# Delete pg_ident.conf from the given node, add a new entry to it +# and then execute a reload to refresh it. +sub reset_pg_ident +{ + my $node= shift; + my $map_name= shift; + my $system_user = shift; + my $pg_user = shift; + + unlink($node->data_dir . '/pg_ident.conf'); + $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user"); + $node->reload; + return; +} + # Test access for a single role, useful to wrap all tests into one. sub test_role { local $Test::Builder::Level = $Test::Builder::Level + 1; - my ($node, $role, $method, $expected_res, %params) = @_; + my ($node, $role, $method, $expected_res, $test_details, %params) = @_; my $status_string = 'failed'; $status_string = 'success' if ($expected_res eq 0); my $connstr = "user=$role"; my $testname = - "authentication $status_string for method $method, role $role"; + "authentication $status_string for method $method, role $role " + . $test_details; if ($expected_res eq 0) { @@ -87,16 +103,50 @@ my $system_user = # Tests without the user name map. # Failure as connection is attempted with a database role not mapping # to an authorized system user. -test_role($node, qq{testmapuser}, 'peer', 2, +test_role( + $node, qq{testmapuser}, 'peer', 2, + 'without user name map', log_like => [qr/Peer authentication failed for user "testmapuser"/]); # Tests with a user name map. -$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser}); +reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser'); reset_pg_hba($node, 'peer map=mypeermap'); # Success as the database role matches with the system user in the map. -test_role($node, qq{testmapuser}, 'peer', 0, +test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map', log_like => [qr/connection authenticated: identity="$system_user" method=peer/]); +# Test with regular expression in user name map. +# Extract the last 3 characters from the system_user +# or the entire system_user (if its length is <= -3). +my $regex_test_string = substr($system_user, -3); + +# The regular expression matches. +reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$}, + 'testmapuser'); +test_role( + $node, + qq{testmapuser}, + 'peer', + 0, + 'with regular expression in user name map', + log_like => + [qr/connection authenticated: identity="$system_user" method=peer/]); + + +# Concatenate system_user to system_user. +$regex_test_string = $system_user . $system_user; + +# The regular expression does not match. +reset_pg_ident($node, 'mypeermap', qq{/^.*$regex_test_string\$}, + 'testmapuser'); +test_role( + $node, + qq{testmapuser}, + 'peer', + 2, + 'with regular expression in user name map', + log_like => [qr/no match in usermap "mypeermap" for user "testmapuser"/]); + done_testing();
Add regular expression testing for user name mapping in the peer authentication TAP test
Hi hackers, while working on [1], I thought it could also be useful to add regular expression testing for user name mapping in the peer authentication TAP test. This kind of test already exists in kerberos/t/001_auth.pl but the proposed one in the peer authentication testing would probably be more widely tested. Please find attached a patch proposal to do so. [1]: https://www.postgresql.org/message-id/4f55303e-62c1-1072-61db-fbfb30bd66c8%40gmail.com Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl index fc951dea06..4e2efbe5e3 100644 --- a/src/test/authentication/t/003_peer.pl +++ b/src/test/authentication/t/003_peer.pl @@ -23,18 +23,34 @@ sub reset_pg_hba return; } +# Delete pg_ident.conf from the given node, add a new entry to it +# and then execute a reload to refresh it. +sub reset_pg_ident +{ + my $node= shift; + my $map_name= shift; + my $system_user = shift; + my $pg_user = shift; + + unlink($node->data_dir . '/pg_ident.conf'); + $node->append_conf('pg_ident.conf', "$map_name $system_user $pg_user"); + $node->reload; + return; +} + # Test access for a single role, useful to wrap all tests into one. sub test_role { local $Test::Builder::Level = $Test::Builder::Level + 1; - my ($node, $role, $method, $expected_res, %params) = @_; + my ($node, $role, $method, $expected_res, $test_details, %params) = @_; my $status_string = 'failed'; $status_string = 'success' if ($expected_res eq 0); my $connstr = "user=$role"; my $testname = - "authentication $status_string for method $method, role $role"; + "authentication $status_string for method $method, role $role " + . $test_details; if ($expected_res eq 0) { @@ -87,16 +103,43 @@ my $system_user = # Tests without the user name map. # Failure as connection is attempted with a database role not mapping # to an authorized system user. -test_role($node, qq{testmapuser}, 'peer', 2, +test_role( + $node, qq{testmapuser}, 'peer', 2, + 'without user name map', log_like => [qr/Peer authentication failed for user "testmapuser"/]); # Tests with a user name map. -$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser}); +reset_pg_ident($node, 'mypeermap', $system_user, 'testmapuser'); reset_pg_hba($node, 'peer map=mypeermap'); # Success as the database role matches with the system user in the map. -test_role($node, qq{testmapuser}, 'peer', 0, +test_role($node, qq{testmapuser}, 'peer', 0, 'with user name map', log_like => [qr/connection authenticated: identity="$system_user" method=peer/]); +# Test with regular expression in user name map. +my $last_system_user_char = substr($system_user, -1); + +# The regular expression matches. +reset_pg_ident($node, 'mypeermap', qq{/^.*$last_system_user_char\$}, + 'testmapuser'); +test_role( + $node, + qq{testmapuser}, + 'peer', + 0, + 'with regular expression in user name map', + log_like => + [qr/connection authenticated: identity="$system_user" method=peer/]); + +# The regular expression does not match. +reset_pg_ident($node, 'mypeermap', '/^$', 'testmapuser'); +test_role( + $node, + qq{testmapuser}, + 'peer', + 2, + 'with regular expression in user name map', + log_like => [qr/no match in usermap "mypeermap" for user "testmapuser"/]); + done_testing();
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/14/22 7:30 AM, Michael Paquier wrote: On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote: Indeed, ;-) So, I have spent the last two days looking at all that, studying the structure of the patch and the existing HEAD code, Thanks! The code could be split to tackle things step-by-step: - One refactoring patch to introduce token_regcomp() and token_regexec(), with the introduction of a new structure that includes the compiled regexes. (Feel free to counterargue about the use of AuthToken for this purpose, of course!) - Plug in the refactored logic for the lists of role names and database names in pg_hba.conf. - Handle the case of single host entries in pg_hba.conf. -- I agree to work step-by-step. While looking at it again now, I discovered that the new TAP test for the regexp on the hostname in ssl/002_scram.pl is failing on some of my tests environment (and not all..). So, I agree with the dedicated steps you are proposing and that the "host case" needs a dedicated attention. I'm not ignoring all the remarks you've just done up-thread, I'll address them and/or provide my feedback on them when I'll come back with the step-by-step sub patches. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/14/22 8:18 AM, Michael Paquier wrote: On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote: First, as of HEAD, AuthToken is only used for elements in a list of role and database names in hba.conf before filling in each HbaLine, hence we limit its usage to the initial parsing. The patch assigns an optional regex_t to it, then extends the use of AuthToken for single hostname entries in pg_hba.conf. Things going first: shouldn't we combine ident_user and "re" together in the same structure? Even if we finish by not using AuthToken to store the computed regex, it seems to me that we'd better use the same base structure for pg_ident.conf and pg_hba.conf. While looking closely at the patch, we would expand the use of AuthToken outside its original context. I have also looked at make_auth_token(), and wondered if it could be possible to have this routine compile the regexes. This approach would not stick with pg_ident.conf though, as we validate the fields in each line when we put our hands on ident_user and after the base validation of a line (number of fields, etc.). So with all that in mind, it feels right to not use AuthToken at all when building each HbaLine and each IdentLine, but a new, separate, structure. We could call that an AuthItem (string, its compiled regex) perhaps? It could have its own make() routine, taking in input an AuthToken and process pg_regcomp(). Better ideas for this new structure would be welcome, and the idea is that we'd store the post-parsing state of an AuthToken to something that has a compiled regex. We could finish by using AuthToken at the end and expand its use, but it does not feel completely right either to have a make() routine but not be able to compile its regular expression when creating the AuthToken. I have have sent this part too quickly. As AuthTokens are used in check_db() and check_role() when matching entries, it is more intuitive to store the regex_t directly in it. Yeah, I also think this is the right place for it. Changing IdentLine to use a AuthToken makes the "quoted" part useless in this case, still it could be used in Assert()s to make sure that the data is shaped as expected at check-time, enforced at false when creating it in parse_ident_line()? I agree, that makes sense. I'll work on that. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/11/22 8:29 AM, Michael Paquier wrote: On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote: foreach(cell, tokens) { [...] + tokreg = lfirst(cell); + if (!token_is_regexp(tokreg)) { - if (strcmp(dbname, role) == 0) + if (am_walsender && !am_db_walsender) + { + /* +* physical replication walsender connections can only match +* replication keyword +*/ + if (token_is_keyword(tokreg->authtoken, "replication")) + return true; + } + else if (token_is_keyword(tokreg->authtoken, "all")) return true; When checking the list of databases in check_db(), physical WAL senders (aka am_walsender && !am_db_walsender) would be able to accept regexps, but these should only accept "replication" and never a regexp, no? Oh right, good catch, thanks! Please find attached v6 fixing it. This is kind of special in the HBA logic, coming back to 9.0 where physical replication and this special role property have been introduced. WAL senders have gained an actual database property later on in 9.4 with logical decoding, keeping "replication" for compatibility (connection strings can use replication=database to connect as a non-physical WAL sender and connect to a specific database). Thanks for the explanation! +typedef struct AuthToken +{ + char *string; + boolquoted; +} AuthToken; + +/* + * Distinguish the case a token has to be treated as a regular + * expression or not. + */ +typedef struct AuthTokenOrRegex +{ + boolis_regex; + + /* +* Not an union as we still need the token string for fill_hba_line(). +*/ + AuthToken *authtoken; + regex_t*regex; +} AuthTokenOrRegex; Hmm. With is_regex to check if a regex_t exists, both structures may not be necessary. Agree that both struct are not necessary. In v6, AuthTokenOrRegex has been removed and the regex has been moved to AuthToken. There is no is_regex bool anymore, as it's enough to test whether regex is NULL or not. I have not put my hands on that directly, but if I guess that I would shape things to have only AuthToken with (enforcing regex_t in priority if set in the list of elements to check for a match): - the string - quoted - regex_t A list member should never have (regex_t != NULL && quoted), right? The patch does allow that. For example it happens for the test where we add a comma in the role name. As we don't rely on a dedicated char to mark the end of a reg exp (we only rely on / to mark its start) then allowing (regex_t != NULL && quoted) seems reasonable to me. +# test with a comma in the regular expression +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password'); +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username', + 0); So, we check here that the role includes "5," in its name. This is getting fun to parse ;) Indeed, ;-) elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/) { - plan skip_all => 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; + plan skip_all => + 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA'; } Unrelated noise from perltidy. Right. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..406628ef35 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -235,8 +235,9 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names can be specified by preceding the file name with @. @@ -249,7 +250,8 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by / + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members @@ -258,7 +260,8 @@ hostnogssenc database user/ +
Re: Query Jumbling for CALL and SET utility statements
Hi, On 10/7/22 6:13 AM, Michael Paquier wrote: On Thu, Oct 06, 2022 at 11:51:52PM -0400, Tom Lane wrote: I've been thinking since the beginning of this thread that there was no coherent, defensible rationale being offered for jumbling some utility statements and not others. Yeah. The potential performance impact of all the TransactionStmts worries me a bit, though. I wonder if the answer is to jumble them all. We avoided that up to now because it would imply a ton of manual effort and future code maintenance ... but now that the backend/nodes/ infrastructure is largely auto-generated, could we auto-generate the jumbling code? I think that's a good idea. Probably. One part that may be tricky though is the location of the constants we'd like to make generic, but perhaps this could be handled by using a dedicated variable type that just maps to int? It looks to me that we'd also need to teach the perl parser which fields per statements struct need to be jumbled (or more probably which ones to exclude from the jumbling, for example funccall in CallStmt). Not sure yet how to teach the perl parser, but I'll build this list first to help decide for a right approach, unless you already have some thoughts about it? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/6/22 2:53 AM, Michael Paquier wrote: On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote: On 10/5/22 9:24 AM, Michael Paquier wrote: - test_role() -> test_conn() to be able to pass down a database name. - reset_pg_hba() to control the host, db and user parts. The host part does not really apply after moving the hosts checks to a more secure location, so I guess that this had better be extended just for the user and database, keeping host=local all the time. I am planning to apply 0001 attached independently, 0001 looks good to me. Thanks. I have applied this refactoring, leaving the host part out of the equation as we should rely only on local connections for this part of the test. Thanks! Thanks! I'll look at it and the comments you just made up-thread. Cool, thanks. One thing that matters a lot IMO (that I forgot to mention previously) is to preserve the order of the items parsed from the configuration files. Fully agree, all the versions that have been submitted in this thread preserves the ordering. Also, I am wondering whether we'd better have some regression tests where a regex includes a comma and a role name itself has a comma, actually, just to stress more the parsing of individual items in the HBA file. Good idea, it has been added in v5 just shared up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/5/22 9:24 AM, Michael Paquier wrote: On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote: Anyway, I have looked at the patch. + List *roles_re; + List *databases_re; + regex_thostname_re; I am surprised by the approach of using separate lists for the regular expressions and the raw names. Wouldn't it be better to store everything in a single list but assign an entry type? In this case it would be either regex or plain string. This would minimize the footprint of the changes (no extra arguments *_re in the routines checking for a match on the roles, databases or hosts). And it seems to me that this would make unnecessary the use of re_num here and there. Please find attached v5 addressing this. I started with an union but it turns out that we still need the plain string when a regex is used. This is not needed for the authentication per say but for fill_hba_line(). So I ended up creating a new struct without union in v5. The hostname is different, of course, requiring only an extra field for its type, or something like that. I'm using the same new struct as described above for the hostname. Perhaps the documentation would gain in clarity if there were more examples, like a set of comma-separated examples (mix of regex and raw strings for example, for all the field types that gain support for regexes)? Right, I added more examples in v5. -$node->append_conf('postgresql.conf', "log_connections = on\n"); +$node->append_conf( +'postgresql.conf', qq{ +listen_addresses = '127.0.0.1' +log_connections = on +}); Hmm. I think that we may need to reconsider the location of the tests for the regexes with the host name, as the "safe" regression tests should not switch listen_addresses. One location where we already do that is src/test/ssl/, so these could be moved there. Good point, I moved the hostname related tests in src/test/ssl. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..406628ef35 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -235,8 +235,9 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names can be specified by preceding the file name with @. @@ -249,7 +250,8 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by / + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members @@ -258,7 +260,8 @@ hostnogssenc database user/ + can be supplied by separating them with commas. A separate file containing user names can be specified by preceding the file name with @. @@ -270,8 +273,9 @@ hostnogssenc database user Specifies the client machine address(es) that this record - matches. This field can contain either a host name, an IP - address range, or one of the special key words mentioned below. + matches. This field can contain either a host name, a regular expression + preceded by / representing host names, an IP address range, + or one of the special key words mentioned below. @@ -739,6 +743,24 @@ hostall all ::1/128 trust # TYPE DATABASEUSERADDRESS METHOD hostall all localhost trust +# The same using a regular expression for host name, which allows connection for +# host name ending with "test". +# +# TYPE DATABASEUSERADDRESS METHOD +hostall all /^.*test$ trust + +# The same using regular expression for DATABASE, which allows connection to the +# db1 and testdb databases and any database with a name ending with "test". +# +# TYPE DATABASE USERADDRESS METHOD +local db1,/^.*test$,testdb all /^.*test$ trust + +# The same using regular expression for USER, which allows connection to the +# user1 and testuser users and any user with a name ending with "test". +# +# TYPE DATABASE
Re: Record SET session in VariableSetStmt
On 10/6/22 2:28 PM, Julien Rouhaud wrote: On Thu, Oct 06, 2022 at 02:19:32PM +0200, Drouvot, Bertrand wrote: On 10/6/22 1:18 PM, Julien Rouhaud wrote: so nothing should rely on how exactly someone spelled it. This is also the case for our core jumbling code, where we guarantee (or at least try to) that two semantically identical statements will get the same queryid, and therefore don't distinguish eg. LIKE vs ~~. Agree, but on the other hand currently SET and SET SESSION are recorded with distinct queryid: postgres=# select calls, query, queryid from pg_stat_statements; calls |query| queryid ---+-+-- 2 | select calls, query from pg_stat_statements | -6345508659980235519 1 | set session enable_seqscan=1| -3921418831612111986 1 | create extension pg_stat_statements | -1739183385080879393 1 | set enable_seqscan=1| 7925920505912025406 (4 rows) and this behavior would change with the Jumbling work in progress in [1] (mentioned up-thread) if we don't record "SET SESSION". I think that would make sense to keep the same behavior, what do you think? It's because until now jumbling of utility statements was just hashing the query text, which is quite terrible. This was also implying getting different queryids for things like this: =# select query, queryid from pg_stat_statements where query like '%work_mem%';; query | queryid ---+-- SeT work_mem = 123465 | -1114638544275583196 Set work_mem = 123465 | -1966597613643458788 SET work_mem = 123465 | 4161441071081149574 seT work_mem = 123465 | 8327271737593275474 (4 rows) If we move to a real jumbling of VariableSetStmt, we should keep the rules consistent with the rest of the jumble code and ignore an explicit "SESSION" in the original command. Understood, so I agree that it makes sense to keep the jumbling behavior consistent and so keep the same queryid for statements that are semantically identical. Thanks for your feedback! Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Record SET session in VariableSetStmt
Hi, On 10/6/22 1:18 PM, Julien Rouhaud wrote: Hi, On Thu, Oct 06, 2022 at 12:57:17PM +0200, Drouvot, Bertrand wrote: Hi hackers, "SET local" is currently recorded in VariableSetStmt (with the boolean is_local) but "SET session" is not. Please find attached a patch proposal to also record "SET session" so that VariableSetStmt records all the cases. Remark: Recording "SET session" will also help for the Jumbling work being done in [1]. I don't think it's necessary. SET and SET SESSION are semantically the same Thanks for your feedback! so nothing should rely on how exactly someone spelled it. This is also the case for our core jumbling code, where we guarantee (or at least try to) that two semantically identical statements will get the same queryid, and therefore don't distinguish eg. LIKE vs ~~. Agree, but on the other hand currently SET and SET SESSION are recorded with distinct queryid: postgres=# select calls, query, queryid from pg_stat_statements; calls |query| queryid ---+-+-- 2 | select calls, query from pg_stat_statements | -6345508659980235519 1 | set session enable_seqscan=1| -3921418831612111986 1 | create extension pg_stat_statements | -1739183385080879393 1 | set enable_seqscan=1| 7925920505912025406 (4 rows) and this behavior would change with the Jumbling work in progress in [1] (mentioned up-thread) if we don't record "SET SESSION". I think that would make sense to keep the same behavior, what do you think? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Record SET session in VariableSetStmt
Hi hackers, "SET local" is currently recorded in VariableSetStmt (with the boolean is_local) but "SET session" is not. Please find attached a patch proposal to also record "SET session" so that VariableSetStmt records all the cases. Remark: Recording "SET session" will also help for the Jumbling work being done in [1]. [1]: https://www.postgresql.org/message-id/66be1104-164f-dcb8-6c43-f03a68a139a7%40gmail.com Looking forward to your feedback, Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comcommit 2401087366432b0a520ac45947c4584924099617 Author: bdrouvotAWS Date: Thu Oct 6 10:20:18 2022 + v1-0001-record-set-session-in-VariableSetStmt.patch diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 94d5142a4a..809e617d18 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -1563,6 +1563,7 @@ VariableSetStmt: VariableSetStmt *n = $2; n->is_local = false; + n->is_session = false; $$ = (Node *) n; } | SET LOCAL set_rest @@ -1570,6 +1571,7 @@ VariableSetStmt: VariableSetStmt *n = $3; n->is_local = true; + n->is_session = false; $$ = (Node *) n; } | SET SESSION set_rest @@ -1577,6 +1579,7 @@ VariableSetStmt: VariableSetStmt *n = $3; n->is_local = false; + n->is_session = true; $$ = (Node *) n; } ; diff --git a/src/include/nodes/parsenodes.h b/src/include/nodes/parsenodes.h index 633e7671b3..c3f33e7c4a 100644 --- a/src/include/nodes/parsenodes.h +++ b/src/include/nodes/parsenodes.h @@ -2228,6 +2228,7 @@ typedef struct VariableSetStmt char *name; /* variable to be set */ List *args; /* List of A_Const nodes */ boolis_local; /* SET LOCAL? */ + boolis_session; /* SET SESSION? */ } VariableSetStmt; /* --
Re: Query Jumbling for CALL and SET utility statements
Hi, On 10/6/22 8:39 AM, Michael Paquier wrote: On Mon, Sep 19, 2022 at 08:29:22AM +0200, Drouvot, Bertrand wrote: Please find attached v6 taking care of the remarks mentioned above. +case T_VariableSetStmt: +{ +VariableSetStmt *stmt = (VariableSetStmt *) node; + +/* stmt->name is NULL for RESET ALL */ +if (stmt->name) +{ +APP_JUMB(stmt->kind); +APP_JUMB_STRING(stmt->name); +JumbleExpr(jstate, (Node *) stmt->args); +} +} +break; Hmm. If VariableSetStmt->is_local is not added to the jumble, then aren't "SET foo = $1" and "SET LOCAL foo = $1" counted as the same query? Good catch, thanks! While at it let's also jumble "SET SESSION foo =". For this one, we would need to record another bool in VariableSetStmt: I'll create a dedicated patch for that. I am not seeing SAVEPOINT, RELEASE, ROLLBACK .. TO SAVEPOINT mentioned on this thread. Would these be worth considering in what gets compiled? That would cover the remaining bits of TransactionStmt. The ODBC driver abuses of savepoints, for example, so this could be useful for monitoring purposes in such cases. Agree. I'll look at those too. As of the code stands, it could be cleaner to check IsJumbleUtilityAllowed() in compute_utility_query_id(), falling back to a default in JumbleQuery(). Not that what your patch does is incorrect, of course. Will look at it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Query Jumbling for CALL and SET utility statements
Hi, On 9/26/22 12:40 PM, Drouvot, Bertrand wrote: let's add it in V7 attached (that's safer should the code change later on). Attached a tiny rebase needed due to 249b0409b1. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index 9ac5c87c3a..5d3330a87d 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -311,7 +311,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C"; wal_records > $2 as wal_records_generated, +| | | | | wal_records >= rows as wal_records_ge_rows +| | | | | FROM pg_stat_statements ORDER BY query COLLATE "C"| | | | | - SET pg_stat_statements.track_utility = FALSE | 1 |0 | f | f | t + SET pg_stat_statements.track_utility = $1 | 1 |0 | f | f | t UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t (7 rows) @@ -462,6 +462,111 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 (6 rows) +-- PL/pgSQL procedure and pg_stat_statements.track = top +CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (i - 1 - 1.0)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (j + j)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +SET pg_stat_statements.track = 'top'; +SET pg_stat_statements.track_utility = FALSE; +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +query | calls | rows +--+---+-- + CALL MINUS_TWO($1) | 2 |0 + CALL SUM_TWO($1, $2) | 2 |0 + SELECT pg_stat_statements_reset() | 1 |1 + SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 +(4 rows) + +-- +-- pg_stat_statements.track = all +-- +SET pg_stat_statements.track = 'all'; +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +-- PL/pgSQL procedure and pg_stat_statements.track = all +-- we drop and recreate the procedures to avoid any caching funnies +DROP PROCEDURE MINUS_TWO(INTEGER); +DROP PROCEDURE SUM_TWO(INTEGER, INTEGER); +CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (i - 1 - 1.0)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (j + j)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +query | calls | rows +--+---+-- + CALL MINUS_TWO($1) | 2 |0 + CALL SUM_TWO($1, $2) | 2 |0 + SELECT (i - $2 - $3)::INTEGER | 2 |2 + SELECT (j + j)::INTEGER | 2 |2 + SELECT pg_stat_statements_reset() | 1 |1 + SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 +(6 rows) + +SET pg_stat_statements.track_utility = TRUE; +-- SET +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +set enable_seqscan=false; +set enable_seqscan=true; +set seq_page_cost=2.0; +set seq_page_cost=1.0; +set enable_seqscan to default; +set seq_page_cost to default; +reset seq_page_cost; +reset
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 10/5/22 9:24 AM, Michael Paquier wrote: Something that stood out on a first review is the refactoring of 001_password.pl that can be done independently of the main patch: Good idea, thanks for the proposal. - test_role() -> test_conn() to be able to pass down a database name. - reset_pg_hba() to control the host, db and user parts. The host part does not really apply after moving the hosts checks to a more secure location, so I guess that this had better be extended just for the user and database, keeping host=local all the time. I am planning to apply 0001 attached independently, 0001 looks good to me. reducing the footprint of 0002, which is your previous patch left untouched (mostly!). Thanks! I'll look at it and the comments you just made up-thread. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add peer authentication TAP test
Hi, On 10/3/22 9:46 AM, Michael Paquier wrote: On Fri, Sep 30, 2022 at 07:51:29PM +0200, Drouvot, Bertrand wrote: Agree that it could be simplified, thanks for the hints! Attached a simplified version. While looking at that, I have noticed that it is possible to reduce the number of connection attempts (for example no need to re-test that the connection works when the map is not set, and the authn log would be the same with the map in place). Yeah that's right, thanks for simplifying further. Note that a path's meson.build needs a refresh for any new file added into the tree, with 003_peer.pl missing so this new test was not running in the recent CI runs. The indentation was also a bit wrong and I have tweaked a few comments, before finally applying it. Thanks! -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add peer authentication TAP test
Hi, On 9/30/22 2:00 AM, Michael Paquier wrote: On Wed, Sep 28, 2022 at 04:24:44PM +0900, Michael Paquier wrote: Hmm, indeed. It would be more reliable to rely on the contents returned by getpeereid()/getpwuid() after one successful peer connection, then use it in the map. I was wondering whether using stuff like getpwuid() in the perl script itself would be better, but it sounds less of a headache in terms of portability to just rely on authn_id via SYSTEM_USER to generate the contents of the correct map. By the way, on an extra read I have found a few things that can be simplified - I think that test_role() should be reworked so as the log patterns expected are passed down to connect_ok() and connect_fails() rather than involving find_in_log(). You still need find_in_log() to skip properly the case where peer is not supported by the platform, of course. - get_log_size() is not necessary. You should be able to get the same information with "-s $self->logfile". - Nit: a newline should be added at the end of 003_peer.pl. -- Agree that it could be simplified, thanks for the hints! Attached a simplified version. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl new file mode 100644 index 00..57bf2343fc --- /dev/null +++ b/src/test/authentication/t/003_peer.pl @@ -0,0 +1,108 @@ + +# Copyright (c) 2022, PostgreSQL Global Development Group + +# Tests for peer authentication and user name map. +# The test is skipped if the platform does not support peer authentication. + +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# Delete pg_hba.conf from the given node, add a new entry to it +# and then execute a reload to refresh it. +sub reset_pg_hba +{ + my $node = shift; + my $hba_method = shift; + + unlink($node->data_dir . '/pg_hba.conf'); + # just for testing purposes, use a continuation line. + $node->append_conf('pg_hba.conf', "local all all\\\n $hba_method"); + $node->reload; + return; +} + +# Test access for a single role, useful to wrap all tests into one. +sub test_role +{ + local $Test::Builder::Level = $Test::Builder::Level + 1; + + my ($node, $role, $method, $expected_res, %params) = @_; + my $status_string = 'failed'; + $status_string = 'success' if ($expected_res eq 0); + + my $connstr = "user=$role"; + my $testname = +"authentication $status_string for method $method, role $role"; + + if ($expected_res eq 0) + { +$node->connect_ok($connstr, $testname, %params); + } + else + { + # No checks of the error message, only the status code. + $node->connect_fails($connstr, $testname, %params); + } +} + +# find $pat in logfile of $node. +sub find_in_log +{ + my ($node, $pat) = @_; + + my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile); + return 0 if (length($log) <= 0); + + return $log =~ m/$pat/; +} + +# Initialize primary node. +my $node = PostgreSQL::Test::Cluster->new('primary'); +$node->init; +$node->append_conf('postgresql.conf', "log_connections = on\n"); +$node->start; + +# Set pg_hba.conf with the peer authentication. +reset_pg_hba($node, 'peer'); + +# Check that peer authentication is supported on this platform. +$node->psql('postgres'); + +# If not supported, then skip the rest of the test. +if (find_in_log($node, qr/peer authentication is not supported on this platform/)) +{ +plan skip_all => 'peer authentication is not supported on this platform'; +} + +# Let's create a new user for the user name map test. +$node->safe_psql('postgres', + qq{CREATE USER testmapuser}); + +# Get the system_user to define the user name map test. +my $system_user = + $node->safe_psql('postgres', q(select (string_to_array(SYSTEM_USER, ':'))[2])); + +# This connection should succeed. +test_role($node, $system_user, 'peer', 0, +log_like => + [qr/connection authenticated: identity="$system_user" method=peer/]); + +# This connection should failed. +test_role($node, qq{testmapuser}, 'peer', 2, +log_like => [qr/Peer authentication failed for user "testmapuser"/]); + +# Define a user name map. +$node->append_conf('pg_ident.conf', qq{mypeermap $system_user testmapuser}); + +# Set pg_hba.conf with the peer authentication and the user name map. +reset_pg_hba($node, 'peer map=mypeermap'); + +# This connection should now succeed. +test_role($node, qq{testmapuser}, 'peer', 0, +log_like => + [qr/connection authenticated: identity="$system_user" method=peer/]); + +done_testing();
Re: log_heap_visible(): remove unused parameter and update comment
Hi, On 9/30/22 1:32 PM, Bharath Rupireddy wrote: On Fri, Sep 30, 2022 at 1:07 PM Drouvot, Bertrand wrote: Hi, While resuming the work on [1] I noticed that: - there is an unused parameter in log_heap_visible() - the comment associated to the function is not in "sync" with the current implementation (referring a "block" that is not involved anymore) Attached a tiny patch as an attempt to address the above remarks. [1]: https://commitfest.postgresql.org/39/3740/ It looks like that parameter was originally introduced and used in PG 9.4 where xl_heap_visible structure was having RelFileNode, which was later removed in PG 9.5, since then the RelFileNode rnode parameter is left out. This parameter got renamed to RelFileLocator rlocator by the commit b0a55e43299c4ea2a9a8c757f9c26352407d0ccc in HEAD. The attached patch LGTM. Thanks for looking at it! We recently committed another patch to remove an unused function parameter - 65b158ae4e892c2da7a5e31e2d2645e5e79a0bfd. It makes me think that why can't we remove the unused function parameters once and for all, say with a compiler flag such as -Wunused-parameter [1]? We might have to be careful in removing certain parameters which are not being used right now, but might be used in the near future though. [1] https://man7.org/linux/man-pages/man1/gcc.1.html -Wunused-parameter Warn whenever a function parameter is unused aside from its declaration. To suppress this warning use the "unused" attribute. That's right. I have the feeling this will be somehow time consuming and I'm not sure the added value is worth it (as compare to fix them when "accidentally" cross their paths). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Minimal logical decoding on standbys
Hi, On 7/6/22 3:30 PM, Drouvot, Bertrand wrote: Hi, On 10/28/21 11:07 PM, Andres Freund wrote: Hi, On 2021-10-28 16:24:22 -0400, Robert Haas wrote: On Wed, Oct 27, 2021 at 2:56 AM Drouvot, Bertrand wrote: So you have in mind to check for XLogLogicalInfoActive() first, and if true, then open the relation and call RelationIsAccessibleInLogicalDecoding()? I think 0001 is utterly unacceptable. We cannot add calls to table_open() in low-level functions like this. Suppose for example that _bt_getbuf() calls _bt_log_reuse_page() which with 0001 applied would call get_rel_logical_catalog(). _bt_getbuf() will have acquired a buffer lock on the page. The idea that it's safe to call table_open() while holding a buffer lock cannot be taken seriously. Yes - that's pretty clearly a deadlock hazard. It shouldn't too hard to fix, I think. Possibly a bit more verbose than nice, but... Alternatively we could propagate the information whether a relcache entry is for a catalog from the table to the index. Then we'd not need to change the btree code to pass the table down. Looking closer at RelationIsAccessibleInLogicalDecoding() It seems to me that the missing part to be able to tell whether or not an index is for a catalog is the rd_options->user_catalog_table value of its related heap relation. Then, a way to achieve that could be to: - Add to Relation a new "heap_rd_options" representing the rd_options of the related heap relation when appropriate - Trigger the related indexes relcache invalidations when an ATExecSetRelOptions() is triggered on a heap relation - Write an equivalent of RelationIsUsedAsCatalogTable() for indexes that would make use of the heap_rd_options instead Does that sound like a valid option to you or do you have another idea in mind to propagate the information whether a relcache entry is for a catalog from the table to the index? I ended up with the attached proposal to propagate the catalog information to the indexes. The attached adds a new field "isusercatalog" in pg_index to indicate whether or not the index is linked to a table that has the storage parameter user_catalog_table set to true. Then it defines new macros, including "IndexIsAccessibleInLogicalDecoding" making use of this new field. This new macro replaces get_rel_logical_catalog() that was part of the previous patch version. What do you think about this approach and the attached? If that sounds reasonable, then I'll add tap tests for it and try to improve the way isusercatalog is propagated to the index(es) in case a reset is done on user_catalog_table on the table (currently in this POC patch, it's hardcoded to "false" which is the default value for user_catalog_table in boolRelOpts[]) (A better approach would be probably to retrieve the value from the table once the reset is done and then propagate it to the index(es).) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comFrom 0dc838a0fdb0f8daccc398bea4f11152b68e0c3a Mon Sep 17 00:00:00 2001 From: bdrouvotAWS Date: Fri, 30 Sep 2022 09:28:09 + Subject: [PATCH v26] Add info in WAL records in preparation for logical slot conflict handling. When a WAL replay on standby indicates that a catalog table tuple is to be deleted by an xid that is greater than a logical slot's catalog_xmin, then that means the slot's catalog_xmin conflicts with the xid, and we need to handle the conflict. While subsequent commits will do the actual conflict handling, this commit adds a new field onCatalogTable in such WAL records, that is true for catalog tables, so as to arrange for conflict handling. Author: Andres Freund (in an older version), Amit Khandekar, Bertrand Drouvot Reviewed-By: Bertrand Drouvot, Andres Freund, Robert Haas, Fabrizio de Royes Mello --- doc/src/sgml/catalogs.sgml | 11 + src/backend/access/common/reloptions.c | 3 +- src/backend/access/gist/gistxlog.c | 1 + src/backend/access/hash/hashinsert.c| 1 + src/backend/access/heap/heapam.c| 4 +- src/backend/access/heap/pruneheap.c | 1 + src/backend/access/heap/visibilitymap.c | 3 +- src/backend/access/nbtree/nbtpage.c | 3 ++ src/backend/access/spgist/spgvacuum.c | 1 + src/backend/catalog/index.c | 14 -- src/backend/commands/tablecmds.c| 57 + src/include/access/gistxlog.h | 2 + src/include/access/hash_xlog.h | 1 + src/include/access/heapam_xlog.h| 5 ++- src/include/access/nbtxlog.h| 2 + src/include/access/spgxlog.h| 1 + src/include/catalog/pg_index.h | 2 + src/include/utils/rel.h | 33 ++ 18 files changed, 137 insertions(+), 8 deletions(-) 8.3% doc/src/sgml/ 9.5% src/backend/access/heap/ 8.1% src/backend/access/ 7.7% src/backend/catalog
log_heap_visible(): remove unused parameter and update comment
Hi, While resuming the work on [1] I noticed that: - there is an unused parameter in log_heap_visible() - the comment associated to the function is not in "sync" with the current implementation (referring a "block" that is not involved anymore) Attached a tiny patch as an attempt to address the above remarks. [1]: https://commitfest.postgresql.org/39/3740/ Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c index 75b214824d..7465334a83 100644 --- a/src/backend/access/heap/heapam.c +++ b/src/backend/access/heap/heapam.c @@ -8188,17 +8188,17 @@ log_heap_freeze(Relation reln, Buffer buffer, TransactionId cutoff_xid, } /* - * Perform XLogInsert for a heap-visible operation. 'block' is the block - * being marked all-visible, and vm_buffer is the buffer containing the - * corresponding visibility map block. Both should have already been modified - * and dirtied. + * Perform XLogInsert for a heap-visible operation. heap_buffer is the buffer + * containing the block being marked all-visible, and vm_buffer is the buffer + * containing the corresponding visibility map block. Both should have already + * been modified and dirtied. * * If checksums are enabled, we also generate a full-page image of * heap_buffer, if necessary. */ XLogRecPtr -log_heap_visible(RelFileLocator rlocator, Buffer heap_buffer, Buffer vm_buffer, -TransactionId cutoff_xid, uint8 vmflags) +log_heap_visible(Buffer heap_buffer, Buffer vm_buffer, TransactionId cutoff_xid, +uint8 vmflags) { xl_heap_visible xlrec; XLogRecPtr recptr; diff --git a/src/backend/access/heap/visibilitymap.c b/src/backend/access/heap/visibilitymap.c index d62761728b..ccc97e3b09 100644 --- a/src/backend/access/heap/visibilitymap.c +++ b/src/backend/access/heap/visibilitymap.c @@ -283,8 +283,7 @@ visibilitymap_set(Relation rel, BlockNumber heapBlk, Buffer heapBuf, if (XLogRecPtrIsInvalid(recptr)) { Assert(!InRecovery); - recptr = log_heap_visible(rel->rd_locator, heapBuf, vmBuf, - cutoff_xid, flags); + recptr = log_heap_visible(heapBuf, vmBuf, cutoff_xid, flags); /* * If data checksums are enabled (or wal_log_hints=on), we diff --git a/src/include/access/heapam_xlog.h b/src/include/access/heapam_xlog.h index 34220d93cf..4b6bdf6955 100644 --- a/src/include/access/heapam_xlog.h +++ b/src/include/access/heapam_xlog.h @@ -415,7 +415,7 @@ extern bool heap_prepare_freeze_tuple(HeapTupleHeader tuple, MultiXactId *relminmxid_out); extern void heap_execute_freeze_tuple(HeapTupleHeader tuple, xl_heap_freeze_tuple *frz); -extern XLogRecPtr log_heap_visible(RelFileLocator rlocator, Buffer heap_buffer, - Buffer vm_buffer, TransactionId cutoff_xid, uint8 vmflags); +extern XLogRecPtr log_heap_visible(Buffer heap_buffer, Buffer vm_buffer, + TransactionId cutoff_xid, uint8 vmflags); #endif /* HEAPAM_XLOG_H */
Re: SYSTEM_USER reserved word implementation
Hi, On 9/29/22 8:12 AM, Michael Paquier wrote: On Wed, Sep 28, 2022 at 12:58:48PM +0200, Drouvot, Bertrand wrote: I had a look at v5 and it does look good to me. Okay, cool. I have spent some time today doing a last pass over it and an extra round of tests. Things looked fine, so applied. -- Thanks for your precious help! -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: SYSTEM_USER reserved word implementation
Hi, On 9/28/22 5:28 AM, Michael Paquier wrote: On Tue, Sep 27, 2022 at 03:38:49PM -0700, Jacob Champion wrote: On 9/26/22 06:29, Drouvot, Bertrand wrote: Since there are only internal clients to the API, I'd argue this makes more sense as an Assert(authn_id != NULL), but I don't think it's a dealbreaker. Using an assert() looks like a good idea from here. If this is called with a NULL authn, this could reflect a problem in the authentication logic. Agree, thanks for pointing out. As far the assertion failure mentioned by Michael when moving the SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is safe to force the collation to C_COLLATION_OID for SQLValueFunction having a TEXT type, but I would be happy to also hear your thoughts about it. Unfortunately I don't have much to add here; I don't know enough about the underlying problems. I have been looking at that, and after putting my hands on that this comes down to the facility introduced in 40c24bf. So, I think that we'd better use COERCE_SQL_SYNTAX so as there is no need to worry about the shortcuts this patch is trying to use with the collation setup. Nice! And there are a few tests for get_func_sql_syntax() in create_view.sql. Note that this makes the patch slightly shorter, and simpler. Agree that it does look simpler that way and that making use of COERCE_SQL_SYNTAX does looks like a better approach. Nice catch! The docs still mentioned "name", and not "text". Oups, thanks for pointing out. I had a look at v5 and it does look good to me. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Add peer authentication TAP test
Hi, On 9/28/22 7:52 AM, Michael Paquier wrote: On Fri, Aug 26, 2022 at 10:43:43AM +0200, Drouvot, Bertrand wrote: During the work in [1] we created a new TAP test to test the SYSTEM_USER behavior with peer authentication. It turns out that there is currently no TAP test for the peer authentication, so we think (thanks Michael for the suggestion [2]) that it's better to split the work in [1] between "pure" SYSTEM_USER related work and the "pure" peer authentication TAP test work. That's the reason of this new thread, please find attached a patch to add a new TAP test for the peer authentication. +# Get the session_user to define the user name map test. +my $session_user = + $node->safe_psql('postgres', 'select session_user'); [...] +# Define a user name map. +$node->append_conf('pg_ident.conf', qq{mypeermap $session_user testmap$session_user}); + +# Set pg_hba.conf with the peer authentication and the user name map. +reset_pg_hba($node, 'peer map=mypeermap'); A map consists of a "MAPNAME SYSTEM_USER PG_USER". Why does this test use a Postgres role (from session_user) as the system user for the peer map? Thanks for looking at it! Initially selecting the session_user with a "local" connection and no user provided during the connection is a way I came up to retrieve the "SYSTEM_USER" to be used later on in the map. Maybe the variable name should be system_user instead or should we use another way to get the "SYSTEM_USER" to be used in the map? Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: SYSTEM_USER reserved word implementation
Hi, On 9/8/22 3:17 AM, Michael Paquier wrote: On Wed, Sep 07, 2022 at 08:48:43AM -0700, Jacob Champion wrote: We could pass the bare auth_method index, or update the documentation for auth_method to state that it's guaranteed to be zero if authn_id is NULL (and then enforce that). case SVFOP_CURRENT_USER: case SVFOP_USER: case SVFOP_SESSION_USER: + case SVFOP_SYSTEM_USER: case SVFOP_CURRENT_CATALOG: case SVFOP_CURRENT_SCHEMA: svf->type = NAMEOID; Should this be moved to use TEXTOID instead? Yeah, it should. There is actually a second and much deeper issue here, in the shape of a collation problem. See the assertion failure in exprSetCollation(), because we expect SQLValueFunction nodes to return a name or a non-collatable type. However, for this case, we'd require a text to get rid of the 63-character limit, and that's a collatable type. This reminds me of the recent thread to work on getting rid of this limit for the name type.. Please find attached V4 taking care of Jacob's previous comments. As far the assertion failure mentioned by Michael when moving the SVFOP_SYSTEM_USER from NAMEOID to TEXTOID: V4 is assuming that it is safe to force the collation to C_COLLATION_OID for SQLValueFunction having a TEXT type, but I would be happy to also hear your thoughts about it. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index e1fe4fec57..fe99e65dd5 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -22623,6 +22623,25 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); + + + + system_user + +system_user +name + + +Returns the authentication method and the identity (if any) that the +user presented during the authentication cycle before they were +assigned a database role. It is represented as +auth_method:identity or +NULL if the user has not been authenticated (for +example if Trust authentication has +been used). + + + diff --git a/src/backend/access/transam/parallel.c b/src/backend/access/transam/parallel.c index bc93101ff7..c2a08e9414 100644 --- a/src/backend/access/transam/parallel.c +++ b/src/backend/access/transam/parallel.c @@ -1496,6 +1496,14 @@ ParallelWorkerMain(Datum main_arg) false); RestoreClientConnectionInfo(clientconninfospace); + /* +* Initialize SystemUser now that MyClientConnectionInfo is restored. +* Also ensure that auth_method is actually valid, aka authn_id is not NULL. +*/ + if (MyClientConnectionInfo.authn_id) + InitializeSystemUser(MyClientConnectionInfo.authn_id, + hba_authname(MyClientConnectionInfo.auth_method)); + /* Attach to the leader's serializable transaction, if SERIALIZABLE. */ AttachSerializableXact(fps->serializable_xact_handle); diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 9b9bbf00a9..c51578c0b9 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -2537,6 +2537,11 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep *op) *op->resvalue = session_user(fcinfo); *op->resnull = fcinfo->isnull; break; + case SVFOP_SYSTEM_USER: + InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL); + *op->resvalue = system_user(fcinfo); + *op->resnull = fcinfo->isnull; + break; case SVFOP_CURRENT_CATALOG: InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL); *op->resvalue = current_database(fcinfo); diff --git a/src/backend/nodes/nodeFuncs.c b/src/backend/nodes/nodeFuncs.c index 3bac350bf5..453ba84494 100644 --- a/src/backend/nodes/nodeFuncs.c +++ b/src/backend/nodes/nodeFuncs.c @@ -1142,7 +1142,8 @@ exprSetCollation(Node *expr, Oid collation) ((MinMaxExpr *) expr)->minmaxcollid = collation; break; case T_SQLValueFunction: - AssertSQLValueFunction *) expr)->type == NAMEOID) ? + AssertSQLValueFunction *) expr)->type == NAMEOID || + ((SQLValueFunction *) expr)->type == TEXTOID) ? (collation == C_COLLATION_OID) : (collation == InvalidOid));
Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)
Hi, On 9/26/22 10:23 AM, Drouvot, Bertrand wrote: Hi, On 9/23/22 10:45 PM, Andres Freund wrote: Hi, On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote: Thanks for the suggestion, I'm coming up with this proposal in v4 attached: I pushed the bugfix / related test portion to 15, master. Thanks! Thanks! Forgot to say that with that being fixed, I'll come back with a patch proposal for the tables/indexes stats split (discovered the issue fixed in this current thread while working on the split patch.) Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: SYSTEM_USER reserved word implementation
Hi, On 9/7/22 5:48 PM, Jacob Champion wrote: On 9/7/22 07:46, Drouvot, Bertrand wrote: Except the Nit above, that looks all good to me. A few additional comments: +assigned a database role. It is represented as +auth_method:identity or +NULL if the user has not been authenticated (for +example if has been used). + This is rendered as ... (for example if Section 21.4 has been used). which IMO isn't too helpful. Maybe a would read better than an ? Thanks for looking at it! Good catch, V4 coming soon will make use of instead. Also, this function's placement in the docs (with the System Catalog Information Functions) seems wrong to me. I think it should go up above in the Session Information Functions, with current_user et al. Agree, will move it to the Session Information Functions in V4. + /* Build system user as auth_method:authn_id */ + char *system_user; + Sizeauthname_len = strlen(auth_method); + Sizeauthn_id_len = strlen(authn_id); + + system_user = palloc0(authname_len + authn_id_len + 2); + strcat(system_user, auth_method); + strcat(system_user, ":"); + strcat(system_user, authn_id); If we're palloc'ing anyway, can this be replaced with a single psprintf()? Fair point, V4 will make use of psprintf(). + /* Initialize SystemUser now that MyClientConnectionInfo is restored. */ + InitializeSystemUser(MyClientConnectionInfo.authn_id, + hba_authname(MyClientConnectionInfo.auth_method)); It makes me a little nervous to call hba_authname(auth_method) without checking to see that auth_method is actually valid (which is only true if authn_id is not NULL). Will add additional check for safety in V4. We could pass the bare auth_method index, or update the documentation for auth_method to state that it's guaranteed to be zero if authn_id is NULL (and then enforce that). case SVFOP_CURRENT_USER: case SVFOP_USER: case SVFOP_SESSION_USER: + case SVFOP_SYSTEM_USER: case SVFOP_CURRENT_CATALOG: case SVFOP_CURRENT_SCHEMA: svf->type = NAMEOID; Should this be moved to use TEXTOID instead? Good catch, will do in V4. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Query Jumbling for CALL and SET utility statements
Hi, On 9/21/22 6:07 PM, Fujii Masao wrote: On 2022/09/19 15:29, Drouvot, Bertrand wrote: Please find attached v6 taking care of the remarks mentioned above. Thanks for updating the patch! +SET pg_stat_statements.track_utility = TRUE; + +-- PL/pgSQL procedure and pg_stat_statements.track = all +-- we drop and recreate the procedures to avoid any caching funnies +SET pg_stat_statements.track_utility = FALSE; Could you tell me why track_utility is enabled just before it's disabled? Thanks for looking at the new version! No real reason, I removed those useless SET in the new V7 attached. Could you tell me what actually happens if we don't drop and recreate the procedures? I'd like to know what "any caching funnies" are. Without the drop/recreate the procedure body does not appear normalized (while the CALL itself is) when switching from track = top to track = all. I copy-pasted this comment from the already existing "function" section in the pg_stat_statements.sql file. This comment has been introduced for the function section in commit 83f2061dd0. Note that the behavior would be the same for the function case (function body does not appear normalized without the drop/recreate). +SELECT pg_stat_statements_reset(); +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); + +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; This test set for the procedures is executed with the following four conditions, respectively. Do we really need all of these tests? track = top, track_utility = true track = top, track_utility = false track = all, track_utility = true track = all, track_utility = false Oh right, the track_utility = false cases have been added when we decided up-thread to force track CALL. But now that's probably not needed to test with track_utility = true. So I'm just keeping track_utility = off with track = top or all in the new V7 attached (like this is the case for the "function" section). +begin; +prepare transaction 'tx1'; +insert into test_tx values (1); +commit prepared 'tx1'; The test set of 2PC commands is also executed with track_utility = on and off, respectively. But why do we need to run that test when track_utility = off? That's useless, thanks for pointing out. Removed in V7 attached. - if (query->utilityStmt) + if (query->utilityStmt && !jstate) { if (pgss_track_utility && !PGSS_HANDLED_UTILITY(query->utilityStmt)) "pgss_track_utility" should be "pgss_track_utility || FORCE_TRACK_UTILITY(parsetree)" theoretically? Good catch! That's not needed (in practice) with the current code but that is "theoretically" needed indeed, let's add it in V7 attached (that's safer should the code change later on). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index ff0166fb9d..f5fc2f1f38 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -272,7 +272,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C"; wal_records > $2 as wal_records_generated, +| | | | | wal_records >= rows as wal_records_ge_rows +| | | | | FROM pg_stat_statements ORDER BY query COLLATE "C"| | | | | - SET pg_stat_statements.track_utility = FALSE | 1 |0 | f | f | t + SET pg_stat_statements.track_utility = $1 | 1 |0 | f | f | t UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t (7 rows) @@ -423,6 +423,111 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 (6 rows) +-- PL/pgSQL procedure and pg_stat_statements.track = top +CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (i - 1 - 1.0)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (j + j)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +SET pg_stat_statements.track = 'top'; +SET pg_stat_statements.track_utility = FALSE; +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +CALL MINUS_TW
Re: pg_stat_have_stats() returns true for dropped indexes (or for index creation transaction rolled back)
Hi, On 9/23/22 10:45 PM, Andres Freund wrote: Hi, On 2022-09-01 08:40:54 +0200, Drouvot, Bertrand wrote: Thanks for the suggestion, I'm coming up with this proposal in v4 attached: I pushed the bugfix / related test portion to 15, master. Thanks! Thanks! I left the replication stuff out - it seemed somewhat independent. Yeah. Probably will just push that to master, unless somebody thinks it should be in both? Sounds good to me as this is not a bug and that seems unlikely to me that an issue in this area will be introduced later on on 15 without being introduced on master too. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 9/20/22 6:30 AM, Michael Paquier wrote: On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote: You have to assume that somebody (a) has a role or DB name starting with slash, (b) has an explicit reference to that name in their pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't notice that things are misbehaving until after some hacker manages to break into their installation on the strength of the misbehaving entry. OK, I'll grant that the probability of (c) is depressingly close to unity; but each of the other steps seems quite low probability. All four of them happening in one installation is something I doubt will happen. It is the kind of things that could blow up as a CVE and some bad PR for the project, so I cannot get excited about enforcing this new rule in an authentication file (aka before a role is authenticated) while we are talking about 3~4 code paths (?) that would need an extra check to make sure that no instances have such object names. I also have the feeling that having (a), (b) and (d) is low probability. That said, If the user "/john" already exists and has a hba entry then this entry will still match with the patch. Problem is that all the users that contain "john" would also now match. But things get worst if say /a is an existing user and hba entry as the entry would match any users that contains "a" with the patch. I assume (maybe i should not) that if objects starting with / already exist there is very good reason(s) behind. Then I don't think that preventing their creation in the DDL would help (quite the contrary for the ones that really need them). It looks to me that adding a GUC (off by default) to enable/disable the regexp usage in the hba could be a fair compromise. It won't block any creation starting with a / and won't open more doors (if such objects exist) by default. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch to address creation of PgStat* contexts with null parent context
Hi, On 9/17/22 6:10 PM, Andres Freund wrote: Hi, On 2022-09-07 11:11:11 +0200, Drouvot, Bertrand wrote: On 9/6/22 7:53 AM, Kyotaro Horiguchi wrote: At Mon, 5 Sep 2022 14:46:55 +0200, "Drouvot, Bertrand" wrote in Looks like that both approaches have their pros and cons. I'm tempted to vote +1 on "just changing" the parent context to TopMemoryContext and not changing the allocations locations. Yeah. It is safe more than anything and we don't have a problem there. So, I'm fine with just replacing the parent context at the three places. Attached a patch proposal to do so. Pushed. Thanks for the report and the fix! Thanks! I just marked the corresponding CF entry as Committed. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 9/17/22 8:53 AM, Michael Paquier wrote: On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote: The CF bot is failing for Windows (all other tests are green) and only for the new tap test related to the regular expression on the host name (the ones on database and role are fine). The issue is not related to the patch. The issue is that the Windows Cirrus test does not like when a host name is provided for a "host" entry in pg_hba.conf (while it works fine when a CIDR is provided). You can see an example in [1] where the only change is to replace the CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing on Windows only (its log file is here [2]). I'll look at this "Windows" related issue but would appreciate any guidance/help if someone has experience in this area on windows. I recall that being able to do a reverse lookup of a hostname on Windows for localhost requires a few extra setup steps as that's not guaranteed to be set in all environments by default, which is why we go at great length to use 127.0.0.1 in the TAP test setup for example (see Cluster.pm). Looking at your patch, the goal is to test the mapping of regular expression for host names, user names and database names. If the first case is not guaranteed, my guess is that it is fine to skip this portion of the tests on Windows. Thanks for looking at it! That sounds reasonable, v3 attached is skipping the regular expression tests for the hostname on Windows. While reading the patch, I am a bit confused about token_regcomp() and token_regexec(). It would help the review a lot if these were documented with proper comments, even if these act roughly as wrappers for pg_regexec() and pg_regcomp(). Fully agree, comments were missing. They've been added in v3 attached. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..28343445be 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -235,8 +235,9 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names can be specified by preceding the file name with @. @@ -249,7 +250,8 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by / + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members @@ -258,7 +260,8 @@ hostnogssenc database user/ + can be supplied by separating them with commas. A separate file containing user names can be specified by preceding the file name with @. @@ -270,8 +273,9 @@ hostnogssenc database user Specifies the client machine address(es) that this record - matches. This field can contain either a host name, an IP - address range, or one of the special key words mentioned below. + matches. This field can contain either a host name, a regular expression + preceded by / representing host names, an IP address range, + or one of the special key words mentioned below. @@ -785,16 +789,18 @@ hostall all 192.168.12.10/32 gss # TYPE DATABASEUSERADDRESS METHOD hostall all 192.168.0.0/16 ident map=omicron -# If these are the only three lines for local connections, they will +# If these are the only four lines for local connections, they will # allow local users to connect only to their own databases (databases -# with the same name as their database user name) except for administrators -# and members of role "support", who can connect to all databases. The file -# $PGDATA/admins contains a list of names of administrators. Passwords +# with the same name as their database user name) except for administrators, +# users finishing with "helpdesk" and members of role "support", +# who can connect to all databases. +# The file$PGDATA/admins contains a list of names of administrators. Passwords # are required in all cases. # # TYPE DATABASEUSERADDRESS METHOD local sameuserall
Re: Query Jumbling for CALL and SET utility statements
Hi, On 9/16/22 5:47 PM, Drouvot, Bertrand wrote: Hi, On 9/16/22 2:53 PM, Fujii Masao wrote: Attached v5 to normalize 2PC commands too, so that we get things like: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + /* stmt->name is NULL for RESET ALL */ + if (stmt->name) + { + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the same query. Is this intentional? Thanks for looking at the patch! No, it is not intentional, good catch! Which might be ok because their behavior is basically the same. But I'm afaid which may cause users to be confused. For example, they may fail to find the pgss entry for RESET command they ran and just wonder why the command was not recorded. To avoid such confusion, how about appending stmt->kind to the jumble? Thought? I think that's a good idea and will provide a new version taking care of it (and also Sami's comments up-thread). Please find attached v6 taking care of the remarks mentioned above. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/contrib/pg_stat_statements/expected/pg_stat_statements.out b/contrib/pg_stat_statements/expected/pg_stat_statements.out index ff0166fb9d..ad8cebcbad 100644 --- a/contrib/pg_stat_statements/expected/pg_stat_statements.out +++ b/contrib/pg_stat_statements/expected/pg_stat_statements.out @@ -272,7 +272,7 @@ FROM pg_stat_statements ORDER BY query COLLATE "C"; wal_records > $2 as wal_records_generated, +| | | | | wal_records >= rows as wal_records_ge_rows +| | | | | FROM pg_stat_statements ORDER BY query COLLATE "C"| | | | | - SET pg_stat_statements.track_utility = FALSE | 1 |0 | f | f | t + SET pg_stat_statements.track_utility = $1 | 1 |0 | f | f | t UPDATE pgss_test SET b = $1 WHERE a > $2 | 1 |3 | t | t | t (7 rows) @@ -423,6 +423,155 @@ SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 (6 rows) +-- PL/pgSQL procedure and pg_stat_statements.track = top +CREATE PROCEDURE MINUS_TWO(i INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (i - 1 - 1.0)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +CREATE PROCEDURE SUM_TWO(i INTEGER, j INTEGER) AS $$ +DECLARE + r INTEGER; +BEGIN + SELECT (j + j)::INTEGER INTO r; +END; $$ LANGUAGE plpgsql; +SET pg_stat_statements.track = 'top'; +SET pg_stat_statements.track_utility = TRUE; +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +query | calls | rows +--+---+-- + CALL MINUS_TWO($1) | 2 |0 + CALL SUM_TWO($1, $2) | 2 |0 + SELECT pg_stat_statements_reset() | 1 |1 + SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C" | 0 |0 +(4 rows) + +-- Ensure CALL is tracked even if pg_stat_statements.track_utility is FALSE +SET pg_stat_statements.track_utility = FALSE; +SELECT pg_stat_statements_reset(); + pg_stat_statements_reset +-- + +(1 row) + +CALL MINUS_TWO(3); +CALL MINUS_TWO(7); +CALL SUM_TWO(3, 8); +CALL SUM_TWO(7, 5); +SELECT query, calls, rows FROM pg_stat_statements ORDER BY query COLLATE "C"; +query | calls | rows +--+---+-- + CALL MINUS_TWO($1) | 2 |0 + CALL SUM_TWO($1, $2) | 2 |0 + SELECT pg_stat_statements_reset() | 1 |1 + SELECT query, calls, rows FROM pg_stat_statements
Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf
Hi, On 9/12/22 9:55 AM, Drouvot, Bertrand wrote: Hi, On 9/10/22 1:21 AM, Jacob Champion wrote: On 8/19/22 01:12, Drouvot, Bertrand wrote: + wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar)); + wlen = pg_mb2wchar_with_len(tok->string + 1, + wstr, strlen(tok->string + 1)); The (tok->string + 1) construction comes up often enough that I think it should be put in a `regex` variable or similar. That would help my eyes with the (strlen(tok->string + 1) + 1) construction, especially. I noticed that for pg_ident, we precompile the regexes per-line and reuse those in child processes. Whereas here we're compiling, using, and then discarding the regex for each check. I think the example set by the pg_ident code is probably the one to follow, unless you have a good reason not to. Thanks for the feedback. Yeah fully agree. I'll provide a new version that follow the same logic as the pg_ident code. +# Testing with regular expression for username +reset_pg_hba($node, '/^.*md.*$', 'password'); +test_role($node, 'md5_role', 'password from pgpass and regular expression for username', 0); + IMO the coverage for this patch needs to be filled out. Negative test cases are more important than positive ones for security-related code. Agree, will do. Other than that, and Tom's note on potentially expanding this to other areas, I'll add regexp usage for the database column and also the for the address one when non CIDR is provided (so host name(s)) (I think it also makes sense specially as we don't allow multiple values for this column). Please find attached v2 addressing the comments mentioned above. v2 also provides regular expression usage for the database and the address columns (when a host name is being used). Remark: The CF bot is failing for Windows (all other tests are green) and only for the new tap test related to the regular expression on the host name (the ones on database and role are fine). The issue is not related to the patch. The issue is that the Windows Cirrus test does not like when a host name is provided for a "host" entry in pg_hba.conf (while it works fine when a CIDR is provided). You can see an example in [1] where the only change is to replace the CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing on Windows only (its log file is here [2]). I'll look at this "Windows" related issue but would appreciate any guidance/help if someone has experience in this area on windows. [1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr” [2]: https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml index c6f1b70fd3..28343445be 100644 --- a/doc/src/sgml/client-auth.sgml +++ b/doc/src/sgml/client-auth.sgml @@ -235,8 +235,9 @@ hostnogssenc database userPostgreSQL database. - Multiple database names can be supplied by separating them with - commas. A separate file containing database names can be specified by + Multiple database names and/or regular expressions preceded by / + can be supplied by separating them with commas. + A separate file containing database names can be specified by preceding the file name with @. @@ -249,7 +250,8 @@ hostnogssenc database userall specifies that it matches all users. Otherwise, this is either the name of a specific - database user, or a group name preceded by +. + database user, a regular expression preceded by / + or a group name preceded by +. (Recall that there is no real distinction between users and groups in PostgreSQL; a + mark really means match any of the roles that are directly or indirectly members @@ -258,7 +260,8 @@ hostnogssenc database user/ + can be supplied by separating them with commas. A separate file containing user names can be specified by preceding the file name with @. @@ -270,8 +273,9 @@ hostnogssenc database user Specifies the client machine address(es) that this record - matches. This field can contain either a host name, an IP - address range, or one of the special key words mentioned below. + matches. This field can contain either a host name, a regular expression + preceded by / representing host names, an IP address range, + or one of the special key words mentioned below. @@ -785,16 +789,18 @@ hostall all 192.168.12.10/32 gss # TYPE DATABASEUSERADDRESS METHOD hostall all
Re: Query Jumbling for CALL and SET utility statements
Hi, On 9/16/22 2:53 PM, Fujii Masao wrote: Attached v5 to normalize 2PC commands too, so that we get things like: + case T_VariableSetStmt: + { + VariableSetStmt *stmt = (VariableSetStmt *) node; + + /* stmt->name is NULL for RESET ALL */ + if (stmt->name) + { + APP_JUMB_STRING(stmt->name); + JumbleExpr(jstate, (Node *) stmt->args); With the patch, "SET ... TO DEFAULT" and "RESET ..." are counted as the same query. Is this intentional? Thanks for looking at the patch! No, it is not intentional, good catch! Which might be ok because their behavior is basically the same. But I'm afaid which may cause users to be confused. For example, they may fail to find the pgss entry for RESET command they ran and just wonder why the command was not recorded. To avoid such confusion, how about appending stmt->kind to the jumble? Thought? I think that's a good idea and will provide a new version taking care of it (and also Sami's comments up-thread). Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Re: Query Jumbling for CALL and SET utility statements
Hi, On 9/13/22 6:33 AM, Julien Rouhaud wrote: Hi, On Tue, Sep 13, 2022 at 11:43:52AM +0900, bt22kawamotok wrote: I found that utility statement is counted separately in upper and lower case. For example BEGIN and begin are counted separately. Is it difficult to fix this problem? This is a known behavior, utility command aren't normalized (apart from the few that will be with this patch) and the queryid is just a hash of the provided query string. It seems unrelated to this patch though. While it can be a bit annoying, it's unlikely that the application will have thousands of way to ask for a new transaction (mixing case, adding a random number of spaces between BEGIN and TRANSACTION and so on), so in real life it won't cause any problem. Agree that it seems unlikely to cause any problem (as compare to the utility statements that are handled in this patch). Fixing it would require to fully jumble all utility statements, which would require a separate discussion. Agree. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com Amazon Web Services EMEA SARL, 38 avenue John F. Kennedy, L-1855 Luxembourg, R.C.S. Luxembourg B186284 Amazon Web Services EMEA SARL, succursale francaise, 31 Place des Corolles, Tour Carpe Diem, F-92400 Courbevoie, SIREN 831 001 334, RCS Nanterre, APE 6311Z, TVA FR30831001334
[PATCH] Add peer authentication TAP test
Hi hackers, During the work in [1] we created a new TAP test to test the SYSTEM_USER behavior with peer authentication. It turns out that there is currently no TAP test for the peer authentication, so we think (thanks Michael for the suggestion [2]) that it's better to split the work in [1] between "pure" SYSTEM_USER related work and the "pure" peer authentication TAP test work. That's the reason of this new thread, please find attached a patch to add a new TAP test for the peer authentication. [1]: https://www.postgresql.org/message-id/flat/7e692b8c-0b11-45db-1cad-3afc5b57409f%40amazon.com [2]: https://www.postgresql.org/message-id/YwgboqQUV1%2BY/k6z%40paquier.xyz Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com diff --git a/src/test/authentication/t/003_peer.pl b/src/test/authentication/t/003_peer.pl new file mode 100644 index 00..006c4405a5 --- /dev/null +++ b/src/test/authentication/t/003_peer.pl @@ -0,0 +1,140 @@ + +# Copyright (c) 2022, PostgreSQL Global Development Group + +# Tests for peer authentication and user name map. +# The test is skipped if the platform does not support peer authentication. + +use strict; +use warnings; +use PostgreSQL::Test::Cluster; +use PostgreSQL::Test::Utils; +use Test::More; + +# Delete pg_hba.conf from the given node, add a new entry to it +# and then execute a reload to refresh it. +sub reset_pg_hba +{ + my $node = shift; + my $hba_method = shift; + + unlink($node->data_dir . '/pg_hba.conf'); + # just for testing purposes, use a continuation line. + $node->append_conf('pg_hba.conf', "local all all\\\n $hba_method"); + $node->reload; + return; +} + +# Test access for a single role, useful to wrap all tests into one. +sub test_role +{ + local $Test::Builder::Level = $Test::Builder::Level + 1; + + my ($node, $role, $method, $expected_res) = @_; + my $status_string = 'failed'; + $status_string = 'success' if ($expected_res eq 0); + + my $connstr = "user=$role"; + my $testname = +"authentication $status_string for method $method, role $role"; + + if ($expected_res eq 0) + { +$node->connect_ok($connstr, $testname); + } + else + { + # No checks of the error message, only the status code. + $node->connect_fails($connstr, $testname); + } +} + +# return the size of logfile of $node in bytes. +sub get_log_size +{ + my ($node) = @_; + + return (stat $node->logfile)[7]; +} + +# find $pat in logfile of $node after $off-th byte. +sub find_in_log +{ + my ($node, $pat, $off) = @_; + + $off = 0 unless defined $off; + my $log = PostgreSQL::Test::Utils::slurp_file($node->logfile); + return 0 if (length($log) <= $off); + + $log = substr($log, $off); + + return $log =~ m/$pat/; +} + +# Initialize primary node. +my $node = PostgreSQL::Test::Cluster->new('primary'); +$node->init; +$node->append_conf('postgresql.conf', "log_connections = on\n"); +$node->start; + +# Get the session_user to define the user name map test. +my $session_user = + $node->safe_psql('postgres', 'select session_user'); + +# Create a new user for the user name map test. +$node->safe_psql('postgres', + qq{CREATE USER testmap$session_user}); + +# Set pg_hba.conf with the peer authentication. +reset_pg_hba($node, 'peer'); + +# Check that peer authentication is supported on this platform. +my $logstart = get_log_size($node); + +$node->psql('postgres', undef, connstr => "user=$session_user"); + +# If not supported, then skip the rest of the test. +if (find_in_log($node, qr/peer authentication is not supported on this platform/, $logstart)) +{ +plan skip_all => 'peer authentication is not supported on this platform'; +} + +# It's supported so let's test peer authentication. +# This connection should succeed. +$logstart = get_log_size($node); +test_role($node, $session_user, 'peer', 0); + +ok( find_in_log( + $node, + qr/connection authenticated: identity="$session_user" method=peer/, $logstart), + "logfile: connection authenticated for method peer and identity $session_user"); + +# This connection should failed. +$logstart = get_log_size($node); +test_role($node, qq{testmap$session_user}, 'peer', 2); + +ok( find_in_log( + $node, + qr/Peer authentication failed for user "testmap$session_user"/, $logstart), + "logfile: Peer authentication failed for user testmap$session_user"); + +# Define a user name map. +$node->append_conf('pg_ident.conf', qq{mypeermap $session_user testmap$session_user}); + +# Set pg_hba.conf with the peer authentication and the user name map. +reset_pg_hba($node, 'peer map=mypeermap'); + +# This connection should now succeed. +$logstart = get_log_size($node); +test_role($node, qq{testmap$session_user}, 'peer', 0); + +ok( find_in_log( + $node, + qr/connection authenticated: identity="$session_user" method=peer/, $logstart), + "logfile: connection authenticated for method peer and identity $session_user"); + +ok(
Re: shared-memory based stats collector - v70
Hi, On 8/18/22 9:51 PM, Andres Freund wrote: Hi, On 2022-08-18 15:26:31 -0400, Greg Stark wrote: And indexes of course. It's a bit frustrating since without the catalog you won't know what table the index actually is for... But they're pretty important stats. FWIW, I think we should split relation stats into table and index stats. Historically it'd have added a lot of complexity to separate the two, but I don't think that's the case anymore. And we waste space for index stats by having lots of table specific fields. It seems to me that we should work on that first then, what do you think? (If so I can try to have a look at it). And once done then resume the work to provide the APIs to get all tables/indexes from all the databases. That way we'll be able to provide one API for the tables and one for the indexes (instead of one API for both like my current POC is doing). On that note though... What do you think about having the capability to add other stats kinds to the stats infrastructure? I think that's a good idea and that would be great to have. Getting closer to that was one of my goals working on the shared memory stats stuff. It would make a lot of sense for pg_stat_statements to add its entries here instead of having to reimplement a lot of the same magic. Yes, we should move pg_stat_statements over. It's pretty easy to get massive contention on stats entries with pg_stat_statements, because it doesn't have support for "batching" updates to shared stats. And reimplementing the same logic in pg_stat_statements.c doesn't make sense. And the set of normalized queries could probably stored in DSA as well - the file based thing we have right now is problematic. To do that I guess more of the code needs to be moved to be table driven from the kind structs either with callbacks or with other meta data. Pretty much all of it already is. The only substantial missing bit is reading/writing of stats files, but that should be pretty easy. And of course making the callback array extensible. So the kind record could contain tupledesc and the code to construct the returned tuple so that these functions could return any custom entry as well as the standard entries. I don't see how this would work well - we don't have functions returning variable kinds of tuples. And what would convert a struct to a tuple? Nor do I think it's needed - if you have an extension providing a new stats kind it can also provide accessors. I think the same (the extension should be able to do that). I really like the idea of being able to provide new stats kind. Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: New hooks in the connection path
Hi, On 8/16/22 10:10 AM, Bharath Rupireddy wrote: On Tue, Aug 16, 2022 at 1:31 PM Drouvot, Bertrand wrote: On 8/14/22 7:52 AM, Gurjeet Singh wrote: On Mon, Aug 8, 2022 at 3:51 AM Drouvot, Bertrand wrote: I think we can reduce the number of places the hook is called, if we call the hook from proc_exit(), and at all the other places we simply set a global variable to signify the reason for the failure. The case of _exit(1) from the signal-handler cannot use such a mechanism, but I think all the other cases of interest can simply register one of the FCET_* values, and let the call from proc_exit() pass that value to the hook. That looks like a good idea to me. I'm tempted to rewrite the patch that way (and addressing the first comment in the same time). Curious to hear about others hackers thoughts too. IMO, calling the hook from proc_exit() is not a good design as proc_exit() is a generic code called from many places in the source code, even the simple code of kind if(call_failed_conn_hook) { falied_conn_hook(params);} can come in the way of many exit code paths which is undesirable, and the likelihood of introducing new bugs may increase. Thanks for the feedback. What do you think about calling the hook only if the new global variable is not equal to its default value (which would mean don't trigger the hook)? Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Re: [PATCH] Expose port->authn_id to extensions and triggers
Hi, On 8/12/22 12:28 AM, Jacob Champion wrote: On Wed, Aug 10, 2022 at 10:48 PM Drouvot, Bertrand wrote: What do you think about adding a second field in ClientConnectionInfo for the auth method (as suggested by Michael upthread)? Sure -- without a followup patch, it's not really tested, though. v2 adjusts set_authn_id() to copy the auth_method over as well. It "passes tests" but is otherwise unexercised. Thank you! To help with the testing I've just provided a new version (aka v2-0004-system_user-implementation.patch) of the SYSTEM_USER patch in [1] that can be applied on top of "v2-0001-Allow-parallel-workers-to-read-authn_id.patch". But for this to work, the first comment below on your patch needs to be addressed. Once the first comment is addressed and the new SYSTEM_USER patch applied (that adds new tap tests) then we can test the propagation to the parallel workers with: make -C src/test/kerberos check PROVE_TESTS=t/001_auth.pl PROVE_FLAGS=-v and make -C src/test/authentication check PROVE_TESTS=t/001_password.pl PROVE_FLAGS=-v Both are currently successful. Regarding the comments on v2-0001-Allow-parallel-workers-to-read-authn_id.patch: 1) This is the one to be applied before adding the new SYSTEM_USER one on top: +typedef struct +{ + /* + * Authenticated identity. The meaning of this identifier is dependent on has to be replaced by: +typedef struct ClientConnectionInfo +{ + /* + * Authenticated identity. The meaning of this identifier is dependent on 2) + * Authenticated identity. The meaning of this identifier is dependent on There is one extra space before "The" 3) +SerializeClientConnectionInfo(Size maxsize, char *start_address) +{ + /* + * First byte is an indication of whether or not authn_id has been set to + * non-NULL, to differentiate that case from the empty string. + */ is authn_id being an empty string possible? 4) + */ + +ClientConnectionInfo MyClientConnectionInfo; + +/* + * Calculate the space needed to serialize MyClientConnectionInfo. + */ +Size +EstimateClientConnectionInfoSpace(void) From a coding style point of view, shouldn't "ClientConnectionInfo MyClientConnectionInfo;" be moved to the top of the file? [1]: https://commitfest.postgresql.org/39/3703/ Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Re: shared-memory based stats collector - v70
Hi, On 8/10/22 11:25 PM, Greg Stark wrote: On Wed, 10 Aug 2022 at 04:05, Drouvot, Bertrand wrote: Hi, On 8/9/22 6:00 PM, Greg Stark wrote: On Tue, 9 Aug 2022 at 06:19, Drouvot, Bertrand wrote: What do you think about adding a function in core PG to provide such functionality? (means being able to retrieve all the stats (+ eventually add some filtering) without the need to connect to each database). I'm working on it myself too. I'll post a patch for discussion in a bit. Great! Thank you! So I was adding the code to pgstat.c because I had thought there were some data types I needed and/or static functions I needed. However you and Andres encouraged me to check again now. And indeed I was able, after fixing a couple things, to make the code work entirely externally. Nice! Though I still think to have an SQL API in core could be useful to. As Andres was not -1 about that idea (as it should be low cost to add and maintain) as long as somebody cares enough to write something: then I'll give it a try and submit a patch for it. This is definitely not polished and there's a couple obvious things missing. But at the risk of embarrassment I've attached my WIP. Please be gentle :) I'll post the github link in a bit when I've fixed up some meta info. Thanks! I will have a look at it on github (once you share the link). Regards, -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com
Re: Patch proposal: New hooks in the connection path
Hi, On 7/6/22 12:11 AM, Joe Conway wrote: On 7/5/22 03:37, Bharath Rupireddy wrote: On Mon, Jul 4, 2022 at 6:23 PM Drouvot, Bertrand wrote: On 7/2/22 1:00 AM, Nathan Bossart wrote: > Could we model this after fmgr_hook? The first argument in that hook > indicates where it is being called from. This doesn't alleviate the need > for several calls to the hook in the authentication logic, but extension > authors would only need to define one hook. I like the idea and indeed fmgr.h looks a good place to model it. Attached a new patch version doing so. I was thinking along the same lines, so +1 for the general approach Thanks for the review! Thanks for the patch. Can we think of enhancing ClientAuthentication_hook_type itself i.e. make it a generic hook for all sorts of authentication metrics, info etc. with the type parameter embedded to it instead of new hook FailedConnection_hook?We can either add a new parameter for the "event" (the existing ClientAuthentication_hook_type implementers will have problems), or embed/multiplex the "event" info to existing Port structure or status variable (macro or enum) (existing implementers will not have compatibility problems). IMO, this looks cleaner going forward. Not sure I like this though -- I'll have to think about that Not sure about this one neither. The "enhanced" ClientAuthentication_hook will have to be fired at the same places as the new FailedConnection_hook is, but i think those places are not necessary linked to real authentication per say (making the name confusing). On the v2 patch: 1. Why do we need to place the hook and structures in fmgr.h? Why not in auth.h? agreed -- it does not belong in fmgr.h Moved to auth.h. 2. Timeout Handler is a signal handler, called as part of SIGALRM signal handler, most of the times, signal handlers ought to be doing small things, now that we are handing off the control to hook, which can do any long running work (writing to a remote storage, file, aggregate etc.), I don't think it's the right thing to do here. static void StartupPacketTimeoutHandler(void) { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_STARTUP_PACKET_TIMEOUT, MyProcPort); + ereport(COMMERROR, + (errcode(ERRCODE_PROTOCOL_VIOLATION), + errmsg("timeout while processing startup packet"))); Why add the ereport()? removed it. But more to Bharath's point, perhaps this is a case that is better served by incrementing a stat counter and not exposed as a hook? I think that the advantage of the hook is that it gives the extension author the ability/flexibility to aggregate the counter based on information available in the Port Struct (say the client addr for example) at this stage. What about to aggregate the stat counter based on the client addr? (Not sure if there is more useful information (than the client addr) at this stage though) That said, i agree that having a hook in a time out handler might not be the right thing to do (even if at the end that would be to the extension author responsibility to do "small things" in it), so it has been removed in the new attached version. Also, a teeny nit: 8<-- + if (status != STATUS_OK) { + if (FailedConnection_hook) 8<-- does not follow usual practice and probably should be: 8<-- + if (status != STATUS_OK) + { + if (FailedConnection_hook) 8<-- Thanks!, fixed. -- Bertrand Drouvot Amazon Web Services: https://aws.amazon.com diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d7257e4056..d9e1e3b4c1 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -567,6 +567,8 @@ int postmaster_alive_fds[2] = {-1, -1}; HANDLE PostmasterHandle; #endif +FailedConnection_hook_type FailedConnection_hook = NULL; + /* * Postmaster main entry point */ @@ -4462,7 +4464,11 @@ BackendInitialize(Port *port) * already did any appropriate error reporting. */ if (status != STATUS_OK) + { + if (FailedConnection_hook) + (*FailedConnection_hook) (FCET_BAD_STARTUP_PACKET, port); proc_exit(0); + } /* * Now that we have the user and database name, we can set the process diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 6b9082604f..e9bbd185f4 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -360,10 +360,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect if (!am_superuser && pg_database_aclcheck(MyDatabaseId, GetUserId(), ACL_CONNECT) !
Patch proposal: New hooks in the connection path
Hi hackers, While commit 960869da08 added some information about connections that have been successfully authenticated, there is no metrics for connections that have not (or did not reached the authentication stage). Adding metrics about failed connections attempts could also help, for example with proper sampling, to: * detect spikes in failed login attempts * check if there is a correlation between spikes in successful and failed connection attempts While the number of successful connections could also already been tracked with the ClientAuthentication_hook (and also the ones that failed the authentication) we are missing metrics about: * why the connection failed (could be bad password, bad database, bad user, missing CONNECT privilege...) * number of times the authentication stage has not been reached * why the authentication stage has not been reached (bad startup packets, timeout while processing startup packet,...) Those missing metrics (in addition to the ones that can be already gathered) could provide value for: * security investigations * anomalies detections * tracking application misconfigurations In an attempt to be able to provide those metrics, please find attached a patch proposal to add new hooks in the connection path, that would be fired if: * there is a bad startup packet * there is a timeout while processing the startup packet * user does not have CONNECT privilege * database does not exist For safety those hooks request the use of a const Port parameter, so that they could be used only for reporting purpose (for example, we are working on an extension to record detailed login metrics counters). Another option could be to add those metrics in the engine itself (instead of providing new hooks to get them), but the new hooks option gives more flexibility on how to render and exploit them (there is a lot of information in the Port Struct that one could be interested with). I’m adding this patch proposal to the commitfest. Looking forward to your feedback, Regards, Bertrand diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index dde4bc25b1..8e00327a5d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -567,6 +567,9 @@ int postmaster_alive_fds[2] = {-1, -1}; HANDLE PostmasterHandle; #endif +StartupPacketTimeout_hook_type StartupPacketTimeout_hook = NULL; +BadConnPacket_hook_type BadConnPacket_hook = NULL; + /* * Postmaster main entry point */ @@ -4462,8 +4465,11 @@ BackendInitialize(Port *port) * Stop here if it was bad or a cancel packet. ProcessStartupPacket * already did any appropriate error reporting. */ - if (status != STATUS_OK) + if (status != STATUS_OK) { + if (BadConnPacket_hook) + (*BadConnPacket_hook) (port); proc_exit(0); + } /* * Now that we have the user and database name, we can set the process @@ -5323,6 +5329,11 @@ dummy_handler(SIGNAL_ARGS) static void StartupPacketTimeoutHandler(void) { + if (StartupPacketTimeout_hook) + (*StartupPacketTimeout_hook) (MyProcPort); + ereport(COMMERROR, + (errcode(ERRCODE_PROTOCOL_VIOLATION), +errmsg("timeout while processing startup packet"))); _exit(1); } diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index 6b9082604f..562ed331bf 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -67,6 +67,14 @@ #include "utils/syscache.h" #include "utils/timeout.h" +/* + * Hooks to be used when a connection has been refused in case of bad + * database name, bad database oid or bad permissions. + */ +BadDb_hook_type baddbname_hook = NULL; +BadDb_hook_type baddboid_hook = NULL; +BadDb_hook_type baddbperm_hook = NULL; + static HeapTuple GetDatabaseTuple(const char *dbname); static HeapTuple GetDatabaseTupleByOid(Oid dboid); static void PerformAuthentication(Port *port); @@ -360,10 +368,14 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect if (!am_superuser && pg_database_aclcheck(MyDatabaseId, GetUserId(), ACL_CONNECT) != ACLCHECK_OK) + { + if (baddbperm_hook) + (*baddbperm_hook) (MyProcPort); ereport(FATAL, (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), errmsg("permission denied for database \"%s\"", name), errdetail("User does not have CONNECT privilege."))); + } /* * Check connection limit for this database. @@
SYSTEM_USER reserved word implementation
Hi hackers, The SYSTEM_USER is a sql reserved word as mentioned in [1] and is currently not implemented. Please find attached a patch proposal to make use of the SYSTEM_USER so that it returns the authenticated identity (if any) (aka authn_id in the Port struct). Indeed in some circumstances, the authenticated identity is not the SESSION_USER and then the information is lost from the connection point of view (it could still be retrieved thanks to commit 9afffcb833 and log_connections set to on). _Example 1, using the gss authentification._ Say we have this entry in pg_hba.conf: host all all 0.0.0.0/0 gss map=mygssmap and the related mapping in pg_ident.conf mygssmap /^(.*@.*)\.LOCAL$ mary Then, connecting with a valid Kerberos Ticket that contains “bertrand@BDTFOREST.LOCAL” as the default principal that way: psql -U mary -h myhostname -d postgres, we will get: postgres=> select current_user, session_user; current_user | session_user --+-- mary | mary (1 row) While the SYSTEM_USER would produce the Kerberos principal: postgres=> select system_user; system_user -- bertrand@BDTFOREST.LOCAL (1 row) _Example 2, using the peer authentification._ Say we have this entry in pg_hba.conf: local all john peer map=mypeermap and the related mapping in pg_ident.conf mypeermap postgres john Then connected localy as the system user postgres and connecting to the database that way: psql -U john -d postgres, we will get: postgres=> select current_user, session_user; current_user | session_user --+-- john | john (1 row) While the SYSTEM_USER would produce the system user that requested the connection: postgres=> select system_user; system_user - postgres (1 row) Thanks to those examples we have seen some situations where the information related to the authenticated identity has been lost from the connection point of view (means not visible in the current_session or in the session_user). The purpose of this patch is to make it visible through the SYSTEM_USER sql reserved word. _Remarks: _ - In case port->authn_id is NULL then the patch is returning the SESSION_USER for the SYSTEM_USER. Perhaps it should return NULL instead. - There is another thread [2] to expose port->authn_id to extensions and triggers thanks to a new API. This thread [2] leads to discussions about providing this information to the parallel workers too. While the new MyClientConnectionInfo being discussed in [2] could be useful to hold the client information that needs to be shared between the backend and any parallel workers, it does not seem to be needed in the case port->authn_id is exposed through SYSTEM_USER (like it is not for CURRENT_USER and SESSION_USER). I will add this patch to the next commitfest. I look forward to your feedback. Bertrand [1]: https://www.postgresql.org/docs/current/sql-keywords-appendix.html [2]: https://www.postgresql.org/message-id/flat/793d990837ae5c06a558d58d62de9378ab525d83.camel%40vmware.com diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml index 478a216dbb..07bb333a3b 100644 --- a/doc/src/sgml/func.sgml +++ b/doc/src/sgml/func.sgml @@ -23660,6 +23660,20 @@ SELECT * FROM pg_ls_dir('.') WITH ORDINALITY AS t(ls,n); + + + + system_user + +system_user +name + + +Returns the identity (if any) that the user presented during the +authentication cycle, before they were assigned a database role. + + + diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index e44ad68cda..63cf3d8bb7 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -2543,6 +2543,11 @@ ExecEvalSQLValueFunction(ExprState *state, ExprEvalStep *op) *op->resvalue = session_user(fcinfo); *op->resnull = fcinfo->isnull; break; + case SVFOP_SYSTEM_USER: + InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL); + *op->resvalue = system_user(fcinfo); + *op->resnull = fcinfo->isnull; + break; case SVFOP_CURRENT_CATALOG: InitFunctionCallInfoData(*fcinfo, NULL, 0, InvalidOid, NULL, NULL); *op->resvalue = current_database(fcinfo); diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y index 969c9c158f..29dcc77724 100644 --- a/src/backend/parser/gram.y +++ b/src/backend/parser/gram.y @@ -842,7 +842,7 @@ static Node *makeRecursiveViewSelect(char *relname, List *aliases, Node *query); SEQUENCE SEQUENCES SERIALIZABLE SERVER SESSION SESSION_USER SET SETS SETOF SHARE SHOW SIMILAR SIMPLE
Re: logical decoding bug: segfault in ReorderBufferToastReplace()
On 12/9/19, 10:10 AM, "Tomas Vondra" wrote: >On Wed, Dec 04, 2019 at 05:36:16PM -0800, Jeremy Schneider wrote: >>On 9/8/19 14:01, Tom Lane wrote: >>> Fix RelationIdGetRelation calls that weren't bothering with error checks. >>> >>> ... >>> >>> Details >>> --- >>> https://git.postgresql.org/pg/commitdiff/69f883fef14a3fc5849126799278abcc43f40f56 >> >>We had two different databases this week (with the same schema) both >>independently hit the condition of this recent commit from Tom. It's on >>11.5 so we're actually segfaulting and restarting rather than just >>causing the walsender process to ERROR, but regardless there's still >>some underlying bug here. >> >>We have core files and we're still working to see if we can figure out >>what's going on, but I thought I'd report now in case anyone has extra >>ideas or suggestions. The segfault is on line 3034 of reorderbuffer.c. >> >>https://github.com/postgres/postgres/blob/REL_11_5/src/backend/replication/logical/reorderbuffer.c#L3034 >> >>3033 toast_rel = RelationIdGetRelation(relation->rd_rel->reltoastrelid); >>3034 toast_desc = RelationGetDescr(toast_rel); >> >>We'll keep looking; let me know any feedback! Would love to track down >>whatever bug is in the logical decoding code, if that's what it is. >> >>== >> >>backtrace showing the call stack... >> >>Core was generated by `postgres: walsender >>(31712)'. >>Program terminated with signal 11, Segmentation fault. >>#0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78, >>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938) >>at reorderbuffer.c:3034 >>3034reorderbuffer.c: No such file or directory. >>... >>(gdb) #0 ReorderBufferToastReplace (rb=0x3086af0, txn=0x3094a78, >>relation=0x2b79177249c8, relation=0x2b79177249c8, change=0x30ac938) >>at reorderbuffer.c:3034 >>#1 ReorderBufferCommit (rb=0x3086af0, xid=xid@entry=1358809, >>commit_lsn=9430473346032, end_lsn=, >>commit_time=commit_time@entry=628712466364268, >>origin_id=origin_id@entry=0, origin_lsn=origin_lsn@entry=0) at >>reorderbuffer.c:1584 >>#2 0x00716248 in DecodeCommit (xid=1358809, >>parsed=0x7ffc4ce123f0, buf=0x7ffc4ce125b0, ctx=0x3068f70) at decode.c:637 >>#3 DecodeXactOp (ctx=0x3068f70, buf=buf@entry=0x7ffc4ce125b0) at >>decode.c:245 >>#4 0x0071655a in LogicalDecodingProcessRecord (ctx=0x3068f70, >>record=0x3069208) at decode.c:117 >>#5 0x00727150 in XLogSendLogical () at walsender.c:2886 >>#6 0x00729192 in WalSndLoop (send_data=send_data@entry=0x7270f0 >>) at walsender.c:2249 >>#7 0x00729f91 in StartLogicalReplication (cmd=0x30485a0) at >>walsender.c: >>#8 exec_replication_command ( >>cmd_string=cmd_string@entry=0x2f968b0 "START_REPLICATION SLOT >>\"\" LOGICAL 893/38002B98 (proto_version '1', >>publication_names '\"\"')") at walsender.c:1628 >>#9 0x0076e939 in PostgresMain (argc=, >>argv=argv@entry=0x2fea168, dbname=0x2fea020 "", >>username=) at postgres.c:4182 >>#10 0x004bdcb5 in BackendRun (port=0x2fdec50) at postmaster.c:4410 >>#11 BackendStartup (port=0x2fdec50) at postmaster.c:4082 >>#12 ServerLoop () at postmaster.c:1759 >>#13 0x007062f9 in PostmasterMain (argc=argc@entry=7, >>argv=argv@entry=0x2f92540) at postmaster.c:1432 >>#14 0x004be73b in main (argc=7, argv=0x2f92540) at main.c:228 >> >>== >> >>Some additional context... >> >># select * from pg_publication_rel; >> prpubid | prrelid >>-+- >> 71417 | 16453 >> 71417 | 54949 >>(2 rows) >> >>(gdb) print toast_rel >>$4 = (struct RelationData *) 0x0 >> >>(gdb) print *relation->rd_rel >>$11 = {relname = {data = "", '\000' }, >>relnamespace = 16402, reltype = 16430, reloftype = 0, >>relowner = 16393, relam = 0, relfilenode = 16428, reltablespace = 0, >>relpages = 0, reltuples = 0, relallvisible = 0, reltoastrelid = 0, >Hmmm, so reltoastrelid = 0, i.e. the relation does not have a TOAST >relation. Yet we're calling ReorderBufferToastReplace on the decoded >record ... interesting. > >Can you share structure of the relation causing the issue? Here it is: \d+ rel_having_issue Table "public.rel_having_issue" Column | Type | Collation | Nullable | Default | Storage | Stats target | Description +--+---+--+-+--+--+- id | integer