Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-24 Thread Drouvot, Bertrand

Hi,

On 10/24/22 5:34 AM, Michael Paquier wrote:

On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:

On 10/21/22 2:58 AM, Michael Paquier wrote:


I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists.  The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes).  The CI and all my machines were green, and
the test coverage looked sufficient.  So, applied. 

Thanks!

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-23 Thread Michael Paquier
On Fri, Oct 21, 2022 at 02:10:37PM +0200, Drouvot, Bertrand wrote:
> On 10/21/22 2:58 AM, Michael Paquier wrote:
>> The same approach with keywords first, regex, and exact match could be
>> applied as well for the databases?  Perhaps it is just mainly a matter
>> of taste,
> 
> Yeah, I think it is.

;)

Still it looks that this makes for less confusion with a minimal
footprint once the new additions are in place.

>> In the same fashion as load_ident(), it seems to me that we
>> need two extra things for this patch:
>> - if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
>> through new_parsed_lines and free for each line the AuthTokens for the
>> database and user lists.
>> - if ok and new_parsed_lines != NIL, the same cleanup needs to
>> happen.
> 
> Right, but I think that should be "parsed_hba_lines != NIL".

For the second case, where we need to free the past contents after a
success, yes.

> Right. To avoid code duplication in the !ok/ok cases, the function
> free_hba_line() has been added in v2: it goes through the list of databases
> and roles tokens and call free_auth_token() for each of them.

Having a small-ish routine for that is fine.

I have spent a couple of hours doing a pass over v2, playing manually
with regex patterns, reloads, the system views and item lists.  The
logic was fine, but I have adjusted a few things related to the
comments and the documentation (particularly with the examples,
removing one example and updating one with a regex that has a comma,
needing double quotes).  The CI and all my machines were green, and
the test coverage looked sufficient.  So, applied.  I'll keep an eye
on the buildfarm.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-21 Thread Drouvot, Bertrand

Hi,

On 10/21/22 2:58 AM, Michael Paquier wrote:

On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:

Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
implement regexes for databases and roles in hba.

It does also contain new regexes related TAP tests and doc updates.


Thanks for the updated version.  This is really easy to look at now.


It relies on the refactoring made in fc579e11c6 (but changes the
regcomp_auth_token() parameters so that it is now responsible for emitting
the compilation error message (if any), to avoid code duplication in
parse_hba_line() and parse_ident_line() for roles, databases and user name
mapping).


@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-   if (!tok->quoted && tok->string[0] == '+')
+   if (!token_has_regexp(tok))
 {
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
 //do
else if (token_is_keyword(tok, "all"))
 //do
else if (token_has_regexp(tok))
 //do regex compilation, handling failures
else if (token_matches(tok, role))
 //do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, 


Yeah, I think it is.


and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, 


And agree that your proposal tastes better ;-): it is easier to 
understand, v2 attached has been done that way.



Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup. 


Oops, right, thanks for the call out!


In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.


Right, but I think that should be "parsed_hba_lines != NIL".


My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths. 


Right. To avoid code duplication in the !ok/ok cases, the function 
free_hba_line() has been added in v2: it goes through the list of 
databases and roles tokens and call free_auth_token() for each of them.



Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.


I agree, and v2 is not attempting to unify them.


For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.


Thanks! v2 attached does apply on top of that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
-   preceding the file name with @.
+   a specific PostgreSQL database or a regular
+   expression preceded by /.
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names and/or regular expressions 
preceded
+   by / can be specified by preceding the file name
+   with @.
   
  
 
@@ -249,18 +252,20 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /,
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
 

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-20 Thread Michael Paquier
On Wed, Oct 19, 2022 at 10:45:44AM +0200, Drouvot, Bertrand wrote:
> Please find attached v1-0001-regex-handling-for-db-and-roles-in-hba.patch to
> implement regexes for databases and roles in hba.
> 
> It does also contain new regexes related TAP tests and doc updates.

Thanks for the updated version.  This is really easy to look at now.

> It relies on the refactoring made in fc579e11c6 (but changes the
> regcomp_auth_token() parameters so that it is now responsible for emitting
> the compilation error message (if any), to avoid code duplication in
> parse_hba_line() and parse_ident_line() for roles, databases and user name
> mapping).

@@ -652,13 +670,18 @@ check_role(const char *role, Oid roleid, List *tokens)
[...]
-   if (!tok->quoted && tok->string[0] == '+')
+   if (!token_has_regexp(tok))
{
Hmm.  Do we need token_has_regexp() here for all the cases?  We know
that the string can begin with a '+', hence it is no regex.  The same
applies for the case of "all".  The remaining case is the one where
the user name matches exactly the AuthToken string, which should be
last as we want to treat anything beginning with a '/' as a regex.  It
seems like we could do an order like that?  Say:
if (!tok->quoted && tok->string[0] == '+')
//do
else if (token_is_keyword(tok, "all"))
//do
else if (token_has_regexp(tok))
//do regex compilation, handling failures
else if (token_matches(tok, role))
//do exact match

The same approach with keywords first, regex, and exact match could be
applied as well for the databases?  Perhaps it is just mainly a matter
of taste, and it depends on how much you want to prioritize the place
of the regex over the rest but that could make the code easier to
understand in the long-run and this is a very sensitive code area, and
the case of physical walsenders (in short specific process types)
requiringx specific conditions is also something to take into account. 

foreach(tokencell, tokens)
{
-   parsedline->roles = lappend(parsedline->roles,
-   copy_auth_token(lfirst(tokencell)));
+   AuthToken  *tok = copy_auth_token(lfirst(tokencell));
+
+   /*
+* Compile a regex from the role token, if necessary.
+*/
+   if (regcomp_auth_token(tok, HbaFileName, line_num, err_msg, elevel))
+   return NULL;
+
+   parsedline->roles = lappend(parsedline->roles, tok);
}

Compiling the expressions for the user and database lists one-by-one
in parse_hba_line() as you do is correct.  However there is a gotcha
that you are forgetting here: the memory allocations done for the
regexp compilations are not linked to the memory context where each
line is processed (named hbacxt in load_hba()) and need a separate
cleanup.  In the same fashion as load_ident(), it seems to me that we
need two extra things for this patch:
- if !ok (see where we do MemoryContextDelete(hbacxt)), we need to go
through new_parsed_lines and free for each line the AuthTokens for the
database and user lists.
- if ok and new_parsed_lines != NIL, the same cleanup needs to
happen.
My guess is that you could do both the same way as load_ident() does,
keeping some symmetry between the two code paths.  Unifying both into
a common routine would be sort of annoying as HbaLines uses lists
within the lists of parsed lines, and IdentLine can have one at most
in each line.

I am wondering whether we should link the regexp code to not use
malloc(), actually..  This would require a separate analysis, though,
and I suspect that palloc() would be very expensive for this job. 

For now, I have made your last patch a bit shorter by applying the
refactoring of regcomp_auth_token() separately with a few tweaks to
the comments.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-19 Thread Drouvot, Bertrand

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)



The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.


Please find attached 
v1-0001-regex-handling-for-db-and-roles-in-hba.patch to implement 
regexes for databases and roles in hba.


It does also contain new regexes related TAP tests and doc updates.

It relies on the refactoring made in fc579e11c6 (but changes the 
regcomp_auth_token() parameters so that it is now responsible for 
emitting the compilation error message (if any), to avoid code 
duplication in parse_hba_line() and parse_ident_line() for roles, 
databases and user name mapping).


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..30753003ba 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -234,10 +234,13 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
-   preceding the file name with @.
+   a specific PostgreSQL database or a regular
+   expression preceded by /.
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names and/or regular expressions 
preceded
+   by / can be specified by preceding the file name
+   with @.
   
  
 
@@ -249,18 +252,20 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /,
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
of this role, while a name without a + mark 
matches
-   only that specific role.) For this purpose, a superuser is only
+   only that specific role or regular expression.) For this purpose, a 
superuser is only
considered to be a member of a role if they are explicitly a member
of the role, directly or indirectly, and not just by virtue of
being a superuser.
-   Multiple user names can be supplied by separating them with commas.
-   A separate file containing user names can be specified by preceding the
-   file name with @.
+   Multiple user names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas. A separate file 
containing
+   user names and/or regular expressions preceded by /
+   can be specified by preceding the file name with @.
   
  
 
@@ -739,6 +744,18 @@ hostall all ::1/128
 trust
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all localhost   trust
 
+# The same using regular expression for DATABASE, which allows connection to 
the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE   USERADDRESS  METHOD
+local   db1,/^.*test$,testdb   all localhosttrust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE USER  ADDRESS 
 METHOD
+local   db1,/^.*test$,testdb user1,/^.*test$,testuser  localhost   
 trust
+
 # Allow any user from any host with IP address 192.168.93.x to connect
 # to database "postgres" as the same user name that ident reports for
 # the connection (typically the operating system user name).
@@ -785,15 +802,17 @@ hostall all 192.168.12.10/32  
  gss
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all 192.168.0.0/16  ident 
map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and 

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-19 Thread Drouvot, Bertrand

Hi,

On 10/19/22 3:18 AM, Michael Paquier wrote:

On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:

BTW, what about adding a new TAP test (dedicated patch) to test the behavior
in case of errors during the regexes compilation in pg_ident.conf and
pg_hba.conf (not done yet)? (we could add it once this  patch series is
done).


Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
cases where no fork() implementation is available, aka Windows).  But
a postmaster restart failure would generate logs that could be picked
for a pattern check?


Right, that's how I'd see it. I'll give it a look.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-18 Thread Michael Paquier
On Tue, Oct 18, 2022 at 09:14:21AM +0200, Drouvot, Bertrand wrote:
> BTW, what about adding a new TAP test (dedicated patch) to test the behavior
> in case of errors during the regexes compilation in pg_ident.conf and
> pg_hba.conf (not done yet)? (we could add it once this  patch series is
> done).

Perhaps, that may become tricky when it comes to -DEXEC_BACKEND (for
cases where no fork() implementation is available, aka Windows).  But
a postmaster restart failure would generate logs that could be picked
for a pattern check?

>> While putting my hands on that, I was also wondering whether we should
>> have the error string generated after compilation within the internal
>> regcomp() routine, but that would require more arguments to
>> pg_regcomp() (as of file name, line number, **err_string), and that
>> looks more invasive than necessary.  Perhaps the follow-up steps will
>> prove me wrong, though :)
> 
> I've had the same thought (and that was what the previous global patch was
> doing). I'm tempted to think that the follow-steps will prove you right ;-)
> (specially if at the end those will be the same error messages for databases
> and roles).

Avoiding three times the same error message seems like a good thing in
the long run, but let's think about this part later as needed.  All
these routines are static to hba.c so even if we finish by not
finishing the whole job for this development cycle we can still be
very flexible.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-18 Thread Drouvot, Bertrand

Hi,

On 10/18/22 7:51 AM, Michael Paquier wrote:

On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:

On 10/14/22 7:30 AM, Michael Paquier wrote:

This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).


I'm not sure to get the issue here with the proposed approach and
pg_ident.conf.


My point is about parse_ident_line(), where we need to be careful in
the order of the operations.  The macros IDENT_MULTI_VALUE() and
IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
the regex computation needs to be last.  Your patch had a subtile
issue here, as users may get errors on the computed regex before the
ordering of the fields as the computation was used *before* the "Get
the PG rolename token" part of the logic.


Gotcha, thanks! I was wondering if we shouldn't add a comment about that 
and I see that you've added one in v2, thanks!


BTW, what about adding a new TAP test (dedicated patch) to test the 
behavior in case of errors during the regexes compilation in 
pg_ident.conf and pg_hba.conf (not done yet)? (we could add it once this 
 patch series is done).



While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary.  Perhaps the follow-up steps will
prove me wrong, though :)


I've had the same thought (and that was what the previous global patch 
was doing). I'm tempted to think that the follow-steps will prove you 
right ;-) (specially if at the end those will be the same error messages 
for databases and roles).




A last thing is the introduction of a free() routine for AuthTokens,
to minimize the number of places where we haev pg_regfree().  The gain
is minimal, but that looks more consistent with the execution and
compilation paths.


Agree, that looks better.

I had a look at your v2, did a few tests and it looks good to me.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-17 Thread Michael Paquier
On Mon, Oct 17, 2022 at 07:56:02PM +0200, Drouvot, Bertrand wrote:
> On 10/14/22 7:30 AM, Michael Paquier wrote:
>> This approach would not stick with
>> pg_ident.conf though, as we validate the fields in each line when we
>> put our hands on ident_user and after the base validation of a line
>> (number of fields, etc.).
> 
> I'm not sure to get the issue here with the proposed approach and
> pg_ident.conf.

My point is about parse_ident_line(), where we need to be careful in
the order of the operations.  The macros IDENT_MULTI_VALUE() and
IDENT_FIELD_ABSENT() need to be applied on all the fields first, and
the regex computation needs to be last.  Your patch had a subtile
issue here, as users may get errors on the computed regex before the
ordering of the fields as the computation was used *before* the "Get
the PG rolename token" part of the logic.

>> About this last point, token_regexec() does not include
>> check_ident_usermap() in its logic, and it seems to me that it should.
>> The difference is with the expected regmatch_t structures, so
>> shouldn't token_regexec be extended with two arguments as of an array
>> of regmatch_t and the number of elements in the array?
> 
> You are right, not using token_regexec() in check_ident_usermap() in the
> previous patch versions was not right. That's fixed in the attached, though
> the substitution (if any) is still outside of token_regexec(), do you think
> it should be part of it? (I think that makes sense to keep it outside of it
> as we wont use the substitution logic for roles, databases and hostname)

Keeping the substition done with the IdentLine's Authtokens outside of
the internal execution routine is fine by me.


While putting my hands on that, I was also wondering whether we should
have the error string generated after compilation within the internal
regcomp() routine, but that would require more arguments to
pg_regcomp() (as of file name, line number, **err_string), and that
looks more invasive than necessary.  Perhaps the follow-up steps will
prove me wrong, though :)

A last thing is the introduction of a free() routine for AuthTokens,
to minimize the number of places where we haev pg_regfree().  The gain
is minimal, but that looks more consistent with the execution and
compilation paths.
--
Michael
From 89a7ce5bb2338f65ebd42703eed11033881646cb Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Tue, 18 Oct 2022 14:51:03 +0900
Subject: [PATCH v2] Refactor regex handling for pg_ident.conf in hba.c

---
 src/include/libpq/hba.h  |  28 +++---
 src/backend/libpq/hba.c  | 162 ---
 src/backend/utils/adt/hbafuncs.c |   2 +-
 3 files changed, 121 insertions(+), 71 deletions(-)

diff --git a/src/include/libpq/hba.h b/src/include/libpq/hba.h
index d06da81806..cec2e2665f 100644
--- a/src/include/libpq/hba.h
+++ b/src/include/libpq/hba.h
@@ -77,6 +77,20 @@ typedef enum ClientCertName
 	clientCertDN
 } ClientCertName;
 
+/*
+ * A single string token lexed from an authentication configuration file
+ * (pg_ident.conf or pg_hba.conf), together with whether the token has
+ * been quoted.  If "string" begins with a slash, it may optionally
+ * contain a regular expression (currently used for pg_ident.conf when
+ * building IdentLines).
+ */
+typedef struct AuthToken
+{
+	char	   *string;
+	bool		quoted;
+	regex_t*regex;
+} AuthToken;
+
 typedef struct HbaLine
 {
 	int			linenumber;
@@ -127,22 +141,10 @@ typedef struct IdentLine
 	int			linenumber;
 
 	char	   *usermap;
-	char	   *ident_user;
 	char	   *pg_role;
-	regex_t		re;
+	AuthToken  *token;
 } IdentLine;
 
-/*
- * A single string token lexed from an authentication configuration file
- * (pg_ident.conf or pg_hba.conf), together with whether the token has
- * been quoted.
- */
-typedef struct AuthToken
-{
-	char	   *string;
-	bool		quoted;
-} AuthToken;
-
 /*
  * TokenizedAuthLine represents one line lexed from an authentication
  * configuration file.  Each item in the "fields" list is a sub-list of
diff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..ea4e955d51 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_has_regexp(t)	(t->regex != NULL)
 #define token_is_keyword(t, k)	(!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -80,9 +81,10 @@ static MemoryContext parsed_hba_context = NULL;
  * pre-parsed content of ident mapping file: list of IdentLine structs.
  * parsed_ident_context is the memory context where it lives.
  *
- * NOTE: the IdentLine structs can contain pre-compiled regular expressions
- * that live outside the memory context. Before destroying or resetting the
- * memory context, they need to be explicitly free'd.
+ * NOTE: the IdentLine structs can contain AuthTokens with pre-compiled
+ * regular expressions that live outside the memory context. Before
+

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-17 Thread Drouvot, Bertrand

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)



I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes. 


I think that it makes sense.


This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).


I'm not sure to get the issue here with the proposed approach and 
pg_ident.conf.


The new attached patch proposal is making use of make_auth_token() 
(through copy_auth_token()) in parse_ident_line(), do you see any issue?




The logic in charge of compiling the regular expressions could be
consolidated more.  The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases).  This approach looks right to me.  Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input


Right, did it that way in the attached.



- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression).  This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.


Right.


About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array? 


You are right, not using token_regexec() in check_ident_usermap() in the 
previous patch versions was not right. That's fixed in the attached, 
though the substitution (if any) is still outside of token_regexec(), do 
you think it should be part of it? (I think that makes sense to keep it 
outside of it as we wont use the substitution logic for roles, databases 
and hostname)




The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec()


Agree. Please find attached v1-0001-token_reg-functions.patch for this 
first step.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/src/backend/libpq/hba.c b/src/backend/libpq/hba.c
index 4637426d62..d9ef98141c 100644
--- a/src/backend/libpq/hba.c
+++ b/src/backend/libpq/hba.c
@@ -66,6 +66,7 @@ typedef struct check_network_data
 } check_network_data;
 
 
+#define token_is_regexp(t) (t->regex)
 #define token_is_keyword(t, k) (!t->quoted && strcmp(t->string, k) == 0)
 #define token_matches(t, k)  (strcmp(t->string, k) == 0)
 
@@ -117,6 +118,9 @@ static List *tokenize_inc_file(List *tokens, const char 
*outer_filename,
   const char 
*inc_filename, int elevel, char **err_msg);
 static bool parse_hba_auth_opt(char *name, char *val, HbaLine *hbaline,
   int elevel, char 
**err_msg);
+static void token_regcomp(AuthToken *token);
+static int token_regexec(const char *match, regex_t *re, size_t nmatch,
+ regmatch_t pmatch[]);
 
 
 /*
@@ -267,11 +271,21 @@ make_auth_token(const char *token, bool quoted)
 
toklen = strlen(token);
/* we copy string into same palloc block as the struct */
-   authtoken = (AuthToken *) palloc(sizeof(AuthToken) + toklen + 1);
+   authtoken = (AuthToken *) palloc0(sizeof(AuthToken) + toklen + 1);
authtoken->string = (char *) authtoken + sizeof(AuthToken);
authtoken->quoted = quoted;
+   authtoken->regcomp_rc = 0;
memcpy(authtoken->string, token, toklen + 1);
 
+   if (authtoken->string[0] == '/')
+   {
+   /*
+* When the string starts with a slash, treat it as a regular
+* expression. Pre-compile it.
+*/
+   token_regcomp(authtoken);
+   }
+
return authtoken;
 }
 
@@ -2326,52 +2340,39 @@ parse_ident_line(TokenizedAuthLine *tok_line, int 
elevel)
tokens = lfirst(field);
IDENT_MULTI_VALUE(tokens);
token = linitial(tokens);
-   parsedline->ident_user = pstrdup(token->string);
 
-   /* Get the PG rolename token */
-   field = lnext(tok_line->fields, field);
-   IDENT_FIELD_ABSENT(field);
-   tokens = lfirst(field);
-   IDENT_MULTI_VALUE(tokens);
-   token = linitial(tokens);
-   parsedline->pg_role = pstrdup(token->string);
+   /* Copy the ident user token */
+   parsedline->token = copy_auth_token(toke

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 7:30 AM, Michael Paquier wrote:

On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:

Indeed, ;-)


So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code,


Thanks!


The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--


I agree to work step-by-step.

While looking at it again now, I discovered that the new TAP test for 
the regexp on the hostname in ssl/002_scram.pl is failing on some of my 
tests environment (and not all..).


So, I agree with the dedicated steps you are proposing and that the 
"host case" needs a dedicated attention.


I'm not ignoring all the remarks you've just done up-thread, I'll 
address them and/or provide my feedback on them when I'll come back with 
the step-by-step sub patches.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-14 Thread Drouvot, Bertrand

Hi,

On 10/14/22 8:18 AM, Michael Paquier wrote:

On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.


I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it. 


Yeah, I also think this is the right place for it.


Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?


I agree, that makes sense. I'll work on that.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-13 Thread Michael Paquier
On Fri, Oct 14, 2022 at 02:30:25PM +0900, Michael Paquier wrote:
> First, as of HEAD, AuthToken is only used for elements in a list of
> role and database names in hba.conf before filling in each HbaLine,
> hence we limit its usage to the initial parsing.  The patch assigns an
> optional regex_t to it, then extends the use of AuthToken for single
> hostname entries in pg_hba.conf.  Things going first: shouldn't we
> combine ident_user and "re" together in the same structure?  Even if
> we finish by not using AuthToken to store the computed regex, it seems
> to me that we'd better use the same base structure for pg_ident.conf
> and pg_hba.conf.  While looking closely at the patch, we would expand
> the use of AuthToken outside its original context.  I have also looked
> at make_auth_token(), and wondered if it could be possible to have this
> routine compile the regexes.  This approach would not stick with
> pg_ident.conf though, as we validate the fields in each line when we
> put our hands on ident_user and after the base validation of a line
> (number of fields, etc.).  So with all that in mind, it feels right to
> not use AuthToken at all when building each HbaLine and each
> IdentLine, but a new, separate, structure.  We could call that an
> AuthItem (string, its compiled regex) perhaps?  It could have its own
> make() routine, taking in input an AuthToken and process
> pg_regcomp().  Better ideas for this new structure would be welcome,
> and the idea is that we'd store the post-parsing state of an
> AuthToken to something that has a compiled regex.  We could finish by
> using AuthToken at the end and expand its use, but it does not feel
> completely right either to have a make() routine but not be able to
> compile its regular expression when creating the AuthToken.

I have have sent this part too quickly.  As AuthTokens are used in
check_db() and check_role() when matching entries, it is more
intuitive to store the regex_t directly in it.  Changing IdentLine to
use a AuthToken makes the "quoted" part useless in this case, still it
could be used in Assert()s to make sure that the data is shaped as
expected at check-time, enforced at false when creating it in
parse_ident_line()?
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-13 Thread Michael Paquier
On Wed, Oct 12, 2022 at 08:17:14AM +0200, Drouvot, Bertrand wrote:
> Indeed, ;-)

So, I have spent the last two days looking at all that, studying the
structure of the patch and the existing HEAD code, and it looks like
that a few things could be consolidated.

First, as of HEAD, AuthToken is only used for elements in a list of
role and database names in hba.conf before filling in each HbaLine,
hence we limit its usage to the initial parsing.  The patch assigns an
optional regex_t to it, then extends the use of AuthToken for single
hostname entries in pg_hba.conf.  Things going first: shouldn't we
combine ident_user and "re" together in the same structure?  Even if
we finish by not using AuthToken to store the computed regex, it seems
to me that we'd better use the same base structure for pg_ident.conf
and pg_hba.conf.  While looking closely at the patch, we would expand
the use of AuthToken outside its original context.  I have also looked
at make_auth_token(), and wondered if it could be possible to have this
routine compile the regexes.  This approach would not stick with
pg_ident.conf though, as we validate the fields in each line when we
put our hands on ident_user and after the base validation of a line
(number of fields, etc.).  So with all that in mind, it feels right to
not use AuthToken at all when building each HbaLine and each
IdentLine, but a new, separate, structure.  We could call that an
AuthItem (string, its compiled regex) perhaps?  It could have its own
make() routine, taking in input an AuthToken and process
pg_regcomp().  Better ideas for this new structure would be welcome,
and the idea is that we'd store the post-parsing state of an
AuthToken to something that has a compiled regex.  We could finish by
using AuthToken at the end and expand its use, but it does not feel
completely right either to have a make() routine but not be able to
compile its regular expression when creating the AuthToken.

The logic in charge of compiling the regular expressions could be
consolidated more.  The patch refactors the logic with
token_regcomp(), uses it for the user names (ident_user in
parse_ident_line() from pg_ident.conf), then extended to the hostnames
(single item) and the role/database names (list possible in these
cases).  This approach looks right to me.  Once we plug in an AuthItem
to IdentLine, token_regcomp could be changed so as it takes an
AuthToken in input, saving directly the compiled regex_t in the input
structure.

At the end, the global structure of the code should, I guess, respect
the following rules:
- The number of places where we check if a string is a regex should be
minimal (aka string beginning by '/').
- Only one code path of hba.c should call pg_regcomp() (the patch does
that), and only one code path should call pg_regexec() (two code paths
of hba.c do that with the patch, as of the need to store matching
expression).  This should minimize the areas where we call
pg_mb2wchar_with_len(), for one.

About this last point, token_regexec() does not include
check_ident_usermap() in its logic, and it seems to me that it should.
The difference is with the expected regmatch_t structures, so
shouldn't token_regexec be extended with two arguments as of an array
of regmatch_t and the number of elements in the array?  This would
save a bit some of the logic around pg_mb2wchar_with_len(), for
example.  To make all that work, token_regexec() should return an int,
coming from pg_regexec, but no specific error strings as we don't want
to spam the logs when checking hosts, roles and databases in
pg_hba.conf.

   /* Check if it has a CIDR suffix and if so isolate it */
-  cidr_slash = strchr(str, '/');
-  if (cidr_slash)
-  *cidr_slash = '\0';
+  if (!is_regexp)
+  {
+  cidr_slash = strchr(str, '/');
+  if (cidr_slash)
+  *cidr_slash = '\0';
+  }
[...]
/* Get the netmask */
-   if (cidr_slash)
+   if (cidr_slash && !is_regexp)
{
Some of the code handling regexes for hostnames itches me a bit, like
this one.  Perhaps it would be better to evaluate this interaction
with regular expressions separately.  The database and role names
don't have this need, so their cases are much simpler to think about.

The code could be split to tackle things step-by-step:
- One refactoring patch to introduce token_regcomp() and
token_regexec(), with the introduction of a new structure that
includes the compiled regexes.  (Feel free to counterargue about the
use of AuthToken for this purpose, of course!)
- Plug in the refactored logic for the lists of role names and
database names in pg_hba.conf.
- Handle the case of single host entries in pg_hba.conf.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-11 Thread Drouvot, Bertrand

Hi,

On 10/11/22 8:29 AM, Michael Paquier wrote:

On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:

foreach(cell, tokens)
{
[...]
+   tokreg = lfirst(cell);
+   if (!token_is_regexp(tokreg))
{
-   if (strcmp(dbname, role) == 0)
+   if (am_walsender && !am_db_walsender)
+   {
+   /*
+* physical replication walsender connections 
can only match
+* replication keyword
+*/
+   if (token_is_keyword(tokreg->authtoken, 
"replication"))
+   return true;
+   }
+   else if (token_is_keyword(tokreg->authtoken, "all"))
return true;


When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no?  


Oh right, good catch, thanks! Please find attached v6 fixing it.



This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced.  WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).



Thanks for the explanation!


+typedef struct AuthToken
+{
+   char   *string;
+   boolquoted;
+} AuthToken;
+
+/*
+ * Distinguish the case a token has to be treated as a regular
+ * expression or not.
+ */
+typedef struct AuthTokenOrRegex
+{
+   boolis_regex;
+
+   /*
+* Not an union as we still need the token string for fill_hba_line().
+*/
+   AuthToken  *authtoken;
+   regex_t*regex;
+} AuthTokenOrRegex;


Hmm.  With is_regex to check if a regex_t exists, both structures may
not be necessary. 


Agree that both struct are not necessary. In v6, AuthTokenOrRegex has 
been removed and the regex has been moved to AuthToken. There is no 
is_regex bool anymore, as it's enough to test whether regex is NULL or not.



I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?


The patch does allow that. For example it happens for the test where we 
add a comma in the role name. As we don't rely on a dedicated char to 
mark the end of a reg exp (we only rely on / to mark its start) then 
allowing (regex_t != NULL && quoted) seems reasonable to me.



+# test with a comma in the regular expression
+reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
+test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
+   0);


So, we check here that the role includes "5," in its name.  This is
getting fun to parse ;)



Indeed, ;-)



  elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
  {
-   plan skip_all => 'Potentially unsafe test SSL not enabled in 
PG_TEST_EXTRA';
+   plan skip_all =>
+ 'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
  }


Unrelated noise from perltidy.


Right.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-10 Thread Michael Paquier
On Mon, Oct 10, 2022 at 09:00:06AM +0200, Drouvot, Bertrand wrote:
>   foreach(cell, tokens)
>   {
> [...]
> + tokreg = lfirst(cell);
> + if (!token_is_regexp(tokreg))
>   {
> - if (strcmp(dbname, role) == 0)
> + if (am_walsender && !am_db_walsender)
> + {
> + /*
> +  * physical replication walsender connections 
> can only match
> +  * replication keyword
> +  */
> + if (token_is_keyword(tokreg->authtoken, 
> "replication"))
> + return true;
> + }
> + else if (token_is_keyword(tokreg->authtoken, "all"))
>   return true;

When checking the list of databases in check_db(), physical WAL
senders (aka am_walsender && !am_db_walsender) would be able to accept
regexps, but these should only accept "replication" and never a
regexp, no?  The second check on "replication" placed in the branch
for token_is_regexp() in your patch would be correctly placed, though.
This is kind of special in the HBA logic, coming back to 9.0 where
physical replication and this special role property have been
introduced.  WAL senders have gained an actual database property later
on in 9.4 with logical decoding, keeping "replication" for
compatibility (connection strings can use replication=database to
connect as a non-physical WAL sender and connect to a specific
database).

> +typedef struct AuthToken
> +{
> + char   *string;
> + boolquoted;
> +} AuthToken;
> +
> +/*
> + * Distinguish the case a token has to be treated as a regular
> + * expression or not.
> + */
> +typedef struct AuthTokenOrRegex
> +{
> + boolis_regex;
> +
> + /*
> +  * Not an union as we still need the token string for fill_hba_line().
> +  */
> + AuthToken  *authtoken;
> + regex_t*regex;
> +} AuthTokenOrRegex;

Hmm.  With is_regex to check if a regex_t exists, both structures may
not be necessary.  I have not put my hands on that directly, but if
I guess that I would shape things to have only AuthToken with
(enforcing regex_t in priority if set in the list of elements to check
for a match):
- the string
- quoted
- regex_t
A list member should never have (regex_t != NULL && quoted), right?
Hostnames would never be quoted, as well.

> +# test with a comma in the regular expression
> +reset_pg_hba($node, 'all', '"/^.*5,.*e$"', 'password');
> +test_conn($node, 'user=md5,role', 'password', 'matching regexp for username',
> + 0);

So, we check here that the role includes "5," in its name.  This is
getting fun to parse ;)

>  elsif ($ENV{PG_TEST_EXTRA} !~ /\bssl\b/)
>  {
> - plan skip_all => 'Potentially unsafe test SSL not enabled in 
> PG_TEST_EXTRA';
> + plan skip_all =>
> +   'Potentially unsafe test SSL not enabled in PG_TEST_EXTRA';
>  }

Unrelated noise from perltidy.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-10 Thread Drouvot, Bertrand

Hi,

On 10/6/22 2:53 AM, Michael Paquier wrote:

On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:

On 10/5/22 9:24 AM, Michael Paquier wrote:

- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently,


0001 looks good to me.


Thanks.  I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test. 


Thanks!


Thanks! I'll look at it and the comments you just made up-thread.


Cool, thanks.  One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.


Fully agree, all the versions that have been submitted in this thread 
preserves the ordering.




Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.


Good idea, it has been added in v5 just shared up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-10 Thread Drouvot, Bertrand

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
Anyway, I have looked at the patch.

+   List   *roles_re;
+   List   *databases_re;
+   regex_thostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names.  Wouldn't it be better to store
everything in a single list but assign an entry type?  In this case it
would be either regex or plain string.  This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts).  And it seems
to me that this would make unnecessary the use of re_num here and
there. 


Please find attached v5 addressing this. I started with an union but it 
turns out that we still need the plain string when a regex is used. This 
is not needed for the authentication per say but for fill_hba_line(). So 
I ended up creating a new struct without union in v5.



The hostname is different, of course, requiring only an extra
field for its type, or something like that.


I'm using the same new struct as described above for the hostname.



Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?



Right, I added more examples in v5.


-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there. 


Good point, I moved the hostname related tests in src/test/ssl.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..406628ef35 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -739,6 +743,24 @@ hostall all ::1/128
 trust
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all localhost   trust
 
+# The same using a regular expression for host name, which allows connection 
for
+# host name ending with "test".
+#
+# TYPE  DATABASEUSERADDRESS METHOD
+hostall all /^.*test$   trust
+
+# The same using regular expression for DATABASE, which allows connection to 
the
+# db1 and testdb databases and any database with a name ending with "test".
+#
+# TYPE  DATABASE   USERADDRESS METHOD
+local   db1,/^.*test$,testdb   all /^.*test$   trust
+
+# The same using regular expression for USER, which allows connection to the
+# user1 and testuser users and any user with a name ending with "test".
+#
+# TYPE  DATABASE USER  ADDRESS 
METHO

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-05 Thread Michael Paquier
On Wed, Oct 05, 2022 at 03:32:20PM +0200, Drouvot, Bertrand wrote:
> On 10/5/22 9:24 AM, Michael Paquier wrote:
>> - test_role() -> test_conn() to be able to pass down a database name.
>> - reset_pg_hba() to control the host, db and user parts.  The host
>> part does not really apply after moving the hosts checks to a more
>> secure location, so I guess that this had better be extended just for
>> the user and database, keeping host=local all the time.
>> I am planning to apply 0001 attached independently,
> 
> 0001 looks good to me.

Thanks.  I have applied this refactoring, leaving the host part out of
the equation as we should rely only on local connections for this
part of the test.  The best fit I can think about for the checks on
the hostname patterns would be either the ssl, ldap or krb5 tests.
SSL is more widely tested than the two others.

>> reducing the
>> footprint of 0002, which is your previous patch left untouched
>> (mostly!).
> 
> Thanks! I'll look at it and the comments you just made up-thread.

Cool, thanks.  One thing that matters a lot IMO (that I forgot to
mention previously) is to preserve the order of the items parsed from
the configuration files.

Also, I am wondering whether we'd better have some regression tests
where a regex includes a comma and a role name itself has a comma,
actually, just to stress more the parsing of individual items in the
HBA file.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-05 Thread Drouvot, Bertrand

Hi,

On 10/5/22 9:24 AM, Michael Paquier wrote:

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:


Good idea, thanks for the proposal.


- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently, 


0001 looks good to me.


reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).


Thanks! I'll look at it and the comments you just made up-thread.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-10-05 Thread Michael Paquier
On Tue, Sep 20, 2022 at 01:33:09PM +0200, Drouvot, Bertrand wrote:
> I assume (maybe i should not) that if objects starting with / already exist
> there is very good reason(s) behind. Then I don't think that preventing
> their creation in the DDL would help (quite the contrary for the ones that
> really need them).

I have been pondering on this point for the last few weeks, and I'd
like to change my opinion and side with Tom on this one as per the
very unlikeliness of this being a problem in the wild.  I have studied
the places that would require restrictions but that was just feeling
adding a bit more bloat into the CREATE/ALTER ROLE paths for what's
aimed at providing a consistent experience for the user across
pg_hba.conf and pg_ident.conf.

> It looks to me that adding a GUC (off by default) to enable/disable the
> regexp usage in the hba could be a fair compromise. It won't block any
> creation starting with a / and won't open more doors (if such objects exist)
> by default.

Enforcing a behavior change in HBA policies with a GUC does not strike
me as a good thing in the long term.  I am ready to bet that it would
just sit around for nothing like the compatibility GUCs.

Anyway, I have looked at the patch.

+   List   *roles_re;
+   List   *databases_re;
+   regex_thostname_re;
I am surprised by the approach of using separate lists for the regular
expressions and the raw names.  Wouldn't it be better to store
everything in a single list but assign an entry type?  In this case it 
would be either regex or plain string.  This would minimize the
footprint of the changes (no extra arguments *_re in the routines
checking for a match on the roles, databases or hosts).  And it seems
to me that this would make unnecessary the use of re_num here and
there.  The hostname is different, of course, requiring only an extra
field for its type, or something like that.

Perhaps the documentation would gain in clarity if there were more
examples, like a set of comma-separated examples (mix of regex and raw
strings for example, for all the field types that gain support for
regexes)?

-$node->append_conf('postgresql.conf', "log_connections = on\n");
+$node->append_conf(
+'postgresql.conf', qq{
+listen_addresses = '127.0.0.1'
+log_connections = on
+});
Hmm.  I think that we may need to reconsider the location of the tests
for the regexes with the host name, as the "safe" regression tests
should not switch listen_addresses.  One location where we already do
that is src/test/ssl/, so these could be moved there.  Keeping the
database and user name parts in src/test/authentication/ is fine.

Something that stood out on a first review is the refactoring of
001_password.pl that can be done independently of the main patch:
- test_role() -> test_conn() to be able to pass down a database name.
- reset_pg_hba() to control the host, db and user parts.  The host
part does not really apply after moving the hosts checks to a more
secure location, so I guess that this had better be extended just for
the user and database, keeping host=local all the time.
I am planning to apply 0001 attached independently, reducing the
footprint of 0002, which is your previous patch left untouched
(mostly!).
--
Michael
From 203122ff1eeae942fc78b5d7ad85122547a2a6ed Mon Sep 17 00:00:00 2001
From: Michael Paquier 
Date: Wed, 5 Oct 2022 15:14:41 +0900
Subject: [PATCH v4 1/2] Refactor TAP test 001_password.pl

---
 src/test/authentication/t/001_password.pl | 60 ---
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/src/test/authentication/t/001_password.pl 
b/src/test/authentication/t/001_password.pl
index 58e4176e80..9e02697355 100644
--- a/src/test/authentication/t/001_password.pl
+++ b/src/test/authentication/t/001_password.pl
@@ -24,28 +24,30 @@ if (!$use_unix_sockets)
 sub reset_pg_hba
 {
my $node   = shift;
+   my $host   = shift;
+   my $database   = shift;
+   my $role   = shift;
my $hba_method = shift;
 
unlink($node->data_dir . '/pg_hba.conf');
# just for testing purposes, use a continuation line
-   $node->append_conf('pg_hba.conf', "local all all\\\n $hba_method");
+   $node->append_conf('pg_hba.conf', "$host $database $role\\\n 
$hba_method");
$node->reload;
return;
 }
 
-# Test access for a single role, useful to wrap all tests into one.  Extra
-# named parameters are passed to connect_ok/fails as-is.
-sub test_role
+# Test access for a connection string, useful to wrap all tests into one.
+# Extra named parameters are passed to connect_ok/fails as-is.
+sub test_conn
 {
local $Test::Builder::Level = $Test::Builder::Level + 1;
 
-   my ($node, $role, $method, $expected_res, %params) = @_;
+   my ($node, $connstr, $method, $expected_res, %params) = @_;
my $status_string = 'failed';
$status_string = 'success' if ($expected_res eq 0);
 
-   my $connstr = "user=$role";
my $testn

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-20 Thread Jacob Champion
On Mon, Sep 19, 2022 at 9:09 PM Tom Lane  wrote:
> You have to assume that somebody (a) has a role or DB name starting
> with slash, (b) has an explicit reference to that name in their
> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
> notice that things are misbehaving until after some hacker manages
> to break into their installation on the strength of the misbehaving
> entry.  OK, I'll grant that the probability of (c) is depressingly
> close to unity; but each of the other steps seems quite low probability.
> All four of them happening in one installation is something I doubt
> will happen.

I can't argue with (a) or (b), but (d) seems decently likely to me. If
your normal user base consists of people who are authorized to access
your system, what clues would you have that your HBA is silently
failing open?

--Jacob




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-20 Thread Drouvot, Bertrand

Hi,

On 9/20/22 6:30 AM, Michael Paquier wrote:

On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry.  OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.


It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.


I also have the feeling that having (a), (b) and (d) is low probability.

That said, If the user "/john" already exists and has a hba entry then 
this entry will still match with the patch. Problem is that all the 
users that contain "john" would also now match.


But things get worst if say /a is an existing user and hba entry as the 
entry would match any users that contains "a" with the patch.


I assume (maybe i should not) that if objects starting with / already 
exist there is very good reason(s) behind. Then I don't think that 
preventing their creation in the DDL would help (quite the contrary for 
the ones that really need them).


It looks to me that adding a GUC (off by default) to enable/disable the 
regexp usage in the hba could be a fair compromise. It won't block any 
creation starting with a / and won't open more doors (if such objects 
exist) by default.


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-19 Thread Michael Paquier
On Tue, Sep 20, 2022 at 12:09:33AM -0400, Tom Lane wrote:
> You have to assume that somebody (a) has a role or DB name starting
> with slash, (b) has an explicit reference to that name in their
> pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
> notice that things are misbehaving until after some hacker manages
> to break into their installation on the strength of the misbehaving
> entry.  OK, I'll grant that the probability of (c) is depressingly
> close to unity; but each of the other steps seems quite low probability.
> All four of them happening in one installation is something I doubt
> will happen.

It is the kind of things that could blow up as a CVE and some bad PR
for the project, so I cannot get excited about enforcing this new rule
in an authentication file (aka before a role is authenticated) while
we are talking about 3~4 code paths (?) that would need an extra check
to make sure that no instances have such object names.

> On the contrary side, if we make this work differently from the
> pg_ident.conf precedent, or install weird rules to try to prevent
> accidental misinterpretations, that could also lead to security
> problems because things don't work as someone would expect.  I see
> no a-priori reason to believe that this risk is negligible compared
> to the other one.

I also do like a lot the idea of making things consistent across all
the auth configuration files for all the fields where this can be
applied.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-19 Thread Tom Lane
Michael Paquier  writes:
>> On Thu, Sep 8, 2022 at 5:46 PM Tom Lane  wrote:
>>> Meh ... that concern seems overblown to me.  I guess it's possible
>>> that somebody has an HBA entry that looks like that, but it doesn't
>>> seem very plausible.  Note that we made this exact same change in
>>> pg_ident.conf years ago, and AFAIR we got zero complaints.

> This concern does not sound overblown to me.

You have to assume that somebody (a) has a role or DB name starting
with slash, (b) has an explicit reference to that name in their
pg_hba.conf, (c) doesn't read the release notes, and (d) doesn't
notice that things are misbehaving until after some hacker manages
to break into their installation on the strength of the misbehaving
entry.  OK, I'll grant that the probability of (c) is depressingly
close to unity; but each of the other steps seems quite low probability.
All four of them happening in one installation is something I doubt
will happen.

On the contrary side, if we make this work differently from the
pg_ident.conf precedent, or install weird rules to try to prevent
accidental misinterpretations, that could also lead to security
problems because things don't work as someone would expect.  I see
no a-priori reason to believe that this risk is negligible compared
to the other one.

regards, tom lane




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-19 Thread Michael Paquier
On Fri, Sep 09, 2022 at 03:05:18PM -0700, Jacob Champion wrote:
> On Thu, Sep 8, 2022 at 5:46 PM Tom Lane  wrote:
>> Jacob Champion  writes:
>> > I think you're going to have to address backwards compatibility
>> > concerns. Today, I can create a role named "/a", and I can put that
>> > into the HBA without quoting it. I'd be unamused if, after an upgrade,
>> > my rule suddenly matched any role name containing an 'a'.
>>
>> Meh ... that concern seems overblown to me.  I guess it's possible
>> that somebody has an HBA entry that looks like that, but it doesn't
>> seem very plausible.  Note that we made this exact same change in
>> pg_ident.conf years ago, and AFAIR we got zero complaints.
> 
> What percentage of users actually use pg_ident maps? My assumption
> would be that a change to pg_hba would affect many more people, but
> then I don't have any proof that there are users with role names that
> look like that to begin with. I won't pound the table with it.

This concern does not sound overblown to me.  A change in pg_hba.conf
impacts everybody.  I was just looking at this patch, and the logic
with usernames and databases is changed so as we would *always* treat
*any* entries beginning with a slash as a regular expression, skipping
them if they don't match with an error fed to the logs and
pg_hba_file_rules.  This could lead to silent security issues as 
stricter HBA policies need to be located first in pg_hba.conf and
these could suddenly fail to load.  It would be much safer to me if we
had in place some restrictions to avoid such problems to happen,
meaning some extra checks in the DDL code paths for such object names
and perhaps even something in pg_upgrade with a scan at pg_database
and pg_authid.

On the bright side, I have been looking at some of the RFCs covering
the set of characters allowed in DNS names and slashes are not
authorized in hostnames, making this change rather safe AFAIK.
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-19 Thread Drouvot, Bertrand

Hi,

On 9/17/22 8:53 AM, Michael Paquier wrote:

On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:

The CF bot is failing for Windows (all other tests are green) and only for
the new tap test related to the regular expression on the host name (the
ones on database and role are fine).

The issue is not related to the patch. The issue is that the Windows Cirrus
test does not like when a host name is provided for a "host" entry in
pg_hba.conf (while it works fine when a CIDR is provided).

You can see an example in [1] where the only change is to replace the CIDR
by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
on Windows only (its log file is here [2]).

I'll look at this "Windows" related issue but would appreciate any
guidance/help if someone has experience in this area on windows.


I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm).  Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names.  If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.


Thanks for looking at it!

That sounds reasonable, v3 attached is skipping the regular expression 
tests for the hostname on Windows.





While reading the patch, I am a bit confused about token_regcomp() and
token_regexec().  It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().


Fully agree, comments were missing. They've been added in v3 attached.

Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -785,16 +789,18 @@ hostall all 192.168.12.10/32  
  gss
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all 192.168.0.0/16  ident 
map=omicron
 
-# If these are the only three lines for local connections, they will
+# If these are the only four lines for local connections, they will
 # allow local users to connect only to their own databases (databases
-# with the same name as their database user name) except for administrators
-# and members of role "support", who can connect to all databases.  The file
-# $PGDATA/admins contains a list of names of administrators.  Passwords
+# with the same name as their database user name) except for administrators,
+# users finishing with "helpdesk" and members of role "support",
+# who can connect to all databases.
+# The file$PGDATA/admins contains a list of names of administrators.  Passwords
 # are required in all cases.
 #
 # TYPE  DATABASEUSERADDRESS METHOD
 local   sameuserall md5
 local   all @admins

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-16 Thread Michael Paquier
On Fri, Sep 16, 2022 at 06:24:07PM +0200, Drouvot, Bertrand wrote:
> The CF bot is failing for Windows (all other tests are green) and only for
> the new tap test related to the regular expression on the host name (the
> ones on database and role are fine).
> 
> The issue is not related to the patch. The issue is that the Windows Cirrus
> test does not like when a host name is provided for a "host" entry in
> pg_hba.conf (while it works fine when a CIDR is provided).
> 
> You can see an example in [1] where the only change is to replace the CIDR
> by "localhost" in 002_scram.pl. As you can see the Cirrus tests are failing
> on Windows only (its log file is here [2]).
> 
> I'll look at this "Windows" related issue but would appreciate any
> guidance/help if someone has experience in this area on windows.

I recall that being able to do a reverse lookup of a hostname on
Windows for localhost requires a few extra setup steps as that's not
guaranteed to be set in all environments by default, which is why we
go at great length to use 127.0.0.1 in the TAP test setup for example
(see Cluster.pm).  Looking at your patch, the goal is to test the
mapping of regular expression for host names, user names and database
names.  If the first case is not guaranteed, my guess is that it is
fine to skip this portion of the tests on Windows.

While reading the patch, I am a bit confused about token_regcomp() and
token_regexec().  It would help the review a lot if these were
documented with proper comments, even if these act roughly as wrappers
for pg_regexec() and pg_regcomp().
--
Michael


signature.asc
Description: PGP signature


Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-16 Thread Drouvot, Bertrand

Hi,

On 9/12/22 9:55 AM, Drouvot, Bertrand wrote:

Hi,

On 9/10/22 1:21 AM, Jacob Champion wrote:

On 8/19/22 01:12, Drouvot, Bertrand wrote:

+   wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));
+   wlen = pg_mb2wchar_with_len(tok->string + 1,
+   wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.


Thanks for the feedback.

Yeah fully agree. I'll provide a new version that follow the same logic 
as the pg_ident code.



+# Testing with regular expression for username
+reset_pg_hba($node, '/^.*md.*$', 'password');
+test_role($node, 'md5_role', 'password from pgpass and regular expression for 
username', 0);
+

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.


Agree, will do.


Other than that, and Tom's note on potentially expanding this to other
areas,


I'll add regexp usage for the database column and also the for the 
address one when non CIDR is provided (so host name(s)) (I think it also 
makes sense specially as we don't allow multiple values for this column).





Please find attached v2 addressing the comments mentioned above.

v2 also provides regular expression usage for the database and the 
address columns (when a host name is being used).


Remark:

The CF bot is failing for Windows (all other tests are green) and only 
for the new tap test related to the regular expression on the host name 
(the ones on database and role are fine).


The issue is not related to the patch. The issue is that the Windows 
Cirrus test does not like when a host name is provided for a "host" 
entry in pg_hba.conf (while it works fine when a CIDR is provided).


You can see an example in [1] where the only change is to replace the 
CIDR by "localhost" in 002_scram.pl. As you can see the Cirrus tests are 
failing on Windows only (its log file is here [2]).


I'll look at this "Windows" related issue but would appreciate any 
guidance/help if someone has experience in this area on windows.





[1]: https://github.com/bdrouvot/postgres/branches on branch “host_non_cidr”

[2]: 
https://api.cirrus-ci.com/v1/artifact/task/6507279833890816/log/src/test/ssl/tmp_check/log/002_scram_primary.log


Regards,

--
Bertrand Drouvot
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.comdiff --git a/doc/src/sgml/client-auth.sgml b/doc/src/sgml/client-auth.sgml
index c6f1b70fd3..28343445be 100644
--- a/doc/src/sgml/client-auth.sgml
+++ b/doc/src/sgml/client-auth.sgml
@@ -235,8 +235,9 @@ hostnogssenc  database  
userPostgreSQL database.
-   Multiple database names can be supplied by separating them with
-   commas.  A separate file containing database names can be specified by
+   Multiple database names and/or regular expressions preceded by 
/
+   can be supplied by separating them with commas.
+   A separate file containing database names can be specified by
preceding the file name with @.
   
  
@@ -249,7 +250,8 @@ hostnogssenc  database  
userall specifies that it
matches all users.  Otherwise, this is either the name of a specific
-   database user, or a group name preceded by +.
+   database user, a regular expression preceded by /
+   or a group name preceded by +.
(Recall that there is no real distinction between users and groups
in PostgreSQL; a + mark 
really means
match any of the roles that are directly or indirectly members
@@ -258,7 +260,8 @@ hostnogssenc  database  
user/
+   can be supplied by separating them with commas.
A separate file containing user names can be specified by preceding the
file name with @.
   
@@ -270,8 +273,9 @@ hostnogssenc  database  
user
   
Specifies the client machine address(es) that this record
-   matches.  This field can contain either a host name, an IP
-   address range, or one of the special key words mentioned below.
+   matches.  This field can contain either a host name, a regular 
expression
+   preceded by / representing host names, an IP address 
range,
+   or one of the special key words mentioned below.
   
 
   
@@ -785,16 +789,18 @@ hostall all 192.168.12.10/32  
  gss
 # TYPE  DATABASEUSERADDRESS METHOD
 hostall all 192.168.0.0/16  ident 
map=om

Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-09 Thread Jacob Champion
On 8/19/22 01:12, Drouvot, Bertrand wrote:
> +   wstr = palloc((strlen(tok->string + 1) + 1) * sizeof(pg_wchar));  
>
> +   wlen = pg_mb2wchar_with_len(tok->string + 1,  
>
> +   wstr, strlen(tok->string + 1));

The (tok->string + 1) construction comes up often enough that I think it
should be put in a `regex` variable or similar. That would help my eyes
with the (strlen(tok->string + 1) + 1) construction, especially.

I noticed that for pg_ident, we precompile the regexes per-line and
reuse those in child processes. Whereas here we're compiling, using, and
then discarding the regex for each check. I think the example set by the
pg_ident code is probably the one to follow, unless you have a good
reason not to.

> +# Testing with regular expression for username   
>
> +reset_pg_hba($node, '/^.*md.*$', 'password');
>
> +test_role($node, 'md5_role', 'password from pgpass and regular expression 
> for username', 0);
> +  

IMO the coverage for this patch needs to be filled out. Negative test
cases are more important than positive ones for security-related code.

Other than that, and Tom's note on potentially expanding this to other
areas, I think this is a pretty straightforward patch.

Thanks,
--Jacob




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-09 Thread Jacob Champion
On Thu, Sep 8, 2022 at 5:46 PM Tom Lane  wrote:
> Jacob Champion  writes:
> > I think you're going to have to address backwards compatibility
> > concerns. Today, I can create a role named "/a", and I can put that
> > into the HBA without quoting it. I'd be unamused if, after an upgrade,
> > my rule suddenly matched any role name containing an 'a'.
>
> Meh ... that concern seems overblown to me.  I guess it's possible
> that somebody has an HBA entry that looks like that, but it doesn't
> seem very plausible.  Note that we made this exact same change in
> pg_ident.conf years ago, and AFAIR we got zero complaints.

What percentage of users actually use pg_ident maps? My assumption
would be that a change to pg_hba would affect many more people, but
then I don't have any proof that there are users with role names that
look like that to begin with. I won't pound the table with it.

> > Speaking of partial matches, should this feature allow them? Maybe
> > rules should have to match the entire username instead, and sidestep
> > the inevitable "I forgot to anchor my regex" problems?
>
> I think the pg_ident.conf precedent is binding on us here.  If we
> make this one work differently, nobody's going to thank us for it,
> they're just going to wonder "did the left hand not know what the
> right hand already did?"

Hmm... yeah, I suppose. From the other direction, it'd be bad to train
users that unanchored regexes are safe in pg_hba only to take those
guardrails off in pg_ident. I will tuck that away as a potential
behavior change, for a different thread.

Thanks,
--Jacob




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-09 Thread Tom Lane
"Drouvot, Bertrand"  writes:
> Agree that it seems unlikely but maybe we could add a new GUC to turn 
> the regex usage on the hba file on/off (and use off as the default)?

I think that will just add useless complication.

regards, tom lane




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-08 Thread Tom Lane
Jacob Champion  writes:
> On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand  wrote:
>> This is why I think username filtering with regular expressions would
>> provide its own advantages.

> I think your motivation for the feature is solid.

Yeah.  I'm not sure that I buy the argument that this is more useful
than writing a role name and controlling things with GRANT ROLE, but
it's a plausible alternative with properties that might win in some
use-cases.  So I see little reason not to allow it.

I'd actually ask why stop here?  In particular, why not do the same
with the database-name column, especially since that does *not*
have the ability to use roles as a substitute for a wildcard entry?

> I think you're going to have to address backwards compatibility
> concerns. Today, I can create a role named "/a", and I can put that
> into the HBA without quoting it. I'd be unamused if, after an upgrade,
> my rule suddenly matched any role name containing an 'a'.

Meh ... that concern seems overblown to me.  I guess it's possible
that somebody has an HBA entry that looks like that, but it doesn't
seem very plausible.  Note that we made this exact same change in
pg_ident.conf years ago, and AFAIR we got zero complaints.

> Speaking of partial matches, should this feature allow them? Maybe
> rules should have to match the entire username instead, and sidestep
> the inevitable "I forgot to anchor my regex" problems?

I think the pg_ident.conf precedent is binding on us here.  If we
make this one work differently, nobody's going to thank us for it,
they're just going to wonder "did the left hand not know what the
right hand already did?"

regards, tom lane




Re: Patch proposal: make use of regular expressions for the username in pg_hba.conf

2022-09-08 Thread Jacob Champion
On Fri, Aug 19, 2022 at 1:13 AM Drouvot, Bertrand  wrote:
> This is why I think username filtering with regular expressions would
> provide its own advantages.
>
> Thoughts? Looking forward to your feedback,

I think your motivation for the feature is solid. It is killing me a
bit that this is making it easier to switch authentication methods
based on the role name, when I suspect what someone might really want
is to switch authentication methods based on the ID the user is trying
to authenticate with. But that's not your fault or problem to fix,
because the startup packet doesn't currently have that information.
(It does make me wonder whether I withdrew my PGAUTHUSER proposal [1]
a month too early. And man, do I wish that pg_ident and pg_hba were
one file.)

I think you're going to have to address backwards compatibility
concerns. Today, I can create a role named "/a", and I can put that
into the HBA without quoting it. I'd be unamused if, after an upgrade,
my rule suddenly matched any role name containing an 'a'.

Speaking of partial matches, should this feature allow them? Maybe
rules should have to match the entire username instead, and sidestep
the inevitable "I forgot to anchor my regex" problems?

Thanks,
--Jacob

[1] https://commitfest.postgresql.org/38/3314/