subject:"\[PATCH\] postgres_fdw connection caching \- cause remote sessions linger till the local session exit"

On Fri, Jan 29, 2021 at 1:17 PM Fujii Masao  wrote:
> >> But if the issue is only the inconsistency of test results,
> >> we can go with the option (2)? Even with (2), we can make the test
> >> stable by removing "valid" column and executing
> >> postgres_fdw_get_connections() within the transaction?
> >
> > Hmmm, and we should have the tests at the start of the file
> > postgres_fdw.sql before even we make any foreign server connections.
>
> We don't need to move the test if we always call 
> postgres_fdw_disconnect_all() just before starting new transaction and 
> calling postgres_fdw_get_connections() as follows?
>
> SELECT 1 FROM postgres_fdw_disconnect_all();
> BEGIN;
> ...
> SELECT * FROM postgres_fdw_get_connections();
> ...

Yes, that works, but we cannot show true/false for the
postgres_fdw_disconnect_all output.

I will post the patch soon. Thanks a lot.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/29 16:12, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 12:36 PM Fujii Masao
 wrote:

On 2021/01/29 15:44, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
 wrote:

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

   if (PQstatus(entry->conn) != CONNECTION_OK ||
   PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
   entry->changing_xact_state ||
   entry->invalidated).   > by adding this
   {
   elog(DEBUG3, "discarding connection %p", entry->conn);
   disconnect_pg_server(entry);
   }

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.


Also, see the original patch for the connection leak issue just does
option (1), see [1]. But in [2] and [3], we chose option (2).

I feel, we can go for option (1), with the patch attached in [1] i.e.
having have_invalid_connections whenever any connection gets invalided
so that we don't quickly exit in pgfdw_xact_callback and the
invalidated connections get closed properly. Thoughts?


Before going for (1) or something, I'd like to understand what the actual
issue of (2), i.e., the current code is. Otherwise other approaches might
have the same issue.


The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
pgfdw_inval_callback is getting called many times and the connections
that are not used i..e xact_depth == 0, are getting disconnected
there, so we are not seeing the consistent results for
postgres_fdw_get_connectionstest cases. If the connections are being
used within the xact, then the valid option for those connections are
being shown as false again making postgres_fdw_get_connections output
inconsistent. This is what happened on the build farm member with
CLOBBER_CACHE_ALWAYS build.


But if the issue is only the inconsistency of test results,
we can go with the option (2)? Even with (2), we can make the test
stable by removing "valid" column and executing
postgres_fdw_get_connections() within the transaction?


Hmmm, and we should have the tests at the start of the file
postgres_fdw.sql before even we make any foreign server connections.


We don't need to move the test if we always call postgres_fdw_disconnect_all() 
just before starting new transaction and calling postgres_fdw_get_connections() 
as follows?

SELECT 1 FROM postgres_fdw_disconnect_all();
BEGIN;
...
SELECT * FROM postgres_fdw_get_connections();
...




If okay, I can prepare the patch and run with clobber cache build locally.


Many thanks!






So if we go with option (1), get rid of valid state from
postgres_fdw_get_connectionstest and having the test cases inside an
explicit xact block at the beginning of the postgres_fdw.sql test
file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
if this is the correct way.


Regarding (1), as far as I understand correctly, even when the transaction
doesn't use foreign tables at all, it needs to scan the connection cache
entries if necessary. I was thinking to avoid this. I guess that this doesn't
work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
because with the patch the commit/rollback callback is performed only
for the connections used in the transaction.


You mean to say, pgfdw_xact_callback will not get called when the xact
uses no foreign server connection or is it that pgfdw_xact_callback
gets called but exits quickly from it? I'm not sure what the 2PC patch
does.


Maybe it's chance to review the patch! ;P

BTW his patch tries to add new callback interfaces for commit/rollback of
foreign transactions, and make postgres_fdw use them instead of
XactCallback. And those new interfaces are executed only when
the transaction has started the foreign transactions.


IMHO, it's better to keep it as a separate discussion.


Yes, of course!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 12:36 PM Fujii Masao
 wrote:
> On 2021/01/29 15:44, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
> >  wrote:
>  IIRC, when we were finding a way to close the invalidated connections
>  so that they don't leaked, we had two options:
> 
>  1) let those connections (whether currently being used in the xact or
>  not) get marked invalidated in pgfdw_inval_callback and closed in
>  pgfdw_xact_callback at the main txn end as shown below
> 
>    if (PQstatus(entry->conn) != CONNECTION_OK ||
>    PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
>    entry->changing_xact_state ||
>    entry->invalidated).   > by adding this
>    {
>    elog(DEBUG3, "discarding connection %p", entry->conn);
>    disconnect_pg_server(entry);
>    }
> 
>  2) close the unused connections right away in pgfdw_inval_callback
>  instead of marking them invalidated. Mark used connections as
>  invalidated in pgfdw_inval_callback and close them in
>  pgfdw_xact_callback at the main txn end.
> 
>  We went with option (2) because we thought this would ease some burden
>  on pgfdw_xact_callback closing a lot of invalid connections at once.
> >>>
> >>> Also, see the original patch for the connection leak issue just does
> >>> option (1), see [1]. But in [2] and [3], we chose option (2).
> >>>
> >>> I feel, we can go for option (1), with the patch attached in [1] i.e.
> >>> having have_invalid_connections whenever any connection gets invalided
> >>> so that we don't quickly exit in pgfdw_xact_callback and the
> >>> invalidated connections get closed properly. Thoughts?
> >>
> >> Before going for (1) or something, I'd like to understand what the actual
> >> issue of (2), i.e., the current code is. Otherwise other approaches might
> >> have the same issue.
> >
> > The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
> > pgfdw_inval_callback is getting called many times and the connections
> > that are not used i..e xact_depth == 0, are getting disconnected
> > there, so we are not seeing the consistent results for
> > postgres_fdw_get_connectionstest cases. If the connections are being
> > used within the xact, then the valid option for those connections are
> > being shown as false again making postgres_fdw_get_connections output
> > inconsistent. This is what happened on the build farm member with
> > CLOBBER_CACHE_ALWAYS build.
>
> But if the issue is only the inconsistency of test results,
> we can go with the option (2)? Even with (2), we can make the test
> stable by removing "valid" column and executing
> postgres_fdw_get_connections() within the transaction?

Hmmm, and we should have the tests at the start of the file
postgres_fdw.sql before even we make any foreign server connections.

If okay, I can prepare the patch and run with clobber cache build locally.

> >
> > So if we go with option (1), get rid of valid state from
> > postgres_fdw_get_connectionstest and having the test cases inside an
> > explicit xact block at the beginning of the postgres_fdw.sql test
> > file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
> > if this is the correct way.
> >
> >> Regarding (1), as far as I understand correctly, even when the transaction
> >> doesn't use foreign tables at all, it needs to scan the connection cache
> >> entries if necessary. I was thinking to avoid this. I guess that this 
> >> doesn't
> >> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
> >> because with the patch the commit/rollback callback is performed only
> >> for the connections used in the transaction.
> >
> > You mean to say, pgfdw_xact_callback will not get called when the xact
> > uses no foreign server connection or is it that pgfdw_xact_callback
> > gets called but exits quickly from it? I'm not sure what the 2PC patch
> > does.
>
> Maybe it's chance to review the patch! ;P
>
> BTW his patch tries to add new callback interfaces for commit/rollback of
> foreign transactions, and make postgres_fdw use them instead of
> XactCallback. And those new interfaces are executed only when
> the transaction has started the foreign transactions.

IMHO, it's better to keep it as a separate discussion. I will try to
review that patch later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/29 15:44, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
 wrote:

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

  if (PQstatus(entry->conn) != CONNECTION_OK ||
  PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
  entry->changing_xact_state ||
  entry->invalidated).   > by adding this
  {
  elog(DEBUG3, "discarding connection %p", entry->conn);
  disconnect_pg_server(entry);
  }

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.


Also, see the original patch for the connection leak issue just does
option (1), see [1]. But in [2] and [3], we chose option (2).

I feel, we can go for option (1), with the patch attached in [1] i.e.
having have_invalid_connections whenever any connection gets invalided
so that we don't quickly exit in pgfdw_xact_callback and the
invalidated connections get closed properly. Thoughts?


Before going for (1) or something, I'd like to understand what the actual
issue of (2), i.e., the current code is. Otherwise other approaches might
have the same issue.


The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
pgfdw_inval_callback is getting called many times and the connections
that are not used i..e xact_depth == 0, are getting disconnected
there, so we are not seeing the consistent results for
postgres_fdw_get_connectionstest cases. If the connections are being
used within the xact, then the valid option for those connections are
being shown as false again making postgres_fdw_get_connections output
inconsistent. This is what happened on the build farm member with
CLOBBER_CACHE_ALWAYS build.


But if the issue is only the inconsistency of test results,
we can go with the option (2)? Even with (2), we can make the test
stable by removing "valid" column and executing
postgres_fdw_get_connections() within the transaction?



So if we go with option (1), get rid of valid state from
postgres_fdw_get_connectionstest and having the test cases inside an
explicit xact block at the beginning of the postgres_fdw.sql test
file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
if this is the correct way.


Regarding (1), as far as I understand correctly, even when the transaction
doesn't use foreign tables at all, it needs to scan the connection cache
entries if necessary. I was thinking to avoid this. I guess that this doesn't
work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
because with the patch the commit/rollback callback is performed only
for the connections used in the transaction.


You mean to say, pgfdw_xact_callback will not get called when the xact
uses no foreign server connection or is it that pgfdw_xact_callback
gets called but exits quickly from it? I'm not sure what the 2PC patch
does.


Maybe it's chance to review the patch! ;P

BTW his patch tries to add new callback interfaces for commit/rollback of
foreign transactions, and make postgres_fdw use them instead of
XactCallback. And those new interfaces are executed only when
the transaction has started the foreign transactions.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 11:54 AM Fujii Masao
 wrote:
> >> IIRC, when we were finding a way to close the invalidated connections
> >> so that they don't leaked, we had two options:
> >>
> >> 1) let those connections (whether currently being used in the xact or
> >> not) get marked invalidated in pgfdw_inval_callback and closed in
> >> pgfdw_xact_callback at the main txn end as shown below
> >>
> >>  if (PQstatus(entry->conn) != CONNECTION_OK ||
> >>  PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> >>  entry->changing_xact_state ||
> >>  entry->invalidated).   > by adding this
> >>  {
> >>  elog(DEBUG3, "discarding connection %p", entry->conn);
> >>  disconnect_pg_server(entry);
> >>  }
> >>
> >> 2) close the unused connections right away in pgfdw_inval_callback
> >> instead of marking them invalidated. Mark used connections as
> >> invalidated in pgfdw_inval_callback and close them in
> >> pgfdw_xact_callback at the main txn end.
> >>
> >> We went with option (2) because we thought this would ease some burden
> >> on pgfdw_xact_callback closing a lot of invalid connections at once.
> >
> > Also, see the original patch for the connection leak issue just does
> > option (1), see [1]. But in [2] and [3], we chose option (2).
> >
> > I feel, we can go for option (1), with the patch attached in [1] i.e.
> > having have_invalid_connections whenever any connection gets invalided
> > so that we don't quickly exit in pgfdw_xact_callback and the
> > invalidated connections get closed properly. Thoughts?
>
> Before going for (1) or something, I'd like to understand what the actual
> issue of (2), i.e., the current code is. Otherwise other approaches might
> have the same issue.

The problem with option (2) is that because of CLOBBER_CACHE_ALWAYS,
pgfdw_inval_callback is getting called many times and the connections
that are not used i..e xact_depth == 0, are getting disconnected
there, so we are not seeing the consistent results for
postgres_fdw_get_connectionstest cases. If the connections are being
used within the xact, then the valid option for those connections are
being shown as false again making postgres_fdw_get_connections output
inconsistent. This is what happened on the build farm member with
CLOBBER_CACHE_ALWAYS build.

So if we go with option (1), get rid of valid state from
postgres_fdw_get_connectionstest and having the test cases inside an
explicit xact block at the beginning of the postgres_fdw.sql test
file, we don't see CLOBBER_CACHE_ALWAYS inconsistencies. I'm not sure
if this is the correct way.

> Regarding (1), as far as I understand correctly, even when the transaction
> doesn't use foreign tables at all, it needs to scan the connection cache
> entries if necessary. I was thinking to avoid this. I guess that this doesn't
> work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
> because with the patch the commit/rollback callback is performed only
> for the connections used in the transaction.

You mean to say, pgfdw_xact_callback will not get called when the xact
uses no foreign server connection or is it that pgfdw_xact_callback
gets called but exits quickly from it? I'm not sure what the 2PC patch
does.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/29 14:46, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 11:08 AM Bharath Rupireddy
 wrote:


On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
 wrote:

On 2021/01/29 14:12, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
 wrote:

On 2021/01/29 11:09, Tom Lane wrote:

Bharath Rupireddy  writes:

On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.



Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.
I will further analyze making tests stable, meanwhile any suggestions
are welcome.


I do not think you should regard this as "we need to hack the test
to make it stable".  I think you should regard this as "this is a
bug".  A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.

Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me.  Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?


If the connection is still used in the transaction, pgfdw_inval_callback()
marks it as invalidated and doesn't close it. So I was not thinking that
this is so unsafe.

The disconnection code in pgfdw_inval_callback() was added in commit
e3ebcca843 to fix connection leak issue, and it's back-patched. If this
change is really unsafe, we need to revert it immediately at least from back
branches because the next minor release is scheduled soon.


I think we can remove disconnect_pg_server in pgfdw_inval_callback and
make entries only invalidated. Anyways, those connections can get
closed at the end of main txn in pgfdw_xact_callback. Thoughts?


But this revives the connection leak issue. So isn't it better to
to do that after we confirm that the current code is really unsafe?


IMO, connections will not leak, because the invalidated connections
eventually will get closed in pgfdw_xact_callback at the main txn end.

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

 if (PQstatus(entry->conn) != CONNECTION_OK ||
 PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
 entry->changing_xact_state ||
 entry->invalidated).   > by adding this
 {
 elog(DEBUG3, "discarding connection %p", entry->conn);
 disconnect_pg_server(entry);
 }

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.


Also, see the original patch for the connection leak issue just does
option (1), see [1]. But in [2] and [3], we chose option (2).

I feel, we can go for option (1), with the patch attached in [1] i.e.
having have_invalid_connections whenever any connection gets invalided
so that we don't quickly exit in pgfdw_xact_callback and the
invalidated connections get closed properly. Thoughts?


Before going for (1) or something, I'd like to understand what the actual
issue of (2), i.e., the current code is. Otherwise other approaches might
have the same issue.


Regarding (1), as far as I understand correctly, even when the transaction
doesn't use foreign tables at all, it needs to scan the connection cache
entries if necessary. I was thinking to avoid this. I guess that this doesn't
work with at least the postgres_fdw 2PC patch that Sawada-san is proposing
because with the patch the commit/rollback callback is performed only
for the connections used in the transaction.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 11:38 AM Fujii Masao
 wrote:
> On 2021/01/29 14:53, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
> >  wrote:
>  BTW, even if we change pgfdw_inval_callback() so that it doesn't close
>  the connection at all, ISTM that the results of 
>  postgres_fdw_get_connections()
>  would not be stable because entry->invalidated would vary based on
>  whether CLOBBER_CACHE_ALWAYS is used or not.
> >>>
> >>> Yes, after the above change (removing disconnect_pg_server in
> >>> pgfdw_inval_callback), our tests don't get stable because
> >>> postgres_fdw_get_connections shows the valid state of the connections.
> >>> I think we can change postgres_fdw_get_connections so that it only
> >>> shows the active connections server name but not valid state. Because,
> >>> the valid state is something dependent on the internal state change
> >>> and is not consistent with the user expectation but we are exposing it
> >>> to the user.  Thoughts?
> >>
> >> I don't think that's enough because even the following simple
> >> queries return the different results, depending on whether
> >> CLOBBER_CACHE_ALWAYS is used or not.
> >>
> >>   SELECT * FROM ft6;  -- ft6 is the foreign table
> >>   SELECT server_name FROM postgres_fdw_get_connections();
> >>
> >> When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
> >> returns no records because the connection is marked as invalidated,
> >> and then closed at xact callback in SELECT query. Otherwise,
> >> postgres_fdw_get_connections() returns at least one connection that
> >> was established in the SELECT query.
> >
> > Right. In that case, after changing postgres_fdw_get_connections() so
> > that it doesn't output the valid state of the connections at all, we
>
> You're thinking to get rid of "valid" column? Or hide it from the test query
> (e.g., SELECT server_name from postgres_fdw_get_connections())?

I'm thinking we can get rid of the "valid" column from the
postgres_fdw_get_connections() function, not from the tests. Seems
like we are exposing some internal state(connection is valid or not)
which can change because of internal events. And also with the
existing postgres_fdw_get_connections(), the valid will always be true
if the user calls postgres_fdw_get_connections() outside an explicit
xact block, it can become false only when it's used in an explicit txn
block. So, the valid column may not be much useful for the user.
Thoughts?

> > can have all the new function test cases inside an explicit txn block.
> > So even if the clobber cache invalidates the connections, they don't
> > get closed until the end of main xact, the tests will be stable.
> > Thoughts?
>
> Also if there are cached connections before starting that transaction,
> they should be closed or established again before executing
> postgres_fdw_get_connections(). Otherwise, those connections are
> returned from postgres_fdw_get_connections() when
> CLOBBER_CACHE_ALWAYS is not used, but not when it's used.

Yes, we need to move the test to the place where cache wouldn't have
been initialized yet or no foreign connection has been made yet in the
session.

ALTER FOREIGN TABLE ft2 ALTER COLUMN c1 OPTIONS (column_name 'C 1');
\det+

<<<>>>

-- Test that alteration of server options causes reconnection
-- Remote's errors might be non-English, so hide them to ensure stable results
\set VERBOSITY terse
SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should work
ALTER SERVER loopback OPTIONS (SET dbname 'no such database');
SELECT c3, c4 FROM ft1 ORDER BY c3, c1 LIMIT 1;  -- should fail
DO $d$

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/29 14:53, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
 wrote:

BTW, even if we change pgfdw_inval_callback() so that it doesn't close
the connection at all, ISTM that the results of postgres_fdw_get_connections()
would not be stable because entry->invalidated would vary based on
whether CLOBBER_CACHE_ALWAYS is used or not.


Yes, after the above change (removing disconnect_pg_server in
pgfdw_inval_callback), our tests don't get stable because
postgres_fdw_get_connections shows the valid state of the connections.
I think we can change postgres_fdw_get_connections so that it only
shows the active connections server name but not valid state. Because,
the valid state is something dependent on the internal state change
and is not consistent with the user expectation but we are exposing it
to the user.  Thoughts?


I don't think that's enough because even the following simple
queries return the different results, depending on whether
CLOBBER_CACHE_ALWAYS is used or not.

  SELECT * FROM ft6;  -- ft6 is the foreign table
  SELECT server_name FROM postgres_fdw_get_connections();

When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
returns no records because the connection is marked as invalidated,
and then closed at xact callback in SELECT query. Otherwise,
postgres_fdw_get_connections() returns at least one connection that
was established in the SELECT query.


Right. In that case, after changing postgres_fdw_get_connections() so
that it doesn't output the valid state of the connections at all, we


You're thinking to get rid of "valid" column? Or hide it from the test query
(e.g., SELECT server_name from postgres_fdw_get_connections())?


can have all the new function test cases inside an explicit txn block.
So even if the clobber cache invalidates the connections, they don't
get closed until the end of main xact, the tests will be stable.
Thoughts?


Also if there are cached connections before starting that transaction,
they should be closed or established again before executing
postgres_fdw_get_connections(). Otherwise, those connections are
returned from postgres_fdw_get_connections() when
CLOBBER_CACHE_ALWAYS is not used, but not when it's used.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
 wrote:
> >> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
> >> the connection at all, ISTM that the results of 
> >> postgres_fdw_get_connections()
> >> would not be stable because entry->invalidated would vary based on
> >> whether CLOBBER_CACHE_ALWAYS is used or not.
> >
> > Yes, after the above change (removing disconnect_pg_server in
> > pgfdw_inval_callback), our tests don't get stable because
> > postgres_fdw_get_connections shows the valid state of the connections.
> > I think we can change postgres_fdw_get_connections so that it only
> > shows the active connections server name but not valid state. Because,
> > the valid state is something dependent on the internal state change
> > and is not consistent with the user expectation but we are exposing it
> > to the user.  Thoughts?
>
> I don't think that's enough because even the following simple
> queries return the different results, depending on whether
> CLOBBER_CACHE_ALWAYS is used or not.
>
>  SELECT * FROM ft6;  -- ft6 is the foreign table
>  SELECT server_name FROM postgres_fdw_get_connections();
>
> When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
> returns no records because the connection is marked as invalidated,
> and then closed at xact callback in SELECT query. Otherwise,
> postgres_fdw_get_connections() returns at least one connection that
> was established in the SELECT query.

Right. In that case, after changing postgres_fdw_get_connections() so
that it doesn't output the valid state of the connections at all, we
can have all the new function test cases inside an explicit txn block.
So even if the clobber cache invalidates the connections, they don't
get closed until the end of main xact, the tests will be stable.
Thoughts?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 11:08 AM Bharath Rupireddy
 wrote:
>
> On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
>  wrote:
> > On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> > >  wrote:
> > >> On 2021/01/29 11:09, Tom Lane wrote:
> > >>> Bharath Rupireddy  writes:
> >  On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:
> > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> > > This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> > > telling us is that the patch's behavior is unstable in the face
> > > of unexpected cache flushes.
> > >>>
> >  Thanks a lot! It looks like the syscache invalidation messages are
> >  generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >  which pgfdw_inval_callback gets called many times in which the cached
> >  entries are marked as invalid and closed if they are not used in the
> >  txn. The new function postgres_fdw_get_connections outputs the
> >  information of the cached connections such as name if the connection
> >  is still open and their validity. Hence the output of the
> >  postgres_fdw_get_connections became unstable in the buildfarm member.
> >  I will further analyze making tests stable, meanwhile any suggestions
> >  are welcome.
> > >>>
> > >>> I do not think you should regard this as "we need to hack the test
> > >>> to make it stable".  I think you should regard this as "this is a
> > >>> bug".  A cache flush should not cause user-visible state changes.
> > >>> In particular, the above analysis implies that you think a cache
> > >>> flush is equivalent to end-of-transaction, which it absolutely
> > >>> is not.
> > >>>
> > >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> > >>> the heck out of me.  Actually disconnecting a connection during
> > >>> a cache inval callback seems quite unsafe --- what if that happens
> > >>> while we're using the connection?
> > >>
> > >> If the connection is still used in the transaction, 
> > >> pgfdw_inval_callback()
> > >> marks it as invalidated and doesn't close it. So I was not thinking that
> > >> this is so unsafe.
> > >>
> > >> The disconnection code in pgfdw_inval_callback() was added in commit
> > >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> > >> change is really unsafe, we need to revert it immediately at least from 
> > >> back
> > >> branches because the next minor release is scheduled soon.
> > >
> > > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > > make entries only invalidated. Anyways, those connections can get
> > > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
> >
> > But this revives the connection leak issue. So isn't it better to
> > to do that after we confirm that the current code is really unsafe?
>
> IMO, connections will not leak, because the invalidated connections
> eventually will get closed in pgfdw_xact_callback at the main txn end.
>
> IIRC, when we were finding a way to close the invalidated connections
> so that they don't leaked, we had two options:
>
> 1) let those connections (whether currently being used in the xact or
> not) get marked invalidated in pgfdw_inval_callback and closed in
> pgfdw_xact_callback at the main txn end as shown below
>
> if (PQstatus(entry->conn) != CONNECTION_OK ||
> PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
> entry->changing_xact_state ||
> entry->invalidated).   > by adding this
> {
> elog(DEBUG3, "discarding connection %p", entry->conn);
> disconnect_pg_server(entry);
> }
>
> 2) close the unused connections right away in pgfdw_inval_callback
> instead of marking them invalidated. Mark used connections as
> invalidated in pgfdw_inval_callback and close them in
> pgfdw_xact_callback at the main txn end.
>
> We went with option (2) because we thought this would ease some burden
> on pgfdw_xact_callback closing a lot of invalid connections at once.

Also, see the original patch for the connection leak issue just does
option (1), see [1]. But in [2] and [3], we chose option (2).

I feel, we can go for option (1), with the patch attached in [1] i.e.
having have_invalid_connections whenever any connection gets invalided
so that we don't quickly exit in pgfdw_xact_callback and the
invalidated connections get closed properly. Thoughts?

static void
pgfdw_xact_callback(XactEvent event, void *arg)
{
HASH_SEQ_STATUS scan;
ConnCacheEntry *entry;

/* Quick exit if no connections were touched in this transaction. */
if (!xact_got_connection)
return;

[1] 
https://www.postgresql.org/message-id/CALj2ACVNcGH_6qLY-4_tXz8JLvA%2B4yeBThRfxMz7Oxbk1aHcpQ%40mail.gmail.com
[2] 
https://www.postgresql.org/message-id/f57dd9c3-0664-5f4c-41f0-0713047ae7b7%40oss.nttdata.com
[3] 
https://www.postgresql

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 10:55 AM Fujii Masao
 wrote:
> On 2021/01/29 14:12, Bharath Rupireddy wrote:
> > On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
> >  wrote:
> >> On 2021/01/29 11:09, Tom Lane wrote:
> >>> Bharath Rupireddy  writes:
>  On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:
> > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> > This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> > telling us is that the patch's behavior is unstable in the face
> > of unexpected cache flushes.
> >>>
>  Thanks a lot! It looks like the syscache invalidation messages are
>  generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
>  which pgfdw_inval_callback gets called many times in which the cached
>  entries are marked as invalid and closed if they are not used in the
>  txn. The new function postgres_fdw_get_connections outputs the
>  information of the cached connections such as name if the connection
>  is still open and their validity. Hence the output of the
>  postgres_fdw_get_connections became unstable in the buildfarm member.
>  I will further analyze making tests stable, meanwhile any suggestions
>  are welcome.
> >>>
> >>> I do not think you should regard this as "we need to hack the test
> >>> to make it stable".  I think you should regard this as "this is a
> >>> bug".  A cache flush should not cause user-visible state changes.
> >>> In particular, the above analysis implies that you think a cache
> >>> flush is equivalent to end-of-transaction, which it absolutely
> >>> is not.
> >>>
> >>> Also, now that I've looked at pgfdw_inval_callback, it scares
> >>> the heck out of me.  Actually disconnecting a connection during
> >>> a cache inval callback seems quite unsafe --- what if that happens
> >>> while we're using the connection?
> >>
> >> If the connection is still used in the transaction, pgfdw_inval_callback()
> >> marks it as invalidated and doesn't close it. So I was not thinking that
> >> this is so unsafe.
> >>
> >> The disconnection code in pgfdw_inval_callback() was added in commit
> >> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> >> change is really unsafe, we need to revert it immediately at least from 
> >> back
> >> branches because the next minor release is scheduled soon.
> >
> > I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> > make entries only invalidated. Anyways, those connections can get
> > closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> But this revives the connection leak issue. So isn't it better to
> to do that after we confirm that the current code is really unsafe?

IMO, connections will not leak, because the invalidated connections
eventually will get closed in pgfdw_xact_callback at the main txn end.

IIRC, when we were finding a way to close the invalidated connections
so that they don't leaked, we had two options:

1) let those connections (whether currently being used in the xact or
not) get marked invalidated in pgfdw_inval_callback and closed in
pgfdw_xact_callback at the main txn end as shown below

if (PQstatus(entry->conn) != CONNECTION_OK ||
PQtransactionStatus(entry->conn) != PQTRANS_IDLE ||
entry->changing_xact_state ||
entry->invalidated).   > by adding this
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
}

2) close the unused connections right away in pgfdw_inval_callback
instead of marking them invalidated. Mark used connections as
invalidated in pgfdw_inval_callback and close them in
pgfdw_xact_callback at the main txn end.

We went with option (2) because we thought this would ease some burden
on pgfdw_xact_callback closing a lot of invalid connections at once.

Hope that's fine.

I will respond to postgres_fdw_get_connections issue separately.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/29 14:12, Bharath Rupireddy wrote:

On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
 wrote:

On 2021/01/29 11:09, Tom Lane wrote:

Bharath Rupireddy  writes:

On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.



Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.
I will further analyze making tests stable, meanwhile any suggestions
are welcome.


I do not think you should regard this as "we need to hack the test
to make it stable".  I think you should regard this as "this is a
bug".  A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.

Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me.  Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?


If the connection is still used in the transaction, pgfdw_inval_callback()
marks it as invalidated and doesn't close it. So I was not thinking that
this is so unsafe.

The disconnection code in pgfdw_inval_callback() was added in commit
e3ebcca843 to fix connection leak issue, and it's back-patched. If this
change is really unsafe, we need to revert it immediately at least from back
branches because the next minor release is scheduled soon.


I think we can remove disconnect_pg_server in pgfdw_inval_callback and
make entries only invalidated. Anyways, those connections can get
closed at the end of main txn in pgfdw_xact_callback. Thoughts?


But this revives the connection leak issue. So isn't it better to
to do that after we confirm that the current code is really unsafe?



If okay, I can make a patch for this.


BTW, even if we change pgfdw_inval_callback() so that it doesn't close
the connection at all, ISTM that the results of postgres_fdw_get_connections()
would not be stable because entry->invalidated would vary based on
whether CLOBBER_CACHE_ALWAYS is used or not.


Yes, after the above change (removing disconnect_pg_server in
pgfdw_inval_callback), our tests don't get stable because
postgres_fdw_get_connections shows the valid state of the connections.
I think we can change postgres_fdw_get_connections so that it only
shows the active connections server name but not valid state. Because,
the valid state is something dependent on the internal state change
and is not consistent with the user expectation but we are exposing it
to the user.  Thoughts?


I don't think that's enough because even the following simple
queries return the different results, depending on whether
CLOBBER_CACHE_ALWAYS is used or not.

SELECT * FROM ft6;  -- ft6 is the foreign table
SELECT server_name FROM postgres_fdw_get_connections();

When CLOBBER_CACHE_ALWAYS is used, postgres_fdw_get_connections()
returns no records because the connection is marked as invalidated,
and then closed at xact callback in SELECT query. Otherwise,
postgres_fdw_get_connections() returns at least one connection that
was established in the SELECT query.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 10:42 AM Bharath Rupireddy
 wrote:
> > > Also, now that I've looked at pgfdw_inval_callback, it scares
> > > the heck out of me.  Actually disconnecting a connection during
> > > a cache inval callback seems quite unsafe --- what if that happens
> > > while we're using the connection?
> >
> > If the connection is still used in the transaction, pgfdw_inval_callback()
> > marks it as invalidated and doesn't close it. So I was not thinking that
> > this is so unsafe.
> >
> > The disconnection code in pgfdw_inval_callback() was added in commit
> > e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> > change is really unsafe, we need to revert it immediately at least from back
> > branches because the next minor release is scheduled soon.
>
> I think we can remove disconnect_pg_server in pgfdw_inval_callback and
> make entries only invalidated. Anyways, those connections can get
> closed at the end of main txn in pgfdw_xact_callback. Thoughts?
>
> If okay, I can make a patch for this.

Attaching a patch for this, which can be back patched.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v1-0001-Fix-connection-closure-issue-in-pgfdw_inval_callb.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 10:28 AM Fujii Masao
 wrote:
> On 2021/01/29 11:09, Tom Lane wrote:
> > Bharath Rupireddy  writes:
> >> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:
> >>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
> >>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> >>> telling us is that the patch's behavior is unstable in the face
> >>> of unexpected cache flushes.
> >
> >> Thanks a lot! It looks like the syscache invalidation messages are
> >> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> >> which pgfdw_inval_callback gets called many times in which the cached
> >> entries are marked as invalid and closed if they are not used in the
> >> txn. The new function postgres_fdw_get_connections outputs the
> >> information of the cached connections such as name if the connection
> >> is still open and their validity. Hence the output of the
> >> postgres_fdw_get_connections became unstable in the buildfarm member.
> >> I will further analyze making tests stable, meanwhile any suggestions
> >> are welcome.
> >
> > I do not think you should regard this as "we need to hack the test
> > to make it stable".  I think you should regard this as "this is a
> > bug".  A cache flush should not cause user-visible state changes.
> > In particular, the above analysis implies that you think a cache
> > flush is equivalent to end-of-transaction, which it absolutely
> > is not.
> >
> > Also, now that I've looked at pgfdw_inval_callback, it scares
> > the heck out of me.  Actually disconnecting a connection during
> > a cache inval callback seems quite unsafe --- what if that happens
> > while we're using the connection?
>
> If the connection is still used in the transaction, pgfdw_inval_callback()
> marks it as invalidated and doesn't close it. So I was not thinking that
> this is so unsafe.
>
> The disconnection code in pgfdw_inval_callback() was added in commit
> e3ebcca843 to fix connection leak issue, and it's back-patched. If this
> change is really unsafe, we need to revert it immediately at least from back
> branches because the next minor release is scheduled soon.

I think we can remove disconnect_pg_server in pgfdw_inval_callback and
make entries only invalidated. Anyways, those connections can get
closed at the end of main txn in pgfdw_xact_callback. Thoughts?

If okay, I can make a patch for this.

> BTW, even if we change pgfdw_inval_callback() so that it doesn't close
> the connection at all, ISTM that the results of postgres_fdw_get_connections()
> would not be stable because entry->invalidated would vary based on
> whether CLOBBER_CACHE_ALWAYS is used or not.

Yes, after the above change (removing disconnect_pg_server in
pgfdw_inval_callback), our tests don't get stable because
postgres_fdw_get_connections shows the valid state of the connections.
I think we can change postgres_fdw_get_connections so that it only
shows the active connections server name but not valid state. Because,
the valid state is something dependent on the internal state change
and is not consistent with the user expectation but we are exposing it
to the user.  Thoughts?

If okay, I can work on the patch for this.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On 2021/01/29 11:09, Tom Lane wrote:

Bharath Rupireddy writes:

On Fri, Jan 29, 2021 at 1:52 AM Tom Lane wrote:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.

Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.
I will further analyze making tests stable, meanwhile any suggestions
are welcome.

I do not think you should regard this as "we need to hack the test
to make it stable". I think you should regard this as "this is a
bug". A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.

Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me. Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?

If the connection is still used in the transaction, pgfdw_inval_callback()
marks it as invalidated and doesn't close it. So I was not thinking that
this is so unsafe.

The disconnection code in pgfdw_inval_callback() was added in commit
e3ebcca843 to fix connection leak issue, and it's back-patched. If this
change is really unsafe, we need to revert it immediately at least from back
branches because the next minor release is scheduled soon.

BTW, even if we change pgfdw_inval_callback() so that it doesn't close
the connection at all, ISTM that the results of postgres_fdw_get_connections()
would not be stable because entry->invalidated would vary based on
whether CLOBBER_CACHE_ALWAYS is used or not.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-28 Thread Tom Lane

Bharath Rupireddy  writes:
> On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
>> telling us is that the patch's behavior is unstable in the face
>> of unexpected cache flushes.

> Thanks a lot! It looks like the syscache invalidation messages are
> generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
> which pgfdw_inval_callback gets called many times in which the cached
> entries are marked as invalid and closed if they are not used in the
> txn. The new function postgres_fdw_get_connections outputs the
> information of the cached connections such as name if the connection
> is still open and their validity. Hence the output of the
> postgres_fdw_get_connections became unstable in the buildfarm member.
> I will further analyze making tests stable, meanwhile any suggestions
> are welcome.

I do not think you should regard this as "we need to hack the test
to make it stable".  I think you should regard this as "this is a
bug".  A cache flush should not cause user-visible state changes.
In particular, the above analysis implies that you think a cache
flush is equivalent to end-of-transaction, which it absolutely
is not.

Also, now that I've looked at pgfdw_inval_callback, it scares
the heck out of me.  Actually disconnecting a connection during
a cache inval callback seems quite unsafe --- what if that happens
while we're using the connection?

I fear this patch needs to be reverted and redesigned.

regards, tom lane

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Fri, Jan 29, 2021 at 1:52 AM Tom Lane  wrote:
>
> Bharath Rupireddy  writes:
> > On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao  
> > wrote:
> >> Thanks for the patch! I also created that patch, confirmed that the test
> >> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
> >> and pushed the patch.
>
> > Thanks a lot!
>
> Seems you're not out of the woods yet:
>
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40
>
> This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
> telling us is that the patch's behavior is unstable in the face
> of unexpected cache flushes.

Thanks a lot! It looks like the syscache invalidation messages are
generated too frequently with -DCLOBBER_CACHE_ALWAYS build due to
which pgfdw_inval_callback gets called many times in which the cached
entries are marked as invalid and closed if they are not used in the
txn. The new function postgres_fdw_get_connections outputs the
information of the cached connections such as name if the connection
is still open and their validity. Hence the output of the
postgres_fdw_get_connections became unstable in the buildfarm member.

I will further analyze making tests stable, meanwhile any suggestions
are welcome.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-28 Thread Tom Lane

Bharath Rupireddy  writes:
> On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao  
> wrote:
>> Thanks for the patch! I also created that patch, confirmed that the test
>> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
>> and pushed the patch.

> Thanks a lot!

Seems you're not out of the woods yet:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=trilobite&dt=2021-01-26%2019%3A59%3A40

This is a CLOBBER_CACHE_ALWAYS build, so I suspect what it's
telling us is that the patch's behavior is unstable in the face
of unexpected cache flushes.

regards, tom lane

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-26 Thread Bharath Rupireddy

On Tue, Jan 26, 2021 at 8:38 AM Bharath Rupireddy
 wrote:
> I will post "keep_connections" GUC and "keep_connection" server level
> option patches later.

Attaching v19 patch set for "keep_connections" GUC and
"keep_connection" server level option. Please review them further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v19-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data

v19-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-26 Thread Bharath Rupireddy

On Tue, Jan 26, 2021 at 1:55 PM Fujii Masao  wrote:
> Thanks for the patch! I also created that patch, confirmed that the test
> successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
> and pushed the patch.

Thanks a lot!

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-26 Thread Fujii Masao





On 2021/01/26 17:07, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao  wrote:

Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.


Sorry, I'm unfamiliar with checking the system status on the build
farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
I'm trying to figure that out.


+WARNING:  roles created by regression test cases should have names starting with 
"regress_"
   CREATE ROLE multi_conn_user2 SUPERUSER;
+WARNING:  roles created by regression test cases should have names starting with 
"regress_"

Hmm... another failure happened.


My bad. I should have caught that earlier. I will take care in future.

Attaching a patch to fix it.


Thanks for the patch! I also created that patch, confirmed that the test
successfully passed with -DENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS,
and pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-26 Thread Bharath Rupireddy

On Tue, Jan 26, 2021 at 1:27 PM Fujii Masao  wrote:
> > Yes, so I pushed that change to stabilize the regression test.
> > Let's keep checking how the results of buildfarm members are changed.

Sorry, I'm unfamiliar with checking the system status on the build
farm website - https://buildfarm.postgresql.org/cgi-bin/show_failures.pl.
I'm trying to figure that out.

> +WARNING:  roles created by regression test cases should have names starting 
> with "regress_"
>   CREATE ROLE multi_conn_user2 SUPERUSER;
> +WARNING:  roles created by regression test cases should have names starting 
> with "regress_"
>
> Hmm... another failure happened.

My bad. I should have caught that earlier. I will take care in future.

Attaching a patch to fix it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v1-0001-Stabilize-test-case-for-postgres_fdw_disconnect_a.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/26 16:39, Fujii Masao wrote:



On 2021/01/26 16:33, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:

On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

   -- Return false as connections are still in use, warnings are issued.
   SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
   WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.


Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".


Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.


+WARNING:  roles created by regression test cases should have names starting with 
"regress_"
 CREATE ROLE multi_conn_user2 SUPERUSER;
+WARNING:  roles created by regression test cases should have names starting with 
"regress_"

Hmm... another failure happened.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/26 16:33, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:

On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

   -- Return false as connections are still in use, warnings are issued.
   SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
   WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.


Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".


Yes, so I pushed that change to stabilize the regression test.
Let's keep checking how the results of buildfarm members are changed.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Tue, Jan 26, 2021 at 12:54 PM Fujii Masao
 wrote:
> On 2021/01/26 16:05, Tom Lane wrote:
> > Fujii Masao  writes:
> >> Thanks for the review! I fixed them and pushed the patch!
> >
> > Buildfarm is very not happy ...
>
> Yes I'm investigating that.
>
>   -- Return false as connections are still in use, warnings are issued.
>   SELECT postgres_fdw_disconnect_all();
> -WARNING:  cannot close dropped server connection because it is still in use
> -WARNING:  cannot close connection for server "loopback" because it is still 
> in use
>   WARNING:  cannot close connection for server "loopback2" because it is 
> still in use
> +WARNING:  cannot close connection for server "loopback" because it is still 
> in use
> +WARNING:  cannot close dropped server connection because it is still in use
>
> The cause of the regression test failure is that the order of warning messages
> is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
> when doing the above test.

Looks like we do suppress warnings/notices by setting
client_min_messages to ERROR/WARNING. For instance, "suppress warning
that depends on wal_level" and  "Suppress NOTICE messages when
users/groups don't exist".

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/26 16:05, Tom Lane wrote:

Fujii Masao  writes:

Thanks for the review! I fixed them and pushed the patch!


Buildfarm is very not happy ...


Yes I'm investigating that.

 -- Return false as connections are still in use, warnings are issued.
 SELECT postgres_fdw_disconnect_all();
-WARNING:  cannot close dropped server connection because it is still in use
-WARNING:  cannot close connection for server "loopback" because it is still in 
use
 WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close dropped server connection because it is still in use

The cause of the regression test failure is that the order of warning messages
is not stable. So I'm thinking to set client_min_messages to ERROR temporarily
when doing the above test.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-25 Thread Tom Lane

Fujii Masao  writes:
> Thanks for the review! I fixed them and pushed the patch!

Buildfarm is very not happy ...

regards, tom lane

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/26 12:08, Bharath Rupireddy wrote:

On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
 wrote:

Attaching v17 patch set, please review it further.


Thanks for updating the patch!

Attached is the tweaked version of the patch. I didn't change any logic,
but I updated some comments and docs. Also I added the regresssion test
to check that postgres_fdw_disconnect() closes multiple connections.
Barring any objection, I will commit this version.


Thanks. The patch LGTM, except few typos:
1) in the commit message "a warning messsage is emitted." it's
"message" not "messsage".
2) in the documentation "+   a user mapping, the correspoinding
connections are closed." it's "corresponding" not "correspoinding".


Thanks for the review! I fixed them and pushed the patch!

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Tue, Jan 26, 2021 at 12:38 AM Fujii Masao
 wrote:
> > Attaching v17 patch set, please review it further.
>
> Thanks for updating the patch!
>
> Attached is the tweaked version of the patch. I didn't change any logic,
> but I updated some comments and docs. Also I added the regresssion test
> to check that postgres_fdw_disconnect() closes multiple connections.
> Barring any objection, I will commit this version.

Thanks. The patch LGTM, except few typos:
1) in the commit message "a warning messsage is emitted." it's
"message" not "messsage".
2) in the documentation "+   a user mapping, the correspoinding
connections are closed." it's "corresponding" not "correspoinding".

I will post "keep_connections" GUC and "keep_connection" server level
option patches later.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit




On 2021/01/26 0:12, Bharath Rupireddy wrote:

On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
 wrote:

I will provide the updated patch set soon.


Attaching v17 patch set, please review it further.


Thanks for updating the patch!

Attached is the tweaked version of the patch. I didn't change any logic,
but I updated some comments and docs. Also I added the regresssion test
to check that postgres_fdw_disconnect() closes multiple connections.
Barring any objection, I will commit this version.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From fa3e0f0a700588ab644c4c752e06c03845450712 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Tue, 26 Jan 2021 03:54:46 +0900
Subject: [PATCH] postgres_fdw: Add functions to discard cached connections.

This commit introduces two new functions postgres_fdw_disconnect()
and postgres_fdw_disconnect_all(). The former function discards
the cached connection to the specified foreign server. The latter discards
all the cached connections. If the connection is used in the current
transaction, it's not closed and a warning messsage is emitted.

For example, these functions are useful when users want to explicitly
close the foreign server connections that are no longer necessary and
then to prevent them from eating up the foreign servers connections
capacity.

Author: Bharath Rupireddy, tweaked a bit by Fujii Masao
Reviewed-by: Alexey Kondratov, Zhijie Hou, Zhihong Yu, Fujii Masao
Discussion: 
https://postgr.es/m/CALj2ACVvrp5=avp2pupem+nac8s4buqr3fjmmacoc7ftt0a...@mail.gmail.com
---
 contrib/postgres_fdw/connection.c | 135 +++-
 .../postgres_fdw/expected/postgres_fdw.out| 208 +-
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql   |  10 +
 contrib/postgres_fdw/sql/postgres_fdw.sql |  98 -
 doc/src/sgml/postgres-fdw.sgml|  67 +-
 5 files changed, 505 insertions(+), 13 deletions(-)

diff --git a/contrib/postgres_fdw/connection.c 
b/contrib/postgres_fdw/connection.c
index a1404cb6bb..ee0b4acf0b 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -80,6 +80,8 @@ static bool xact_got_connection = false;
  * SQL functions
  */
 PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect_all);
 
 /* prototypes of private functions */
 static void make_new_connection(ConnCacheEntry *entry, UserMapping *user);
@@ -102,6 +104,7 @@ static bool pgfdw_exec_cleanup_query(PGconn *conn, const 
char *query,
 static bool pgfdw_get_cleanup_result(PGconn *conn, TimestampTz endtime,
 
PGresult **result);
 static bool UserMappingPasswordRequired(UserMapping *user);
+static bool disconnect_cached_connections(Oid serverid);
 
 /*
  * Get a PGconn which can be used to execute queries on the remote PostgreSQL
@@ -1428,8 +1431,8 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
 * Even though the server is dropped in the current 
transaction, the
 * cache can still have associated active connection entry, say 
we
 * call such connections dangling. Since we can not fetch the 
server
-* name from system catalogs for dangling connections, instead 
we
-* show NULL value for server name in output.
+* name from system catalogs for dangling connections, instead 
we show
+* NULL value for server name in output.
 *
 * We could have done better by storing the server name in the 
cache
 * entry instead of server oid so that it could be used in the 
output.
@@ -1447,7 +1450,7 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
/*
 * If the server has been dropped in the current 
explicit
 * transaction, then this entry would have been 
invalidated in
-* pgfdw_inval_callback at the end of drop sever 
command. Note
+* pgfdw_inval_callback at the end of drop server 
command. Note
 * that this connection would not have been closed in
 * pgfdw_inval_callback because it is still being used 
in the
 * current explicit transaction. So, assert that here.
@@ -1470,3 +1473,129 @@ postgres_fdw_get_connections(PG_FUNCTION_ARGS)
 
PG_RETURN_VOID();
 }
+
+/*
+ * Disconnect the specified cached connections.
+ *
+ * This function discards the open connections that are established by
+ * postgres_fdw from the local session to the foreign server with
+ * the given name. Note that there can be multiple connections to
+ * the given server using different user mappings. If the connections
+ * are used in the current local transaction, they are not disconne

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Mon, Jan 25, 2021 at 7:28 PM Bharath Rupireddy
 wrote:
> I will provide the updated patch set soon.

Attaching v17 patch set, please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v17-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data


v17-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data


v17-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Mon, Jan 25, 2021 at 7:20 PM Fujii Masao  wrote:
> On 2021/01/25 19:28, Bharath Rupireddy wrote:
> > On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao  
> > wrote:
> >>> Yes, if required backends can establish the connection again. But my
> >>> worry is this - a non-super user disconnecting all or a given
> >>> connection created by a super user?
> >>
> >> Yes, I was also worried about that. But I found that there are other 
> >> similar cases, for example,
> >>
> >> - a cursor that superuser declared can be closed by non-superuser (set by 
> >> SET ROLE or SET SESSION AUTHORIZATION) in the same session.
> >> - a prepared statement that superuser created can be deallocated by 
> >> non-superuser in the same session.
> >>
> >> This makes me think that it's OK even for non-superuser to disconnect the 
> >> connections established by superuser in the same session. For now I've not 
> >> found any real security issue by doing that yet. Thought? Am I missing 
> >> something?
> >
> > Oh, and added to that list is dblink_disconnect(). I don't know
> > whether there's any security risk if we allow non-superusers to
> > discard the super users connections.
>
> I guess that's ok because superuser and nonsuperuser are running in the same 
> session. That is, since this is the case where superuser switches to 
> nonsuperuser intentionally, interactions between them is also intentional.
>
> OTOH, if nonsuperuser in one session can affect superuser in another session 
> that way, which would be problematic. So, for example, for now 
> pg_stat_activity disallows nonsuperuser to see the query that superuser in 
> another session is running, from it.

Hmm, that makes sense.

> > In this case, the super users
> > will just have to re make the connection.
> >
> >>> For now I'm thinking that it might better to add the restriction like 
> >>> pg_terminate_backend() at first and relax that later if possible. But I'd 
> >>> like hear more opinions about this.
> >>
> >> I agree. If required we can lift it later, once we get the users using
> >> these functions? Maybe we can have a comment near superchecks in
> >> disconnect_cached_connections saying, we can lift this in future?
> >
> > Maybe we can do the opposite of the above that is not doing any
> > superuser checks in disconnect functions for now, and later if some
> > users complain we can add it?
>
> +1

Thanks, will send the updated patch set soon.

> > We can leave a comment there that "As of
> > now we don't see any security risks if a non-super user disconnects
> > the connections made by super users. If required, non-supers can be
> > disallowed to disconnct the connections" ?
>
> Yes. Also we should note that that's ok because they are in the same session.

I will add this comment in disconnect_cached_connections so that we
don't lose track of it.

I will provide the updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/25 19:28, Bharath Rupireddy wrote:

On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao  wrote:

Yes, if required backends can establish the connection again. But my
worry is this - a non-super user disconnecting all or a given
connection created by a super user?


Yes, I was also worried about that. But I found that there are other similar 
cases, for example,

- a cursor that superuser declared can be closed by non-superuser (set by SET 
ROLE or SET SESSION AUTHORIZATION) in the same session.
- a prepared statement that superuser created can be deallocated by 
non-superuser in the same session.

This makes me think that it's OK even for non-superuser to disconnect the 
connections established by superuser in the same session. For now I've not 
found any real security issue by doing that yet. Thought? Am I missing 
something?


Oh, and added to that list is dblink_disconnect(). I don't know
whether there's any security risk if we allow non-superusers to
discard the super users connections.


I guess that's ok because superuser and nonsuperuser are running in the same 
session. That is, since this is the case where superuser switches to 
nonsuperuser intentionally, interactions between them is also intentional.

OTOH, if nonsuperuser in one session can affect superuser in another session 
that way, which would be problematic. So, for example, for now pg_stat_activity 
disallows nonsuperuser to see the query that superuser in another session is 
running, from it.



In this case, the super users
will just have to re make the connection.


For now I'm thinking that it might better to add the restriction like 
pg_terminate_backend() at first and relax that later if possible. But I'd like 
hear more opinions about this.


I agree. If required we can lift it later, once we get the users using
these functions? Maybe we can have a comment near superchecks in
disconnect_cached_connections saying, we can lift this in future?


Maybe we can do the opposite of the above that is not doing any
superuser checks in disconnect functions for now, and later if some
users complain we can add it?


+1


We can leave a comment there that "As of
now we don't see any security risks if a non-super user disconnects
the connections made by super users. If required, non-supers can be
disallowed to disconnct the connections" ?


Yes. Also we should note that that's ok because they are in the same session.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Mon, Jan 25, 2021 at 3:17 PM Fujii Masao  wrote:
> > Yes, if required backends can establish the connection again. But my
> > worry is this - a non-super user disconnecting all or a given
> > connection created by a super user?
>
> Yes, I was also worried about that. But I found that there are other similar 
> cases, for example,
>
> - a cursor that superuser declared can be closed by non-superuser (set by SET 
> ROLE or SET SESSION AUTHORIZATION) in the same session.
> - a prepared statement that superuser created can be deallocated by 
> non-superuser in the same session.
>
> This makes me think that it's OK even for non-superuser to disconnect the 
> connections established by superuser in the same session. For now I've not 
> found any real security issue by doing that yet. Thought? Am I missing 
> something?

Oh, and added to that list is dblink_disconnect(). I don't know
whether there's any security risk if we allow non-superusers to
discard the super users connections. In this case, the super users
will just have to re make the connection.

> > For now I'm thinking that it might better to add the restriction like 
> > pg_terminate_backend() at first and relax that later if possible. But I'd 
> > like hear more opinions about this.
>
> I agree. If required we can lift it later, once we get the users using
> these functions? Maybe we can have a comment near superchecks in
> disconnect_cached_connections saying, we can lift this in future?

Maybe we can do the opposite of the above that is not doing any
superuser checks in disconnect functions for now, and later if some
users complain we can add it? We can leave a comment there that "As of
now we don't see any security risks if a non-super user disconnects
the connections made by super users. If required, non-supers can be
disallowed to disconnct the connections" ?

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/25 18:13, Bharath Rupireddy wrote:

On Mon, Jan 25, 2021 at 1:20 PM Fujii Masao  wrote:

Yeah, connections can be discarded by non-super users using
postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
fact that a non-super user requires a password to access foreign
tables [1], IMO a non-super user changing something related to a super
user makes no sense at all. If okay, we can have a check in
disconnect_cached_connections something like below:


Also like pg_terminate_backend(), we should disallow non-superuser to 
disconnect the connections established by other non-superuser if the requesting 
user is not a member of the other? Or that's overkill because the target to 
discard is just a connection and it can be established again if necessary?


Yes, if required backends can establish the connection again. But my
worry is this - a non-super user disconnecting all or a given
connection created by a super user?


Yes, I was also worried about that. But I found that there are other similar 
cases, for example,

- a cursor that superuser declared can be closed by non-superuser (set by SET 
ROLE or SET SESSION AUTHORIZATION) in the same session.
- a prepared statement that superuser created can be deallocated by 
non-superuser in the same session.

This makes me think that it's OK even for non-superuser to disconnect the 
connections established by superuser in the same session. For now I've not 
found any real security issue by doing that yet. Thought? Am I missing 
something?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Mon, Jan 25, 2021 at 1:20 PM Fujii Masao  wrote:
> > Yeah, connections can be discarded by non-super users using
> > postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
> > fact that a non-super user requires a password to access foreign
> > tables [1], IMO a non-super user changing something related to a super
> > user makes no sense at all. If okay, we can have a check in
> > disconnect_cached_connections something like below:
>
> Also like pg_terminate_backend(), we should disallow non-superuser to 
> disconnect the connections established by other non-superuser if the 
> requesting user is not a member of the other? Or that's overkill because the 
> target to discard is just a connection and it can be established again if 
> necessary?

Yes, if required backends can establish the connection again. But my
worry is this - a non-super user disconnecting all or a given
connection created by a super user?

> For now I'm thinking that it might better to add the restriction like 
> pg_terminate_backend() at first and relax that later if possible. But I'd 
> like hear more opinions about this.

I agree. If required we can lift it later, once we get the users using
these functions? Maybe we can have a comment near superchecks in
disconnect_cached_connections saying, we can lift this in future?

Do you want me to add these checks like in pg_signal_backend?

/* Only allow superusers to signal superuser-owned backends. */
if (superuser_arg(proc->roleId) && !superuser())
return SIGNAL_BACKEND_NOSUPERUSER;

/* Users can signal backends they have role membership in. */
if (!has_privs_of_role(GetUserId(), proc->roleId) &&
!has_privs_of_role(GetUserId(), DEFAULT_ROLE_SIGNAL_BACKENDID))
return SIGNAL_BACKEND_NOPERMISSION;

or only below is enough?

+/* Non-super users are not allowed to disconnect cached connections. */
+if (!superuser())
+ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to discard open connections")));

> > +static bool
> > +disconnect_cached_connections(Oid serverid)
> > +{
> > +HASH_SEQ_STATUSscan;
> > +ConnCacheEntry*entry;
> > +boolall = !OidIsValid(serverid);
> > +boolresult = false;
> > +
> > +if (!superuser())
> > +ereport(ERROR,
> > +(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
> > + errmsg("must be superuser to discard open connections")));
> > +
> > +if (!ConnectionHash)
> >
> > Having said that, it looks like dblink_disconnect doesn't perform
> > superuser checks.
>
> Also non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) seems to be 
> able to run SQL using the dblink connection established by superuser. If we 
> didn't think that this is a problem, we also might not need to care about 
> issue even for postgres_fdw.

IMO, we can have superuser checks for postgres_fdw new functions for now.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-24 Thread Fujii Masao





On 2021/01/23 13:40, Bharath Rupireddy wrote:

On Fri, Jan 22, 2021 at 6:43 PM Fujii Masao  wrote:

Please review the v16 patch set further.


Thanks! Will review that later.


+   /*
+* For the given server, if we closed connection or it 
is still in
+* use, then no need of scanning the cache further. We 
do this
+* because the cache can not have multiple cache 
entries for a
+* single foreign server.
+*/

On second thought, ISTM that single foreign server can have multiple cache
entries. For example,

CREATE ROLE foo1 SUPERUSER;
CREATE ROLE foo2 SUPERUSER;
CREATE EXTENSION postgres_fdw;
CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port '5432');
CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
CREATE TABLE t (i int);
CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
SET SESSION AUTHORIZATION foo1;
SELECT * FROM ft;
SET SESSION AUTHORIZATION foo2;
SELECT * FROM ft;


Then you can see there are multiple open connections for the same server
as follows. So we need to scan all the entries even when the serverid is
specified.

SELECT * FROM postgres_fdw_get_connections();

   server_name | valid
-+---
   loopback| t
   loopback| t
(2 rows)


This is a great finding. Thanks a lot. I will remove
hash_seq_term(&scan); in disconnect_cached_connections and add this as
a test case for postgres_fdw_get_connections function, just to show
there can be multiple connections with a single server name.


This means that user (even non-superuser) can disconnect the connection
established by another user (superuser), by using postgres_fdw_disconnect_all().
Is this really OK?


Yeah, connections can be discarded by non-super users using
postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
fact that a non-super user requires a password to access foreign
tables [1], IMO a non-super user changing something related to a super
user makes no sense at all. If okay, we can have a check in
disconnect_cached_connections something like below:


Also like pg_terminate_backend(), we should disallow non-superuser to 
disconnect the connections established by other non-superuser if the requesting 
user is not a member of the other? Or that's overkill because the target to 
discard is just a connection and it can be established again if necessary?

For now I'm thinking that it might better to add the restriction like 
pg_terminate_backend() at first and relax that later if possible. But I'd like 
hear more opinions about this.




+static bool
+disconnect_cached_connections(Oid serverid)
+{
+HASH_SEQ_STATUSscan;
+ConnCacheEntry*entry;
+boolall = !OidIsValid(serverid);
+boolresult = false;
+
+if (!superuser())
+ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to discard open connections")));
+
+if (!ConnectionHash)

Having said that, it looks like dblink_disconnect doesn't perform
superuser checks.


Also non-superuser (set by SET ROLE or SET SESSION AUTHORIZATION) seems to be 
able to run SQL using the dblink connection established by superuser. If we 
didn't think that this is a problem, we also might not need to care about issue 
even for postgres_fdw.


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-22 Thread Bharath Rupireddy

On Fri, Jan 22, 2021 at 6:43 PM Fujii Masao  wrote:
> >> Please review the v16 patch set further.
> >
> > Thanks! Will review that later.
>
> +   /*
> +* For the given server, if we closed connection or 
> it is still in
> +* use, then no need of scanning the cache further. 
> We do this
> +* because the cache can not have multiple cache 
> entries for a
> +* single foreign server.
> +*/
>
> On second thought, ISTM that single foreign server can have multiple cache
> entries. For example,
>
> CREATE ROLE foo1 SUPERUSER;
> CREATE ROLE foo2 SUPERUSER;
> CREATE EXTENSION postgres_fdw;
> CREATE SERVER loopback FOREIGN DATA WRAPPER postgres_fdw OPTIONS (port 
> '5432');
> CREATE USER MAPPING FOR foo1 SERVER loopback OPTIONS (user 'postgres');
> CREATE USER MAPPING FOR foo2 SERVER loopback OPTIONS (user 'postgres');
> CREATE TABLE t (i int);
> CREATE FOREIGN TABLE ft (i int) SERVER loopback OPTIONS (table_name 't');
> SET SESSION AUTHORIZATION foo1;
> SELECT * FROM ft;
> SET SESSION AUTHORIZATION foo2;
> SELECT * FROM ft;
>
>
> Then you can see there are multiple open connections for the same server
> as follows. So we need to scan all the entries even when the serverid is
> specified.
>
> SELECT * FROM postgres_fdw_get_connections();
>
>   server_name | valid
> -+---
>   loopback| t
>   loopback| t
> (2 rows)

This is a great finding. Thanks a lot. I will remove
hash_seq_term(&scan); in disconnect_cached_connections and add this as
a test case for postgres_fdw_get_connections function, just to show
there can be multiple connections with a single server name.

> This means that user (even non-superuser) can disconnect the connection
> established by another user (superuser), by using 
> postgres_fdw_disconnect_all().
> Is this really OK?

Yeah, connections can be discarded by non-super users using
postgres_fdw_disconnect_all and postgres_fdw_disconnect. Given the
fact that a non-super user requires a password to access foreign
tables [1], IMO a non-super user changing something related to a super
user makes no sense at all. If okay, we can have a check in
disconnect_cached_connections something like below:

+static bool
+disconnect_cached_connections(Oid serverid)
+{
+HASH_SEQ_STATUSscan;
+ConnCacheEntry*entry;
+boolall = !OidIsValid(serverid);
+boolresult = false;
+
+if (!superuser())
+ereport(ERROR,
+(errcode(ERRCODE_INSUFFICIENT_PRIVILEGE),
+ errmsg("must be superuser to discard open connections")));
+
+if (!ConnectionHash)

Having said that, it looks like dblink_disconnect doesn't perform
superuser checks.

Thoughts?

[1]
SELECT * FROM ft1_nopw LIMIT 1;
ERROR:  password is required
DETAIL:  Non-superusers must provide a password in the user mapping.

> +   if (all || (OidIsValid(serverid) && entry->serverid == 
> serverid))
> +   {
>
> I don't think that "OidIsValid(serverid)" condition is necessary here.
> But you're just concerned about the case where the caller mistakenly
> specifies invalid oid and all=false? One idea to avoid that inconsistent
> combination of inputs is to change disconnect_cached_connections()
> as follows.
>
> -disconnect_cached_connections(Oid serverid, bool all)
> +disconnect_cached_connections(Oid serverid)
>   {
> HASH_SEQ_STATUS scan;
> ConnCacheEntry  *entry;
> +   boolall = !OidIsValid(serverid);

+1. Will change it.

> +* in pgfdw_inval_callback at the end 
> of drop sever
>
> Typo: "sever" should be "server".

+1. Will change it.

> +-- ===
> +-- test postgres_fdw_disconnect function
> +-- ===
>
> This regression test is placed at the end of test file. But isn't it better
> to place that just after the regression test "test connection invalidation
>   cases" because they are related?

+1. Will change it.

> +
> +postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');
> + postgres_fdw_disconnect
>
> The tag  should start from the beginning.

+1. Will change it.

> As I commented upthread, what about replacing the example query with
> "SELECT postgres_fdw_disconnect('loopback1');" because it's more common?

Sorry, I forgot to check that in the documentation earlier. +1. Will change it.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-22 Thread Fujii Masao





On 2021/01/22 3:29, Fujii Masao wrote:



On 2021/01/22 1:17, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao  wrote:

My opinion is to check "!all", but if others prefer using such boolean flag,
I'd withdraw my opinion.


I'm really sorry, actually if (!all) is enough there, my earlier
understanding was wrong.


+   if ((all || entry->server_hashvalue == hashvalue) &&

What about making disconnect_cached_connections() accept serverid instead
of hashvalue, and perform the above comparison based on serverid? That is,
I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
of hashvalue.


That's a good idea. I missed this point. Thanks.


+   if ((all || entry->server_hashvalue == hashvalue) &&
+    entry->conn)

I think that it's better to make the check of "entry->conn" independent
like other functions in postgres_fdw/connection.c. What about adding
the following check before the above?

 /* Ignore cache entry if no open connection right now */
 if (entry->conn == NULL)
 continue;


Done.


+   /*
+    * If the server has been dropped in 
the current explicit
+    * transaction, then this entry would 
have been invalidated
+    * in pgfdw_inval_callback at the end 
of drop sever
+    * command. Note that this connection 
would not have been
+    * closed in pgfdw_inval_callback 
because it is still being
+    * used in the current explicit 
transaction. So, assert
+    * that here.
+    */
+   Assert(entry->invalidated);

As this comment explains, even when the connection is used in the transaction,
its server can be dropped in the same transaction. The connection can remain
until the end of transaction even though its server has been already dropped.
I'm now wondering if this behavior itself is problematic and should be forbid.
Of course, this is separate topic from this patch, though..

BTW, my just idea for that is;
1. change postgres_fdw_get_connections() return also serverid and xact_depth.
2. make postgres_fdw define the event trigger on DROP SERVER command so that
  an error is thrown if the connection to the server is still in use.
  The event trigger function uses postgres_fdw_get_connections() to check
  if the server connection is still in use or not.

I'm not sure if this just idea is really feasible or not, though...


I'm not quite sure if we can create such a dependency i.e. blocking
"drop foreign server" when at least one session has an in use cached
connection on it?


Maybe my explanation was not clear... I was thinking to prevent the server 
whose connection is used *within the current transaction* from being dropped. 
IOW, I was thinking to forbid the drop of server if xact_depth of its 
connection is more than one. So one session can drop the server even when its 
connection is open in other session if it's not used within the transaction 
(i.e., xact_depth == 0).

BTW, for now, if the connection is used within the transaction, other session 
cannot drop the corresponding server because the transaction holds the lock on 
the relations that depend on the server. Only the session running that 
transaction can drop the server. This can cause the issue in discussion.

So, my just idea is to disallow even that session running the transaction to drop 
the server. This means that no session can drop the server while its connection is 
used within the transaction (xact_depth > 0).



What if a user wants to drop a server from one
session, all other sessions one after the other keep having in-use
connections related to that server, (though this use case sounds
impractical) will the drop server ever be successful? Since we can
have hundreds of sessions in real world postgres environment, I don't
know if it's a good idea to create such dependency.

As you suggested, this point can be discussed in a separate thread and
if any of the approaches proposed by you above is finalized we can
extend postgres_fdw_get_connections anytime.

Thoughts?


I will consider more before starting separate discussion!




Attaching v16 patch set, addressing above review comments and also
added a test case suggested upthread that postgres_fdw_disconnect()
with existing server name returns false that is when the cache doesn't
have active connection.

Please review the v16 patch set further.


Thanks! Will review that later.


+   /*
+* For the given server, if we closed connection or it

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-21 Thread Fujii Masao





On 2021/01/22 1:17, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao  wrote:

My opinion is to check "!all", but if others prefer using such boolean flag,
I'd withdraw my opinion.


I'm really sorry, actually if (!all) is enough there, my earlier
understanding was wrong.


+   if ((all || entry->server_hashvalue == hashvalue) &&

What about making disconnect_cached_connections() accept serverid instead
of hashvalue, and perform the above comparison based on serverid? That is,
I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
of hashvalue.


That's a good idea. I missed this point. Thanks.


+   if ((all || entry->server_hashvalue == hashvalue) &&
+entry->conn)

I think that it's better to make the check of "entry->conn" independent
like other functions in postgres_fdw/connection.c. What about adding
the following check before the above?

 /* Ignore cache entry if no open connection right now */
 if (entry->conn == NULL)
 continue;


Done.


+   /*
+* If the server has been dropped in 
the current explicit
+* transaction, then this entry would 
have been invalidated
+* in pgfdw_inval_callback at the end 
of drop sever
+* command. Note that this connection 
would not have been
+* closed in pgfdw_inval_callback 
because it is still being
+* used in the current explicit 
transaction. So, assert
+* that here.
+*/
+   Assert(entry->invalidated);

As this comment explains, even when the connection is used in the transaction,
its server can be dropped in the same transaction. The connection can remain
until the end of transaction even though its server has been already dropped.
I'm now wondering if this behavior itself is problematic and should be forbid.
Of course, this is separate topic from this patch, though..

BTW, my just idea for that is;
1. change postgres_fdw_get_connections() return also serverid and xact_depth.
2. make postgres_fdw define the event trigger on DROP SERVER command so that
  an error is thrown if the connection to the server is still in use.
  The event trigger function uses postgres_fdw_get_connections() to check
  if the server connection is still in use or not.

I'm not sure if this just idea is really feasible or not, though...


I'm not quite sure if we can create such a dependency i.e. blocking
"drop foreign server" when at least one session has an in use cached
connection on it?


Maybe my explanation was not clear... I was thinking to prevent the server 
whose connection is used *within the current transaction* from being dropped. 
IOW, I was thinking to forbid the drop of server if xact_depth of its 
connection is more than one. So one session can drop the server even when its 
connection is open in other session if it's not used within the transaction 
(i.e., xact_depth == 0).

BTW, for now, if the connection is used within the transaction, other session 
cannot drop the corresponding server because the transaction holds the lock on 
the relations that depend on the server. Only the session running that 
transaction can drop the server. This can cause the issue in discussion.

So, my just idea is to disallow even that session running the transaction to drop 
the server. This means that no session can drop the server while its connection is 
used within the transaction (xact_depth > 0).



What if a user wants to drop a server from one
session, all other sessions one after the other keep having in-use
connections related to that server, (though this use case sounds
impractical) will the drop server ever be successful? Since we can
have hundreds of sessions in real world postgres environment, I don't
know if it's a good idea to create such dependency.

As you suggested, this point can be discussed in a separate thread and
if any of the approaches proposed by you above is finalized we can
extend postgres_fdw_get_connections anytime.

Thoughts?


I will consider more before starting separate discussion!




Attaching v16 patch set, addressing above review comments and also
added a test case suggested upthread that postgres_fdw_disconnect()
with existing server name returns false that is when the cache doesn't
have active connection.

Please review the v16 patch set further.


Thanks! Will review that later.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-21 Thread Bharath Rupireddy

On Thu, Jan 21, 2021 at 8:58 PM Fujii Masao  wrote:
> My opinion is to check "!all", but if others prefer using such boolean flag,
> I'd withdraw my opinion.

I'm really sorry, actually if (!all) is enough there, my earlier
understanding was wrong.

> +   if ((all || entry->server_hashvalue == hashvalue) &&
>
> What about making disconnect_cached_connections() accept serverid instead
> of hashvalue, and perform the above comparison based on serverid? That is,
> I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
> simplify postgres_fdw_disconnect() a bit more by getting rid of the 
> calculation
> of hashvalue.

That's a good idea. I missed this point. Thanks.

> +   if ((all || entry->server_hashvalue == hashvalue) &&
> +entry->conn)
>
> I think that it's better to make the check of "entry->conn" independent
> like other functions in postgres_fdw/connection.c. What about adding
> the following check before the above?
>
> /* Ignore cache entry if no open connection right now */
> if (entry->conn == NULL)
> continue;

Done.

> +   /*
> +* If the server has been dropped in 
> the current explicit
> +* transaction, then this entry would 
> have been invalidated
> +* in pgfdw_inval_callback at the end 
> of drop sever
> +* command. Note that this connection 
> would not have been
> +* closed in pgfdw_inval_callback 
> because it is still being
> +* used in the current explicit 
> transaction. So, assert
> +* that here.
> +*/
> +   Assert(entry->invalidated);
>
> As this comment explains, even when the connection is used in the transaction,
> its server can be dropped in the same transaction. The connection can remain
> until the end of transaction even though its server has been already dropped.
> I'm now wondering if this behavior itself is problematic and should be forbid.
> Of course, this is separate topic from this patch, though..
>
> BTW, my just idea for that is;
> 1. change postgres_fdw_get_connections() return also serverid and xact_depth.
> 2. make postgres_fdw define the event trigger on DROP SERVER command so that
>  an error is thrown if the connection to the server is still in use.
>  The event trigger function uses postgres_fdw_get_connections() to check
>  if the server connection is still in use or not.
>
> I'm not sure if this just idea is really feasible or not, though...

I'm not quite sure if we can create such a dependency i.e. blocking
"drop foreign server" when at least one session has an in use cached
connection on it? What if a user wants to drop a server from one
session, all other sessions one after the other keep having in-use
connections related to that server, (though this use case sounds
impractical) will the drop server ever be successful? Since we can
have hundreds of sessions in real world postgres environment, I don't
know if it's a good idea to create such dependency.

As you suggested, this point can be discussed in a separate thread and
if any of the approaches proposed by you above is finalized we can
extend postgres_fdw_get_connections anytime.

Thoughts?

Attaching v16 patch set, addressing above review comments and also
added a test case suggested upthread that postgres_fdw_disconnect()
with existing server name returns false that is when the cache doesn't
have active connection.

Please review the v16 patch set further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v16-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data

v16-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data

v16-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-21 Thread Fujii Masao





On 2021/01/21 16:16, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao
 wrote:

On 2021/01/21 14:46, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
 wrote:
   > >> +   if (entry->server_hashvalue == hashvalue &&

+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?


I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use.


No because we check the following condition before reaching that code. No?

+   if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.


I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?


Thanks for thinking this. But at least for me, "if (!all)" looks not so 
confusing.
And the comment seems to explain why we can end the scan.


May I know if it's okay to have the boolean can_terminate_scan as shown in [1]?


My opinion is to check "!all", but if others prefer using such boolean flag,
I'd withdraw my opinion.

+   if ((all || entry->server_hashvalue == hashvalue) &&

What about making disconnect_cached_connections() accept serverid instead
of hashvalue, and perform the above comparison based on serverid? That is,
I'm thinking "if (all || entry->serverid == serverid)". If we do that, we can
simplify postgres_fdw_disconnect() a bit more by getting rid of the calculation
of hashvalue.

+   if ((all || entry->server_hashvalue == hashvalue) &&
+entry->conn)

I think that it's better to make the check of "entry->conn" independent
like other functions in postgres_fdw/connection.c. What about adding
the following check before the above?

/* Ignore cache entry if no open connection right now */
if (entry->conn == NULL)
continue;

+   /*
+* If the server has been dropped in 
the current explicit
+* transaction, then this entry would 
have been invalidated
+* in pgfdw_inval_callback at the end 
of drop sever
+* command. Note that this connection 
would not have been
+* closed in pgfdw_inval_callback 
because it is still being
+* used in the current explicit 
transaction. So, assert
+* that here.
+*/
+   Assert(entry->invalidated);

As this comment explains, even when the connection is used in the transaction,
its server can be dropped in the same transaction. The connection can remain
until the end of transaction even though its server has been already dropped.
I'm now wondering if this behavior itself is problematic and should be forbid.
Of course, this is separate topic from this patch, though..

BTW, my just idea for that is;
1. change postgres_fdw_get_connections() return also serverid and xact_depth.
2. make postgres_fdw define the event trigger on DROP SERVER command so that
an error is thrown if the connection to the server is still in use.
The event trigger function uses postgres_fdw_get_connections() to check
if the server connection is still in use or not.

I'm not sure if this just idea is really feasible or not, though...

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Thu, Jan 21, 2021 at 12:17 PM Fujii Masao
 wrote:
> On 2021/01/21 14:46, Bharath Rupireddy wrote:
> > On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
> >  wrote:
> >   > >> +   if (entry->server_hashvalue == hashvalue &&
>  +   (entry->xact_depth > 0 || result))
>  +   {
>  +   hash_seq_term(&scan);
>  +   break;
> 
>  entry->server_hashvalue can be 0? If yes, since 
>  postgres_fdw_disconnect_all()
>  specifies 0 as hashvalue, ISTM that the above condition can be true
>  unexpectedly. Can we replace this condition with just "if (!all)"?
> >>>
> >>> I don't think so entry->server_hashvalue can be zero, because
> >>> GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> >>> as hash value. I have not seen someone comparing hashvalue with an
> >>> expectation that it has 0 value, for instance see if (hashvalue == 0
> >>> || riinfo->oidHashValue == hashvalue).
> >>>
> >>>Having if(!all) something like below there doesn't suffice because we
> >>> might call hash_seq_term, when some connection other than the given
> >>> foreign server connection is in use.
> >>
> >> No because we check the following condition before reaching that code. No?
> >>
> >> +   if ((all || entry->server_hashvalue == hashvalue) &&
> >>
> >>
> >> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> >> necessary because "result" is set to true when xact_depth <= 0 and that
> >> condition always indicates true.
> >
> > I think that condition is too confusing. How about having a boolean
> > can_terminate_scan like below?
>
> Thanks for thinking this. But at least for me, "if (!all)" looks not so 
> confusing.
> And the comment seems to explain why we can end the scan.

May I know if it's okay to have the boolean can_terminate_scan as shown in [1]?

[1] - 
https://www.postgresql.org/message-id/flat/CALj2ACVx0%2BiOsrAA-wXbo3RLAKqUoNvvEd7foJ0vLwOdu8XjXw%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/21 14:46, Bharath Rupireddy wrote:

On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
 wrote:
  > >> +   if (entry->server_hashvalue == hashvalue &&

+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?


I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

   Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use.


No because we check the following condition before reaching that code. No?

+   if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.


I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?


Thanks for thinking this. But at least for me, "if (!all)" looks not so 
confusing.
And the comment seems to explain why we can end the scan.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Hou, Zhijie

> > > Attaching v15 patch set. Please consider it for further review.
> >
> > Hi
> >
> > I have some comments for the 0001 patch
> >
> > In v15-0001-postgres_fdw-function-to-discard-cached-connecti
> >
> > 1.
> > +  If there is no open connection to the given foreign server,
> false
> > +  is returned. If no foreign server with the given name is found,
> > + an error
> >
> > Do you think it's better add some testcases about:
> > call postgres_fdw_disconnect and postgres_fdw_disconnect_all
> > when there is no open connection to the given foreign server
> 
> Do you mean a test case where foreign server exists but
> postgres_fdw_disconnect() returns false because there's no connection for
> that server?


Yes, I read this from the doc, so I think it's better to test this.




> > 2.
> > +   /*
> > +* For the given server, if we closed connection
> or it is still in
> > +* use, then no need of scanning the cache
> further.
> > +*/
> > +   if (entry->server_hashvalue == hashvalue &&
> > +   (entry->xact_depth > 0 || result))
> > +   {
> > +   hash_seq_term(&scan);
> > +   break;
> > +   }
> >
> > If I am not wrong, is the following condition always true ?
> > (entry->xact_depth > 0 || result)
> 
> It's not always true. But it seems like it's too confusing, please have
> a look at the upthread suggestion to change this with can_terminate_scan
> boolean.

Thanks for the remind, I will look at that.



Best regards,
houzj

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Thu, Jan 21, 2021 at 11:15 AM Hou, Zhijie  wrote:
>
> > Attaching v15 patch set. Please consider it for further review.
>
> Hi
>
> I have some comments for the 0001 patch
>
> In v15-0001-postgres_fdw-function-to-discard-cached-connecti
>
> 1.
> +  If there is no open connection to the given foreign server, 
> false
> +  is returned. If no foreign server with the given name is found, an 
> error
>
> Do you think it's better add some testcases about:
> call postgres_fdw_disconnect and postgres_fdw_disconnect_all when 
> there is no open connection to the given foreign server

Do you mean a test case where foreign server exists but
postgres_fdw_disconnect() returns false because there's no connection
for that server?

> 2.
> +   /*
> +* For the given server, if we closed connection or 
> it is still in
> +* use, then no need of scanning the cache further.
> +*/
> +   if (entry->server_hashvalue == hashvalue &&
> +   (entry->xact_depth > 0 || result))
> +   {
> +   hash_seq_term(&scan);
> +   break;
> +   }
>
> If I am not wrong, is the following condition always true ?
> (entry->xact_depth > 0 || result)

It's not always true. But it seems like it's too confusing, please
have a look at the upthread suggestion to change this with
can_terminate_scan boolean.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Thu, Jan 21, 2021 at 10:06 AM Fujii Masao
 wrote:
 > >> +   if (entry->server_hashvalue == hashvalue &&
> >> +   (entry->xact_depth > 0 || result))
> >> +   {
> >> +   hash_seq_term(&scan);
> >> +   break;
> >>
> >> entry->server_hashvalue can be 0? If yes, since 
> >> postgres_fdw_disconnect_all()
> >> specifies 0 as hashvalue, ISTM that the above condition can be true
> >> unexpectedly. Can we replace this condition with just "if (!all)"?
> >
> > I don't think so entry->server_hashvalue can be zero, because
> > GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
> > as hash value. I have not seen someone comparing hashvalue with an
> > expectation that it has 0 value, for instance see if (hashvalue == 0
> > || riinfo->oidHashValue == hashvalue).
> >
> >   Having if(!all) something like below there doesn't suffice because we
> > might call hash_seq_term, when some connection other than the given
> > foreign server connection is in use.
>
> No because we check the following condition before reaching that code. No?
>
> +   if ((all || entry->server_hashvalue == hashvalue) &&
>
>
> I was thinking that "(entry->xact_depth > 0 || result))" condition is not
> necessary because "result" is set to true when xact_depth <= 0 and that
> condition always indicates true.

I think that condition is too confusing. How about having a boolean
can_terminate_scan like below?

while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
{
boolcan_terminate_scan = false;

/*
 * Either disconnect given or all the active and not in use cached
 * connections.
 */
if ((all || entry->server_hashvalue == hashvalue) &&
 entry->conn)
{
/* We cannot close connection that's in use, so issue a warning. */
if (entry->xact_depth > 0)
{
ForeignServer *server;

if (!all)
can_terminate_scan = true;

server = GetForeignServerExtended(entry->serverid,
  FSV_MISSING_OK);

if (!server)
{
/*
 * If the server has been dropped in the current explicit
 * transaction, then this entry would have been invalidated
 * in pgfdw_inval_callback at the end of drop sever
 * command. Note that this connection would not have been
 * closed in pgfdw_inval_callback because it is still being
 * used in the current explicit transaction. So, assert
 * that here.
 */
Assert(entry->invalidated);

ereport(WARNING,
(errmsg("cannot close dropped server
connection because it is still in use")));
}
else
ereport(WARNING,
(errmsg("cannot close connection for
server \"%s\" because it is still in use",
 server->servername)));
}
else
{
elog(DEBUG3, "discarding connection %p", entry->conn);
disconnect_pg_server(entry);
result = true;

if (!all)
can_terminate_scan = true;
}

/*
 * For the given server, if we closed connection or it is still in
 * use, then no need of scanning the cache further.
 */
if (can_terminate_scan)
{
hash_seq_term(&scan);
break;
}
}
}

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-20 Thread Hou, Zhijie

> Attaching v15 patch set. Please consider it for further review.

Hi

I have some comments for the 0001 patch

In v15-0001-postgres_fdw-function-to-discard-cached-connecti

1.
+  If there is no open connection to the given foreign server, 
false
+  is returned. If no foreign server with the given name is found, an error

Do you think it's better add some testcases about:
call postgres_fdw_disconnect and postgres_fdw_disconnect_all when there 
is no open connection to the given foreign server

2.
+   /*
+* For the given server, if we closed connection or it 
is still in
+* use, then no need of scanning the cache further.
+*/
+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;
+   }

If I am not wrong, is the following condition always true ?
(entry->xact_depth > 0 || result)

Best regards,
houzj

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/21 12:00, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao  wrote:

+ * It checks if the cache has a connection for the given foreign server that's
+ * not being used within current transaction, then returns true. If the
+ * connection is in use, then it emits a warning and returns false.

The comment also should mention the case where no open connection
for the given server is found? What about rewriting this to the following?

-
If the cached connection for the given foreign server is found and has not
been used within current transaction yet, close the connection and return
true. Even when it's found, if it's already used, keep the connection, emit
a warning and return false. If it's not found, return false.
-


Done.


+ * It returns true, if it closes at least one connection, otherwise false.
+ *
+ * It returns false, if the cache doesn't exit.

The above second comment looks redundant.


Yes. "otherwise false" means it.


+   if (ConnectionHash)
+   result = disconnect_cached_connections(0, true);

Isn't it smarter to make disconnect_cached_connections() check
ConnectionHash and return false if it's NULL? If we do that, we can
simplify the code of postgres_fdw_disconnect() and _all().


Done.


+ * current transaction are disconnected. Otherwise, the unused entries with the
+ * given hashvalue are disconnected.

In the above second comment, a singular form should be used instead?
Because there must be no multiple entries with the given hashvalue.


Rephrased the function comment a bit. Mentioned the _disconnect and
_disconnect_all in comments because we have said enough what each of
those two functions do.

+/*
+ * Workhorse to disconnect cached connections.
+ *
+ * This function disconnects either all unused connections when called from
+ * postgres_fdw_disconnect_all or a given foreign server unused connection when
+ * called from postgres_fdw_disconnect.
+ *
+ * This function returns true if at least one connection is disconnected,
+ * otherwise false.
+ */


+   server = GetForeignServer(entry->serverid);

This seems to cause an error "cache lookup failed"
if postgres_fdw_disconnect_all() is called when there is
a connection in use but its server is dropped. To avoid this error,
GetForeignServerExtended() with FSV_MISSING_OK should be used
instead, like postgres_fdw_get_connections() does?


+1.  So, I changed it to GetForeignServerExtended, added an assertion
for invalidation  just like postgres_fdw_get_connections. I also added
a test case for this, we now emit a slightly different warning for
this case alone that is (errmsg("cannot close dropped server
connection because it is still in use")));. This warning looks okay as
we cannot show any other server name in the ouput and we know that
this rare case can exist when someone drops the server in an explicit
transaction.


+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?


I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

  Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use.


No because we check the following condition before reaching that code. No?

+   if ((all || entry->server_hashvalue == hashvalue) &&


I was thinking that "(entry->xact_depth > 0 || result))" condition is not
necessary because "result" is set to true when xact_depth <= 0 and that
condition always indicates true.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Wed, Jan 20, 2021 at 6:58 PM Fujii Masao  wrote:
> + * It checks if the cache has a connection for the given foreign server 
> that's
> + * not being used within current transaction, then returns true. If the
> + * connection is in use, then it emits a warning and returns false.
>
> The comment also should mention the case where no open connection
> for the given server is found? What about rewriting this to the following?
>
> -
> If the cached connection for the given foreign server is found and has not
> been used within current transaction yet, close the connection and return
> true. Even when it's found, if it's already used, keep the connection, emit
> a warning and return false. If it's not found, return false.
> -

Done.

> + * It returns true, if it closes at least one connection, otherwise false.
> + *
> + * It returns false, if the cache doesn't exit.
>
> The above second comment looks redundant.

Yes. "otherwise false" means it.

> +   if (ConnectionHash)
> +   result = disconnect_cached_connections(0, true);
>
> Isn't it smarter to make disconnect_cached_connections() check
> ConnectionHash and return false if it's NULL? If we do that, we can
> simplify the code of postgres_fdw_disconnect() and _all().

Done.

> + * current transaction are disconnected. Otherwise, the unused entries with 
> the
> + * given hashvalue are disconnected.
>
> In the above second comment, a singular form should be used instead?
> Because there must be no multiple entries with the given hashvalue.

Rephrased the function comment a bit. Mentioned the _disconnect and
_disconnect_all in comments because we have said enough what each of
those two functions do.

+/*
+ * Workhorse to disconnect cached connections.
+ *
+ * This function disconnects either all unused connections when called from
+ * postgres_fdw_disconnect_all or a given foreign server unused connection when
+ * called from postgres_fdw_disconnect.
+ *
+ * This function returns true if at least one connection is disconnected,
+ * otherwise false.
+ */

> +   server = GetForeignServer(entry->serverid);
>
> This seems to cause an error "cache lookup failed"
> if postgres_fdw_disconnect_all() is called when there is
> a connection in use but its server is dropped. To avoid this error,
> GetForeignServerExtended() with FSV_MISSING_OK should be used
> instead, like postgres_fdw_get_connections() does?

+1.  So, I changed it to GetForeignServerExtended, added an assertion
for invalidation  just like postgres_fdw_get_connections. I also added
a test case for this, we now emit a slightly different warning for
this case alone that is (errmsg("cannot close dropped server
connection because it is still in use")));. This warning looks okay as
we cannot show any other server name in the ouput and we know that
this rare case can exist when someone drops the server in an explicit
transaction.

> +   if (entry->server_hashvalue == hashvalue &&
> +   (entry->xact_depth > 0 || result))
> +   {
> +   hash_seq_term(&scan);
> +   break;
>
> entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
> specifies 0 as hashvalue, ISTM that the above condition can be true
> unexpectedly. Can we replace this condition with just "if (!all)"?

I don't think so entry->server_hashvalue can be zero, because
GetSysCacheHashValue1/CatalogCacheComputeHashValue will not return 0
as hash value. I have not seen someone comparing hashvalue with an
expectation that it has 0 value, for instance see if (hashvalue == 0
|| riinfo->oidHashValue == hashvalue).

 Having if(!all) something like below there doesn't suffice because we
might call hash_seq_term, when some connection other than the given
foreign server connection is in use. Our intention to call
hash_seq_term is only when a given server is found and either it's in
use or is closed.

 if (!all && (entry->xact_depth > 0 || result))
{
hash_seq_term(&scan);
break;
}

Given the above points, the existing check looks good to me.

> +-- Closes loopback connection, returns true and issues a warning as loopback2
> +-- connection is still in use and can not be closed.
> +SELECT * FROM postgres_fdw_disconnect_all();
> +WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
> + postgres_fdw_disconnect_all
> +-
> + t
> +(1 row)
>
> After the above test, isn't it better to call postgres_fdw_get_connections()
> to check that loopback is not output?

+1.

> +WARNING:  cannot close connection for server "loopback" because it is still 
> in use
> +WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
>
> Just in the case please let me confirm that the order of these warning
> me

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/20 19:17, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao  wrote:

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?


Thanks for explaining this! You're right. I'd withdraw my suggestion.


Attaching v14 patch set with review comments addressed. Please review
it further.


Thanks for updating the patch!

+ * It checks if the cache has a connection for the given foreign server that's
+ * not being used within current transaction, then returns true. If the
+ * connection is in use, then it emits a warning and returns false.

The comment also should mention the case where no open connection
for the given server is found? What about rewriting this to the following?

-
If the cached connection for the given foreign server is found and has not
been used within current transaction yet, close the connection and return
true. Even when it's found, if it's already used, keep the connection, emit
a warning and return false. If it's not found, return false.
-

+ * It returns true, if it closes at least one connection, otherwise false.
+ *
+ * It returns false, if the cache doesn't exit.

The above second comment looks redundant.

+   if (ConnectionHash)
+   result = disconnect_cached_connections(0, true);

Isn't it smarter to make disconnect_cached_connections() check
ConnectionHash and return false if it's NULL? If we do that, we can
simplify the code of postgres_fdw_disconnect() and _all().

+ * current transaction are disconnected. Otherwise, the unused entries with the
+ * given hashvalue are disconnected.

In the above second comment, a singular form should be used instead?
Because there must be no multiple entries with the given hashvalue.

+   server = GetForeignServer(entry->serverid);

This seems to cause an error "cache lookup failed"
if postgres_fdw_disconnect_all() is called when there is
a connection in use but its server is dropped. To avoid this error,
GetForeignServerExtended() with FSV_MISSING_OK should be used
instead, like postgres_fdw_get_connections() does?

+   if (entry->server_hashvalue == hashvalue &&
+   (entry->xact_depth > 0 || result))
+   {
+   hash_seq_term(&scan);
+   break;

entry->server_hashvalue can be 0? If yes, since postgres_fdw_disconnect_all()
specifies 0 as hashvalue, ISTM that the above condition can be true
unexpectedly. Can we replace this condition with just "if (!all)"?

+-- Closes loopback connection, returns true and issues a warning as loopback2
+-- connection is still in use and can not be closed.
+SELECT * FROM postgres_fdw_disconnect_all();
+WARNING:  cannot close connection for server "loopback2" because it is still 
in use
+ postgres_fdw_disconnect_all
+-
+ t
+(1 row)

After the above test, isn't it better to call postgres_fdw_get_connections()
to check that loopback is not output?

+WARNING:  cannot close connection for server "loopback" because it is still in 
use
+WARNING:  cannot close connection for server "loopback2" because it is still 
in use

Just in the case please let me confirm that the order of these warning
messages is always stable?

+   
+postgres_fdw_disconnect(IN servername text) returns 
boolean

I think that "IN" of "IN servername text" is not necessary.

I'd like to replace "servername" with "server_name" because
postgres_fdw_get_connections() uses "server_name" as the output
column name.

+
+ 
+  When called in local session with foreign server name as input, it
+  discards the unused open connection previously made to the foreign server
+  and returns true.

"unused open connection" sounds confusing to me. What about the following?

-
This function discards the open connection that postgres_fdw established
from the local session to the foriegn server with the given name if it's not
used in the current local transaction yet, and then returns true. If it's
already used, the function doesn't discard the connection, emits
a warning and then returns false. If there is no open connection to
the given foreign server, false is returned. If no foreign server with
the given name is found, an error is emitted. Example usage of the function:
-

+postgres=# SELECT * FROM postgres_fdw_disconnect('loopback1');

"SELECT postgres_fdw_disconnect('loopback1')" is more common?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Wed, Jan 20, 2021 at 3:24 PM Fujii Masao  wrote:
> > Keeping above in mind, I feel we can do hash_seq_search(), as we do
> > currently, even when the server name is given as input. This way, we
> > don't need to bother much on the above points.
> >
> > Thoughts?
>
> Thanks for explaining this! You're right. I'd withdraw my suggestion.

Attaching v14 patch set with review comments addressed. Please review
it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com


v14-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data


v14-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data


v14-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/20 17:41, Bharath Rupireddy wrote:

On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
 wrote:

So, furthermore, we can use hash_search() to find the target cached
connection, instead of using hash_seq_search(), when the server name
is given. This would simplify the code a bit more? Of course,
hash_seq_search() is necessary when closing all the connections, though.


Note that the cache entry key is user mapping oid and to use
hash_search() we need the user mapping oid. But in
postgres_fdw_disconnect we can get server oid and we can also get user
mapping id using GetUserMapping, but it requires
GetUserId()/CurrentUserId as an input, I doubt we will have problems
if CurrentUserId is changed somehow with the change of current user in
the session. And user mapping may be dropped but still the connection
can exist if it's in use, in that case GetUserMapping fails in cache
lookup.

And yes, disconnecting all connections requires hash_seq_search().

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?


Thanks for explaining this! You're right. I'd withdraw my suggestion.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

On Wed, Jan 20, 2021 at 11:53 AM Fujii Masao
 wrote:
> So, furthermore, we can use hash_search() to find the target cached
> connection, instead of using hash_seq_search(), when the server name
> is given. This would simplify the code a bit more? Of course,
> hash_seq_search() is necessary when closing all the connections, though.

Note that the cache entry key is user mapping oid and to use
hash_search() we need the user mapping oid. But in
postgres_fdw_disconnect we can get server oid and we can also get user
mapping id using GetUserMapping, but it requires
GetUserId()/CurrentUserId as an input, I doubt we will have problems
if CurrentUserId is changed somehow with the change of current user in
the session. And user mapping may be dropped but still the connection
can exist if it's in use, in that case GetUserMapping fails in cache
lookup.

And yes, disconnecting all connections requires hash_seq_search().

Keeping above in mind, I feel we can do hash_seq_search(), as we do
currently, even when the server name is given as input. This way, we
don't need to bother much on the above points.

Thoughts?

> + * 2) If no input argument is provided, then it tries to disconnect all 
> the
> + *connections.
>
> I'm concerned that users can easily forget to specify the argument and
> accidentally discard all the connections. So, IMO, to alleviate this 
> situation,
> what about changing the function name (only when closing all the connections)
> to something postgres_fdw_disconnect_all(), like we have
> pg_advisory_unlock_all() against pg_advisory_unlock()?

+1. We will have two functions postgres_fdw_disconnect(server name),
postgres_fdw_disconnect_all.

> +   if (result)
> +   {
> +   /* We closed at least one connection, others 
> are in use. */
> +   ereport(WARNING,
> +   (errmsg("cannot close all 
> connections because some of them are still in use")));
> +   }
>
> Sorry if this was already discussed upthread. Isn't it more helpful to
> emit a warning for every connections that fail to be closed? For example,
>
> WARNING:  cannot close connection for server "loopback1" because it is still 
> in use
> WARNING:  cannot close connection for server "loopback2" because it is still 
> in use
> WARNING:  cannot close connection for server "loopback3" because it is still 
> in use
> ...
>
> This enables us to identify easily which server connections cannot be
> closed for now.

+1. Looks like pg_advisory_unlock is doing that. Given the fact that
still in use connections are possible only in explicit txns, we might
not have many still in use connections in the real world use case, so
I'm okay to change that way.

I will address all these comments and post an updated patch set soon.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-19 Thread Fujii Masao





On 2021/01/19 12:09, Bharath Rupireddy wrote:

On Mon, Jan 18, 2021 at 9:11 PM Fujii Masao  wrote:

Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
function, 0002 is for keep_connections GUC and 0003 is for
keep_connection server level option.


Thanks!



Please review it further.


+   server = GetForeignServerByName(servername, true);
+
+   if (!server)
+   ereport(ERROR,
+   
(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+errmsg("foreign server \"%s\" does not 
exist", servername)));

ISTM we can simplify this code as follows.

  server = GetForeignServerByName(servername, false);


Done.


+   hash_seq_init(&scan, ConnectionHash);
+   while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))

When the server name is specified, even if its connection is successfully
closed, postgres_fdw_disconnect() scans through all the entries to check
whether there are active connections. But if "result" is true and
active_conn_exists is true, we can get out of this loop to avoid unnecessary
scans.


My initial thought was that it's possible to have two entries with the
same foreign server name but with different user mappings, looks like
it's not possible. I tried associating a foreign server with two
different user mappings [1], then the cache entry is getting
associated initially with the user mapping that comes first in the
pg_user_mappings, if this user mapping is dropped then the cache entry
gets invalidated, so next time the second user mapping is used.

Since there's no way we can have two cache entries with the same
foreign server name, we can get out of the loop when we find the cache
entry match with the given server. I made the changes.


So, furthermore, we can use hash_search() to find the target cached
connection, instead of using hash_seq_search(), when the server name
is given. This would simplify the code a bit more? Of course,
hash_seq_search() is necessary when closing all the connections, though.




[1]
postgres=# select * from pg_user_mappings ;
  umid  | srvid |  srvname  | umuser | usename | umoptions
---+---+---++-+---
  16395 | 16394 | loopback1 | 10 | bharath |-> cache entry
is initially made with this user mapping.
  16399 | 16394 | loopback1 |  0 | public  |   -> if the
above user mapping is dropped, then the cache entry is made with this
user mapping.


+   /*
+* Destroy the cache if we discarded all active connections i.e. if 
there
+* is no single active connection, which we can know while scanning the
+* cached entries in the above loop. Destroying the cache is better 
than to
+* keep it in the memory with all inactive entries in it to save some
+* memory. Cache can get initialized on the subsequent queries to 
foreign
+* server.

How much memory is assumed to be saved by destroying the cache in
many cases? I'm not sure if it's really worth destroying the cache to save
the memory.


I removed the cache destroying code, if somebody complains in
future(after the feature commit), we can really revisit then.


+  a warning is issued and false is returned. 
false
+  is returned when there are no open connections. When there are some open
+  connections, but there is no connection for the given foreign server,
+  then false is returned. When no foreign server exists
+  with the given name, an error is emitted. Example usage of the function:

When a non-existent server name is specified, postgres_fdw_disconnect()
emits an error if there is at least one open connection, but just returns
false otherwise. At least for me, this behavior looks inconsitent and strange.
In that case, IMO the function always should emit an error.


Done.

Attaching v13 patch set, please review it further.


Thanks!

+ * 2) If no input argument is provided, then it tries to disconnect all the
+ *connections.

I'm concerned that users can easily forget to specify the argument and
accidentally discard all the connections. So, IMO, to alleviate this situation,
what about changing the function name (only when closing all the connections)
to something postgres_fdw_disconnect_all(), like we have
pg_advisory_unlock_all() against pg_advisory_unlock()?

+   if (result)
+   {
+   /* We closed at least one connection, others 
are in use. */
+   ereport(WARNING,
+   (errmsg("cannot close all 
connections because some of them are still in use")));
+   }

Sorry if this was already discussed upthread. Isn't it more helpful to
emit a warning for every connections that fail to be closed? For example,

WARNING:  cannot close connection for server "loopback1" because it is still in

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/19 9:53, Hou, Zhijie wrote:

+1 to add it after "dropped (Note )", how about as follows
with slight changes?

dropped (Note that server name of an invalid connection can be NULL
if the server is dropped), and then such .


Yes, I like this one. One question is; "should" or "is" is better
than "can" in this case because the server name of invalid connection
is always NULL when its server is dropped?


I think "dropped (Note that server name of an invalid connection will
be NULL if the server is dropped), and then such ."


Sounds good to me. So patch attached.


+1


Thanks! I pushed the patch.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-18 Thread Bharath Rupireddy

On Mon, Jan 18, 2021 at 9:11 PM Fujii Masao  wrote:
> > Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
> > function, 0002 is for keep_connections GUC and 0003 is for
> > keep_connection server level option.
>
> Thanks!
>
> >
> > Please review it further.
>
> +   server = GetForeignServerByName(servername, true);
> +
> +   if (!server)
> +   ereport(ERROR,
> +   
> (errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
> +errmsg("foreign server \"%s\" does 
> not exist", servername)));
>
> ISTM we can simplify this code as follows.
>
>  server = GetForeignServerByName(servername, false);

Done.

> +   hash_seq_init(&scan, ConnectionHash);
> +   while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))
>
> When the server name is specified, even if its connection is successfully
> closed, postgres_fdw_disconnect() scans through all the entries to check
> whether there are active connections. But if "result" is true and
> active_conn_exists is true, we can get out of this loop to avoid unnecessary
> scans.

My initial thought was that it's possible to have two entries with the
same foreign server name but with different user mappings, looks like
it's not possible. I tried associating a foreign server with two
different user mappings [1], then the cache entry is getting
associated initially with the user mapping that comes first in the
pg_user_mappings, if this user mapping is dropped then the cache entry
gets invalidated, so next time the second user mapping is used.

Since there's no way we can have two cache entries with the same
foreign server name, we can get out of the loop when we find the cache
entry match with the given server. I made the changes.

[1]
postgres=# select * from pg_user_mappings ;
 umid  | srvid |  srvname  | umuser | usename | umoptions
---+---+---++-+---
 16395 | 16394 | loopback1 | 10 | bharath |-> cache entry
is initially made with this user mapping.
 16399 | 16394 | loopback1 |  0 | public  |   -> if the
above user mapping is dropped, then the cache entry is made with this
user mapping.

> +   /*
> +* Destroy the cache if we discarded all active connections i.e. if 
> there
> +* is no single active connection, which we can know while scanning 
> the
> +* cached entries in the above loop. Destroying the cache is better 
> than to
> +* keep it in the memory with all inactive entries in it to save some
> +* memory. Cache can get initialized on the subsequent queries to 
> foreign
> +* server.
>
> How much memory is assumed to be saved by destroying the cache in
> many cases? I'm not sure if it's really worth destroying the cache to save
> the memory.

I removed the cache destroying code, if somebody complains in
future(after the feature commit), we can really revisit then.

> +  a warning is issued and false is returned. 
> false
> +  is returned when there are no open connections. When there are some 
> open
> +  connections, but there is no connection for the given foreign server,
> +  then false is returned. When no foreign server 
> exists
> +  with the given name, an error is emitted. Example usage of the 
> function:
>
> When a non-existent server name is specified, postgres_fdw_disconnect()
> emits an error if there is at least one open connection, but just returns
> false otherwise. At least for me, this behavior looks inconsitent and strange.
> In that case, IMO the function always should emit an error.

Done.

Attaching v13 patch set, please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com
From 0731c6244ac228818916d62cc51ea1434178c5be Mon Sep 17 00:00:00 2001
From: Bharath Rupireddy 
Date: Tue, 19 Jan 2021 08:23:55 +0530
Subject: [PATCH v13] postgres_fdw function to discard cached connections

This patch introduces a new function postgres_fdw_disconnect().
When called with a foreign server name, it discards the associated
connection with the server. When called without any argument, it
discards all the existing cached connections.
---
 contrib/postgres_fdw/connection.c | 147 ++
 .../postgres_fdw/expected/postgres_fdw.out|  93 +++
 .../postgres_fdw/postgres_fdw--1.0--1.1.sql   |  10 ++
 contrib/postgres_fdw/sql/postgres_fdw.sql |  35 +
 doc/src/sgml/postgres-fdw.sgml|  59 +++
 5 files changed, 344 insertions(+)

diff --git a/contrib/postgres_fdw/connection.c b/contrib/postgres_fdw/connection.c
index a1404cb6bb..287a047c80 100644
--- a/contrib/postgres_fdw/connection.c
+++ b/contrib/postgres_fdw/connection.c
@@ -80,6 +80,7 @@ static bool xact_got_connection = false;
  * SQL functions
  */
 PG_FUNCTION_INFO_V1(postgres_fdw_get_connections);
+PG_FUNCTION_INFO_V1(postgres_fdw_disconnect);
 
 /* pr

RE: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-18 Thread Hou, Zhijie

> >>> +1 to add it after "dropped (Note )", how about as follows
> >>> with slight changes?
> >>>
> >>> dropped (Note that server name of an invalid connection can be NULL
> >>> if the server is dropped), and then such .
> >>
> >> Yes, I like this one. One question is; "should" or "is" is better
> >> than "can" in this case because the server name of invalid connection
> >> is always NULL when its server is dropped?
> >
> > I think "dropped (Note that server name of an invalid connection will
> > be NULL if the server is dropped), and then such ."
> 
> Sounds good to me. So patch attached.

+1

Best regards,
houzj

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit




On 2021/01/18 22:03, Bharath Rupireddy wrote:

On Mon, Jan 18, 2021 at 6:17 PM Fujii Masao  wrote:

+1 to add it after "dropped (Note )", how about as follows
with slight changes?

dropped (Note that server name of an invalid connection can be NULL if
the server is dropped), and then such .


Yes, I like this one. One question is; "should" or "is" is better than
"can" in this case because the server name of invalid connection is
always NULL when its server is dropped?


I think "dropped (Note that server name of an invalid connection will
be NULL if the server is dropped), and then such ."


Sounds good to me. So patch attached.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION
From cbe84856899f23a810fe4d42cf1cd5e46b3d6870 Mon Sep 17 00:00:00 2001
From: Fujii Masao 
Date: Tue, 19 Jan 2021 00:56:10 +0900
Subject: [PATCH] doc: Add note about the server name of
 postgres_fdw_get_connections() returns.

Previously the document didn't mention the case where
postgres_fdw_get_connections() returns NULL in server_name column.
Users might be confused about why NULL was returned.

This commit adds the note that, in postgres_fdw_get_connections(),
the server name of an invalid connection will be NULL if the server is dropped.

Suggested-by: Zhijie Hou
Author: Bharath Rupireddy
Reviewed-by: Fujii Masao
Discussion: 
https://postgr.es/m/e7ddd14e96444fce88e47a709c196537@G08CNEXMBPEKD05.g08.fujitsu.local
---
 doc/src/sgml/postgres-fdw.sgml | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 6a91926da8..9adc8d12a9 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -493,7 +493,9 @@ OPTIONS (ADD password_required 'false');
   each connection is valid or not. false is returned
   if the foreign server connection is used in the current local
   transaction but its foreign server or user mapping is changed or
-  dropped, and then such invalid connection will be closed at
+  dropped (Note that server name of an invalid connection will be
+  NULL if the server is dropped),
+  and then such invalid connection will be closed at
   the end of that transaction. true is returned
   otherwise. If there are no open connections, no record is returned.
   Example usage of the function:
-- 
2.27.0

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit





On 2021/01/18 23:14, Bharath Rupireddy wrote:

On Mon, Jan 18, 2021 at 11:44 AM Fujii Masao
 wrote:

I will post patches for the other function postgres_fdw_disconnect,
GUC and server level option later.


Thanks!


Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
function, 0002 is for keep_connections GUC and 0003 is for
keep_connection server level option.


Thanks!



Please review it further.


+   server = GetForeignServerByName(servername, true);
+
+   if (!server)
+   ereport(ERROR,
+   
(errcode(ERRCODE_CONNECTION_DOES_NOT_EXIST),
+errmsg("foreign server \"%s\" does not 
exist", servername)));

ISTM we can simplify this code as follows.

server = GetForeignServerByName(servername, false);


+   hash_seq_init(&scan, ConnectionHash);
+   while ((entry = (ConnCacheEntry *) hash_seq_search(&scan)))

When the server name is specified, even if its connection is successfully
closed, postgres_fdw_disconnect() scans through all the entries to check
whether there are active connections. But if "result" is true and
active_conn_exists is true, we can get out of this loop to avoid unnecessary
scans.


+   /*
+* Destroy the cache if we discarded all active connections i.e. if 
there
+* is no single active connection, which we can know while scanning the
+* cached entries in the above loop. Destroying the cache is better 
than to
+* keep it in the memory with all inactive entries in it to save some
+* memory. Cache can get initialized on the subsequent queries to 
foreign
+* server.

How much memory is assumed to be saved by destroying the cache in
many cases? I'm not sure if it's really worth destroying the cache to save
the memory.


+  a warning is issued and false is returned. 
false
+  is returned when there are no open connections. When there are some open
+  connections, but there is no connection for the given foreign server,
+  then false is returned. When no foreign server exists
+  with the given name, an error is emitted. Example usage of the function:

When a non-existent server name is specified, postgres_fdw_disconnect()
emits an error if there is at least one open connection, but just returns
false otherwise. At least for me, this behavior looks inconsitent and strange.
In that case, IMO the function always should emit an error.

Regards,


--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-18 Thread Bharath Rupireddy

On Mon, Jan 18, 2021 at 11:44 AM Fujii Masao
 wrote:
> > I will post patches for the other function postgres_fdw_disconnect,
> > GUC and server level option later.
>
> Thanks!

Attaching v12 patch set. 0001 is for postgres_fdw_disconnect()
function, 0002 is for keep_connections GUC and 0003 is for
keep_connection server level option.

Please review it further.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

v12-0001-postgres_fdw-function-to-discard-cached-connecti.patch
Description: Binary data

v12-0002-postgres_fdw-add-keep_connections-GUC-to-not-cac.patch
Description: Binary data

v12-0003-postgres_fdw-server-level-option-keep_connection.patch
Description: Binary data

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

2021-01-18 Thread Bharath Rupireddy

On Mon, Jan 18, 2021 at 6:17 PM Fujii Masao  wrote:
> > +1 to add it after "dropped (Note )", how about as follows
> > with slight changes?
> >
> > dropped (Note that server name of an invalid connection can be NULL if
> > the server is dropped), and then such .
>
> Yes, I like this one. One question is; "should" or "is" is better than
> "can" in this case because the server name of invalid connection is
> always NULL when its server is dropped?

I think "dropped (Note that server name of an invalid connection will
be NULL if the server is dropped), and then such ."

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit