Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Bharath Rupireddy Fri, 11 Dec 2020 10:02:53 -0800

On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao
<masao.fu...@oss.nttdata.com> wrote:
> On 2020/12/11 19:16, Bharath Rupireddy wrote:
> > On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy
> > <bharath.rupireddyforpostg...@gmail.com> wrote:
> >>> +                       /* We only look for active and open remote 
> >>> connections. */
> >>> +                       if (entry->invalidated || !entry->conn)
> >>> +                               continue;
> >>>
> >>> We should return even invalidated entry because it has still cached 
> >>> connection?
> >>>
> >>
> >> I checked this point earlier, for invalidated connections, the tuple
> >> returned from the cache is also invalid and the following error will
> >> be thrown. So, we can not get the server name for that user mapping.
> >> Cache entries too would have been invalidated after the connection is
> >> marked as invalid in pgfdw_inval_callback().
> >>
> >> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key));
> >> if (!HeapTupleIsValid(umaptup))
> >>        elog(ERROR, "cache lookup failed for user mapping with OID %u",
> >> entry->key);
> >>
> >
> > I further checked on returning invalidated connections in the output
> > of the function. Actually, the reason I'm seeing a null tuple from sys
> > cache (and hence the error "cache lookup failed for user mapping with
> > OID xxxx") for an invalidated connection is that the user mapping
> > (with OID entry->key that exists in the cache) is getting dropped, so
> > the sys cache returns null tuple. The use case is as follows:
> >
> > 1) Create a server, role, and user mapping of the role with the server
> > 2) Run a foreign table query, so that the connection related to the
> > server gets cached
> > 3) Issue DROP OWNED BY for the role created, since the user mapping is
> > dependent on that role, it gets dropped from the catalogue table and
> > an invalidation message will be pending to clear the sys cache
> > associated with that user mapping.
> > 4) Now, if I do select * from postgres_fdw_get_connections() or for
> > that matter any query, at the beginning the txn
> > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
> > gets called and marks the cached entry as invalidated. Remember the
> > reason for this invalidation message is that the user mapping with the
> > OID entry->key is dropped from 3). Later in
> > postgres_fdw_get_connections(), when we search the sys cache with
> > entry->key for that invalidated connection, since the user mapping is
> > dropped from the system, null tuple is returned.
>
> Thanks for the analysis! This means that the cached connection invalidated by 
> drop of server or user mapping will not be closed even by the subsequent 
> access to the foreign server and will remain until the backend exits. Right?


It will be first marked as invalidated via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(),
and on the next use of that connection invalidated connections are
disconnected and reconnected.

    if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0)
    {
        elog(DEBUG3, "closing connection %p for option changes to take effect",
             entry->conn);
        disconnect_pg_server(entry);
    }

> If so, this seems like a connection-leak bug, at least for me.... Thought?
>

It's not a leak. The comment before pgfdw_inval_callback() [1]
explains why we can not immediately close/disconnect the connections
in pgfdw_inval_callback() after marking them as invalidated.

Here is the scenario how in the midst of a txn we get invalidation
messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback()
happens):

1) select from foreign table with server1, usermapping1 in session1
2) begin a top txn in session1, run a few foreign queries that open up
sub txns internally. meanwhile alter/drop server1/usermapping1 in
session2, then at each start of sub txn also we get to process the
invalidation messages via
AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback().
So, if we disconnect right after marking invalidated in
pgfdw_inval_callback, that's a problem since we are in a sub txn under
a top txn.

I don't think we can do something here and disconnect the connections
right after the invalidation happens. Thoughts?


[1]
/*
 * Connection invalidation callback function
 *
 * After a change to a pg_foreign_server or pg_user_mapping catalog entry,
 * mark connections depending on that entry as needing to be remade.
 * We can't immediately destroy them, since they might be in the midst of
 * a transaction, but we'll remake them at the next opportunity.
 *
 * Although most cache invalidation callbacks blow away all the related stuff
 * regardless of the given hashvalue, connections are expensive enough that
 * it's worth trying to avoid that.
 *
 * NB: We could avoid unnecessary disconnection more strictly by examining
 * individual option values, but it seems too much effort for the gain.
 */
static void
pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue)

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

Re: [PATCH] postgres_fdw connection caching - cause remote sessions linger till the local session exit

Reply via email to