On Fri, Dec 11, 2020 at 11:01 PM Fujii Masao <masao.fu...@oss.nttdata.com> wrote: > On 2020/12/11 19:16, Bharath Rupireddy wrote: > > On Thu, Dec 10, 2020 at 7:14 AM Bharath Rupireddy > > <bharath.rupireddyforpostg...@gmail.com> wrote: > >>> + /* We only look for active and open remote > >>> connections. */ > >>> + if (entry->invalidated || !entry->conn) > >>> + continue; > >>> > >>> We should return even invalidated entry because it has still cached > >>> connection? > >>> > >> > >> I checked this point earlier, for invalidated connections, the tuple > >> returned from the cache is also invalid and the following error will > >> be thrown. So, we can not get the server name for that user mapping. > >> Cache entries too would have been invalidated after the connection is > >> marked as invalid in pgfdw_inval_callback(). > >> > >> umaptup = SearchSysCache1(USERMAPPINGOID, ObjectIdGetDatum(entry->key)); > >> if (!HeapTupleIsValid(umaptup)) > >> elog(ERROR, "cache lookup failed for user mapping with OID %u", > >> entry->key); > >> > > > > I further checked on returning invalidated connections in the output > > of the function. Actually, the reason I'm seeing a null tuple from sys > > cache (and hence the error "cache lookup failed for user mapping with > > OID xxxx") for an invalidated connection is that the user mapping > > (with OID entry->key that exists in the cache) is getting dropped, so > > the sys cache returns null tuple. The use case is as follows: > > > > 1) Create a server, role, and user mapping of the role with the server > > 2) Run a foreign table query, so that the connection related to the > > server gets cached > > 3) Issue DROP OWNED BY for the role created, since the user mapping is > > dependent on that role, it gets dropped from the catalogue table and > > an invalidation message will be pending to clear the sys cache > > associated with that user mapping. > > 4) Now, if I do select * from postgres_fdw_get_connections() or for > > that matter any query, at the beginning the txn > > AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback() > > gets called and marks the cached entry as invalidated. Remember the > > reason for this invalidation message is that the user mapping with the > > OID entry->key is dropped from 3). Later in > > postgres_fdw_get_connections(), when we search the sys cache with > > entry->key for that invalidated connection, since the user mapping is > > dropped from the system, null tuple is returned. > > Thanks for the analysis! This means that the cached connection invalidated by > drop of server or user mapping will not be closed even by the subsequent > access to the foreign server and will remain until the backend exits. Right?
It will be first marked as invalidated via AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(), and on the next use of that connection invalidated connections are disconnected and reconnected. if (entry->conn != NULL && entry->invalidated && entry->xact_depth == 0) { elog(DEBUG3, "closing connection %p for option changes to take effect", entry->conn); disconnect_pg_server(entry); } > If so, this seems like a connection-leak bug, at least for me.... Thought? > It's not a leak. The comment before pgfdw_inval_callback() [1] explains why we can not immediately close/disconnect the connections in pgfdw_inval_callback() after marking them as invalidated. Here is the scenario how in the midst of a txn we get invalidation messages(AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback() happens): 1) select from foreign table with server1, usermapping1 in session1 2) begin a top txn in session1, run a few foreign queries that open up sub txns internally. meanwhile alter/drop server1/usermapping1 in session2, then at each start of sub txn also we get to process the invalidation messages via AtStart_Cache()-->AcceptInvalidationMessages()-->pgfdw_inval_callback(). So, if we disconnect right after marking invalidated in pgfdw_inval_callback, that's a problem since we are in a sub txn under a top txn. I don't think we can do something here and disconnect the connections right after the invalidation happens. Thoughts? [1] /* * Connection invalidation callback function * * After a change to a pg_foreign_server or pg_user_mapping catalog entry, * mark connections depending on that entry as needing to be remade. * We can't immediately destroy them, since they might be in the midst of * a transaction, but we'll remake them at the next opportunity. * * Although most cache invalidation callbacks blow away all the related stuff * regardless of the given hashvalue, connections are expensive enough that * it's worth trying to avoid that. * * NB: We could avoid unnecessary disconnection more strictly by examining * individual option values, but it seems too much effort for the gain. */ static void pgfdw_inval_callback(Datum arg, int cacheid, uint32 hashvalue) With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com