Hi Hayato Kuroda,
> If the checking function is called not periodically but GetConnection(), > it means that the health of foreign servers will be check only when remote > connections are used. > So following workload described in [1] cannot handle the issue. > > BEGIN --- remote operations--- local operations --- COMMIT > > As far as I can see this patch is mostly useful for detecting the failures on the initial remote command. This is especially common when the remote server does a failover/switchover and postgres_fdw uses a cached connection to access to the remote server. The remote server failures within a transaction block sounds like a much less common problem. Still, you can give a good error message before COMMIT is sent to the remote server. I agree that this doesn't solve the issue you described, but I'm not sure if it is worthwhile to fix such a problem. > > Can we have this function/logic on Postgres core, so that other > extensions > > can also use? > > I was not sure about any use-case, but I think it can because it does > quite general things. > Is there any good motivation to do that? > I think any extension that deals with multiple Postgres nodes can benefit from such logic. In fact, the reason I realized this patch is that on the Citus extension, we are trying to solve a similar problem [1], [2]. Thinking even more, I think any extension that uses libpq and WaitEventSets can benefit from such a function. > > Do you see any performance implications of creating/freeing waitEventSets > > per check? I wonder if we can somehow re-use the same waitEventSet by > > modifyWaitEvent? I guess no, but still, if this check causes a > performance > > implication, can we somehow cache 1 waitEventSet per connection? > > I have not tested yet, but I agreed this will be caused performance > decrease. > In next version first I will re-use the event set anyway, and it must be > considered later. > Actually I'm not sure your suggestion, > but you mean to say that we can add a hash table that associates PGconn > and WaitEventSet, right? > > I think it also depends on where you decide to put pgfdw_connection_check_internal(). If you prefer the postgres_fdw side, could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c? But if you decide to put it into the Postgres side, the API for pgfdw_connection_check_internal() -- or equivalent function -- could be discussed. Do we pass a WaitEventSet and if it is NULL create a new one, else use what is passed to the function? Not sure, maybe you can come up with a better API. Thanks, Onder KALACI [1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci · Pull Request #6259 · citusdata/citus (github.com) <https://github.com/citusdata/citus/pull/6259> [2]: Add a single connection retry in MULTI_CONNECTION_LOST state by marcocitus · Pull Request #6283 · citusdata/citus (github.com) <https://github.com/citusdata/citus/pull/6283>