Re: [Proposal] Add foreign-server health checks infrastructure

Önder Kalacı Mon, 03 Oct 2022 07:19:35 -0700

Hi Hayato Kuroda,



> If the checking function is called not periodically but GetConnection(),
> it means that the health of foreign servers will be check only when remote
> connections are used.
> So following workload described in [1] cannot handle the issue.
>
> BEGIN --- remote operations--- local operations --- COMMIT
>
>
As far as I can see this patch is mostly useful for detecting the failures
on the initial remote command. This is especially common when the remote
server does a failover/switchover and postgres_fdw uses a cached connection
to access to the remote server.

The remote server failures within a transaction block sounds like a much
less common problem. Still, you can give a good error message before COMMIT
is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure
if it is worthwhile to fix such a problem.


> > Can we have this function/logic on Postgres core, so that other
> extensions
> > can also use?
>
> I was not sure about any use-case, but I think it can because it does
> quite general things.
> Is there any good motivation to do that?
>

I think any extension that deals with multiple Postgres nodes can benefit
from such logic. In fact, the reason I realized this patch is that on the
Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets
can benefit from such a function.


> > Do you see any performance implications of creating/freeing waitEventSets
> > per check? I wonder if we can somehow re-use the same waitEventSet by
> > modifyWaitEvent? I guess no, but still, if this check causes a
> performance
> > implication, can we somehow cache 1 waitEventSet per connection?
>
> I have not tested yet, but I agreed this will be caused performance
> decrease.
> In next version first I will re-use the event set anyway, and it must be
> considered later.
> Actually I'm not sure your suggestion,
> but you mean to say that we can add a hash table that associates  PGconn
> and WaitEventSet,  right?
>
>
I think it also depends on where you decide to put
pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API
for pgfdw_connection_check_internal() -- or equivalent function -- could be
discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
else use what is passed to the function? Not sure, maybe you can come up
with a better API.

Thanks,
Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci ·
Pull Request #6259 · citusdata/citus (github.com)
<https://github.com/citusdata/citus/pull/6259>
[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by
marcocitus · Pull Request #6283 · citusdata/citus (github.com)
<https://github.com/citusdata/citus/pull/6283>

Re: [Proposal] Add foreign-server health checks infrastructure

Reply via email to