RE: [Proposal] Add foreign-server health checks infrastructure
Dear Zhihong, Thank you for reviewing! And I have to apologize for the late. I'll post new patchset later. > + UnregisterAllCheckingRemoteServersCallback(); > > UnregisterAllCheckingRemoteServersCallback > -> UnregisterAllCheckingRemoteServersCallbacks Will fix. > + CheckingRemoteServersCallbackItem *item; > + item = fdw_callbacks; > > The above two lines can be combined. Will fix. > +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback > callback, > + void *arg) > > Is the above func called anywhere ? Currently not, I just kept the API because of any other FDW extensions. But I cannot find any use cases, so will remove. > + if (item->callback == callback && item->arg == arg) > > Is comparing argument pointers stable ? What if the same argument is cloned > ? This part is no longer needed. Will remove. > + CallCheckingRemoteServersCallbacks(); > + > + if (remote_servers_connection_check_interval > 0) > + enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT, > + > remote_servers_connection_check_interval); > > Should the call to CallCheckingRemoteServersCallbacks() be placed under the > if block checking remote_servers_connection_check_interval ? > If remote_servers_connection_check_interval is 0, it seems the callbacks > shouldn't be called. Agreed. We force stopping timeout when the GUC sets to 0 in assign_hook, so your saying is consistent. Will move. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: [Proposal] Add foreign-server health checks infrastructure
On 2021/12/15 15:40, kuroda.hay...@fujitsu.com wrote: Yeah, remote-checking timeout will be enable even ifa local transaction is opened. In my understanding, postgres cannot distinguish whether opening transactions are using only local object or not. My first idea was that turning on the timeout when GetFdwRoutineXXX functions were called, What about starting the timeout in GetConnection(), instead? I attached which implements that. v05_0004_add_tests.patch failed to be applied to the master. Could you rebase it? -This option is currently available only on systems that support the -non-standard POLLRDHUP extension to the -poll system call, including Linux. +This option relies on kernel events exposed by Linux, macOS, illumos +and the BSD family of operating systems, and is not currently available +on other systems. The above change is included in both v5-0003-Use-WL_SOCKET_CLOSED-for-client_connection_check_.patch and v05_0002_add_doc.patch. If it should be in the former patch, it should be removed from your patch v05_0002_add_doc.patch. There seems no user of UnregisterCheckingRemoteServersCallback(). So how about removing it? Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [Proposal] Add foreign-server health checks infrastructure
On Tue, Jan 4, 2022 at 10:21 PM Shinya Kato wrote: > Thank you for the new patch! > > On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote: > > Dear Kato-san, > > > > Thank you for giving comments! And sorry for late reply. > > I rebased my patches. > > > >> Even for local-only transaction, I thought it useless to execute > >> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I > >> make it so that it determines outside whether it contains SQL to the > >> remote or not? > > > > Yeah, remote-checking timeout will be enable even ifa local > > transaction is opened. > > In my understanding, postgres cannot distinguish whether opening > > transactions > > are using only local object or not. > > My first idea was that turning on the timeout when GetFdwRoutineXXX > > functions > > were called, but in some cases FDWs may not use remote connection even > > if > > they call such a handler function. e.g. postgresExplainForeignScan(). > > Hence I tried another approach that unregister all checking callback > > the end of each transactions. Only FDWs can notice that transactions > > are remote or local, > > so they must register callback when they really connect to other > > database. > > This implementation will avoid calling remote checking in the case of > > local transaction, > > but multiple registering and unregistering may lead another overhead. > > I attached which implements that. > > > It certainly incurs another overhead, but I think it's better than the > previous one. > So far, I haven't encountered any problems, but I'd like other people's > opinions. > > >> The following points are minor. > >> 1. In foreign.c, some of the lines are too long and should be broken. > >> 2. In pqcomm.c, the newline have been removed, what is the intention > >> of > >> this? > >> 3. In postgres.c, > >> 3-1. how about inserting a comment between lines 2713 and 2714, > >> similar > >> to line 2707? > >> 3-2. the line breaks in line 3242 and line 3243 should be aligned. > >> 3-3. you should change > >> /* > >> * Skip checking foreign servers while reading > >> messages. > >> */ > >> to > >> /* > >> * Skip checking foreign servers while reading > >> messages. > >> */ > >> 4. In connection.c, There is a typo in line 1684, so "fucntion" should > >> be changed to "function". > > > > Maybe all of them were fixed. Thanks! > Thank you, and it looks good to me. > > > Hi, + UnregisterAllCheckingRemoteServersCallback(); UnregisterAllCheckingRemoteServersCallback -> UnregisterAllCheckingRemoteServersCallbacks + CheckingRemoteServersCallbackItem *item; + item = fdw_callbacks; The above two lines can be combined. +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback callback, + void *arg) Is the above func called anywhere ? + if (item->callback == callback && item->arg == arg) Is comparing argument pointers stable ? What if the same argument is cloned ? + CallCheckingRemoteServersCallbacks(); + + if (remote_servers_connection_check_interval > 0) + enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT, + remote_servers_connection_check_interval); Should the call to CallCheckingRemoteServersCallbacks() be placed under the if block checking remote_servers_connection_check_interval ? If remote_servers_connection_check_interval is 0, it seems the callbacks shouldn't be called. Cheers
Re: [Proposal] Add foreign-server health checks infrastructure
Thank you for the new patch! On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote: Dear Kato-san, Thank you for giving comments! And sorry for late reply. I rebased my patches. Even for local-only transaction, I thought it useless to execute CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I make it so that it determines outside whether it contains SQL to the remote or not? Yeah, remote-checking timeout will be enable even ifa local transaction is opened. In my understanding, postgres cannot distinguish whether opening transactions are using only local object or not. My first idea was that turning on the timeout when GetFdwRoutineXXX functions were called, but in some cases FDWs may not use remote connection even if they call such a handler function. e.g. postgresExplainForeignScan(). Hence I tried another approach that unregister all checking callback the end of each transactions. Only FDWs can notice that transactions are remote or local, so they must register callback when they really connect to other database. This implementation will avoid calling remote checking in the case of local transaction, but multiple registering and unregistering may lead another overhead. I attached which implements that. It certainly incurs another overhead, but I think it's better than the previous one. So far, I haven't encountered any problems, but I'd like other people's opinions. The following points are minor. 1. In foreign.c, some of the lines are too long and should be broken. 2. In pqcomm.c, the newline have been removed, what is the intention of this? 3. In postgres.c, 3-1. how about inserting a comment between lines 2713 and 2714, similar to line 2707? 3-2. the line breaks in line 3242 and line 3243 should be aligned. 3-3. you should change /* * Skip checking foreign servers while reading messages. */ to /* * Skip checking foreign servers while reading messages. */ 4. In connection.c, There is a typo in line 1684, so "fucntion" should be changed to "function". Maybe all of them were fixed. Thanks! Thank you, and it looks good to me. -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Kato-san, Thank you for giving comments! And sorry for late reply. I rebased my patches. > Even for local-only transaction, I thought it useless to execute > CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I > make it so that it determines outside whether it contains SQL to the > remote or not? Yeah, remote-checking timeout will be enable even ifa local transaction is opened. In my understanding, postgres cannot distinguish whether opening transactions are using only local object or not. My first idea was that turning on the timeout when GetFdwRoutineXXX functions were called, but in some cases FDWs may not use remote connection even if they call such a handler function. e.g. postgresExplainForeignScan(). Hence I tried another approach that unregister all checking callback the end of each transactions. Only FDWs can notice that transactions are remote or local, so they must register callback when they really connect to other database. This implementation will avoid calling remote checking in the case of local transaction, but multiple registering and unregistering may lead another overhead. I attached which implements that. > The following points are minor. > 1. In foreign.c, some of the lines are too long and should be broken. > 2. In pqcomm.c, the newline have been removed, what is the intention of > this? > 3. In postgres.c, > 3-1. how about inserting a comment between lines 2713 and 2714, similar > to line 2707? > 3-2. the line breaks in line 3242 and line 3243 should be aligned. > 3-3. you should change > /* > * Skip checking foreign servers while reading > messages. > */ > to > /* >* Skip checking foreign servers while reading > messages. >*/ > 4. In connection.c, There is a typo in line 1684, so "fucntion" should > be changed to "function". Maybe all of them were fixed. Thanks! How do you think? Best Regards, Hayato Kuroda FUJITSU LIMITED <> v05_0001_add_checking_infrastracture.patch Description: v05_0001_add_checking_infrastracture.patch v05_0002_add_doc.patch Description: v05_0002_add_doc.patch
Re: [Proposal] Add foreign-server health checks infrastructure
On 2021-11-29 21:36, Zhihong Yu wrote: On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com wrote: Dear Zhihong, Thank you for giving comments! I'll post new patches later. +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS() (CheckingRemoteServersHoldoffCount++) The macro contains only one operation. Can the macro be removed (with `CheckingRemoteServersHoldoffCount++` inlined) ? Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and HOLD_CANCEL_INTERRUPTS(): ``` #define HOLD_INTERRUPTS() (InterruptHoldoffCount++) #define RESUME_INTERRUPTS() \ do { \ Assert(InterruptHoldoffCount > 0); \ InterruptHoldoffCount--; \ } while(0) #define HOLD_CANCEL_INTERRUPTS() (QueryCancelHoldoffCount++) #define RESUME_CANCEL_INTERRUPTS() \ do { \ Assert(QueryCancelHoldoffCount > 0); \ QueryCancelHoldoffCount--; \ } while(0) #define START_CRIT_SECTION() (CritSectionCount++) #define END_CRIT_SECTION() \ do { \ Assert(CritSectionCount > 0); \ CritSectionCount--; \ } while(0) ``` So I want to keep the current style. Could you tell me if you have any other reasons? + if (CheckingRemoteServersTimeoutPending && CheckingRemoteServersHoldoffCount != 0) + { + /* +* Skip checking foreign servers while reading messages. +*/ + InterruptPending = true; + } + else if (CheckingRemoteServersTimeoutPending) Would the code be more readable if the above two if blocks be moved inside one enclosing if block (factoring the common condition)? + if (CheckingRemoteServersTimeoutPending) +1. Will fix. Best Regards, Hayato Kuroda FUJITSU LIMITED Hi, It is Okay to keep the macros. Thanks Hi, Kuroda-san. Sorry for late reply. Even for local-only transaction, I thought it useless to execute CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I make it so that it determines outside whether it contains SQL to the remote or not? The following points are minor. 1. In foreign.c, some of the lines are too long and should be broken. 2. In pqcomm.c, the newline have been removed, what is the intention of this? 3. In postgres.c, 3-1. how about inserting a comment between lines 2713 and 2714, similar to line 2707? 3-2. the line breaks in line 3242 and line 3243 should be aligned. 3-3. you should change /* * Skip checking foreign servers while reading messages. */ to /* * Skip checking foreign servers while reading messages. */ 4. In connection.c, There is a typo in line 1684, so "fucntion" should be changed to "function". -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Re: [Proposal] Add foreign-server health checks infrastructure
On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com < kuroda.hay...@fujitsu.com> wrote: > Dear Zhihong, > > Thank you for giving comments! I'll post new patches later. > > > +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS() > (CheckingRemoteServersHoldoffCount++) > > > > The macro contains only one operation. Can the macro be removed (with > `CheckingRemoteServersHoldoffCount++` inlined) ? > > Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and > HOLD_CANCEL_INTERRUPTS(): > > ``` > #define HOLD_INTERRUPTS() (InterruptHoldoffCount++) > > #define RESUME_INTERRUPTS() \ > do { \ > Assert(InterruptHoldoffCount > 0); \ > InterruptHoldoffCount--; \ > } while(0) > > #define HOLD_CANCEL_INTERRUPTS() (QueryCancelHoldoffCount++) > > #define RESUME_CANCEL_INTERRUPTS() \ > do { \ > Assert(QueryCancelHoldoffCount > 0); \ > QueryCancelHoldoffCount--; \ > } while(0) > > #define START_CRIT_SECTION() (CritSectionCount++) > > #define END_CRIT_SECTION() \ > do { \ > Assert(CritSectionCount > 0); \ > CritSectionCount--; \ > } while(0) > ``` > > So I want to keep the current style. Could you tell me if you have any > other reasons? > > > + if (CheckingRemoteServersTimeoutPending && > CheckingRemoteServersHoldoffCount != 0) > > + { > > + /* > > +* Skip checking foreign servers while reading messages. > > +*/ > > + InterruptPending = true; > > + } > > + else if (CheckingRemoteServersTimeoutPending) > > > > Would the code be more readable if the above two if blocks be moved > inside one enclosing if block (factoring the common condition)? > > > > + if (CheckingRemoteServersTimeoutPending) > > +1. Will fix. > > Best Regards, > Hayato Kuroda > FUJITSU LIMITED > Hi, It is Okay to keep the macros. Thanks
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Hackers, I attached new version that is based from Zhihong's comments. And I newly attached a doc patch. I think the infrastructure part is no longer WIP. Could you review them? Best Regards, Hayato Kuroda FUJITSU LIMITED <> v04_0001_add_checking_infrastracture.patch Description: v04_0001_add_checking_infrastracture.patch v04_0002_add_doc.patch Description: v04_0002_add_doc.patch
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Zhihong, Thank you for giving comments! I'll post new patches later. > +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS() > (CheckingRemoteServersHoldoffCount++) > > The macro contains only one operation. Can the macro be removed (with > `CheckingRemoteServersHoldoffCount++` inlined) ? Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and HOLD_CANCEL_INTERRUPTS(): ``` #define HOLD_INTERRUPTS() (InterruptHoldoffCount++) #define RESUME_INTERRUPTS() \ do { \ Assert(InterruptHoldoffCount > 0); \ InterruptHoldoffCount--; \ } while(0) #define HOLD_CANCEL_INTERRUPTS() (QueryCancelHoldoffCount++) #define RESUME_CANCEL_INTERRUPTS() \ do { \ Assert(QueryCancelHoldoffCount > 0); \ QueryCancelHoldoffCount--; \ } while(0) #define START_CRIT_SECTION() (CritSectionCount++) #define END_CRIT_SECTION() \ do { \ Assert(CritSectionCount > 0); \ CritSectionCount--; \ } while(0) ``` So I want to keep the current style. Could you tell me if you have any other reasons? > + if (CheckingRemoteServersTimeoutPending && > CheckingRemoteServersHoldoffCount != 0) > + { > + /* > +* Skip checking foreign servers while reading messages. > +*/ > + InterruptPending = true; > + } > + else if (CheckingRemoteServersTimeoutPending) > > Would the code be more readable if the above two if blocks be moved inside > one enclosing if block (factoring the common condition)? > > + if (CheckingRemoteServersTimeoutPending) +1. Will fix. Best Regards, Hayato Kuroda FUJITSU LIMITED
Re: [Proposal] Add foreign-server health checks infrastructure
On Tue, Nov 23, 2021 at 8:57 PM kuroda.hay...@fujitsu.com < kuroda.hay...@fujitsu.com> wrote: > Dear Kato-san, > > Thank you for reviewing! > > > Thank you for sending the patches! > > I confirmed that they can be compiled and tested successfully on CentOS > > 8. > > Thanks! > > > + { > > + {"remote_servers_connection_check_interval", PGC_USERSET, > > CONN_AUTH_SETTINGS, > > + gettext_noop("Sets the time interval between checks > > for > > disconnection of remote servers."), > > + NULL, > > + GUC_UNIT_MS > > + }, > > + _servers_connection_check_interval, > > + 0, 0, INT_MAX, > > + }, > > > > If you don't use check_hook, assign_hook and show_hook, you should > > explicitly write "NULL, NULL, NULL", as show below. > > Yeah I forgot the line. Fixed. > > > + ereport(ERROR, > > + > > errcode(ERRCODE_CONNECTION_FAILURE), > > + errmsg("Postgres foreign server %s > > might be down.", > > + > > server->servername)); > > > > According to [1], error messages should start with a lowercase letter > > and not use a period. > > Also, along with the rest of the code, it is a good idea to enclose the > > server name in double quotes. > > I confirmed the postgres error-reporting policy and fixed to follow that. > How do you think? > > Attached are the latest version. > > Best Regards, > Hayato Kuroda > FUJITSU LIMITED > > Hi, +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS() (CheckingRemoteServersHoldoffCount++) The macro contains only one operation. Can the macro be removed (with `CheckingRemoteServersHoldoffCount++` inlined) ? + if (CheckingRemoteServersTimeoutPending && CheckingRemoteServersHoldoffCount != 0) + { + /* +* Skip checking foreign servers while reading messages. +*/ + InterruptPending = true; + } + else if (CheckingRemoteServersTimeoutPending) Would the code be more readable if the above two if blocks be moved inside one enclosing if block (factoring the common condition)? + if (CheckingRemoteServersTimeoutPending) Cheers
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Kato-san, I found the missing space, and I added a test. I'm very happy if you review this. Best Regards, Hayato Kuroda FUJITSU LIMITED v03_0001_add_checking_infrastracture.patch Description: v03_0001_add_checking_infrastracture.patch <>
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Kato-san, Thank you for reviewing! > Thank you for sending the patches! > I confirmed that they can be compiled and tested successfully on CentOS > 8. Thanks! > + { > + {"remote_servers_connection_check_interval", PGC_USERSET, > CONN_AUTH_SETTINGS, > + gettext_noop("Sets the time interval between checks > for > disconnection of remote servers."), > + NULL, > + GUC_UNIT_MS > + }, > + _servers_connection_check_interval, > + 0, 0, INT_MAX, > + }, > > If you don't use check_hook, assign_hook and show_hook, you should > explicitly write "NULL, NULL, NULL", as show below. Yeah I forgot the line. Fixed. > + ereport(ERROR, > + > errcode(ERRCODE_CONNECTION_FAILURE), > + errmsg("Postgres foreign server %s > might be down.", > + > server->servername)); > > According to [1], error messages should start with a lowercase letter > and not use a period. > Also, along with the rest of the code, it is a good idea to enclose the > server name in double quotes. I confirmed the postgres error-reporting policy and fixed to follow that. How do you think? Attached are the latest version. Best Regards, Hayato Kuroda FUJITSU LIMITED <> v02_add_checking_infrastracture.patch Description: v02_add_checking_infrastracture.patch v02_add_helth_check_for_postgres_fdw.patch Description: v02_add_helth_check_for_postgres_fdw.patch
Re: [Proposal] Add foreign-server health checks infrastructure
On 2021-11-18 21:43, kuroda.hay...@fujitsu.com wrote: Dear Kato-san, Thank you for your interest! > I also want you to review the postgres_fdw part, > but I think it should not be attached because cfbot cannot understand > such a dependency > and will throw build error. Do you know how to deal with them in this > case? I don't know how to deal with them, but I hope you will attach the PoC, as it may be easier to review. OK, I attached the PoC along with the dependent patches. Please see the zip file. add_helth_check_... patch is written by me, and other two patches are just copied from [1]. In the new callback function ConnectionHash is searched sequentially and WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event. This event is added by the dependent ones. Thank you for sending the patches! I confirmed that they can be compiled and tested successfully on CentOS 8. I haven't looked at the core of the code yet, but I took a quick look at it. + { + {"remote_servers_connection_check_interval", PGC_USERSET, CONN_AUTH_SETTINGS, + gettext_noop("Sets the time interval between checks for disconnection of remote servers."), + NULL, + GUC_UNIT_MS + }, + _servers_connection_check_interval, + 0, 0, INT_MAX, + }, If you don't use check_hook, assign_hook and show_hook, you should explicitly write "NULL, NULL, NULL", as show below. + { + {"remote_servers_connection_check_interval", PGC_USERSET, CONN_AUTH_SETTINGS, + gettext_noop("Sets the time interval between checks for disconnection of remote servers."), + NULL, + GUC_UNIT_MS + }, + _servers_connection_check_interval, + 0, 0, INT_MAX, + NULL, NULL, NULL + }, + ereport(ERROR, + errcode(ERRCODE_CONNECTION_FAILURE), + errmsg("Postgres foreign server %s might be down.", + server->servername)); According to [1], error messages should start with a lowercase letter and not use a period. Also, along with the rest of the code, it is a good idea to enclose the server name in double quotes. I'll get back to you once I've read all the code. [1] https://www.postgresql.org/docs/devel/error-style-guide.html -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
RE: [Proposal] Add foreign-server health checks infrastructure
Dear Kato-san, Thank you for your interest! > > I also want you to review the postgres_fdw part, > > but I think it should not be attached because cfbot cannot understand > > such a dependency > > and will throw build error. Do you know how to deal with them in this > > case? > > I don't know how to deal with them, but I hope you will attach the PoC, > as it may be easier to review. OK, I attached the PoC along with the dependent patches. Please see the zip file. add_helth_check_... patch is written by me, and other two patches are just copied from [1]. In the new callback function ConnectionHash is searched sequentially and WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event. This event is added by the dependent ones. === How to use === I'll explain how to use it briefly. 1. boot two postmaster processes. One is coordinator, and another is worker 2. set remote_servers_connection_check_interval to non-zero value at the coordinator 3. create tables to worker DB-cluster. 4. create foreign server, user mapping, and foreign table to coordinator. 5. connect to coordinator via psql. 6. open a transaction and access to foreing tables. 7. do "pg_ctl stop" command to woker DB-cluser. 8. execute some commands that does not access an foreign table. 9. Finally the following output will be get: ERROR: Postgres foreign server XXX might be down. === Example in some steps === 3. at worker ``` postgres=# \d List of relations Schema | Name | Type | Owner ++---+ public | remote | table | hayato (1 row) ``` 4. at coordinator ``` postgres=# select * from pg_foreign_server ; oid | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | srvoptions ---+-+--++-+++- 16406 | remote | 10 | 16402 | ||| {port=5433,dbname=postgres} (1 row) postgres=# select * from pg_user_mapping ; oid | umuser | umserver | umoptions ---++--+--- 16407 | 10 |16406 | {user=hayato} (1 row) postgres=# \d List of relations Schema | Name | Type | Owner ++---+ public | local | table | hayato public | remote | foreign table | hayato (2 rows) ``` 6-9. at coordinator ``` postgres=# begin; BEGIN postgres=*# select * from remote ; id 1 (1 row) postgres=*# select * from local ; ERROR: Postgres foreign server remote might be down. postgres=!# ``` Note that some keepalive settings are needed if you want to detect cable breakdown events. In my understanding following parameters are needed as server options: * keepalives_idle * keepalives_count * keepalives_interval [1]: https://commitfest.postgresql.org/35/3098/ Best Regards, Hayato Kuroda FUJITSU LIMITED <> v01_add_checking_infrastracture.patch Description: v01_add_checking_infrastracture.patch
Re: [Proposal] Add foreign-server health checks infrastructure
Hi, Thank you for the patch! On 2021-10-30 12:50, kuroda.hay...@fujitsu.com wrote: ## Background Currently there is no way to check the status of an foreign server in PostgreSQL. If an foreign server's postmaster goes down, or if the network between servers is lost, the backend process will only detect these when it uses the connection corresponding to that foreign server. Consider a workload that updates data on an foreign server only at the beginning of a transaction, and runs a lot of local SQLs. Even if the network is disconnected immediately after accessing the foreign server, the backend process will continue to execute local SQLs without realizing it. The process will eventually finish to execute SQLs and try to commit. Only then will it realize that the foreign server cannot be connect and will abort the transaction. This situation should be detected as soon as possible because it is impossible to commit a transaction when the foreign server goes down. This can be more of a problem if you have system-wide downtime requirements. That's why I want to implement the health-check feature to postgres. It's a good idea. I also think such a situation should be avoided. ## Further work As the next step I have a plan to implement the callback function to postgres_fdw. I already made a PoC, but it deeply depends on the following thread: https://commitfest.postgresql.org/35/3098/ I also want you to review the postgres_fdw part, but I think it should not be attached because cfbot cannot understand such a dependency and will throw build error. Do you know how to deal with them in this case? I don't know how to deal with them, but I hope you will attach the PoC, as it may be easier to review. -- Regards, -- Shinya Kato Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION