RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-19 Thread kuroda.hay...@fujitsu.com
Dear Zhihong,

Thank you for reviewing! And I have to apologize for the late.
I'll post new patchset later.

> +   UnregisterAllCheckingRemoteServersCallback();
> 
> UnregisterAllCheckingRemoteServersCallback
> -> UnregisterAllCheckingRemoteServersCallbacks

Will fix.

> +   CheckingRemoteServersCallbackItem *item;
> +   item = fdw_callbacks;
> 
> The above two lines can be combined.

Will fix.

> +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
> callback,
> +   void *arg)
> 
> Is the above func called anywhere ?

Currently not, I just kept the API because of any other FDW extensions.
But I cannot find any use cases, so will remove.

> +   if (item->callback == callback && item->arg == arg)
> 
> Is comparing argument pointers stable ? What if the same argument is cloned
> ?

This part is no longer needed. Will remove.

> +   CallCheckingRemoteServersCallbacks();
> +
> +   if (remote_servers_connection_check_interval > 0)
> +   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
>  remote_servers_connection_check_interval);
>
> Should the call to CallCheckingRemoteServersCallbacks() be placed under the
> if block checking remote_servers_connection_check_interval ?
> If remote_servers_connection_check_interval is 0, it seems the callbacks
> shouldn't be called.

Agreed. We force stopping timeout when the GUC sets to 0 in assign_hook,
so your saying is consistent. Will move.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-18 Thread Fujii Masao




On 2021/12/15 15:40, kuroda.hay...@fujitsu.com wrote:

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened.
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not.
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called,


What about starting the timeout in GetConnection(), instead?


I attached which implements that.


v05_0004_add_tests.patch failed to be applied to the master. Could you rebase 
it?


-This option is currently available only on systems that support the
-non-standard POLLRDHUP extension to the
-poll system call, including Linux.
+This option relies on kernel events exposed by Linux, macOS, illumos
+and the BSD family of operating systems, and is not currently available
+on other systems.

The above change is included in both 
v5-0003-Use-WL_SOCKET_CLOSED-for-client_connection_check_.patch and 
v05_0002_add_doc.patch. If it should be in the former patch, it should be 
removed from your patch v05_0002_add_doc.patch.


There seems no user of UnregisterCheckingRemoteServersCallback(). So how about 
removing it?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-05 Thread Zhihong Yu
On Tue, Jan 4, 2022 at 10:21 PM Shinya Kato 
wrote:

> Thank you for the new patch!
>
> On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote:
> > Dear Kato-san,
> >
> > Thank you for giving comments! And sorry for late reply.
> > I rebased my patches.
> >
> >> Even for local-only transaction, I thought it useless to execute
> >> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> >> make it so that it determines outside whether it contains SQL to the
> >> remote or not?
> >
> > Yeah, remote-checking timeout will be enable even ifa local
> > transaction is opened.
> > In my understanding, postgres cannot distinguish whether opening
> > transactions
> > are using only local object or not.
> > My first idea was that turning on the timeout when GetFdwRoutineXXX
> > functions
> > were called, but in some cases FDWs may not use remote connection even
> > if
> > they call such a handler function. e.g. postgresExplainForeignScan().
> > Hence I tried another approach that unregister all checking callback
> > the end of each transactions. Only FDWs can notice that transactions
> > are remote or local,
> > so they must register callback when they really connect to other
> > database.
> > This implementation will avoid calling remote checking in the case of
> > local transaction,
> > but multiple registering and unregistering may lead another overhead.
> > I attached which implements that.
> >
> It certainly incurs another overhead, but I think it's better than the
> previous one.
> So far, I haven't encountered any problems, but I'd like other people's
> opinions.
>
> >> The following points are minor.
> >> 1. In foreign.c, some of the lines are too long and should be broken.
> >> 2. In pqcomm.c, the newline have been removed, what is the intention
> >> of
> >> this?
> >> 3. In postgres.c,
> >> 3-1. how about inserting a comment between lines 2713 and 2714,
> >> similar
> >> to line 2707?
> >> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> >> 3-3. you should change
> >>  /*
> >>  * Skip checking foreign servers while reading
> >> messages.
> >>  */
> >> to
> >>  /*
> >>   * Skip checking foreign servers while reading
> >> messages.
> >>   */
> >> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> >> be changed to "function".
> >
> > Maybe all of them were fixed. Thanks!
> Thank you, and it looks good to me.
>
>
> Hi,
+   UnregisterAllCheckingRemoteServersCallback();

UnregisterAllCheckingRemoteServersCallback
-> UnregisterAllCheckingRemoteServersCallbacks

+   CheckingRemoteServersCallbackItem *item;
+   item = fdw_callbacks;

The above two lines can be combined.

+UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
callback,
+   void *arg)

Is the above func called anywhere ?

+   if (item->callback == callback && item->arg == arg)

Is comparing argument pointers stable ? What if the same argument is cloned
?

+   CallCheckingRemoteServersCallbacks();
+
+   if (remote_servers_connection_check_interval > 0)
+   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
+
 remote_servers_connection_check_interval);

Should the call to CallCheckingRemoteServersCallbacks() be placed under the
if block checking remote_servers_connection_check_interval ?
If remote_servers_connection_check_interval is 0, it seems the callbacks
shouldn't be called.

Cheers


Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-04 Thread Shinya Kato

Thank you for the new patch!

On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote:

Dear Kato-san,

Thank you for giving comments! And sorry for late reply.
I rebased my patches.


Even for local-only transaction, I thought it useless to execute
CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
make it so that it determines outside whether it contains SQL to the
remote or not?


Yeah, remote-checking timeout will be enable even ifa local
transaction is opened.
In my understanding, postgres cannot distinguish whether opening 
transactions

are using only local object or not.
My first idea was that turning on the timeout when GetFdwRoutineXXX 
functions
were called, but in some cases FDWs may not use remote connection even 
if

they call such a handler function. e.g. postgresExplainForeignScan().
Hence I tried another approach that unregister all checking callback
the end of each transactions. Only FDWs can notice that transactions
are remote or local,
so they must register callback when they really connect to other 
database.

This implementation will avoid calling remote checking in the case of
local transaction,
but multiple registering and unregistering may lead another overhead.
I attached which implements that.

It certainly incurs another overhead, but I think it's better than the 
previous one.
So far, I haven't encountered any problems, but I'd like other people's 
opinions.



The following points are minor.
1. In foreign.c, some of the lines are too long and should be broken.
2. In pqcomm.c, the newline have been removed, what is the intention 
of

this?
3. In postgres.c,
3-1. how about inserting a comment between lines 2713 and 2714, 
similar

to line 2707?
3-2. the line breaks in line 3242 and line 3243 should be aligned.
3-3. you should change
/*
* Skip checking foreign servers while reading
messages.
*/
to
/*
 * Skip checking foreign servers while reading
messages.
 */
4. In connection.c, There is a typo in line 1684, so "fucntion" should
be changed to "function".


Maybe all of them were fixed. Thanks!

Thank you, and it looks good to me.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2021-12-14 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for giving comments! And sorry for late reply.
I rebased my patches.

> Even for local-only transaction, I thought it useless to execute
> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> make it so that it determines outside whether it contains SQL to the
> remote or not?

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened. 
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not. 
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called, but in some cases FDWs may not use remote connection even if
they call such a handler function. e.g. postgresExplainForeignScan().
Hence I tried another approach that unregister all checking callback
the end of each transactions. Only FDWs can notice that transactions are remote 
or local,
so they must register callback when they really connect to other database.
This implementation will avoid calling remote checking in the case of local 
transaction,
but multiple registering and unregistering may lead another overhead.
I attached which implements that.

> The following points are minor.
> 1. In foreign.c, some of the lines are too long and should be broken.
> 2. In pqcomm.c, the newline have been removed, what is the intention of
> this?
> 3. In postgres.c,
> 3-1. how about inserting a comment between lines 2713 and 2714, similar
> to line 2707?
> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> 3-3. you should change
>   /*
>   * Skip checking foreign servers while reading
> messages.
>   */
> to
>   /*
>* Skip checking foreign servers while reading
> messages.
>*/
> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> be changed to "function".

Maybe all of them were fixed. Thanks!

How do you think?


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v05_0001_add_checking_infrastracture.patch
Description: v05_0001_add_checking_infrastracture.patch


v05_0002_add_doc.patch
Description: v05_0002_add_doc.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-12-06 Thread Shinya Kato

On 2021-11-29 21:36, Zhihong Yu wrote:

On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com
 wrote:


Dear Zhihong,

Thank you for giving comments! I'll post new patches later.


+#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()

(CheckingRemoteServersHoldoffCount++)


The macro contains only one operation. Can the macro be removed

(with `CheckingRemoteServersHoldoffCount++` inlined) ?

Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and
HOLD_CANCEL_INTERRUPTS():

```
#define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)

#define RESUME_INTERRUPTS() \
do { \
Assert(InterruptHoldoffCount > 0); \
InterruptHoldoffCount--; \
} while(0)

#define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)

#define RESUME_CANCEL_INTERRUPTS() \
do { \
Assert(QueryCancelHoldoffCount > 0); \
QueryCancelHoldoffCount--; \
} while(0)

#define START_CRIT_SECTION()  (CritSectionCount++)

#define END_CRIT_SECTION() \
do { \
Assert(CritSectionCount > 0); \
CritSectionCount--; \
} while(0)
```

So I want to keep the current style. Could you tell me if you have
any other reasons?


+   if (CheckingRemoteServersTimeoutPending &&

CheckingRemoteServersHoldoffCount != 0)

+   {
+   /*
+* Skip checking foreign servers while reading messages.
+*/
+   InterruptPending = true;
+   }
+   else if (CheckingRemoteServersTimeoutPending)

Would the code be more readable if the above two if blocks be

moved inside one enclosing if block (factoring the common
condition)?


+   if (CheckingRemoteServersTimeoutPending)


+1. Will fix.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi,

It is Okay to keep the macros.

Thanks


Hi, Kuroda-san. Sorry for late reply.

Even for local-only transaction, I thought it useless to execute 
CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I 
make it so that it determines outside whether it contains SQL to the 
remote or not?


The following points are minor.
1. In foreign.c, some of the lines are too long and should be broken.
2. In pqcomm.c, the newline have been removed, what is the intention of 
this?

3. In postgres.c,
3-1. how about inserting a comment between lines 2713 and 2714, similar 
to line 2707?

3-2. the line breaks in line 3242 and line 3243 should be aligned.
3-3. you should change
/*
* Skip checking foreign servers while reading messages.
*/
to
/*
 * Skip checking foreign servers while reading messages.
 */
4. In connection.c, There is a typo in line 1684, so "fucntion" should 
be changed to "function".



--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread Zhihong Yu
On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com <
kuroda.hay...@fujitsu.com> wrote:

> Dear Zhihong,
>
> Thank you for giving comments! I'll post new patches later.
>
> > +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()
> (CheckingRemoteServersHoldoffCount++)
> >
> > The macro contains only one operation. Can the macro be removed (with
> `CheckingRemoteServersHoldoffCount++` inlined) ?
>
> Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and
> HOLD_CANCEL_INTERRUPTS():
>
> ```
> #define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)
>
> #define RESUME_INTERRUPTS() \
> do { \
> Assert(InterruptHoldoffCount > 0); \
> InterruptHoldoffCount--; \
> } while(0)
>
> #define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)
>
> #define RESUME_CANCEL_INTERRUPTS() \
> do { \
> Assert(QueryCancelHoldoffCount > 0); \
> QueryCancelHoldoffCount--; \
> } while(0)
>
> #define START_CRIT_SECTION()  (CritSectionCount++)
>
> #define END_CRIT_SECTION() \
> do { \
> Assert(CritSectionCount > 0); \
> CritSectionCount--; \
> } while(0)
> ```
>
> So I want to keep the current style. Could you tell me if you have any
> other reasons?
>
> > +   if (CheckingRemoteServersTimeoutPending &&
> CheckingRemoteServersHoldoffCount != 0)
> > +   {
> > +   /*
> > +* Skip checking foreign servers while reading messages.
> > +*/
> > +   InterruptPending = true;
> > +   }
> > +   else if (CheckingRemoteServersTimeoutPending)
> >
> > Would the code be more readable if the above two if blocks be moved
> inside one enclosing if block (factoring the common condition)?
> >
> > +   if (CheckingRemoteServersTimeoutPending)
>
> +1. Will fix.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

Hi,

It is Okay to keep the macros.

Thanks


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread kuroda.hay...@fujitsu.com
Dear Hackers,

I attached new version that is based from Zhihong's comments.
And I newly attached a doc patch. I think the infrastructure part is no longer 
WIP. 

Could you review them?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v04_0001_add_checking_infrastracture.patch
Description: v04_0001_add_checking_infrastracture.patch


v04_0002_add_doc.patch
Description: v04_0002_add_doc.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread kuroda.hay...@fujitsu.com
Dear Zhihong,

Thank you for giving comments! I'll post new patches later.

> +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()  
> (CheckingRemoteServersHoldoffCount++)
> 
> The macro contains only one operation. Can the macro be removed (with 
> `CheckingRemoteServersHoldoffCount++` inlined) ?

Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and 
HOLD_CANCEL_INTERRUPTS():

```
#define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)

#define RESUME_INTERRUPTS() \
do { \
Assert(InterruptHoldoffCount > 0); \
InterruptHoldoffCount--; \
} while(0)

#define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)

#define RESUME_CANCEL_INTERRUPTS() \
do { \
Assert(QueryCancelHoldoffCount > 0); \
QueryCancelHoldoffCount--; \
} while(0)

#define START_CRIT_SECTION()  (CritSectionCount++)

#define END_CRIT_SECTION() \
do { \
Assert(CritSectionCount > 0); \
CritSectionCount--; \
} while(0)
```

So I want to keep the current style. Could you tell me if you have any other 
reasons?

> +   if (CheckingRemoteServersTimeoutPending && 
> CheckingRemoteServersHoldoffCount != 0)
> +   {
> +   /*
> +* Skip checking foreign servers while reading messages.
> +*/
> +   InterruptPending = true;
> +   }
> +   else if (CheckingRemoteServersTimeoutPending)
> 
> Would the code be more readable if the above two if blocks be moved inside 
> one enclosing if block (factoring the common condition)?
> 
> +   if (CheckingRemoteServersTimeoutPending)

+1. Will fix.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-28 Thread Zhihong Yu
On Tue, Nov 23, 2021 at 8:57 PM kuroda.hay...@fujitsu.com <
kuroda.hay...@fujitsu.com> wrote:

> Dear Kato-san,
>
> Thank you for reviewing!
>
> > Thank you for sending the patches!
> > I confirmed that they can be compiled and tested successfully on CentOS
> > 8.
>
> Thanks!
>
> > + {
> > + {"remote_servers_connection_check_interval", PGC_USERSET,
> > CONN_AUTH_SETTINGS,
> > + gettext_noop("Sets the time interval between checks
> > for
> > disconnection of remote servers."),
> > + NULL,
> > + GUC_UNIT_MS
> > + },
> > + _servers_connection_check_interval,
> > + 0, 0, INT_MAX,
> > + },
> >
> > If you don't use check_hook, assign_hook and show_hook, you should
> > explicitly write "NULL, NULL, NULL", as show below.
>
> Yeah I forgot the line. Fixed.
>
> > + ereport(ERROR,
> > +
> >   errcode(ERRCODE_CONNECTION_FAILURE),
> > + errmsg("Postgres foreign server %s
> > might be down.",
> > +
> >   server->servername));
> >
> > According to [1], error messages should start with a lowercase letter
> > and not use a period.
> > Also, along with the rest of the code, it is a good idea to enclose the
> > server name in double quotes.
>
> I confirmed the postgres error-reporting policy and fixed to follow that.
> How do you think?
>
> Attached are the latest version.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
> Hi,

+#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()
 (CheckingRemoteServersHoldoffCount++)

The macro contains only one operation. Can the macro be removed (with
`CheckingRemoteServersHoldoffCount++` inlined) ?

+   if (CheckingRemoteServersTimeoutPending &&
CheckingRemoteServersHoldoffCount != 0)
+   {
+   /*
+* Skip checking foreign servers while reading messages.
+*/
+   InterruptPending = true;
+   }
+   else if (CheckingRemoteServersTimeoutPending)

Would the code be more readable if the above two if blocks be moved inside
one enclosing if block (factoring the common condition)?

+   if (CheckingRemoteServersTimeoutPending)

Cheers


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-23 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

I found the missing space, and I added a test.
I'm very happy if you review this.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v03_0001_add_checking_infrastracture.patch
Description: v03_0001_add_checking_infrastracture.patch
<>


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-23 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for reviewing!

> Thank you for sending the patches!
> I confirmed that they can be compiled and tested successfully on CentOS
> 8.

Thanks!

> + {
> + {"remote_servers_connection_check_interval", PGC_USERSET,
> CONN_AUTH_SETTINGS,
> + gettext_noop("Sets the time interval between checks
> for
> disconnection of remote servers."),
> + NULL,
> + GUC_UNIT_MS
> + },
> + _servers_connection_check_interval,
> + 0, 0, INT_MAX,
> + },
> 
> If you don't use check_hook, assign_hook and show_hook, you should
> explicitly write "NULL, NULL, NULL", as show below.

Yeah I forgot the line. Fixed.

> + ereport(ERROR,
> +
>   errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("Postgres foreign server %s
> might be down.",
> +
>   server->servername));
> 
> According to [1], error messages should start with a lowercase letter
> and not use a period.
> Also, along with the rest of the code, it is a good idea to enclose the
> server name in double quotes.

I confirmed the postgres error-reporting policy and fixed to follow that.
How do you think?

Attached are the latest version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v02_add_checking_infrastracture.patch
Description: v02_add_checking_infrastracture.patch


v02_add_helth_check_for_postgres_fdw.patch
Description: v02_add_helth_check_for_postgres_fdw.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-19 Thread Shinya Kato

On 2021-11-18 21:43, kuroda.hay...@fujitsu.com wrote:

Dear Kato-san,

Thank you for your interest!


> I also want you to review the postgres_fdw part,
> but I think it should not be attached because cfbot cannot understand
> such a dependency
> and will throw build error. Do you know how to deal with them in this
> case?

I don't know how to deal with them, but I hope you will attach the 
PoC,

as it may be easier to review.


OK, I attached the PoC along with the dependent patches. Please see
the zip file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially 
and

WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.


Thank you for sending the patches!
I confirmed that they can be compiled and tested successfully on CentOS 
8.


I haven't looked at the core of the code yet, but I took a quick look at 
it.


+   {
+		{"remote_servers_connection_check_interval", PGC_USERSET, 
CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the time interval between checks for 
disconnection of remote servers."),

+   NULL,
+   GUC_UNIT_MS
+   },
+   _servers_connection_check_interval,
+   0, 0, INT_MAX,
+   },

If you don't use check_hook, assign_hook and show_hook, you should 
explicitly write "NULL, NULL, NULL", as show below.

+   {
+		{"remote_servers_connection_check_interval", PGC_USERSET, 
CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the time interval between checks for 
disconnection of remote servers."),

+   NULL,
+   GUC_UNIT_MS
+   },
+   _servers_connection_check_interval,
+   0, 0, INT_MAX,
+   NULL, NULL, NULL
+   },


+   ereport(ERROR,
+   errcode(ERRCODE_CONNECTION_FAILURE),
+   errmsg("Postgres foreign server %s might be 
down.",
+   server->servername));

According to [1], error messages should start with a lowercase letter 
and not use a period.
Also, along with the rest of the code, it is a good idea to enclose the 
server name in double quotes.


I'll get back to you once I've read all the code.

[1] https://www.postgresql.org/docs/devel/error-style-guide.html

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-18 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for your interest!

> > I also want you to review the postgres_fdw part,
> > but I think it should not be attached because cfbot cannot understand
> > such a dependency
> > and will throw build error. Do you know how to deal with them in this
> > case?
> 
> I don't know how to deal with them, but I hope you will attach the PoC,
> as it may be easier to review.

OK, I attached the PoC along with the dependent patches. Please see the zip 
file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially and
WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.

===
How to use
===

I'll explain how to use it briefly.

1. boot two postmaster processes. One is coordinator, and another is worker
2. set remote_servers_connection_check_interval to non-zero value at the 
coordinator
3. create tables to worker DB-cluster.
4. create foreign server, user mapping, and foreign table to coordinator.
5. connect to coordinator via psql.
6. open a transaction and access to foreing tables.
7. do "pg_ctl stop" command to woker DB-cluser.
8. execute some commands that does not access an foreign table.
9. Finally the following output will be get:

ERROR:  Postgres foreign server XXX might be down.

===
Example in some steps
===

3. at worker

```
postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | remote | table | hayato
(1 row)
```

4. at coordinator 

```
postgres=# select * from pg_foreign_server ;
  oid  | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | 
srvoptions  
---+-+--++-+++-
 16406 | remote  |   10 |  16402 | ||| 
{port=5433,dbname=postgres}
(1 row)

postgres=# select * from pg_user_mapping ;
  oid  | umuser | umserver |   umoptions   
---++--+---
 16407 | 10 |16406 | {user=hayato}
(1 row)

postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | local  | table | hayato
 public | remote | foreign table | hayato
(2 rows)
```

6-9. at coordinator

```
postgres=# begin;
BEGIN
postgres=*# select * from remote ;
 id 

  1
(1 row)

postgres=*# select * from local ;
ERROR:  Postgres foreign server remote might be down.
postgres=!#
```

Note that some keepalive settings are needed
if you want to detect cable breakdown events.
In my understanding following parameters are needed as server options:

* keepalives_idle
* keepalives_count
* keepalives_interval

[1]: https://commitfest.postgresql.org/35/3098/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v01_add_checking_infrastracture.patch
Description: v01_add_checking_infrastracture.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-15 Thread Shinya Kato

Hi,
Thank you for the patch!

On 2021-10-30 12:50, kuroda.hay...@fujitsu.com wrote:


## Background

Currently there is no way to check the status of an foreign server in
PostgreSQL.
If an foreign server's postmaster goes down, or if the network between
servers is lost,
the backend process will only detect these when it uses the connection
corresponding to that foreign server.

Consider a workload that updates data on an foreign server only at the
beginning of a transaction,
and runs a lot of local SQLs. Even if the network is disconnected
immediately after accessing the foreign server,
the backend process will continue to execute local SQLs without 
realizing it.


The process will eventually finish to execute SQLs and try to commit.
Only then will it realize that the foreign server cannot be connect
and will abort the transaction.
This situation should be detected as soon as possible
because it is impossible to commit a transaction when the foreign
server goes down.
This can be more of a problem if you have system-wide downtime 
requirements.

That's why I want to implement the health-check feature to postgres.


It's a good idea. I also think such a situation should be avoided.


## Further work

As the next step I have a plan to implement the callback function to
postgres_fdw.
I already made a PoC, but it deeply depends on the following thread:
https://commitfest.postgresql.org/35/3098/

I also want you to review the postgres_fdw part,
but I think it should not be attached because cfbot cannot understand
such a dependency
and will throw build error. Do you know how to deal with them in this 
case?


I don't know how to deal with them, but I hope you will attach the PoC, 
as it may be easier to review.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




<    1   2