RE: [Proposal] Add foreign-server health checks infrastructure

2023-12-11 Thread Hayato Kuroda (Fujitsu)
Dear Shubham,

> 
> I am failing to apply the latest
> Patch-"v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch"
>  for review. Please find the error I am facing:
> D:\Project\Postgres>git am
> D:\Project\Patch\v39-0001-postgres_fdw-add-postgres_fdw_verify_connection
> -.patch
> error: patch failed: contrib/postgres_fdw/connection.c:117
> error: contrib/postgres_fdw/connection.c: patch does not apply
> hint: Use 'git am --show-current-patch=diff' to see the failed patch
> Applying: postgres_fdw: add postgres_fdw_verify_connection variants
> Patch failed at 0001 postgres_fdw: add postgres_fdw_verify_connection variants
> When you have resolved this problem, run "git am --continue".
> If you prefer to skip this patch, run "git am --skip" instead.
> To restore the original branch and stop patching, run "git am --abort".

Oh, good catch. Here is a new patch set. There are no new changes from v39.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v40-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch
Description:  v40-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch


v40-0002-add-test.patch
Description: v40-0002-add-test.patch


v40-0003-Extend-postgres_fdw_get_connections-to-return-us.patch
Description:  v40-0003-Extend-postgres_fdw_get_connections-to-return-us.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-12-11 Thread Shubham Khanna
On Mon, Dec 11, 2023 at 2:08 PM Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Fujii-san, Tom,
>
> Thank you for giving a suggestion! PSA new version.
>
> > Regarding 0001 patch, on second thought, to me, it seems odd to expose
> > a function that doesn't have anything to directly do with PostgreSQL
> > as a libpq function. The function simply calls poll() on the socket
> > with POLLRDHUP if it is supported. While it is certainly convenient to
> > have this function, I'm not sure that it fits within the scope of libpq.
> > Thought?
>
> Current style is motivated by Onder [1] and later discussions. I thought it 
> might
> be useful for other developers, but OK, I can remove changes on libpq modules.
> Horiguchi-san has suggested [2] that it might be overkill to use 
> WaitEventSet()
> mechanism, so I kept using poll().
> I reused the same naming as previous version because they actually do 
> something
> Like libpq, but better naming is very welcome.
>
> > Regarding 0002 patch, the behavior of 
> > postgres_fdw_verify_connection_states()
> > in regards to multiple connections using different user mappings seems
> > not well documented. The function seems to return false if either of
> > those connections has been closed.
>
> I did not considered the situation because I have not came up with the 
> situation
> that only one of connections to the same foreign server is broken.
>
> > This behavior means that it's difficult to identify which connection
> > has been closed when there are multiple ones to the given server.
> > To make it easier to identify that, it could be helpful to extend
> > the postgres_fdw_verify_connection_states() function so that it accepts
> > a unique connection as an input instead of just the server name.
> > One suggestion is to extend the function so that it accepts
> > both the server name and the user name used for the connection,
> > and checks the specified connection. If only the server name is specified,
> > the function should check all connections to the server and return false
> > if any of them are closed. This would be helpful since there is typically
> > only one connection to the server in most cases.
>
> Just to confirm, your point "user name" means a local user, right?
> I made a patch for addressing them.
>
> > Additionally, it would be helpful to extend the 
> > postgres_fdw_get_connections()
> > function to also return the user name used for each connection,
> > as currently there is no straightforward way to identify it.
>
> Added, See 0003. IIUC there is no good way to extract user mapping from its 
> OID, so I have
> added an function to do that and used it.
>
> > The function name "postgres_fdw_verify_connection_states" may seem
> > unnecessarily long to me. A simpler name like
> > "postgres_fdw_verify_connection" may be enough?
>
> Renamed.
>
> > The patch may not be ready for commit due to the review comments,
> > and with the feature freeze approaching in a few days,
> > it may not be possible to include this feature in v16.
>
> It is sad for me, but it is more important for PostgreSQL to add nicer codes.
> I changed status to "Needs review" again.

I am failing to apply the latest
Patch-"v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch"
 for review. Please find the error I am facing:
D:\Project\Postgres>git am
D:\Project\Patch\v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch
error: patch failed: contrib/postgres_fdw/connection.c:117
error: contrib/postgres_fdw/connection.c: patch does not apply
hint: Use 'git am --show-current-patch=diff' to see the failed patch
Applying: postgres_fdw: add postgres_fdw_verify_connection variants
Patch failed at 0001 postgres_fdw: add postgres_fdw_verify_connection variants
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

Please rebase and post an updated version of the Patch.

Thanks and Regards,
Shubham Khanna.




RE: [Proposal] Add foreign-server health checks infrastructure

2023-04-05 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

> Dear Fujii-san, Tom,
> 
> Thank you for giving a suggestion! PSA new version.

I have reviewed and revised patches by myself.

* Fix handing of poll() based on the Horiguchi-san's point
* Fix WARNING message that shows user name which is used for connection
* PQconnCheck(), PQconnCheckable () are renamed

[1]: 
https://www.postgresql.org/message-id/flat/20230209.115009.2229702014236187289.horikyota.ntt%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch
Description:  v39-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch


v39-0002-add-test.patch
Description: v39-0002-add-test.patch


v39-0003-Extend-postgres_fdw_get_connections-to-return-us.patch
Description:  v39-0003-Extend-postgres_fdw_get_connections-to-return-us.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-04-04 Thread Hayato Kuroda (Fujitsu)
Dear Fujii-san, Tom,

Thank you for giving a suggestion! PSA new version.

> Regarding 0001 patch, on second thought, to me, it seems odd to expose
> a function that doesn't have anything to directly do with PostgreSQL
> as a libpq function. The function simply calls poll() on the socket
> with POLLRDHUP if it is supported. While it is certainly convenient to
> have this function, I'm not sure that it fits within the scope of libpq.
> Thought?

Current style is motivated by Onder [1] and later discussions. I thought it 
might
be useful for other developers, but OK, I can remove changes on libpq modules.
Horiguchi-san has suggested [2] that it might be overkill to use WaitEventSet()
mechanism, so I kept using poll().
I reused the same naming as previous version because they actually do something
Like libpq, but better naming is very welcome.

> Regarding 0002 patch, the behavior of postgres_fdw_verify_connection_states()
> in regards to multiple connections using different user mappings seems
> not well documented. The function seems to return false if either of
> those connections has been closed.

I did not considered the situation because I have not came up with the situation
that only one of connections to the same foreign server is broken.

> This behavior means that it's difficult to identify which connection
> has been closed when there are multiple ones to the given server.
> To make it easier to identify that, it could be helpful to extend
> the postgres_fdw_verify_connection_states() function so that it accepts
> a unique connection as an input instead of just the server name.
> One suggestion is to extend the function so that it accepts
> both the server name and the user name used for the connection,
> and checks the specified connection. If only the server name is specified,
> the function should check all connections to the server and return false
> if any of them are closed. This would be helpful since there is typically
> only one connection to the server in most cases.

Just to confirm, your point "user name" means a local user, right?
I made a patch for addressing them.

> Additionally, it would be helpful to extend the postgres_fdw_get_connections()
> function to also return the user name used for each connection,
> as currently there is no straightforward way to identify it.

Added, See 0003. IIUC there is no good way to extract user mapping from its 
OID, so I have
added an function to do that and used it.

> The function name "postgres_fdw_verify_connection_states" may seem
> unnecessarily long to me. A simpler name like
> "postgres_fdw_verify_connection" may be enough?

Renamed.

> The patch may not be ready for commit due to the review comments,
> and with the feature freeze approaching in a few days,
> it may not be possible to include this feature in v16.

It is sad for me, but it is more important for PostgreSQL to add nicer codes.
I changed status to "Needs review" again.

[1]: 
https://www.postgresql.org/message-id/CACawEhW19nPfbFpvfke9eidFDxAy%2Bic36wmY0s936T%3DxzxgHog%40mail.gmail.com
[2]: 
https://www.postgresql.org/message-id/20221017.172642.45253962719866795.horikyota.ntt%40gmail.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v38-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch
Description:  v38-0001-postgres_fdw-add-postgres_fdw_verify_connection-.patch


v38-0002-add-test.patch
Description: v38-0002-add-test.patch


v38-0003-Extend-postgres_fdw_get_connections-to-return-us.patch
Description:  v38-0003-Extend-postgres_fdw_get_connections-to-return-us.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-04-03 Thread Tom Lane
Fujii Masao  writes:
> Regarding 0001 patch, on second thought, to me, it seems odd to expose
> a function that doesn't have anything to directly do with PostgreSQL
> as a libpq function. The function simply calls poll() on the socket
> with POLLRDHUP if it is supported. While it is certainly convenient to
> have this function, I'm not sure that it fits within the scope of libpq.
> Thought?

Yeah, that does not seem great, partly because the semantics would be
platform-dependent.  I don't think we want that to become part of
libpq's API.

regards, tom lane




Re: [Proposal] Add foreign-server health checks infrastructure

2023-04-03 Thread Fujii Masao




On 2023/03/13 16:05, Hayato Kuroda (Fujitsu) wrote:

Thank you so much for your reviewing!
Now we can wait comments from senior members and committers.


Thanks for working on this patch!

Regarding 0001 patch, on second thought, to me, it seems odd to expose
a function that doesn't have anything to directly do with PostgreSQL
as a libpq function. The function simply calls poll() on the socket
with POLLRDHUP if it is supported. While it is certainly convenient to
have this function, I'm not sure that it fits within the scope of libpq.
Thought?

Regarding 0002 patch, the behavior of postgres_fdw_verify_connection_states()
in regards to multiple connections using different user mappings seems
not well documented. The function seems to return false if either of
those connections has been closed.

This behavior means that it's difficult to identify which connection
has been closed when there are multiple ones to the given server.
To make it easier to identify that, it could be helpful to extend
the postgres_fdw_verify_connection_states() function so that it accepts
a unique connection as an input instead of just the server name.
One suggestion is to extend the function so that it accepts
both the server name and the user name used for the connection,
and checks the specified connection. If only the server name is specified,
the function should check all connections to the server and return false
if any of them are closed. This would be helpful since there is typically
only one connection to the server in most cases.

Additionally, it would be helpful to extend the postgres_fdw_get_connections()
function to also return the user name used for each connection,
as currently there is no straightforward way to identify it.

The function name "postgres_fdw_verify_connection_states" may seem
unnecessarily long to me. A simpler name like
"postgres_fdw_verify_connection" may be enough?

The patch may not be ready for commit due to the review comments,
and with the feature freeze approaching in a few days,
it may not be possible to include this feature in v16.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-03-13 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

> Hi,
> 
> I updated the status of the patch to ready for committer.
> 
> regards,

Thank you so much for your reviewing!
Now we can wait comments from senior members and committers.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-13 Thread Katsuragi Yuta

On 2023-03-10 18:07, Katsuragi Yuta wrote:

On 2023-03-08 13:40, Hayato Kuroda (Fujitsu) wrote:

Dear Vignesh,

Thank you for reviewing! PSA new version.


Hi,

Thank you for the comments, Vignesh.
Thank you for updating the patch, Kuroda-san. This fix looks fine to 
me.


And also, there seems no other comments from this post [1].
So, I'm planning to update the status to ready for committer
on next Monday.


[1]:
https://www.postgresql.org/message-id/TYAPR01MB5866F8EA3C06E1B43E42E4E4F5B79%40TYAPR01MB5866.jpnprd01.prod.outlook.com

regards,


Hi,

I updated the status of the patch to ready for committer.

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-10 Thread Katsuragi Yuta

On 2023-03-08 13:40, Hayato Kuroda (Fujitsu) wrote:

Dear Vignesh,

Thank you for reviewing! PSA new version.


Hi,

Thank you for the comments, Vignesh.
Thank you for updating the patch, Kuroda-san. This fix looks fine to me.

And also, there seems no other comments from this post [1].
So, I'm planning to update the status to ready for committer
on next Monday.


[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866F8EA3C06E1B43E42E4E4F5B79%40TYAPR01MB5866.jpnprd01.prod.outlook.com


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-03-07 Thread Hayato Kuroda (Fujitsu)
Dear Vignesh,

Thank you for reviewing! PSA new version.

> 
> Few comments:
> 1) There is no handling of forConnCheck in #else HAVE_POLL, if this is
> intentional we could add some comments for the same:
>  static int
> -pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
> +pqSocketPoll(int sock, int forRead,
> +int forWrite, int forConnCheck, time_t end_time)
>  {
> /* We use poll(2) if available, otherwise select(2) */
>  #ifdef HAVE_POLL
> @@ -1092,7 +1133,11 @@ pqSocketPoll(int sock, int forRead, int
> forWrite, time_t end_time)
> int timeout_ms;
> 
> if (!forRead && !forWrite)
> -   return 0;

Comments were added. This missing is intentional, because with the patch present
we do not implement checking feature for kqueue(). If you want to check the
detailed implementation of that, please see 0004 patch attached on [1].

> 2) Can this condition be added to the parent if condition:
> if (!forRead && !forWrite)
> -   return 0;
> +   {
> +   /* Connection check can be available on some limted platforms
> */
> +   if (!(forConnCheck && PQconnCheckable()))
> +   return 0;
> +   }

Changed, and added comments atop the if-statement.

> 3) Can we add a comment for PQconnCheckable:
> +/* Check whether the postgres server is still alive or not */
> +extern int PQconnCheck(PGconn *conn);
> +extern int PQconnCheckable(void);
> +

Added. And I found the tab should be used to divide data type and name, so 
fixed.

> 4) "Note that if 0 is returned and forConnCheck is requested" doesn't
> sound right, it can be changed to "Note that if 0 is returned when
> forConnCheck is requested"
> + * If neither forRead, forWrite nor forConnCheck are set, immediately return 
> a
> + * timeout condition (without waiting). Return >0 if condition is met, 0 if a
> + * timeout occurred, -1 if an error or interrupt occurred. Note that if 0 is
> + * returned and forConnCheck is requested, it means that the socket has not
> + * matched POLLRDHUP event and the connection is not closed by the socket
> peer.

Fixed.

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58669EAAC02493BFF9F39B06F5D99%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v37-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch
Description:  v37-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch


v37-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v37-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v37-0003-add-test.patch
Description: v37-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-07 Thread vignesh C
On Tue, 7 Mar 2023 at 09:53, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Katsuragi-san,
>
> Thank you for reviewing! PSA new version patches.
>
> > I think we can update the status to ready for committer after
> > this fix, if there is no objection.
>
> That's a very good news for me! How about other people?
>
> > >> 7. the document of postgres_fdw_verify_connection_states_all
> > >> NULL
> > >> +  is returned if the local session does not have connection
> > >> caches,
> > >> or this
> > >> +  function is not available on this platform.
> > >>
> > >> I think there is a case where a connection cache exists but valid
> > >> connections do not exist and NULL is returned (disconnection case).
> > >> Almost the same document as the postgres_fdw_verify_connection_states
> > >> case (comment no.5) seems better.
> > >
> > > Yes, but completely same statement cannot be used because these is not
> > > specified foreign server. How about:
> > > NULL is returned if there are no established connections or this
> > > function ...
> >
> > Yes, to align with the postgres_fdw_verify_connection_states()
> > case, how about writing the connection is not valid. Like the
> > following?
> > 'NULL is returned if no valid connections are established or
> > this function...'
>
> Prefer yours, fixed.
>
> > This is my comment for v35. Please check.
> > 0002:
> > 1. the comment of verify_cached_connections (I missed one minor point.)
> > + * This function emits warnings if a disconnection is found. This
> > returns true
> > + * if disconnections cannot be found, otherwise returns false.
> >
> > I think false is returned only if disconnections are found and
> > true is returned in all other cases. So, modifying the description
> > like the following seems better.
> > 'This returns false if disconnections are found, otherwise
> > returns true.'
>
> Fixed.

Few comments:
1) There is no handling of forConnCheck in #else HAVE_POLL, if this is
intentional we could add some comments for the same:
 static int
-pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+pqSocketPoll(int sock, int forRead,
+int forWrite, int forConnCheck, time_t end_time)
 {
/* We use poll(2) if available, otherwise select(2) */
 #ifdef HAVE_POLL
@@ -1092,7 +1133,11 @@ pqSocketPoll(int sock, int forRead, int
forWrite, time_t end_time)
int timeout_ms;

if (!forRead && !forWrite)
-   return 0;

2) Can this condition be added to the parent if condition:
if (!forRead && !forWrite)
-   return 0;
+   {
+   /* Connection check can be available on some limted platforms */
+   if (!(forConnCheck && PQconnCheckable()))
+   return 0;
+   }

3) Can we add a comment for PQconnCheckable:
+/* Check whether the postgres server is still alive or not */
+extern int PQconnCheck(PGconn *conn);
+extern int PQconnCheckable(void);
+

4) "Note that if 0 is returned and forConnCheck is requested" doesn't
sound right, it can be changed to "Note that if 0 is returned when
forConnCheck is requested"
+ * If neither forRead, forWrite nor forConnCheck are set, immediately return a
+ * timeout condition (without waiting). Return >0 if condition is met, 0 if a
+ * timeout occurred, -1 if an error or interrupt occurred. Note that if 0 is
+ * returned and forConnCheck is requested, it means that the socket has not
+ * matched POLLRDHUP event and the connection is not closed by the socket peer.

Regards,
Vignesh




RE: [Proposal] Add foreign-server health checks infrastructure

2023-03-06 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new version patches.

> I think we can update the status to ready for committer after
> this fix, if there is no objection.

That's a very good news for me! How about other people?

> >> 7. the document of postgres_fdw_verify_connection_states_all
> >> NULL
> >> +  is returned if the local session does not have connection
> >> caches,
> >> or this
> >> +  function is not available on this platform.
> >>
> >> I think there is a case where a connection cache exists but valid
> >> connections do not exist and NULL is returned (disconnection case).
> >> Almost the same document as the postgres_fdw_verify_connection_states
> >> case (comment no.5) seems better.
> >
> > Yes, but completely same statement cannot be used because these is not
> > specified foreign server. How about:
> > NULL is returned if there are no established connections or this
> > function ...
> 
> Yes, to align with the postgres_fdw_verify_connection_states()
> case, how about writing the connection is not valid. Like the
> following?
> 'NULL is returned if no valid connections are established or
> this function...'

Prefer yours, fixed.

> This is my comment for v35. Please check.
> 0002:
> 1. the comment of verify_cached_connections (I missed one minor point.)
> + * This function emits warnings if a disconnection is found. This
> returns true
> + * if disconnections cannot be found, otherwise returns false.
> 
> I think false is returned only if disconnections are found and
> true is returned in all other cases. So, modifying the description
> like the following seems better.
> 'This returns false if disconnections are found, otherwise
> returns true.'

Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v36-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch
Description:  v36-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch


v36-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v36-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v36-0003-add-test.patch
Description: v36-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-06 Thread Katsuragi Yuta

Hi Kuroda-san,

Thank you for updating the patch!

I think we can update the status to ready for committer after
this fix, if there is no objection.


7. the document of postgres_fdw_verify_connection_states_all
NULL
+  is returned if the local session does not have connection 
caches,

or this
+  function is not available on this platform.

I think there is a case where a connection cache exists but valid
connections do not exist and NULL is returned (disconnection case).
Almost the same document as the postgres_fdw_verify_connection_states
case (comment no.5) seems better.


Yes, but completely same statement cannot be used because these is not
specified foreign server. How about:
NULL is returned if there are no established connections or this 
function ...


Yes, to align with the postgres_fdw_verify_connection_states()
case, how about writing the connection is not valid. Like the
following?
'NULL is returned if no valid connections are established or
this function...'


This is my comment for v35. Please check.
0002:
1. the comment of verify_cached_connections (I missed one minor point.)
+ * This function emits warnings if a disconnection is found. This 
returns true

+ * if disconnections cannot be found, otherwise returns false.

I think false is returned only if disconnections are found and
true is returned in all other cases. So, modifying the description
like the following seems better.
'This returns false if disconnections are found, otherwise
returns true.'


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-03-06 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new version.

> >> 4. the code of pqSocketPoll
> >> +#if defined(POLLRDHUP)
> >> +  if (forConnCheck)
> >> +  input_fd.events |= POLLRDHUP;
> >> +#endif
> >>
> >> I think it is better to use PQconnCheckable() to remove the macro.
> >
> > IIUC the macro is needed. In FreeBSD, macOS and other platforms do not
> > have the
> > macro POLLRDHUP so they cannot compile. I checked by my CI and got
> > following error.
> >
> > ```
> > ...
> > FAILED: src/interfaces/libpq/libpq.a.p/fe-misc.c.o
> > ...
> > ../src/interfaces/libpq/fe-misc.c:1149:22: error: use of undeclared
> > identifier 'POLLRDHUP'
> > input_fd.events |= POLLRDHUP;
> > 
> >
> > It must be invisible from them.
> 
> Sorry, my mistake...

No issues :-).

> >> 9. the document of postgres_fdw
> >> The document of postgres_fdw_verify_connection_states_* is a little
> >> bit old. Could you update it?
> >
> > Updated. IIUC postgres_fdw_verify_connection_states returns
> >
> > * true, if the connection is verified.
> > * false, if the connection seems to be disconnected.
> > * NULL, if this is not the supported platform or connection has not
> > been established.
> >
> > And postgres_fdw_verify_connection_states_all returns
> >
> > * true if all the connections are verified.
> > * false, if one of connections seems to be disconnected.
> > * NULL, if this is not the supported platform or this backend has
> > never established connections
> 
> I think 'backend has never established connections' is a little bit
> strong.
> I think the following cases also return NULL. The case where a
> connection was established, however the connection is now closed
> by the postgres_fdw_disconnect() or something. NULL is also returned if
> the connection is invalidated. So, I think 'NULL, if no valid
> connection is established or the function is not supported on
> the platform.' is better. What do you think?

Disconnect functions have never been in my mind. Descriptions must be updated.

> Followings are my comments for v34. Please check.
> 
> 0001:
> 1. the document of pqConnCheck
> +   0 if the socket is valid, and returns
> -1
> +   if the connection has already been closed or an error has
> occurred.
> 
> 1.1 if the socket is valid -> returns 0 if the 'connection is not
> closed'?

Fixed.

> 1.2 returns -1 if the connection has already been closed  <- Let me ask
> a question.
> Isn't this a situation where we would like to check using this
> function? Is 'error has occurred' insufficient?

Seems right, fixed.

> 2. the comment of pqSocketCheck
> + * means that the socket has not matched POLLRDHUP event and the socket
> has
> + * still survived.
> 
> socket has still survived -> connection is not closed by the socket
> peer?

Fixed.

> 3. the comment of pqSocketPoll
> + * returned and forConnCheck is requested, it means that the socket has
> not
> + * matched POLLRDHUP event and the socket has still survived.
> 
> socket has still survived -> connection is not closed by the socket
> peer?

Fixed.

> 0002:
> 4. the comment of verify_cached_connections
> + * This function emits warnings if a disconnection is found. This
> return true
> + * if disconnections cannot be found, otherwise return false.
> 
> return ture -> return's' true
> return false -> return's' false

Fixed.

> 5. the comment of postgres_fdw_verify_connection_states
> + * if the local session seems to be disconnected from other servers.
> NULL is
> + * returned if a connection to the specified foreign server has not
> been
> + * established yet, or this function is not available on this platform.
> 
> Considering the above discussion, 'NULL is returned if a valid
> connection to the specified foreign server is not established or
> this function...' seems better. What do you think?

Right, fixed.

> 6. the document of postgres_fdw_verify_connection_states
>   NULL
> +  is returned if a connection to the specified foreign server has
> not been
> +  established yet, or this function is not available on this
> platform
> 
> The same as comment no.5.

Right, fixed.

> 7. the document of postgres_fdw_verify_connection_states_all
> NULL
> +  is returned if the local session does not have connection caches,
> or this
> +  function is not available on this platform.
> 
> I think there is a case where a connection cache exists but valid
> connections do not exist and NULL is returned (disconnection case).
> Almost the same document as the postgres_fdw_verify_connection_states
> case (comment no.5) seems better.

Yes, but completely same statement cannot be used because these is not
specified foreign server. How about: 
NULL is returned if there are no established connections or this function ...

> 8. the document of postgres_fdw_can_verify_connection_states
> +  This function checks whether
> postgres_fdw_verify_connection_states
> +  and postgres_fdw_verify_connection_states
> work well
> 
> Should the 

Re: [Proposal] Add foreign-server health checks infrastructure

2023-03-06 Thread Katsuragi Yuta

Hi Kuroda-san,

Thank you for updating the patch!


4. the code of pqSocketPoll
+#if defined(POLLRDHUP)
+   if (forConnCheck)
+   input_fd.events |= POLLRDHUP;
+#endif

I think it is better to use PQconnCheckable() to remove the macro.


IIUC the macro is needed. In FreeBSD, macOS and other platforms do not 
have the

macro POLLRDHUP so they cannot compile. I checked by my CI and got
following error.

```
...
FAILED: src/interfaces/libpq/libpq.a.p/fe-misc.c.o
...
../src/interfaces/libpq/fe-misc.c:1149:22: error: use of undeclared
identifier 'POLLRDHUP'
input_fd.events |= POLLRDHUP;


It must be invisible from them.


Sorry, my mistake...



9. the document of postgres_fdw
The document of postgres_fdw_verify_connection_states_* is a little
bit old. Could you update it?


Updated. IIUC postgres_fdw_verify_connection_states returns

* true, if the connection is verified.
* false, if the connection seems to be disconnected.
* NULL, if this is not the supported platform or connection has not
been established.

And postgres_fdw_verify_connection_states_all returns

* true if all the connections are verified.
* false, if one of connections seems to be disconnected.
* NULL, if this is not the supported platform or this backend has
never established connections


I think 'backend has never established connections' is a little bit 
strong.

I think the following cases also return NULL. The case where a
connection was established, however the connection is now closed
by the postgres_fdw_disconnect() or something. NULL is also returned if
the connection is invalidated. So, I think 'NULL, if no valid
connection is established or the function is not supported on
the platform.' is better. What do you think?


Followings are my comments for v34. Please check.

0001:
1. the document of pqConnCheck
+   0 if the socket is valid, and returns 
-1
+   if the connection has already been closed or an error has 
occurred.


1.1 if the socket is valid -> returns 0 if the 'connection is not 
closed'?


1.2 returns -1 if the connection has already been closed  <- Let me ask 
a question.

Isn't this a situation where we would like to check using this
function? Is 'error has occurred' insufficient?

2. the comment of pqSocketCheck
+ * means that the socket has not matched POLLRDHUP event and the socket 
has

+ * still survived.

socket has still survived -> connection is not closed by the socket 
peer?


3. the comment of pqSocketPoll
+ * returned and forConnCheck is requested, it means that the socket has 
not

+ * matched POLLRDHUP event and the socket has still survived.

socket has still survived -> connection is not closed by the socket 
peer?


0002:
4. the comment of verify_cached_connections
+ * This function emits warnings if a disconnection is found. This 
return true

+ * if disconnections cannot be found, otherwise return false.

return ture -> return's' true
return false -> return's' false

5. the comment of postgres_fdw_verify_connection_states
+ * if the local session seems to be disconnected from other servers. 
NULL is
+ * returned if a connection to the specified foreign server has not 
been

+ * established yet, or this function is not available on this platform.

Considering the above discussion, 'NULL is returned if a valid
connection to the specified foreign server is not established or
this function...' seems better. What do you think?

6. the document of postgres_fdw_verify_connection_states
 NULL
+  is returned if a connection to the specified foreign server has 
not been
+  established yet, or this function is not available on this 
platform


The same as comment no.5.

7. the document of postgres_fdw_verify_connection_states_all
NULL
+  is returned if the local session does not have connection caches, 
or this

+  function is not available on this platform.

I think there is a case where a connection cache exists but valid
connections do not exist and NULL is returned (disconnection case).
Almost the same document as the postgres_fdw_verify_connection_states
case (comment no.5) seems better.


8. the document of postgres_fdw_can_verify_connection_states
+  This function checks whether 
postgres_fdw_verify_connection_states
+  and postgres_fdw_verify_connection_states 
work well


Should the latter (or former) postgres_fdw_verify_connection_states be
postgres_fdw_verify_connection_states_all?


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-03-03 Thread Hayato Kuroda (Fujitsu)
Hi Katsuragi-san,

Thank you for reviewing! PSA new version.

> >> I rethought the pqSocketPoll part. Current interpretation of
> >> arguments seems a little bit confusing because a specific pattern
> >> of arguments has a different meaning. What do you think about
> >> introducing a new argument like `int forConnCheck`? This seems
> >> straightforward and readable.
> >
> > I think it may be better, so fixed.
> > But now we must consider another thing - will we support combination
> > of conncheck
> > and {read|write} or timeout? Especially about timeout, if someone
> > calls pqSocketPoll()
> > with forConnCheck = 1 and end_time = -1, the process may be stuck
> > because it waits
> > till socket peer closed connection.
> > Currently the forConnCheck can be specified with other request, but
> > timeout must be zero.
> 
> Yes, we need to consider these.
> I'm wondering whether we need a special care for the combination
> of event and timeout. Surely, if forConnCheck is set and end_time = -1,
> pqSocketPoll blocks until the connection close. However, the
> behavior matches the meaning of the arguments and does not seem
> confusing (also not an error state). Do we need to restrict this
> kind of usage in the pqSocketPoll side? I think like the following
> might be fine.
> 
> ```
>   if (!forRead && !forWrite)
>   {
>   if (!(forConnCheck && PQconnCheckable()))
>   return 0;
>   }
> ```

Seems right, I was too pessimistic. Fixed.

> > While making the patch, I come up with idea that
> > postgres_fdw_verify_connection_states*
> > returns NULL if PQconnCheck() cannot work on this platform. This can be
> > done by
> > adding PQconnCheckable(). It may be reasonable because the checking is
> > not really
> > done on this environment. How do you think?
> 
> I agree with you.

Changed.

> Followings are comments for v33. Please check.
> 0001:
> 1. the comment of PQconnCheck
> +/*
> + * Check whether the socket peer closed connection or not.
> + *
> 
> Check whether the socket peer closed 'the' connection or not?

Changed.

> 2. the comment of pqSocketCheck
> - * or both.  Returns >0 if one or more conditions are met, 0 if it
> timed
> - * out, -1 if an error occurred.
> + * or both. Moreover, this function can check the health of socket on
> some
> + * limited platforms if end_time is 0.
> 
> the health of socket -> the health of the connection?
> if end_time is 0 -> if forConnCehck is specified?

Fixed.

> 3. the comment of pqSocketPoll
> - * If neither forRead nor forWrite are set, immediately return a
> timeout
> - * condition (without waiting).  Return >0 if condition is met, 0
> - * if a timeout occurred, -1 if an error or interrupt occurred.
> + * Moreover, this function can check the health of socket on some
> limited
> + * platforms if end_time is 0.
> 
> the health of socket -> the health of the connection?
> if end_time is 0 -> if forConnCehck is specified?

Fixed.

> 4. the code of pqSocketPoll
> +#if defined(POLLRDHUP)
> + if (forConnCheck)
> + input_fd.events |= POLLRDHUP;
> +#endif
> 
> I think it is better to use PQconnCheckable() to remove the macro.

IIUC the macro is needed. In FreeBSD, macOS and other platforms do not have the
macro POLLRDHUP so they cannot compile. I checked by my CI and got following 
error.

```
...
FAILED: src/interfaces/libpq/libpq.a.p/fe-misc.c.o 
...
../src/interfaces/libpq/fe-misc.c:1149:22: error: use of undeclared identifier 
'POLLRDHUP'
input_fd.events |= POLLRDHUP;


It must be invisible from them.

> 5. the code of pqSocketCheck and the definition of pqSocketPoll
> - result = pqSocketPoll(conn->sock, forRead, forWrite,
> end_time);
> + result = pqSocketPoll(conn->sock, forRead, forWrite,
> forConnCheck,
> end_time);
> 
> -pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
> +pqSocketPoll(int sock, int forRead, int forWrite, int forConnCheck,
> time_t end_time)
> 
> Should these be divided into two lines?

pgindent did not say anything about them, but I divided. 

> 6. the comment of verify_cached_connections
> + * This function emits warnings if a disconnection is found, and this
> returns
> + * false only when the lastly verified server seems to be disconnected.

Updated. I think comments were not correct, so these were also fixed.

> It seems better to write the case where this function returns
> true.
> 7. the comment of postgres_fdw_verify_connection_states
> + * This function emits a warning if a disconnection is found. This
> returns
> + * false only when the verified server seems to be disconnected, and
> reutrns
> + * NULL if the connection check had not been done.
> 
> It seems better to write the case where this function returns
> true.


> 8. the code of
> + (errcode(ERRCODE_CONNECTION_FAILURE),
> +  errmsg("%s", str.data),
> +  errdetail("Socket close is 

Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-28 Thread Katsuragi Yuta

Hi Kuroda-san,

Thank you for updating the patch!



I rethought the pqSocketPoll part. Current interpretation of
arguments seems a little bit confusing because a specific pattern
of arguments has a different meaning. What do you think about
introducing a new argument like `int forConnCheck`? This seems
straightforward and readable.


I think it may be better, so fixed.
But now we must consider another thing - will we support combination
of conncheck
and {read|write} or timeout? Especially about timeout, if someone
calls pqSocketPoll()
with forConnCheck = 1 and end_time = -1, the process may be stuck
because it waits
till socket peer closed connection.
Currently the forConnCheck can be specified with other request, but
timeout must be zero.


Yes, we need to consider these.
I'm wondering whether we need a special care for the combination
of event and timeout. Surely, if forConnCheck is set and end_time = -1,
pqSocketPoll blocks until the connection close. However, the
behavior matches the meaning of the arguments and does not seem
confusing (also not an error state). Do we need to restrict this
kind of usage in the pqSocketPoll side? I think like the following
might be fine.

```
if (!forRead && !forWrite)
{
if (!(forConnCheck && PQconnCheckable()))
return 0;
}
```




While making the patch, I come up with idea that
postgres_fdw_verify_connection_states*
returns NULL if PQconnCheck() cannot work on this platform. This can be 
done by

adding PQconnCheckable(). It may be reasonable because the checking is
not really
done on this environment. How do you think?


I agree with you.

Followings are comments for v33. Please check.
0001:
1. the comment of PQconnCheck
+/*
+ * Check whether the socket peer closed connection or not.
+ *

Check whether the socket peer closed 'the' connection or not?

2. the comment of pqSocketCheck
- * or both.  Returns >0 if one or more conditions are met, 0 if it 
timed

- * out, -1 if an error occurred.
+ * or both. Moreover, this function can check the health of socket on 
some

+ * limited platforms if end_time is 0.

the health of socket -> the health of the connection?
if end_time is 0 -> if forConnCehck is specified?


3. the comment of pqSocketPoll
- * If neither forRead nor forWrite are set, immediately return a 
timeout

- * condition (without waiting).  Return >0 if condition is met, 0
- * if a timeout occurred, -1 if an error or interrupt occurred.
+ * Moreover, this function can check the health of socket on some 
limited

+ * platforms if end_time is 0.

the health of socket -> the health of the connection?
if end_time is 0 -> if forConnCehck is specified?

4. the code of pqSocketPoll
+#if defined(POLLRDHUP)
+   if (forConnCheck)
+   input_fd.events |= POLLRDHUP;
+#endif

I think it is better to use PQconnCheckable() to remove the macro.

5. the code of pqSocketCheck and the definition of pqSocketPoll
-   result = pqSocketPoll(conn->sock, forRead, forWrite, end_time);
+		result = pqSocketPoll(conn->sock, forRead, forWrite, forConnCheck, 
end_time);


-pqSocketPoll(int sock, int forRead, int forWrite, time_t end_time)
+pqSocketPoll(int sock, int forRead, int forWrite, int forConnCheck, 
time_t end_time)


Should these be divided into two lines?


6. the comment of verify_cached_connections
+ * This function emits warnings if a disconnection is found, and this 
returns

+ * false only when the lastly verified server seems to be disconnected.

It seems better to write the case where this function returns
true.

7. the comment of postgres_fdw_verify_connection_states
+ * This function emits a warning if a disconnection is found. This 
returns
+ * false only when the verified server seems to be disconnected, and 
reutrns

+ * NULL if the connection check had not been done.

It seems better to write the case where this function returns
true.

8. the code of
+   (errcode(ERRCODE_CONNECTION_FAILURE),
+errmsg("%s", str.data),
+errdetail("Socket close is detected."),
+errhint("Plsease check the health of 
server.")));

Is it better to use "Connection close is detected" rather than
"Socket close is detected"?

9. the document of postgres_fdw
The document of postgres_fdw_verify_connection_states_* is a little
bit old. Could you update it?

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-22 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! New patch set can be available on [1].

> I rethought the pqSocketPoll part. Current interpretation of
> arguments seems a little bit confusing because a specific pattern
> of arguments has a different meaning. What do you think about
> introducing a new argument like `int forConnCheck`? This seems
> straightforward and readable.

I think it may be better, so fixed.
But now we must consider another thing - will we support combination of 
conncheck
and {read|write} or timeout? Especially about timeout, if someone calls 
pqSocketPoll()
with forConnCheck = 1 and end_time = -1, the process may be stuck because it 
waits
till socket peer closed connection.
Currently the forConnCheck can be specified with other request, but timeout 
must be zero.

> I think there are cases where the given server name does exist,
> however the connection check is not performed (does not pass the
> first if statement above). 1) a case where the server name exist,
> however an open connection to the specified server is not found.
> 2) case where connection for specified server is invalidated.
> 3) case where there is not ConnectionHash entry for the specified
> server. Current implementation returns true in that case, however
> the check is not performed. Suppose the checking mechanism is
> supported on the platform, it does not seem reasonable to return
> true or false when the check is not performed. What do you think?

Thank you for detailed explanation. I agreed your opinion and modified like 
that.

While making the patch, I come up with idea that 
postgres_fdw_verify_connection_states*
returns NULL if PQconnCheck() cannot work on this platform. This can be done by
adding PQconnCheckable(). It may be reasonable because the checking is not 
really
done on this environment. How do you think?

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB586664F5ED2128E572EA95B0F5AA9%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-22 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! PSA new version.

> 1.
> PQconnCheck() function allows to check the status of the socket by polling
> the socket. This function is currently available only on systems that
> support the non-standard POLLRDHUP extension to the poll system call,
> including Linux.
> 
> ~
> 
> (missed fix from previous review)
> 
> "status of the socket" --> "status of the connection"

Sorry, fixed.

> 
> doc/src/sgml/libpq.sgml
> 
> 2. PQconnCheck
> +  
> +   This function check the health of the connection. Unlike  linkend="libpq-PQstatus"/>,
> +   this check is performed by polling the corresponding socket. This
> +   function is currently available only on systems that support the
> +   non-standard POLLRDHUP extension to the
> poll
> +   system call, including Linux. 
> +   returns greater than zero if the remote peer seems to be closed, 
> returns
> +   0 if the socket is valid, and returns
> -1
> +   if the connection has already been closed or an error has occurred.
> +  
> 
> "check the health" --> "checks the health"

Fixed.

> 3. PQcanConnCheck
> 
> +  
> +   Returns true (1) or false (0) to indicate if the PQconnCheck function
> +   is supported on this platform.
> 
> Should the reference to PQconnCheck be a link as it previously was?

Right, fixed.

> src/interfaces/libpq/fe-misc.c
> 
> 4. PQconnCheck
> 
> +/*
> + * Check whether the socket peer closed connection or not.
> + *
> + * Returns >0 if remote peer seems to be closed, 0 if it is valid,
> + * -1 if the input connection is bad or an error occurred.
> + */
> +int
> +PQconnCheck(PGconn *conn)
> +{
> + return pqSocketCheck(conn, 0, 0, (time_t) 0);
> +}
> 
> I'm confused. This comment says =0 means connection is valid. But the
> pqSocketCheck comment says =0 means it timed out.
> 
> So those two function comments don't seem compatible

Added further descriptions atop pqSocketCheck() and pqSocketPoll().
Is it helpful to understand?

> 5. PQconnCheckable
> 
> +/*
> + * Check whether PQconnCheck() works well on this platform.
> + *
> + * Returns true (1) if this can use PQconnCheck(), otherwise false (0).
> + */
> +int
> +PQconnCheckable(void)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + return true;
> +#else
> + return false;
> +#endif
> +}
> 
> Why say "works well"? IMO it either works or doesn't work – there is no 
> "well".
> 
> SUGGESTION1
> Check whether PQconnCheck() works on this platform.
> 
> SUGGESTION2
> Check whether PQconnCheck() can work on this platform.

I choose 2.

> 6. pqSocketCheck
> 
>  /*
>   * Checks a socket, using poll or select, for data to be read, written,
> - * or both.  Returns >0 if one or more conditions are met, 0 if it timed
> + * or both. Moreover, when neither forRead nor forWrite is requested and
> + * timeout is disabled, try to check the health of socket.
> + *
> + * Returns >0 if one or more conditions are met, 0 if it timed
>   * out, -1 if an error occurred.
>   *
>   * If SSL is in use, the SSL buffer is checked prior to checking the socket
> 
> ~
> 
> See review comment #4. (e.g. This says =0 if it timed out).

Descriptions were added.

> 7. pqSocketPoll
> 
> + * When neither forRead nor forWrite are set and timeout is disabled,
> + *
> + * - If the timeout is disabled, try to check the health of the socket
> + * - Otherwise this immediately returns 0
> + *
> + * Return >0 if condition is met, 0 if a timeout occurred, -1 if an error
> + * or interrupt occurred.
> 
> Don't say "and timeout is disabled," because it clashes with the 1st
> bullet which also says "- If the timeout is disabled,".

This comments were reworded.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v33-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch
Description:  v33-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch


v33-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v33-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v33-0003-add-test.patch
Description: v33-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-21 Thread Katsuragi Yuta

Hi Kuroda-san,

Thank you for updating the patch!

On 2023-02-20 15:42, Hayato Kuroda (Fujitsu) wrote:

Dear Katsuragi-san,

Thank you for reviewing! PSA new version.


I rethought the pqSocketPoll part. Current interpretation of
arguments seems a little bit confusing because a specific pattern
of arguments has a different meaning. What do you think about
introducing a new argument like `int forConnCheck`? This seems
straightforward and readable.



0002:
As for the return value of postgres_fdw_verify_connection_states,
what do you think about returning NULL when connection-checking
is not performed? I think there are two cases 1) ConnectionHash
is not initialized or 2) connection is not found for specified
server name, That is, no entry passes the first if statement below
(case 2)).

```
  if (all || entry->serverid == serverid)
  {
  if (PQconnCheck(entry->conn))
  {
```


I think in that case we can follow postgres_fdw_disconnect().
About postgres_fdw_disconnect(), if the given server_name does not 
exist,

an error is reported.


Yes, I think this error is fine.
I think there are cases where the given server name does exist,
however the connection check is not performed (does not pass the
first if statement above). 1) a case where the server name exist,
however an open connection to the specified server is not found.
2) case where connection for specified server is invalidated.
3) case where there is not ConnectionHash entry for the specified
server. Current implementation returns true in that case, however
the check is not performed. Suppose the checking mechanism is
supported on the platform, it does not seem reasonable to return
true or false when the check is not performed. What do you think?

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-20 Thread Peter Smith
Here are some review comments for v32-0001.

==
Commit message

1.
PQconnCheck() function allows to check the status of the socket by polling
the socket. This function is currently available only on systems that
support the non-standard POLLRDHUP extension to the poll system call,
including Linux.

~

(missed fix from previous review)

"status of the socket" --> "status of the connection"


doc/src/sgml/libpq.sgml

2. PQconnCheck
+  
+   This function check the health of the connection. Unlike ,
+   this check is performed by polling the corresponding socket. This
+   function is currently available only on systems that support the
+   non-standard POLLRDHUP extension to the
poll
+   system call, including Linux. 
+   returns greater than zero if the remote peer seems to be closed, returns
+   0 if the socket is valid, and returns
-1
+   if the connection has already been closed or an error has occurred.
+  

"check the health" --> "checks the health"

~~~

3. PQcanConnCheck

+  
+   Returns true (1) or false (0) to indicate if the PQconnCheck function
+   is supported on this platform.

Should the reference to PQconnCheck be a link as it previously was?

==
src/interfaces/libpq/fe-misc.c

4. PQconnCheck

+/*
+ * Check whether the socket peer closed connection or not.
+ *
+ * Returns >0 if remote peer seems to be closed, 0 if it is valid,
+ * -1 if the input connection is bad or an error occurred.
+ */
+int
+PQconnCheck(PGconn *conn)
+{
+ return pqSocketCheck(conn, 0, 0, (time_t) 0);
+}

I'm confused. This comment says =0 means connection is valid. But the
pqSocketCheck comment says =0 means it timed out.

So those two function comments don't seem compatible

~~~

5. PQconnCheckable

+/*
+ * Check whether PQconnCheck() works well on this platform.
+ *
+ * Returns true (1) if this can use PQconnCheck(), otherwise false (0).
+ */
+int
+PQconnCheckable(void)
+{
+#if (defined(HAVE_POLL) && defined(POLLRDHUP))
+ return true;
+#else
+ return false;
+#endif
+}

Why say "works well"? IMO it either works or doesn't work – there is no "well".

SUGGESTION1
Check whether PQconnCheck() works on this platform.

SUGGESTION2
Check whether PQconnCheck() can work on this platform.

~~~

6. pqSocketCheck

 /*
  * Checks a socket, using poll or select, for data to be read, written,
- * or both.  Returns >0 if one or more conditions are met, 0 if it timed
+ * or both. Moreover, when neither forRead nor forWrite is requested and
+ * timeout is disabled, try to check the health of socket.
+ *
+ * Returns >0 if one or more conditions are met, 0 if it timed
  * out, -1 if an error occurred.
  *
  * If SSL is in use, the SSL buffer is checked prior to checking the socket

~

See review comment #4. (e.g. This says =0 if it timed out).

~~~

7. pqSocketPoll

+ * When neither forRead nor forWrite are set and timeout is disabled,
+ *
+ * - If the timeout is disabled, try to check the health of the socket
+ * - Otherwise this immediately returns 0
+ *
+ * Return >0 if condition is met, 0 if a timeout occurred, -1 if an error
+ * or interrupt occurred.

Don't say "and timeout is disabled," because it clashes with the 1st
bullet which also says "- If the timeout is disabled,".

--
Kind Regards,
Peter Smith.
Fujitsu Australia




RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-19 Thread Hayato Kuroda (Fujitsu)
Dear Peter,

Thank you for reviewing! Latest version can be seen in [1].

> 1.
> PQcanConnCheck seemed like a strange API name. Maybe it can have the
> same prefix as the other?
> 
> e.g.
> 
> - PQconnCheck()
> - PGconnCheckSupported()
> 
> or
> 
> - PQconnCheck()
> - PGconnCheckable()
>

I choose PQconnCheckable().

> 2.
> PqconnCheck() function allows to check the status of socket by polling
> the socket. This function is currently available only on systems that
> support the non-standard POLLRDHUP extension to the poll system call,
> including Linux.
> 
> PqcanConnCheck() checks whether above function is available or not.
> 
> ~
> 
> 2a.
> "status of socket" --> "status of the connection"
> 
> ~
> 
> 2b.
> "above function" --> "the above function"

Fixed.

> doc/src/sgml/libpq.sgml
> 
> 3. PQconnCheck
> 
> Returns the health of the socket.
> 
> int PQconnCheck(PGconn *conn);
> 
> Unlike PQstatus, this function checks socket health. This check is
> performed by polling the socket. This function is currently available
> only on systems that support the non-standard POLLRDHUP extension to
> the poll system call, including Linux. PQconnCheck returns greater
> than zero if the remote peer seems to be closed, returns 0 if the
> socket is valid, and returns -1 if the connection has been already
> invalid or an error is error occurred.
> 
> ~
> 
> 3a.
> Should these descriptions be referring to the health of the
> *connection* rather than the health of the socket?

Reworded.

> 3b.
> "has been already invalid" ?? wording

I checked codes and found that the socket becomes PGINVALID_SOCKET
after being closed. So I clarified that.

> 4. PQcanConnCheck
> 
> Returns whether PQconnCheck is available on this platform.
> PQcanConnCheck returns 1 if the function is supported, otherwise
> returns 0.
> 
> ~
> 
> I thought this should be worded using "true" and "false" same as other
> boolean functions on this page.
> 
> SUGGESTION
> Returns true (1) or false (0) to indicate if the PQconnCheck function
> is supported on this platform.

Fixed.

> ==
> src/interfaces/libpq/fe-misc.c
> 
> 5.
> -static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
> -   time_t end_time);
> +static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead,
> + int forWrite, time_t end_time);
> 
> I was not 100% sure overloading this API is the right thing to do.
> Doesn't this introduce a subtle side-effect on some of the existing
> callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if
> forRead/forWrite were both false. But now other return values like
> errors will be possible. Is that OK?

I checked pqWaitTimed() and seems not to affect other parts.

As the first place, it is an internal function and will be never called from 
clients.
There are 2 functions that call that - connectDBComplete() and pqWait(), and 
both of
them are not set finish_time to zero. In connectDBComplete(), basically 
finish_time 
is set to -1,  and set to time(NULL) + connect_timeout if the timeout is 
enabled.


> 6. pqSocketPoll
> 
> /*
>  * Check a file descriptor for read and/or write data, possibly waiting.
>  * If neither forRead nor forWrite are set, immediately return a timeout
>  * condition (without waiting).  Return >0 if condition is met, 0
>  * if a timeout occurred, -1 if an error or interrupt occurred.
>  *
>  * Timeout is infinite if end_time is -1.  Timeout is immediate (no blocking)
>  * if end_time is 0 (or indeed, any time before now).
>  *
>  * Moreover, when neither forRead nor forWrite is requested and timeout is
>  * disabled, try to check the health of socket.
>  */
> 
> 
> The new comment "Moreover..." is contrary to the earlier part of the
> same comment which already said, "If neither forRead nor forWrite are
> set, immediately return a timeout condition (without waiting)."
> 
> There might be side-effects to previous/existing callers of this
> function (e.g. pqWaitTimed via pqSocketCheck)

Comments were fixed. About the side-effect, please see previous discussion.

> 7.
>   if (!forRead && !forWrite)
> - return 0;
> + {
> + /* Try to check the health if requested */
> + if (end_time == 0)
> +#if defined(POLLRDHUP)
> + input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> +#else
> + return 0;
> +#endif /* defined(POLLRDHUP) */
> + else
> + return 0;
> + }
> 
> FYI - I think the new code can be simpler without needing #else by
> calling your other new function.
> 
> SUGGESTION
> if (!forRead && !forWrite)
> {
> if (!PQcanConnCheck() || end_time != 0)
> return 0;
> 
> /* Check the connection health when end_time is 0 */
> Assert(PQcanConnCheck() && end_time == 0);
> #if defined(POLLRDHUP)
> input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
> #endif
> }

Fixed.

> 8. PQconnCheck
> +/*
> + * Check whether PQconnCheck() can work well on this platform.
> + *
> + * Returns 1 if this can use PQconnCheck(), otherwise 0.
> + */
> +int
> +PQcanConnCheck(void)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + 

RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-19 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new version.

> 0001:
> Extending pqSocketPoll seems to be a better way because we can
> avoid having multiple similar functions. I also would like to hear
> horiguchi-san's opinion whether this matches his expectation.
> Improvements of pqSocketPoll/pqSocketCheck is discussed in this
> thread[1]. I'm concerned with the discussion.

I checked the thread and seems correct. I can post +1 to the thread.
And the modification will be automatically reflected to the feature
if we use the same function, I thought.

> As for the function's name, what do you think about keeping
> current name (pqSocketCheck)? pqSocketIsReadable... describes
> the functionality very well though.

No objection, I can keep the shorter name.

> pqConnCheck seems to be a family of pqReadReady or pqWriteRedy,
> so how about placing pqConnCheck below them?

Moved.

> + * Moreover, when neither forRead nor forWrite is requested and timeout
> is
> + * disabled, try to check the health of socket.
> Isn't it better to put the comment on how the arguments are
> interpreted before the description of return value?
> 
> +#if defined(POLLRDHUP)
> + input_fd.events = POLLRDHUP | POLLHUP |
> POLLNVAL;
> ...
> + input_fd.events |= POLLERR;
> To my understanding, POLLHUP, POLLNVAL and POLLERR are ignored
> in event. Are they necessary?

I read man poll(3) again, and I found that these event is ignored when
it sets to the events attribute. So removed.

> 0002:
> As for the return value of postgres_fdw_verify_connection_states,
> what do you think about returning NULL when connection-checking
> is not performed? I think there are two cases 1) ConnectionHash
> is not initialized or 2) connection is not found for specified
> server name, That is, no entry passes the first if statement below
> (case 2)).
> 
> ```
>   if (all || entry->serverid == serverid)
>   {
>   if (PQconnCheck(entry->conn))
>   {
> ```

I think in that case we can follow postgres_fdw_disconnect().
About postgres_fdw_disconnect(), if the given server_name does not exist,
an error is reported. This is a current behavior so I want to keep it.
Besides, I added the description to document.

> 0004:
> I'm wondering if we should add kqueue support in this version,
> because adding kqueue support introduces additional things to
> be considered. What do you think about focusing on the main
> functionality using poll in this patch and going for kqueue
> support after this patch?

I think it is better because it can keep patches smaller. So I stopped 
attaching 0004.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v32-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch
Description:  v32-0001-Add-PQconnCheck-and-PQconnCheckable-to-libpq.patch


v32-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v32-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v32-0003-add-test.patch
Description: v32-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-19 Thread Peter Smith
Here is a code review only for patch v31-0001.

==
General Comment

1.
PQcanConnCheck seemed like a strange API name. Maybe it can have the
same prefix as the other?

e.g.

- PQconnCheck()
- PGconnCheckSupported()

or

- PQconnCheck()
- PGconnCheckable()

==
Commit Message

2.
PqconnCheck() function allows to check the status of socket by polling
the socket. This function is currently available only on systems that
support the non-standard POLLRDHUP extension to the poll system call,
including Linux.

PqcanConnCheck() checks whether above function is available or not.

~

2a.
"status of socket" --> "status of the connection"

~

2b.
"above function" --> "the above function"

==
doc/src/sgml/libpq.sgml

3. PQconnCheck

Returns the health of the socket.

int PQconnCheck(PGconn *conn);

Unlike PQstatus, this function checks socket health. This check is
performed by polling the socket. This function is currently available
only on systems that support the non-standard POLLRDHUP extension to
the poll system call, including Linux. PQconnCheck returns greater
than zero if the remote peer seems to be closed, returns 0 if the
socket is valid, and returns -1 if the connection has been already
invalid or an error is error occurred.

~

3a.
Should these descriptions be referring to the health of the
*connection* rather than the health of the socket?

~

3b.
"has been already invalid" ?? wording

~~~

4. PQcanConnCheck

Returns whether PQconnCheck is available on this platform.
PQcanConnCheck returns 1 if the function is supported, otherwise
returns 0.

~

I thought this should be worded using "true" and "false" same as other
boolean functions on this page.

SUGGESTION
Returns true (1) or false (0) to indicate if the PQconnCheck function
is supported on this platform.

==
src/interfaces/libpq/fe-misc.c

5.
-static int pqSocketCheck(PGconn *conn, int forRead, int forWrite,
-   time_t end_time);
+static int pqSocketIsReadableOrWritableOrValid(PGconn *conn, int forRead,
+ int forWrite, time_t end_time);

I was not 100% sure overloading this API is the right thing to do.
Doesn't this introduce a subtle side-effect on some of the existing
callers? e.g. Previously pqWaitTimed would ALWAYS return 0 if
forRead/forWrite were both false. But now other return values like
errors will be possible. Is that OK?

~~~

6. pqSocketPoll

/*
 * Check a file descriptor for read and/or write data, possibly waiting.
 * If neither forRead nor forWrite are set, immediately return a timeout
 * condition (without waiting).  Return >0 if condition is met, 0
 * if a timeout occurred, -1 if an error or interrupt occurred.
 *
 * Timeout is infinite if end_time is -1.  Timeout is immediate (no blocking)
 * if end_time is 0 (or indeed, any time before now).
 *
 * Moreover, when neither forRead nor forWrite is requested and timeout is
 * disabled, try to check the health of socket.
 */


The new comment "Moreover..." is contrary to the earlier part of the
same comment which already said, "If neither forRead nor forWrite are
set, immediately return a timeout condition (without waiting)."

There might be side-effects to previous/existing callers of this
function (e.g. pqWaitTimed via pqSocketCheck)

~~~

7.
  if (!forRead && !forWrite)
- return 0;
+ {
+ /* Try to check the health if requested */
+ if (end_time == 0)
+#if defined(POLLRDHUP)
+ input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
+#else
+ return 0;
+#endif /* defined(POLLRDHUP) */
+ else
+ return 0;
+ }

FYI - I think the new code can be simpler without needing #else by
calling your other new function.

SUGGESTION
if (!forRead && !forWrite)
{
if (!PQcanConnCheck() || end_time != 0)
return 0;

/* Check the connection health when end_time is 0 */
Assert(PQcanConnCheck() && end_time == 0);
#if defined(POLLRDHUP)
input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
#endif
}

~~~

8. PQconnCheck
+/*
+ * Check whether PQconnCheck() can work well on this platform.
+ *
+ * Returns 1 if this can use PQconnCheck(), otherwise 0.
+ */
+int
+PQcanConnCheck(void)
+{
+#if (defined(HAVE_POLL) && defined(POLLRDHUP))
+ return true;
+#else
+ return false;
+#endif
+}

~

8a.
"can work well" --> "works"

~

8b.
Maybe better to say "true (1)" and "otherwise false (0)"

--
Kind Regards,
Peter Smith.
Fujitsu Australia




Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-17 Thread Katsuragi Yuta

On 2023-02-09 23:39, Hayato Kuroda (Fujitsu) wrote:

Dear Katsuragi-san,

Thank you for reviewing! PSA new version patches.


Thank you for updating the patch! These are my comments,
please check.

0001:
Extending pqSocketPoll seems to be a better way because we can
avoid having multiple similar functions. I also would like to hear
horiguchi-san's opinion whether this matches his expectation.
Improvements of pqSocketPoll/pqSocketCheck is discussed in this
thread[1]. I'm concerned with the discussion.

As for the function's name, what do you think about keeping
current name (pqSocketCheck)? pqSocketIsReadable... describes
the functionality very well though.

pqConnCheck seems to be a family of pqReadReady or pqWriteRedy,
so how about placing pqConnCheck below them?

+ * Moreover, when neither forRead nor forWrite is requested and timeout 
is

+ * disabled, try to check the health of socket.
Isn't it better to put the comment on how the arguments are
interpreted before the description of return value?

+#if defined(POLLRDHUP)
+   input_fd.events = POLLRDHUP | POLLHUP | POLLNVAL;
...
+   input_fd.events |= POLLERR;
To my understanding, POLLHUP, POLLNVAL and POLLERR are ignored
in event. Are they necessary?


0002:
As for the return value of postgres_fdw_verify_connection_states,
what do you think about returning NULL when connection-checking
is not performed? I think there are two cases 1) ConnectionHash
is not initialized or 2) connection is not found for specified
server name, That is, no entry passes the first if statement below
(case 2)).

```
 if (all || entry->serverid == serverid)
 {
 if (PQconnCheck(entry->conn))
 {
```

0004:
I'm wondering if we should add kqueue support in this version,
because adding kqueue support introduces additional things to
be considered. What do you think about focusing on the main
functionality using poll in this patch and going for kqueue
support after this patch?


[1]: 
https://www.postgresql.org/message-id/20230209.115009.2229702014236187289.horikyota.ntt%40gmail.com


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-09 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new version patches.

> 0001:
> +   while (result < 0 && errno == EINTR);
> +
> +   if (result < 0)
> +   return -1;
> 
> this `return -1` is not indented properly.

This part is no longer needed. Please see another discussion[1].

> 0002:
> +postgres_fdw_verify_connection_states(server_name
> text) returns boolean
> ...
> +  extension to the poll system call, including
> Linux. This
> +  returns true if checked connections are still
> valid,
> +  or the checking is not supported on this platform.
> false
> +  is returned if the local session seems to be disconnected from
> other
> +  servers. Example usage of the function:
> 
> Here, 'still valid' seems a little bit confusing because this 'valid' is
> not
> the same as postgres_fdw_get_connections's 'valid' [1].
> 
> Should 'still valid' be 'existing connection is not closed by the remote
> peer'?
> But this description does not cover all the cases where this function
> returns true...
> I think one choice is to write all the cases like 'returns true if any
> of the
> following condition is satisfied. 1) existing connection is not closed
> by the
> remote peer 2) there is no connection for specified server yet 3) the
> checking
> is not supported...'. If my understanding is not correct, please point
> it out.

Modified like you pointed out.

> BTW, is it reasonable to return true if ConnectionHash is not
> initialized or
> there is no ConnCacheEntry for specified remote server? What do you
> think
> about returning NULL in that case?

I'm not sure which one is better, but modified accordingly.

> 0003:
> I think it is better that the test covers all the new functions.
> How about adding a test for postgres_fdw_verify_connection_states_all?
> 
> 
> +--
> ===
> 
> +-- test for postgres_fdw_verify_foreign_servers function
> +--
> ===
> 
> 
> Writing all the functions' name like 'test for
> postgres_fdw_verify_connection_states
> and postgres_fdw_can_verify_connection_states' looks straightforward.
> What do you think about this?

Added.

> 0004:
> Sorry, I have not read 0004. I will read.

No problem:-).

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58664E039F45959AB321FA1FF5D99%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v31-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch
Description:  v31-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch


v31-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v31-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v31-0003-add-test.patch
Description: v31-0003-add-test.patch


v31-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch
Description:  v31-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-02-09 Thread Hayato Kuroda (Fujitsu)
Dear Horiguchi-san,

Thank you for checking! The patch will be attached to another mail.

> At Fri, 27 Jan 2023 06:57:01 +, "Hayato Kuroda (Fujitsu)"
>  wrote in
> > I found cfbot failure, PSA fixed version.
> 
> +   Unlike , this function checks socket
> +   health. This check is performed by polling the socket. This function 
> is
> +   currently available only on systems that support the non-standard
> +   POLLRDHUP extension to the
> poll system
> 
> I find it quite confusing that we have pqSocketCheck and PQconnCheck,
> that does almost the same thing.. Since pqSocketCheck is a static
> function, we can modify the function as we like.

Renamed to pqSocketIsReadableOrWritableOrValid(), but seemed very bad...

> I still don't understand why we need pqconnCheck_internal separate
> from pqSocketPoll(), and PQconnCheck from pqSocketCheck.

pqconnCheck_internal() was combined into pqSocketPoll(), but PQconnCheck() still
exists. libpq-fe.h, did not include standard header files except stdio.h. I'm 
not
sure whether we can add an inclusion of time.h, because it may break the 
compatibility
that some platform may not have the header. If there are not such a system, we 
may
able to export pqSocketCheck() and remove PQconnCheck().

The side effect of this changes is that codes become dirty when we add kqueue() 
support...

> https://www.postgresql.org/message-id/flat/TYAPR01MB58665BF23D38EDF1
> 0028DE2AF5299%40TYAPR01MB5866.jpnprd01.prod.outlook.com#47d21431bf9
> fa94f763c824f6e81fa54
> > IIUC, pqSocketCheck () calls pqSocketPoll(),
> > and in the pqSocketPoll() we poll()'d the POLLIN or POLLOUT event.
> > But according to [1], we must wait POLLRDHUP event,
> > so we cannot reuse it straightforward.
> 
> Yeah, I didn't suggest to use the function as-is. Couldn't we extend
> the fucntion by letting it accept end_time = 0 && !forRead &&
> !forWrite, not causing side effects?

Modified accordingly. Is it what you expected?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-08 Thread Kyotaro Horiguchi
At Fri, 27 Jan 2023 06:57:01 +, "Hayato Kuroda (Fujitsu)" 
 wrote in 
> I found cfbot failure, PSA fixed version.

+   Unlike , this function checks socket
+   health. This check is performed by polling the socket. This function is
+   currently available only on systems that support the non-standard
+   POLLRDHUP extension to the poll system

I find it quite confusing that we have pqSocketCheck and PQconnCheck,
that does almost the same thing.. Since pqSocketCheck is a static
function, we can modify the function as we like.

I still don't understand why we need pqconnCheck_internal separate
from pqSocketPoll(), and PQconnCheck from pqSocketCheck.

https://www.postgresql.org/message-id/flat/TYAPR01MB58665BF23D38EDF10028DE2AF5299%40TYAPR01MB5866.jpnprd01.prod.outlook.com#47d21431bf9fa94f763c824f6e81fa54
> IIUC, pqSocketCheck () calls pqSocketPoll(),
> and in the pqSocketPoll() we poll()'d the POLLIN or POLLOUT event.
> But according to [1], we must wait POLLRDHUP event,
> so we cannot reuse it straightforward.

Yeah, I didn't suggest to use the function as-is. Couldn't we extend
the fucntion by letting it accept end_time = 0 && !forRead &&
!forWrite, not causing side effects?

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Proposal] Add foreign-server health checks infrastructure

2023-02-07 Thread Katsuragi Yuta

On 2023-01-27 15:57, Hayato Kuroda (Fujitsu) wrote:

I found cfbot failure, PSA fixed version.
Sorry for noise.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi Kuroda-san,

Thank you for updating the patch! Sorry for the late reply.

0001:
+   while (result < 0 && errno == EINTR);
+
+   if (result < 0)
+   return -1;

this `return -1` is not indented properly.


0002:
+postgres_fdw_verify_connection_states(server_name 
text) returns boolean

...
+  extension to the poll system call, including 
Linux. This
+  returns true if checked connections are still 
valid,
+  or the checking is not supported on this platform. 
false
+  is returned if the local session seems to be disconnected from 
other

+  servers. Example usage of the function:

Here, 'still valid' seems a little bit confusing because this 'valid' is 
not

the same as postgres_fdw_get_connections's 'valid' [1].

Should 'still valid' be 'existing connection is not closed by the remote 
peer'?
But this description does not cover all the cases where this function 
returns true...
I think one choice is to write all the cases like 'returns true if any 
of the
following condition is satisfied. 1) existing connection is not closed 
by the
remote peer 2) there is no connection for specified server yet 3) the 
checking
is not supported...'. If my understanding is not correct, please point 
it out.


BTW, is it reasonable to return true if ConnectionHash is not 
initialized or
there is no ConnCacheEntry for specified remote server? What do you 
think

about returning NULL in that case?


0003:
I think it is better that the test covers all the new functions.
How about adding a test for postgres_fdw_verify_connection_states_all?


+-- ===
+-- test for postgres_fdw_verify_foreign_servers function
+-- ===

Writing all the functions' name like 'test for 
postgres_fdw_verify_connection_states

and postgres_fdw_can_verify_connection_states' looks straightforward.
What do you think about this?


0004:
Sorry, I have not read 0004. I will read.


[1]: 
https://github.com/postgres/postgres/blob/master/doc/src/sgml/postgres-fdw.sgml#L764-L765


regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-26 Thread Hayato Kuroda (Fujitsu)
I found cfbot failure, PSA fixed version.
Sorry for noise.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v29-0003-add-test.patch
Description: v29-0003-add-test.patch


v29-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch
Description:  v29-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch


v29-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch
Description:  v29-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch


v29-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v29-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-26 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

I have updated my patch for error handling and kqueue() support.
Actually I do not have BSD-like machine, but I developed by using github CICD.
I think at first we should focus on 0001-0003, and then work for 0004.

Followings are change notes and my analysis.

0001

* Fix missed replacements from PQConnCheck() to PQconnCheck().
* Error handling was improved. Now we can detect the failure of poll() and 
return -1 at that time.
* I thought we don't have to add select(2) in PQconnCheck(). According to man 
page [1],
  select(2) can be only used for watch whether the status is readable, 
writable, or exceptional condition.
  It means that select() does not have an event corresponding to POLLRDHUP.

0002, 0003
Not changed

0004

* Add kqueue(2) support() for BSD family.
* I did not add epoll() support, because it can be used only on linux and such 
systems have POLLRDHUP instead. 
 checked other codes in libpq, and they do not use epoll(). We can see that 
such an event does not specified in POSIX [2]
 and it can be used for linux only. I decided to use poll() as much as possible 
to keep the policy.
* While coding, I found that there are no good timing to close the kernel event 
queue.
 It means that the lifetime of kqueue becomes same as the client program and 
occupy the small memory forever.
 I'm not sure it can be accepted.


[1]: https://man7.org/linux/man-pages/man2/select.2.html
[2]: https://pubs.opengroup.org/onlinepubs/9699919799/functions/poll.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v28-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch
Description:  v28-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch


v28-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v28-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v28-0003-add-test.patch
Description: v28-0003-add-test.patch


v28-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch
Description:  v28-0004-add-kqueue-support-for-PQconnCheck-and-PQcanConn.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-25 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reading the patch! PSA new version.

> Thank you for updating the patch!
> 
> +/* Check whether the postgres server is still alive or not */
> +extern int PQConnCheck(PGconn *conn);
> +extern int PQCanConnCheck(void);
> 
> Aren't these PQconnCheck and PQcanConnCheck? I think the first letter
> following 'PQ' should be lower case.

I cannot find any rules about it, but seems right. Fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch
Description:  v27-0001-Add-PQconnCheck-and-PQcanConnCheck-to-libpq.patch


v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v27-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v27-0003-add-test.patch
Description: v27-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-25 Thread Katsuragi Yuta

On 2023-01-23 14:40, Hayato Kuroda (Fujitsu) wrote:

Dear Ted,

Thanks for reviewing! PSA new version.

For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch , 
`pqConnCheck_internal` only has one caller which is quite short.

Can pqConnCheck_internal and PQConnCheck be merged into one func ?


I divided the function for feature expandability. Currently it works
on linux platform,
but the limitation should be removed in future and internal function
will be longer.
Therefore I want to keep this style.


+int
+PQCanConnCheck(void)

It seems the return value should be of bool type.


I slightly changed the returned value like true/false. But IIUC libpq 
functions

cannot define as "bool" datatype. E.g. PQconnectionNeedsPassword()
returns true/false,
but the function is defined as int.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Thank you for updating the patch!

+/* Check whether the postgres server is still alive or not */
+extern int PQConnCheck(PGconn *conn);
+extern int PQCanConnCheck(void);

Aren't these PQconnCheck and PQcanConnCheck? I think the first letter
following 'PQ' should be lower case.

regards.

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-22 Thread Hayato Kuroda (Fujitsu)
Dear Ted,

Thanks for reviewing! PSA new version.

> For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch , 
> `pqConnCheck_internal` only has one caller which is quite short.
> Can pqConnCheck_internal and PQConnCheck be merged into one func ?

I divided the function for feature expandability. Currently it works on linux 
platform,
but the limitation should be removed in future and internal function will be 
longer.
Therefore I want to keep this style.

> +int
> +PQCanConnCheck(void)
>
> It seems the return value should be of bool type.

I slightly changed the returned value like true/false. But IIUC libpq functions
cannot define as "bool" datatype. E.g. PQconnectionNeedsPassword() returns 
true/false,
but the function is defined as int.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v26-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch
Description:  v26-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch


v26-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v26-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v26-0003-add-test.patch
Description: v26-0003-add-test.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-22 Thread Hayato Kuroda (Fujitsu)
> Thank you for reviewing! PSA new patch set.

Sorry, I missed the updated file in the patch. New version will be posted soon.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-21 Thread Ted Yu
On Sat, Jan 21, 2023 at 4:03 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear Katsuragi-san,
>
> Thank you for reviewing! PSA new patch set.
>
> > ## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
> > +
> > +
> > PQCanConncheckPQCan
> > Conncheck
> > + 
> > +  
> > +   Returns the status of the socket.
> >
> > Is this description right? I think this description is for
> > PQConncheck. Something like "Checks whether PQConncheck is
> > available on this platform." seems better.
>
> It might be copy-and-paste error. Thanks for reporting.
> According to other parts, the sentence should be started like "Returns
> ...".
> So I followed the style and did cosmetic change.
>
> > +/* Check whether the postgres server is still alive or not */
> > +extern int PQConncheck(PGconn *conn);
> > +extern int PQCanConncheck(void);
> >
> > Should the names of these functions be in the form of PQconnCheck?
> > Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).
>
> Agreed, fixed.
>
> > ## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
> > +PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);
> >
> > This patch adds new functions to postgres_fdw for PostgreSQL 16.
> > So, I think it is necessary to update the version of postgres_fdw (v1.1
> > to v1.2).
>
> I checked postgres_fdw commit log, and it seemed that the version was
> updated when SQL functions are added. Fixed.
>
> > +postgres_fdw_verify_connection_states_all() returns
> > boolean
> > +
> > + 
> > +  This function checks the status of remote connections that are
> > established
> > +  by postgres_fdw from the local session to
> > the foreign
> > +  servers. This check is performed by polling the socket and allows
> >
> > It seems better to add a description that states this function
> > checks all the connections. Like the description of
> > postgres_fdw_disconnect_all(). For example, "This function
> > checks the status of 'all the' remote connections..."?
>
> I checked the docs and fixed. Moreover, some inconsistent statements were
> also fixed.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
> Hi,
For v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch ,
`pqConnCheck_internal` only has one caller which is quite short.
Can pqConnCheck_internal and PQConnCheck be merged into one func ?

+int
+PQCanConnCheck(void)

It seems the return value should be of bool type.

Cheers


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-21 Thread Hayato Kuroda (Fujitsu)
Dear Katsuragi-san,

Thank you for reviewing! PSA new patch set.

> ## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
> +
> +
> PQCanConncheckPQCan
> Conncheck
> + 
> +  
> +   Returns the status of the socket.
> 
> Is this description right? I think this description is for
> PQConncheck. Something like "Checks whether PQConncheck is
> available on this platform." seems better.

It might be copy-and-paste error. Thanks for reporting.
According to other parts, the sentence should be started like "Returns ...".
So I followed the style and did cosmetic change.

> +/* Check whether the postgres server is still alive or not */
> +extern int PQConncheck(PGconn *conn);
> +extern int PQCanConncheck(void);
> 
> Should the names of these functions be in the form of PQconnCheck?
> Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).

Agreed, fixed.

> ## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
> +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
> +PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
> +PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);
> 
> This patch adds new functions to postgres_fdw for PostgreSQL 16.
> So, I think it is necessary to update the version of postgres_fdw (v1.1
> to v1.2).

I checked postgres_fdw commit log, and it seemed that the version was
updated when SQL functions are added. Fixed.

> +postgres_fdw_verify_connection_states_all() returns
> boolean
> +
> + 
> +  This function checks the status of remote connections that are
> established
> +  by postgres_fdw from the local session to
> the foreign
> +  servers. This check is performed by polling the socket and allows
> 
> It seems better to add a description that states this function
> checks all the connections. Like the description of
> postgres_fdw_disconnect_all(). For example, "This function
> checks the status of 'all the' remote connections..."?

I checked the docs and fixed. Moreover, some inconsistent statements were
also fixed.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch
Description:  v25-0001-Add-PQConnCheck-and-PQCanConnCheck-to-libpq.patch


v25-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v25-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v25-0003-add-test.patch
Description: v25-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-20 Thread Katsuragi Yuta

On 2023-01-11 19:04, Hayato Kuroda (Fujitsu) wrote:

Dear hackers,

I was not sure, but the cfbot could not be accepted the previous 
version.

I made the patch again from HEAD(5f6401) without any changes,
so I did not count up the version number.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi,

Thanks for the patch!
I read the patch v24. These are my comments. Please check.

## v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
+
+ 
PQCanConncheckPQCanConncheck

+ 
+  
+   Returns the status of the socket.

Is this description right? I think this description is for
PQConncheck. Something like "Checks whether PQConncheck is
available on this platform." seems better.


+/* Check whether the postgres server is still alive or not */
+extern int PQConncheck(PGconn *conn);
+extern int PQCanConncheck(void);

Should the names of these functions be in the form of PQconnCheck?
Not PQConncheck (c.f. The section of fe-misc.c in libpq-fe.h).


## v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
+PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states);
+PG_FUNCTION_INFO_V1(postgres_fdw_verify_connection_states_all);
+PG_FUNCTION_INFO_V1(postgres_fdw_can_verify_connection_states);

This patch adds new functions to postgres_fdw for PostgreSQL 16.
So, I think it is necessary to update the version of postgres_fdw (v1.1 
to v1.2).



+postgres_fdw_verify_connection_states_all() returns 
boolean

+
+ 
+  This function checks the status of remote connections that are 
established
+  by postgres_fdw from the local session to 
the foreign

+  servers. This check is performed by polling the socket and allows

It seems better to add a description that states this function
checks all the connections. Like the description of
postgres_fdw_disconnect_all(). For example, "This function
checks the status of 'all the' remote connections..."?

regards,

--
Katsuragi Yuta
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-11 Thread Hayato Kuroda (Fujitsu)
Dear hackers,

I was not sure, but the cfbot could not be accepted the previous version.
I made the patch again from HEAD(5f6401) without any changes,
so I did not count up the version number.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
Description:  v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch


v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v24-0003-add-test.patch
Description: v24-0003-add-test.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-10 Thread Hayato Kuroda (Fujitsu)
Dear Ted,

Thank you for reviewing! PSA new version.

> +   /* quick exit if connection cache has been not initialized yet. */
>
> been not initialized -> not been initialized

Fixed.

> +   
> (errcode(ERRCODE_CONNECTION_FAILURE),
> +errmsg("could not connect to 
> server \"%s\"",
>
> Currently each server which is not connected would log a warning.
> Is it better to concatenate names for such servers and log one line ? This 
> would be cleaner when there are multiple such servers.

Sounds good, fixed as you said. The following shows the case that two 
disconnections
are detected by postgres_fdw_verify_connection_states_all().

```
postgres=*# select postgres_fdw_verify_connection_states_all ();
WARNING:  could not connect to server "my_external_server2", 
"my_external_server"
DETAIL:  Socket close is detected.
HINT:  Plsease check the health of server.
 postgres_fdw_verify_connection_states_all 
---
 f
(1 row)
```

Currently, the name of servers is concatenated without doing unique checks. IIUC
a backend process cannot connect to the same foreign server by using different
user mapping, so there is no possibility that the same name appears twice.
If the user mapping is altered in the transaction, the cache entry is 
invalidated
and will not be checked.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch
Description:  v24-0001-Add-PQConncheck-and-PQCanConncheck-to-libpq.patch


v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v24-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v24-0003-add-test.patch
Description: v24-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-10 Thread Ted Yu
On Tue, Jan 10, 2023 at 8:26 AM Hayato Kuroda (Fujitsu) <
kuroda.hay...@fujitsu.com> wrote:

> Dear tom,
>
> > I think that it's a really bad idea to require postgres_fdw.sql
> > to have two expected-files: that will be a maintenance nightmare.
> > Please put whatever it is that needs a variant expected-file
> > into its own, hopefully very small and seldom-changed, test script.
> > Or rethink whether you really need a test case that has
> > platform-dependent output.
>
> Thank you for giving the suggestion. I agreed your saying and modifed that.
>
> I added new functions on the libpq and postgres-fdw layer that check
> whether the
> checking works well or not. In the test, at first, the platform is checked
> and
> the checking function is called only when it is supported.
>
> An alternative approach is that PQCanConncheck() can be combined with
> PQConncheck().
> This can reduce the libpq function, but we must define another returned
> value to
> the function like -2. I was not sure which approach was better.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

Hi,

+   /* quick exit if connection cache has been not initialized yet. */

been not initialized -> not been initialized

+
(errcode(ERRCODE_CONNECTION_FAILURE),
+errmsg("could not connect
to server \"%s\"",

Currently each server which is not connected would log a warning.
Is it better to concatenate names for such servers and log one line ? This
would be cleaner when there are multiple such servers.

Cheers


RE: [Proposal] Add foreign-server health checks infrastructure

2023-01-10 Thread Hayato Kuroda (Fujitsu)
Dear tom,

> I think that it's a really bad idea to require postgres_fdw.sql
> to have two expected-files: that will be a maintenance nightmare.
> Please put whatever it is that needs a variant expected-file
> into its own, hopefully very small and seldom-changed, test script.
> Or rethink whether you really need a test case that has
> platform-dependent output.

Thank you for giving the suggestion. I agreed your saying and modifed that.

I added new functions on the libpq and postgres-fdw layer that check whether the
checking works well or not. In the test, at first, the platform is checked and
the checking function is called only when it is supported.

An alternative approach is that PQCanConncheck() can be combined with 
PQConncheck().
This can reduce the libpq function, but we must define another returned value to
the function like -2. I was not sure which approach was better.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v23-0001-Add-PQConncheck-to-libpq.patch
Description: v23-0001-Add-PQConncheck-to-libpq.patch


v23-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch
Description:  v23-0002-postgres_fdw-add-postgres_fdw_verify_connection_.patch


v23-0003-add-test.patch
Description: v23-0003-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-09 Thread Tom Lane
"Hayato Kuroda (Fujitsu)"  writes:
> Thanks for reporting. PSA rebased version.
> These can be applied work well on my HEAD(bd8d45).

I think that it's a really bad idea to require postgres_fdw.sql
to have two expected-files: that will be a maintenance nightmare.
Please put whatever it is that needs a variant expected-file
into its own, hopefully very small and seldom-changed, test script.
Or rethink whether you really need a test case that has
platform-dependent output.

regards, tom lane




Re: [Proposal] Add foreign-server health checks infrastructure

2023-01-03 Thread vignesh C
On Tue, 20 Dec 2022 at 07:22, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear hackers,
>
> PSA rebased patches.

The patch does not apply on top of HEAD as in [1], please post a rebased patch:

=== Applying patches on top of PostgreSQL commit ID
92957ed98c5c565362ce665266132a7f08f6b0c0 ===
=== applying patch ./v21-0003-add-test.patch
patching file contrib/postgres_fdw/expected/postgres_fdw.out
Hunk #1 FAILED at 11701.
1 out of 1 hunk FAILED -- saving rejects to file
contrib/postgres_fdw/expected/postgres_fdw.out.rej
patching file contrib/postgres_fdw/expected/postgres_fdw_1.out
patching file contrib/postgres_fdw/sql/postgres_fdw.sql
Hunk #1 FAILED at 3876.
1 out of 1 hunk FAILED -- saving rejects to file
contrib/postgres_fdw/sql/postgres_fdw.sql.rej

[1] - http://cfbot.cputube.org/patch_41_3388.log

Regards,
Vignesh




Re: [Proposal] Add foreign-server health checks infrastructure

2022-12-06 Thread Andres Freund
Hi,

On 2022-11-25 11:41:45 +, Hayato Kuroda (Fujitsu) wrote:
> Sorry for my late reply. I understood that we got agreement the basic design 
> of first version. Thanks!
> I attached new version patches.

This is failing on cfbot / CI, as have prior versions.

https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest%2F40%2F3388

https://api.cirrus-ci.com/v1/artifact/task/6655254493134848/testrun/build/testrun/postgres_fdw/regress/regression.diffs

diff -U3 /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out
--- /tmp/cirrus-ci-build/contrib/postgres_fdw/expected/postgres_fdw.out 
2022-12-06 05:21:33.906116000 +
+++ 
/tmp/cirrus-ci-build/build/testrun/postgres_fdw/regress/results/postgres_fdw.out
2022-12-06 05:27:24.929694000 +
@@ -11732,10 +11732,9 @@
 -- execute check function. This should return false on supported platforms,
 -- otherwise return true.
 SELECT postgres_fdw_verify_connection_states('loopback');
-WARNING:  08006
  postgres_fdw_verify_connection_states 
 ---
- f
+ t
 (1 row)
 
 ABORT;

The failure happens everywhere except on linux, so presumably there's some
portability issue in the patch.

I've set the patch as waiting on author for now.


Note that you can test CI in your own repository, as explained in 
src/tools/ci/README

Greetings,

Andres Freund




Re: [Proposal] Add foreign-server health checks infrastructure

2022-11-11 Thread Önder Kalacı
Hi Hayota Kuroda,


> I (and my company) worried about overnight batch processing that
> contains some accesses to foreign servers. If the transaction is opened
> overnight and
> one of foreign servers is crashed during it, the transaction must be
> rollbacked.
> But there is a possibility that DBAs do not recognize the crash and
> they waste a time until the morning. This problem may affect customer's
> business.
> (It may not be sufficient to check the status from another different
> server.
> DBAs must check the network between the databases, and they may be
> oversight.)
> This is a motivation we thought.
>

Thanks for the clarification, I agree that this might be a valid concern
for systems.


>
> So how about implementing a check function as an SQL function once and
> update incrementally?
>

I think a SQL function makes sense.


> This still satisfy our motivation and it can avoid overhead because we can
> reduce the number of calling it.
>

Yes, it makes sense. Also, it allows other extensions to utilize the new
libpq function, which is neat.


> If we decide that we establish a new connection in the checking function,
> we can refactor the it.
> And if we decide that we introduce health-check BGworker, then we can add
> a process that calls implemented function periodically.
>

Right, your approach doesn't conflict with a more sophisticated approach,
in fact is a useful building block.


>
> Note that poll() is used here, it means that currently this function can
> be used on some limited platforms.
>
>
I think more experienced hackers could guide us here. I don't see a problem
with that, but checking other functions like pqSocketPoll(), I see that
Postgres uses the select system call. I wonder if we can have something
similar with select? Seems no, but wanted to bring up in case you know more
about that?


> I have added a parameter check_all that controls the scope of
> to-be-checked servers,
> But this is not related with my motivation so we can remove if not needed.
>

I guess it kind of makes sense to have the flexibility to check all
connections vs only in tx connections.

Though, maybe we should follow a similar approach
to postgres_fdw_disconnect(servername) / postgres_fdw_disconnect_all()

postgres_fdw_verify_connection_states(servername) /
postgres_fdw_verify_connection_states_all()

That seems like a more natural API considering the other UDFs. You can also
use in combination with postgres_fdw_get_connections()


> (I have not implemented another version that uses epoll() or kqueue(),
> because they seem to be not called on the libpq layer. Do you know any
> reasons?)
>
>
Hmm, I don't know the reason. Is that maybe epoll is available on a smaller
number of platforms and libpq can be used on more platforms as being a
client library?

Now, some comments regarding the v18:

>
> +static int
> +pqConncheck_internal(int sock)
> +{
> +#if (defined(HAVE_POLL) && defined(POLLRDHUP))
> + struct pollfd input_fd;
> +
> + input_fd.fd = sock;
> + input_fd.events = POLLRDHUP;
> + input_fd.revents = 0;
> +
> + poll(_fd, 1, 0);
> +
> + return input_fd.revents & POLLRDHUP;


WaitEventSetWaitBlock's* defined(WAIT_USE_POLL)* branch uses the following
check to find WL_SOCKET_CLOSED

#ifdef POLLRDHUP
if ((cur_event->events & WL_SOCKET_CLOSED) &&
(cur_pollfd->revents & (POLLRDHUP | errflags)))
{
/* remote peer closed, or error */
occurred_events->events |= WL_SOCKET_CLOSED;
}
#endif

Where *errflags = POLLHUP | POLLERR | POLLNVAL;*

So, should we also be using all these flags like: input_fd.events =
POLLRDHUP | *errflags*;  ?


+ ereport(ERROR,
> + (errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("could not connect to server \"%s\"",
> + server->servername),
> + errdetail("Socket close is detected."),
> + errhint("Please check the health of the server.")));


Is erroring out always necessary? Maybe we should just return true/false,
and let the caller decide what to do? For example, you might want to
rollback to savepoint and retry?  If the caller wants to rollback the whole
transaction, that is possible anyway.

Maybe similar to postgres_fdw_disconnect(), we can warn if the connection
is involved in a tx (e.g., depth > 0)?

new file mode 100644
> index 00..3ce169c837
> --- /dev/null
> +++ b/contrib/postgres_fdw/expected/postgres_fdw_1.out
> @@ -0,0 +1,11615 @@
> +-- ===
> +-- create FDW objects
> +-- ===
> +CREATE EXTENSION postgres_fdw;
> +CREATE SERVER testserver1 FOREIGN DATA WRAPPER postgres_fdw;
> +DO $d$
> +BEGIN


Do we really need this new file or just an oversight in your patch?

Thanks,
Onder KALACI

Software Engineer at Microsoft &
Developing the Citus database extension for PostgreSQL


Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-19 Thread Önder Kalacı
Hi,

> As far as I can think of, it should probably be a single background task
> > checking whether the server is down. If so, sending an invalidation
> message
> > to all the backends such that related backends could act on the
> > invalidation and throw an error. This is to cover the use-case you
> > described on [1].
>
> Indeed your approach covers the use case I said, but I'm not sure whether
> it is really good.
> In your approach, once the background worker process will manage all
> foreign servers.
> It may be OK if there are a few servers, but if there are hundreds of
> servers,
> the time interval during checks will be longer.
>

I expect users typically will have a lot more backends than the servers. We
can have a threshold for spinning a new bg worker (e.g., every 10 servers
gets a new bg worker etc.). Still, I think that'd be an optimization that
is probably not necessary for the majority of the users?


> Currently, each FDW can decide whether we do health checks or not per the
> backend.
> For example, we can skip health checks if the foreign server is not used
> now.
> The background worker cannot control such a way.
> Based on the above, I do not agree that we introduce a new background
> worker and make it to do a health check.
>

Again, the definition of "health check" is probably different for me. I'd
expect the health check to happen continuously, ideally keeping track of
how many consecutive times it succeeded and/or last time it
failed/succeeded etc.

A transaction failing with a bad error message (or holding some resources
locally until the transaction is committed) doesn't sound essential to me.
Is there any specific workload are you referring for optimizing to rollback
a transaction earlier if a remote server dies?  What kind of workload would
benefit from that? Maybe there is, but not clear to me and haven't seen
discussed on the thread (sorry if I missed).

I'm trying to understand if we are trying to solve a problem that does not
really exists. I'm bringing this up, because I often deal with
architectures where there is a local node and remote transaction on
different Postgres servers. And, I have not encountered many (or any)
pattern that'd benefit from this change much. In fact, I think, on the
contrary, this might add significant overhead for OLTP type of high query
throughput systems.


> Moreover, methods to connect to foreign servers and check health are
> different per FDW.
> In terms of mysql_fdw [1], we must do mysql_init() and
> mysql_real_connect().
> About file_fdw, we do not have to connect, but developers may want to
> calculate checksum and compare.
> Therefore, we must provide callback functions anyway.
>
>
I think providing callback functions is useful for any case. Each fdw (or
in general extension) should be able to provide its own "health check"
function.

Thanks,
Onder KALACI


RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread osumi.takami...@fujitsu.com
On Monday, October 17, 2022 9:25 PM Kuroda, Hayato/黒田 隼人 
 wrote:
> > I mainly followed the steps there and
> > replaced the command "SELECT" for the remote table at 6-9 with "INSERT"
> > command.
> > Then, after waiting for few seconds, the "COMMIT" succeeded like below
> > output, even after the server stop of the worker side.
> I understood that we should not call disconnect_pg_server(entry->conn) even if
> we detect the disconnection.
> If should be controlled in Xact callback. This will be modified in next 
> version.
Hi,

FYI, I quickly rechecked the patch's behavior on that scenario with v17.
Now, the last "COMMIT" above failed expectedly, which looks reasonable behavior.
Thank you for having modified it.


Best Regards,
Takamichi Osumi



RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Osumi-san,

> I mainly followed the steps there and
> replaced the command "SELECT" for the remote table at 6-9 with "INSERT"
> command.
> Then, after waiting for few seconds, the "COMMIT" succeeded like below output,
> even after the server stop of the worker side.

> Additionally, the last reference "SELECT" which failed above can succeed,
> if I restart the worker server before the "SELECT" command to the remote 
> table.
> This means the transaction looks successful but the data isn't there ?
> Could you please have a look at this issue ?

Good catch. This was occurred because we disconnected the crashed server.
 
Previously the failed server had been disconnected in the 
pgfdw_connection_check().
It was OK if the transaction(or statement) was surely cacneled.
But currently the statement might be not canceled because QueryCancelPending 
might be cleaned up[1].

If we failed to cancel statements and reached pgfdw_xact_callback(),
the connection would not be used because entry->conn is NULL.
That's why you sucseeded to do "COMMIT".
However, the backend did not send "COMMIT" command to the srever,
so inserted data was vanished on remote server.

I understood that we should not call disconnect_pg_server(entry->conn) even if 
we detect the disconnection.
If should be controlled in Xact callback. This will be modified in next version.

Moreover, I noticed that we should enable timer even if the QueryCancelMessage 
was not NULL.
If we do not turn on here, the checking will be never enabled again.


[1]: 
https://www.postgresql.org/message-id/TYAPR01MB5866CE34430424A588F60FC2F55D9%40TYAPR01MB5866.jpnprd01.prod.outlook.com


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san,

> Might be on slight different direction, but it looks to me a bit too
> much to use WaitEventSet to check only if a socket is live or not.
> 
> A quick search in the tree told me that we could use pqSocketCheck()
> instead, and I think it would be the something that "could potentially
> go into libpq-fe.h" as Önder mentioned, if I understand what he said
> correctly.

Based on your suggestion, I tried to add like following function to fe-misc.c:

```
int
PQconncheck(PGconn *conn)
{
/* Raise an ERROR if invalid socket has come */
if (conn == NULL ||
PQsocket(conn) == PGINVALID_SOCKET)
return -1;

return pqSocketCheck(conn, 1, 1, -1);
}
```

... and replace pgfdw_connection_check_internal() to PQconncheck(),
but it did not work well.
To be exact, pqSocketCheck() said the socket was "readable" and "writable"
even if the connection has been killed.

IIUC, pqSocketCheck () calls pqSocketPoll(),
and in the pqSocketPoll() we poll()'d the POLLIN or POLLOUT event.
But according to [1], we must wait POLLRDHUP event,
so we cannot reuse it straightforward.

If we really want to move the checking function anyway,
we must follow AddWaitEventToSet() and some WaitEventAdjust functions.
I'm not sure whether it is really good.

[1]: https://man7.org/linux/man-pages/man2/poll.2.html

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread Kyotaro Horiguchi
At Mon, 17 Oct 2022 07:27:21 +, "kuroda.hay...@fujitsu.com" 
 wrote in 
> > In other words, a variation of pgfdw_connection_check_internal()
> > could potentially go into interfaces/libpq/libpq-fe.h
> > (backend/libpq/pqcomm.c or src/interfaces/libpq/fe-connect.c).
> 
> Hmm, IIUC libpq related function and data structures cannot be accessed from 
> core source,
> so we cannot move to pqcomm.c.
> (This is a motivation for introducing libpqwalreceiver library. It is used to 
> avoid to link libpq directly)
> And functions in libpq/fe-connect.c will be included libpq.so,
> but latch related functions like WaitEventSetWait() should not be called from 
> client application.
> So it is also not appropriate.
> In short, there are no good position to place the function because this 
> requires both of libpq and core functions.

Might be on slight different direction, but it looks to me a bit too
much to use WaitEventSet to check only if a socket is live or not.

A quick search in the tree told me that we could use pqSocketCheck()
instead, and I think it would be the something that "could potentially
go into libpq-fe.h" as Önder mentioned, if I understand what he said
correctly.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-17 Thread kuroda.hay...@fujitsu.com
Dear Önder,

Thanks for giving suggestions!

> Still, the reestablish mechanism can be further simplified with
> WL_SOCKET_CLOSED event such as the following (where we should probably
> rename pgfdw_connection_check_internal):

Sounds reasonable.
I think it may be included in this patch. I will try to next (or later) version.

> In other words, a variation of pgfdw_connection_check_internal()
> could potentially go into interfaces/libpq/libpq-fe.h
> (backend/libpq/pqcomm.c or src/interfaces/libpq/fe-connect.c).

Hmm, IIUC libpq related function and data structures cannot be accessed from 
core source,
so we cannot move to pqcomm.c.
(This is a motivation for introducing libpqwalreceiver library. It is used to 
avoid to link libpq directly)
And functions in libpq/fe-connect.c will be included libpq.so,
but latch related functions like WaitEventSetWait() should not be called from 
client application.
So it is also not appropriate.
In short, there are no good position to place the function because this 
requires both of libpq and core functions.


> I still think that it is probably too much work/code to detect the
> mentioned use-case you described on [1]. Each backend constantly
> calling CallCheckingRemoteServersCallbacks() for this purpose doesn't sound
> the optimal way to approach the "check whether server down" problem. You
> typically try to decide whether a server is down by establishing a
> connection (or ping etc), not going over all the existing connections.

Yes, the approach that establishes a new connection is very simple, but I think 
it has some holes.
For example, if the DNS server or some routing software may is stopped,
we will fail to connect to foreign servers.
In your approach, we regard the case as "failed" and try to invalidate the 
server
even if the existing connection can be used.
Another case is that if a server goes down and failover has occurred, we will 
succeed to connect to foreign servers.
We may regard the case as a "success", but we cannot COMMIT the transaction.


> As far as I can think of, it should probably be a single background task
> checking whether the server is down. If so, sending an invalidation message
> to all the backends such that related backends could act on the
> invalidation and throw an error. This is to cover the use-case you
> described on [1].

Indeed your approach covers the use case I said, but I'm not sure whether it is 
really good. 
In your approach, once the background worker process will manage all foreign 
servers.
It may be OK if there are a few servers, but if there are hundreds of servers,
the time interval during checks will be longer.

Currently, each FDW can decide whether we do health checks or not per the 
backend.
For example, we can skip health checks if the foreign server is not used now.
The background worker cannot control such a way.

Moreover, methods to connect to foreign servers and check health are different 
per FDW.
In terms of mysql_fdw [1], we must do mysql_init() and mysql_real_connect().
About file_fdw, we do not have to connect, but developers may want to calculate 
checksum and compare.
Therefore, we must provide callback functions anyway.

Based on the above, I do not agree that we introduce a new background worker 
and make it to do a health check.

> This is of course how I would approach this problem. I think some other
> perspectives on this would be very useful to hear.

Yes, we can hear other opinions :-).

[1]: https://github.com/EnterpriseDB/mysql_fdw

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

 


RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-16 Thread osumi.takami...@fujitsu.com
On Wednesday, October 5, 2022 6:27 PM kuroda.hay...@fujitsu.com 
 wrote:
> Thanks for giving many comments! Based on off and on discussions, I modified
> my patch.
Here are my other quick review comments on v16.


(1) v16-0001 : definition of a new structure

CheckingRemoteServersCallbackItem can be added to typedefs.list.


(2) v16-0001 : UnRegisterCheckingRemoteServersCallback's header comment

+void
+UnRegisterCheckingRemoteServersCallback(CheckingRemoteServersCallback callback,
+   
void *arg)
+{

Looks require a header comment for this function,
because in this file, all other functions have each one.


(3) v16-0002 : better place to declare new variables

New variables 'old' and 'server' in GetConnection() can be moved to
a scope of 'if' statement where those are used.

@@ -138,6 +149,8 @@ GetConnection(UserMapping *user, bool will_prep_stmt, 
PgFdwConnState **state)
ConnCacheEntry *entry;
ConnCacheKey key;
MemoryContext ccxt = CurrentMemoryContext;
+   MemoryContext old;
+   ForeignServer *server = GetForeignServer(user->serverid);

(4) v16-0002 : typo in pgfdw_connection_check_internal()


+   /* return false is if the socket seems to be closed. */

It should be "return false if the socket seems to be closed" ?


(5) v16-0002 : pgfdw_connection_check's message

When I tested the new feature, I got a below message.

"ERROR:  Foreign Server my_external_server might be down."

I think we can wrap the server name with double quotes
so that the message is more aligned with existing error message
like connect_pg_server.


(6) v16-003 : typo

+  Authors needs to register the callback function via following API.

Should be "Authors need to register the ...".


(7) v16-0004 : unnecessary file

I think we can remove a new file which looks like a log file.

.../postgres_fdw/expected/postgres_fdw_1.out



Best Regards,
Takamichi Osumi



RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-16 Thread osumi.takami...@fujitsu.com
Hi,


On Wednesday, October 5, 2022 6:27 PM kuroda.hay...@fujitsu.com 
 wrote:
> Thanks for giving many comments! Based on off and on discussions, I modified
> my patch.
Thank you for your patch set !

While reviewing and testing the new v16, I've met a possible issue
by slightly adjusting the scenario written in [1].

I mainly followed the steps there and
replaced the command "SELECT" for the remote table at 6-9 with "INSERT" command.
Then, after waiting for few seconds, the "COMMIT" succeeded like below output,
even after the server stop of the worker side.

After the transaction itself, any reference to the remote table fails.
Note that the local table has some data initially in my test.

postgres=# begin;
BEGIN
postgres=*# insert into remote values (-1000);
INSERT 0 1
postgres=*# select * from local;
 number 

101
102
103
104
105
(5 rows)

postgres=*# commit;
COMMIT
postgres=# select * from remote;
ERROR:  could not connect to server "my_external_server"
DETAIL:  connection to server on socket "/tmp/.s.PGSQL." failed: No such 
file or directory
Is the server running locally and accepting connections on that socket?


Additionally, the last reference "SELECT" which failed above can succeed,
if I restart the worker server before the "SELECT" command to the remote table.
This means the transaction looks successful but the data isn't there ?
Could you please have a look at this issue ?


[1] - 
https://www.postgresql.org/message-id/TYAPR01MB58662CD4FD98AA475B3D10F9F59B9%40TYAPR01MB5866.jpnprd01.prod.outlook.com


Best Regards,
Takamichi Osumi



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-12 Thread Önder Kalacı
Hi,

Sounds reasonable. Do you mean that we can add additional GUC like
> "postgres_fdw.initial_check",
> wait WL_SOCKET_CLOSED if the conneciton is found in the hash table, and do
> reconnection if it might be closed, right?
>
>
Alright, it took me sometime to realize that postgres_fdw already has a
retry mechanism if the first command fails: postgres_fdw: reestablish new
connection if cached one is detected as… · postgres/postgres@32a9c0b
(github.com)



Still, the reestablish mechanism can be further simplified with
WL_SOCKET_CLOSED event such as the following (where we should probably
rename pgfdw_connection_check_internal):

 /*
> * If the connection needs to be remade due to invalidation, disconnect as
> * soon as we're out of all transactions.
> */
>
* | +bool remoteSocketIsClosed =  entry->conn != NULL : *
pgfdw_connection_check_internal*(entry->conn) : false;*

> if (entry->conn != NULL && (entry->invalidated || remoteSocketIsClosed) &&
> entry->xact_depth == 0)

{
> elog(DEBUG3, "closing connection %p for option changes to take effect",
> entry->conn);
> disconnect_pg_server(entry);
> }

* | +else if (remoteSocketIsClosed && && entry->xact_depth > 0)*
* | + error ("Remote Server is down ...")*


In other words, a variation of pgfdw_connection_check_internal()
could potentially go into interfaces/libpq/libpq-fe.h
(backend/libpq/pqcomm.c or src/interfaces/libpq/fe-connect.c).
Then, GetConnection() in postgres_fdw, it can force to reconnect as it is
already done for some cases or error properly:


>
>  Based on off and on discussions, I modified my patch.
>
>
I still think that it is probably too much work/code to detect the
mentioned use-case you described on [1]. Each backend constantly
calling CallCheckingRemoteServersCallbacks() for this purpose doesn't sound
the optimal way to approach the "check whether server down" problem. You
typically try to decide whether a server is down by establishing a
connection (or ping etc), not going over all the existing connections.

As far as I can think of, it should probably be a single background task
checking whether the server is down. If so, sending an invalidation message
to all the backends such that related backends could act on the
invalidation and throw an error. This is to cover the use-case you
described on [1].

Also, maybe we could have a new catalog table like pg_foreign_server_health
or such, where we can keep the last time the check succeeded (and/or
failed), and how many times the check  succeeded (and/or failed).

This is of course how I would approach this problem. I think some other
perspectives on this would be very useful to hear.

Thanks,
Onder KALACI


  [1]
https://www.postgresql.org/message-id/TYAPR01MB58662809E678253B90E82CE5F5889%40TYAPR01MB5866.jpnprd01.prod.outlook.com


RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder,

> As far as I can see this patch is mostly useful for detecting the failures
> on the initial remote command. This is especially common when the remote
> server does a failover/switchover and postgres_fdw uses a cached connection
> to access to the remote server.

Sounds reasonable. Do you mean that we can add additional GUC like 
"postgres_fdw.initial_check",
wait WL_SOCKET_CLOSED if the conneciton is found in the hash table, and do 
reconnection if it might be closed, right?

> I think any extension that deals with multiple Postgres nodes can benefit
> from such logic. In fact, the reason I realized this patch is that on the
> Citus extension, we are trying to solve a similar problem [1], [2].

> Thinking even more, I think any extension that uses libpq and WaitEventSets
> can benefit from such a function.

OK, I agreed it may be useful, but where should this function be?
This cannot be used from application, so it should not in interface/libpq dir. 
So backend/libpq/pqcomm.c?

> I think it also depends on where you decide to put
> pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
> could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?
> But if you decide to put it into the Postgres side, the API
> for pgfdw_connection_check_internal() -- or equivalent function -- could be
> discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
> else use what is passed to the function? Not sure, maybe you can come up
> with a better API.

Thank you for describing more detail. I can imagine you said.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,



> If the checking function is called not periodically but GetConnection(),
> it means that the health of foreign servers will be check only when remote
> connections are used.
> So following workload described in [1] cannot handle the issue.
>
> BEGIN --- remote operations--- local operations --- COMMIT
>
>
As far as I can see this patch is mostly useful for detecting the failures
on the initial remote command. This is especially common when the remote
server does a failover/switchover and postgres_fdw uses a cached connection
to access to the remote server.

The remote server failures within a transaction block sounds like a much
less common problem. Still, you can give a good error message before COMMIT
is sent to the remote server.

I agree that this doesn't solve the issue you described, but I'm not sure
if it is worthwhile to fix such a problem.


> > Can we have this function/logic on Postgres core, so that other
> extensions
> > can also use?
>
> I was not sure about any use-case, but I think it can because it does
> quite general things.
> Is there any good motivation to do that?
>

I think any extension that deals with multiple Postgres nodes can benefit
from such logic. In fact, the reason I realized this patch is that on the
Citus extension, we are trying to solve a similar problem [1], [2].

Thinking even more, I think any extension that uses libpq and WaitEventSets
can benefit from such a function.


> > Do you see any performance implications of creating/freeing waitEventSets
> > per check? I wonder if we can somehow re-use the same waitEventSet by
> > modifyWaitEvent? I guess no, but still, if this check causes a
> performance
> > implication, can we somehow cache 1 waitEventSet per connection?
>
> I have not tested yet, but I agreed this will be caused performance
> decrease.
> In next version first I will re-use the event set anyway, and it must be
> considered later.
> Actually I'm not sure your suggestion,
> but you mean to say that we can add a hash table that associates  PGconn
> and WaitEventSet,  right?
>
>
I think it also depends on where you decide to put
pgfdw_connection_check_internal(). If you prefer the postgres_fdw side,
could we maybe use ConnCacheEntry in contrib/postgres_fdw/connection.c?

But if you decide to put it into the Postgres side, the API
for pgfdw_connection_check_internal() -- or equivalent function -- could be
discussed. Do we pass a WaitEventSet and if it is NULL create a new one,
else use what is passed to the function? Not sure, maybe you can come up
with a better API.

Thanks,
Onder KALACI

[1]: Check WL_SOCKET_CLOSED before using any connection by onderkalaci ·
Pull Request #6259 · citusdata/citus (github.com)

[2]: Add a single connection retry in MULTI_CONNECTION_LOST state by
marcocitus · Pull Request #6283 · citusdata/citus (github.com)



RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Önder,

Thank you for being interest to my patch! Your suggestions will be included to 
newer version.

> In other words, what is the trade-off for calling
> pgfdw_connection_check_internal() inside GetConnection() when we are about
> to use a "cached" connection? I think that might simplify the patch as well

If the checking function is called not periodically but GetConnection(),
it means that the health of foreign servers will be check only when remote 
connections are used.
So following workload described in [1] cannot handle the issue.

BEGIN --- remote operations--- local operations --- COMMIT

But, yes, I perfectly agreed that it could simplify the code
because we can reduce the timer part. This is second plan of this patch,
I may move on this approach if it is still useful.

> Can we have this function/logic on Postgres core, so that other extensions
> can also use?

I was not sure about any use-case, but I think it can because it does quite 
general things.
Is there any good motivation to do that?

> What if PQsocket(conn) returns -1? Maybe we move certain connection state
> checks into pgfdw_connection_check_internal() such that it is more generic?
> I can think of checks like: conn!=NULL, PQsocket(conn) != PGINVALID_SOCKET,
> PQstatus == CONNECTION_OK

ereport(ERROR) will be thrown if PQsocket(conn) returns -1.
All of you said should be handled here. I will modify it.

> Do you see any performance implications of creating/freeing waitEventSets
> per check? I wonder if we can somehow re-use the same waitEventSet by
> modifyWaitEvent? I guess no, but still, if this check causes a performance
> implication, can we somehow cache 1 waitEventSet per connection?

I have not tested yet, but I agreed this will be caused performance decrease.
In next version first I will re-use the event set anyway, and it must be 
considered later.
Actually I'm not sure your suggestion,
but you mean to say that we can add a hash table that associates  PGconn and 
WaitEventSet,  right?

[1]: 
https://www.postgresql.org/message-id/TYAPR01MB58662809E678253B90E82CE5F5889%40TYAPR01MB5866.jpnprd01.prod.outlook.com

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread Önder Kalacı
Hi Hayato Kuroda,

Thanks for the patch. I think there are some non-fdw extensions out there
which could benefit from this logic. That's why I want to first learn some
more about high-level design/goals of the patch a little more.

+/*
> + * Call callbacks for checking remote servers.
> + */
> +void
> +CallCheckingRemoteServersCallbacks(void)


Why do we run this periodically, but not when a specific connection is
going to be used? Wouldn't running it periodically prevent detecting some
already-closed sockets at the time of the connection used (e.g., we checked
10 seconds ago, the server died 5 seconds ago)?

In other words, what is the trade-off for calling
pgfdw_connection_check_internal() inside GetConnection() when we are about
to use a "cached" connection? I think that might simplify the patch as well.

+/*
> + * Helper function for pgfdw_connection_check
> + */
> +static bool
> +pgfdw_connection_check_internal(PGconn *conn)
> +{


Can we have this function/logic on Postgres core, so that other extensions
can also use?

 + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);


What if PQsocket(conn) returns -1? Maybe we move certain connection state
checks into pgfdw_connection_check_internal() such that it is more generic?
I can think of checks like: conn!=NULL, PQsocket(conn) != PGINVALID_SOCKET,
PQstatus == CONNECTION_OK


+ eventset = CreateWaitEventSet(CurrentMemoryContext, 1);
> + AddWaitEventToSet(eventset, WL_SOCKET_CLOSED, PQsocket(conn), NULL,
> NULL);
> +
> + WaitEventSetWait(eventset, 0, , 1, 0);
> +
> + if (events.events & WL_SOCKET_CLOSED)
> + {
> + FreeWaitEventSet(eventset);
> + return false;
> + }
> + FreeWaitEventSet(eventset);


Do you see any performance implications of creating/freeing waitEventSets
per check? I wonder if we can somehow re-use the same waitEventSet by
modifyWaitEvent? I guess no, but still, if this check causes a performance
implication, can we somehow cache 1 waitEventSet per connection?


Thanks,
Onder KALACI
Developing the Citus extension @Microsoft


RE: [Proposal] Add foreign-server health checks infrastructure

2022-10-03 Thread kuroda.hay...@fujitsu.com
Dear Andres,

> This seems to reliably fail on windows. See

Thanks for reporting. Actually this feature cannot be used on Windows machine.
To check the status of each socket that connects to the foreign server,
the socket event WL_SOCKET_CLOSED is used.
The event is only enabled on some OSes, and Windows machine cannot.  

The part must be skipped if the system cannot be used the event, but I was not 
sure how to do that...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [Proposal] Add foreign-server health checks infrastructure

2022-10-02 Thread Andres Freund
Hi,

On 2022-09-21 11:56:56 +, kuroda.hay...@fujitsu.com wrote:
> PSA rebased patches. I reviewed my myself and they contain changes.
> E.g., move GUC-related code to option.c.

This seems to reliably fail on windows. See
https://cirrus-ci.com/task/6454408568373248
https://cirrus-ci.com/github/postgresql-cfbot/postgresql/commitfest/39/3388

https://api.cirrus-ci.com/v1/artifact/task/6454408568373248/testrun/build/testrun/postgres_fdw/regress/regression.diffs

diff -w -U3 C:/cirrus/contrib/postgres_fdw/expected/postgres_fdw.out 
C:/cirrus/build/testrun/postgres_fdw/regress/results/postgres_fdw.out
--- C:/cirrus/contrib/postgres_fdw/expected/postgres_fdw.out2022-10-02 
14:47:24.486355800 +
+++ C:/cirrus/build/testrun/postgres_fdw/regress/results/postgres_fdw.out   
2022-10-02 15:02:03.039752800 +
@@ -11478,6 +11478,8 @@
 ALTER SERVER loopback OPTIONS (SET application_name 'healthcheck');
 -- Set GUC for checking the health of remote servers
 SET postgres_fdw.health_check_interval TO '1s';
+ERROR:  invalid value for parameter "postgres_fdw.health_check_interval": 1000
+DETAIL:  postgres_fdw.health_check_interval must be set to 0 on this platform
 BEGIN;
 SELECT 1 FROM ft1 LIMIT 1;
  ?column?
@@ -11495,7 +11497,15 @@

 -- While sleeping the process down will be detected.
 SELECT pg_sleep(3);
-ERROR:  Foreign Server loopback might be down.
+ pg_sleep
+--
+
+(1 row)
+
 COMMIT;
+ERROR:  server closed the connection unexpectedly
+   This probably means the server terminated abnormally
+   before or while processing the request.
+CONTEXT:  remote SQL command: COMMIT TRANSACTION
 -- Clean up
 RESET debug_discard_caches;


Greetings,

Andres Freund




RE: [Proposal] Add foreign-server health checks infrastructure

2022-09-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thanks for checking!

> These failed to be applied to the master branch cleanly. Could you update 
> them?

PSA rebased patches. I reviewed my myself and they contain changes.
E.g., move GUC-related code to option.c.


> +  this option relies on kernel events exposed by Linux, macOS,
> 
> s/this/This

Fixed.

> 
> + GUC_check_errdetail("pgfdw_health_check_interval must be set
> to 0 on this platform");
> 
> The actual parameter name "postgres_fdw.health_check_interval"
> should be used for the message instead of internal variable name.

Fixed.

> This registered signal handler does lots of things. But that's not acceptable
> and they should be performed outside signal handler. No?


I modified like v09 or earlier versions, which has a mechanism for registering 
CheckingRemoteServersCallback.
It had been removed because we want to keep core simpler, but IIUC it is needed
if the signal handler just sets some flags.
The core-side does not consider the current status of transaction and running 
query for simpleness.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v15-0001-Add-an-infrastracture-for-checking-remote-server.patch
Description:  v15-0001-Add-an-infrastracture-for-checking-remote-server.patch


v15-0002-postgres_fdw-Implement-health-check-feature.patch
Description:  v15-0002-postgres_fdw-Implement-health-check-feature.patch


v15-0003-add-doc.patch
Description: v15-0003-add-doc.patch


v15-0004-add-test.patch
Description: v15-0004-add-test.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2022-09-17 Thread Fujii Masao




On 2022/03/04 15:17, kuroda.hay...@fujitsu.com wrote:

Hi Hackers,


It's not happy, but I'm not sure about a good solution. I made a timer 
reschedule
if connection lost had detected. But if queries in the transaction are quite 
short,
catching SIGINT may be fail.


Attached uses another way: sets pending flags again if DoingCommandRead is true.
If a remote server goes down while it is in idle_in_transaction,
next query will fail because of ereport(ERROR).

How do you think?


Sounds ok to me.

Thanks for updating the patches!

These failed to be applied to the master branch cleanly. Could you update them?

+  this option relies on kernel events exposed by Linux, macOS,

s/this/This

+   GUC_check_errdetail("pgfdw_health_check_interval must be set to 0 on 
this platform");

The actual parameter name "postgres_fdw.health_check_interval"
should be used for the message instead of internal variable name.

+   /* Register a timeout for checking remote servers */
+   pgfdw_health_check_timeout = RegisterTimeout(USER_TIMEOUT, 
pgfdw_connection_check);

This registered signal handler does lots of things. But that's not acceptable
and they should be performed outside signal handler. No?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-03-03 Thread kuroda.hay...@fujitsu.com
Hi Hackers,

> It's not happy, but I'm not sure about a good solution. I made a timer 
> reschedule
> if connection lost had detected. But if queries in the transaction are quite 
> short,
> catching SIGINT may be fail.

Attached uses another way: sets pending flags again if DoingCommandRead is true.
If a remote server goes down while it is in idle_in_transaction,
next query will fail because of ereport(ERROR).

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v14_0001_expose_cancel_message.patch
Description: v14_0001_expose_cancel_message.patch


v14_0002_add_health_check.patch
Description: v14_0002_add_health_check.patch


v14_0003_add_doc.patch
Description: v14_0003_add_doc.patch
<>


RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-23 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for your quick reviewing! I attached new version.
I found previous patches have wrong name. Sorry.

> The connection check timer is re-scheduled repeatedly even while the backend 
> is
> in idle state or is running a local transaction that doesn't access to any 
> foreign
> servers. I'm not sure if it's really worth checking the connections even in 
> those
> states. Even without the periodic connection checks, if the connections are 
> closed
> in those states, subsequent GetConnection() will detect that closed connection
> and re-establish the connection when starting remote transaction. Thought?

Indeed. We can now control the timer in fdw layer, so disable_timeout() was 
added
at the bottom of pgfdw_xact_callback(). 

> When a closed connection is detected in idle-in-transaction state and SIGINT 
> is
> raised, nothing happens because there is no query running to be canceled by
> SIGINT. Also in this case the connection check timer gets disabled. So we can 
> still
> execute queries that don't access to foreign servers, in the same 
> transaction, and
> then the transaction commit fails. Is this expected behavior?

It's not happy, but I'm not sure about a good solution. I made a timer 
reschedule
if connection lost had detected. But if queries in the transaction are quite 
short,
catching SIGINT may be fail.

> When I shutdowned the foreign server while the local backend is in
> idle-in-transaction state, the connection check timer was triggered and 
> detected
> the closed connection. Then when I executed COMMIT command, I got the
> following WARNING message. Is this a bug?
> 
>  WARNING:  leaked hash_seq_search scan for hash table 0x7fd2ca878f20

Fixed. It is caused because hash_seq_term() was not called when checker detects
a connection lost.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v13_0001_expose_cancel_message.patch
Description: v13_0001_expose_cancel_message.patch


v13_0002_add_health_check.patch
Description: v13_0002_add_health_check.patch


v13_0003_add_doc.patch
Description: v13_0003_add_doc.patch
<>


Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread Fujii Masao




On 2022/02/22 15:41, kuroda.hay...@fujitsu.com wrote:

Cfbot is still angry because of missing PGDLLIMPORT, so attached.


Thanks for updating the patches!

The connection check timer is re-scheduled repeatedly even while the backend is 
in idle state or is running a local transaction that doesn't access to any 
foreign servers. I'm not sure if it's really worth checking the connections 
even in those states. Even without the periodic connection checks, if the 
connections are closed in those states, subsequent GetConnection() will detect 
that closed connection and re-establish the connection when starting remote 
transaction. Thought?


When a closed connection is detected in idle-in-transaction state and SIGINT is 
raised, nothing happens because there is no query running to be canceled by 
SIGINT. Also in this case the connection check timer gets disabled. So we can 
still execute queries that don't access to foreign servers, in the same 
transaction, and then the transaction commit fails. Is this expected behavior?


When I shutdowned the foreign server while the local backend is in 
idle-in-transaction state, the connection check timer was triggered and 
detected the closed connection. Then when I executed COMMIT command, I got the 
following WARNING message. Is this a bug?

WARNING:  leaked hash_seq_search scan for hash table 0x7fd2ca878f20

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Cfbot is still angry because of missing PGDLLIMPORT, so attached.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v12_0001_add_checking_infrastracture.patch
Description: v12_0001_add_checking_infrastracture.patch


v12_0002_add_health_check.patch
Description: v12_0002_add_health_check.patch


v12_0003_add_doc.patch
Description: v12_0003_add_doc.patch
<>


RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

> cfbot is reporting that the 0002 patch fails to be applied cleanly. Could you 
> update
> the patch?
> http://cfbot.cputube.org/patch_37_3388.log

Thanks for reporting and sorry for inconvenience.
I repo was not latest version. Attached can be applied to 52e4f0c

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v11_0001_add_checking_infrastracture.patch
Description: v11_0001_add_checking_infrastracture.patch


v11_0002_add_health_check.patch
Description: v11_0002_add_health_check.patch


v11_0003_add_doc.patch
Description: v11_0003_add_doc.patch
<>


Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread Fujii Masao




On 2022/02/22 11:53, kuroda.hay...@fujitsu.com wrote:

How do you think?


Thanks for updating the patches! I will read them.

cfbot is reporting that the 0002 patch fails to be applied cleanly. Could you 
update the patch?
http://cfbot.cputube.org/patch_37_3388.log

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san, Fujii-san,

> > I understood here as removing following mechanism from core:
> >
> > * disable timeout at end of tx.
> > * skip if held off or read commands
> 
> I think we're on the same page. Anyway query cancel interrupt is
> ignored while rading input.
> 
> > > - If an existing connection is found to be dead, just try canceling
> > >   the query (or sending query cancel).
> > > One issue with it is how to show the decent message for the query
> > > cancel, but maybe we can have a global variable that suggests the
> > > reason for the cancel.
> >
> > Currently I have no good idea for that but I'll try.
> 
> However, I would like to hear others' opnions about the direction, of
> course.
>

Based on the idea, I re-implemented the feature. Almost all feature is
moved to postgres_fdw. The abstract of my patch is as follows:

# core

* Exposes QueryCancelMessage for error reporting
* Uses above if it was not NULL

# postgres_fdw

* Defines new GUC postgres_fdw.health_check_interval.
  It is renamed simpler.
* Registers a timeout when initializing connection hash.
* Raises SIGINT and sets QueryCancelMessage to message.
  if detects a connection lost.

I also attached a test as zipped file for keep cfbot quiet.
When connection lost is detected, the following message will show:

```
ERROR:  Foreign Server (servername) might be down.
```

How do you think?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v10_0001_expose_cancel_message.patch
Description: v10_0001_expose_cancel_message.patch


v10_0002_add_health_check.patch
Description: v10_0002_add_health_check.patch


v10_0003_add_doc.patch
Description: v10_0003_add_doc.patch
<>


RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-21 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

> Isn't it a very special case where many FDWs use their own user timeouts? 
> Could
> you tell me the assumption that you're thinking, especially how many FDWs are
> working?

I came up with the case like star schema, which postgres database connects data 
store.
If each dbms are different and FDWs have each timeout, many timeout will be 
registered.
But it may be a corner case and should not be confused with OLTP case. 
So I'll post new patch based on his post.


Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-18 Thread Fujii Masao




On 2022/02/17 19:35, kuroda.hay...@fujitsu.com wrote:

Dear Horiguchi-san,


I think we just don't need to add the special timeout kind to the
core.  postgres_fdw can use USER_TIMEOUT and it would be suffiction to
keep running health checking regardless of transaction state then fire
query cancel if disconnection happens. As I said in the previous main,
possible extra query cancel woud be safe.


Sounds reasonable to me.



I finally figured out that you mentioned about user-defined timeout system.
Firstly - before posting to hackers - I designed like that,
but I was afraid of an overhead that many FDW registers timeout
and call setitimer() many times. Is it too overcautious?


Isn't it a very special case where many FDWs use their own user timeouts? Could 
you tell me the assumption that you're thinking, especially how many FDWs are 
working?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-17 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san,

> I think we just don't need to add the special timeout kind to the
> core.  postgres_fdw can use USER_TIMEOUT and it would be suffiction to
> keep running health checking regardless of transaction state then fire
> query cancel if disconnection happens. As I said in the previous main,
> possible extra query cancel woud be safe.

I finally figured out that you mentioned about user-defined timeout system.
Firstly - before posting to hackers - I designed like that,
but I was afraid of an overhead that many FDW registers timeout
and call setitimer() many times. Is it too overcautious?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED
 





Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-17 Thread Kyotaro Horiguchi
At Thu, 17 Feb 2022 04:32:26 +, "kuroda.hay...@fujitsu.com" 
 wrote in 
> > I understood here as removing following mechanism from core:
> > 
> > * disable timeout at end of tx.
> 
> While reading again and this part might be wrong.
> Sorry for inconvenience.
> But anyway some codes should be (re)moved from core, right?

I think we just don't need to add the special timeout kind to the
core.  postgres_fdw can use USER_TIMEOUT and it would be suffiction to
keep running health checking regardless of transaction state then fire
query cancel if disconnection happens. As I said in the previous main,
possible extra query cancel woud be safe.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-17 Thread Kyotaro Horiguchi
Hi, Kuroda-san.

At Thu, 17 Feb 2022 04:11:09 +, "kuroda.hay...@fujitsu.com" 
 wrote in 
> Dear Horiguchi-san,
> 
> Thank you for giving your suggestions. I want to confirm your saying.
> 
> > FWIW, I'm not sure this feature necessarily requires core support
> > dedicated to FDWs.  The core have USER_TIMEOUT feature already and
> > FDWs are not necessarily connection based.  It seems better if FDWs
> > can implement health check feature without core support and it seems
> > possible.  Or at least the core feature should be more generic and
> > simpler. Why don't we just expose InTransactionHealthCheckCallbacks or
> > something and operating functions on it?
> 
> I understood that core is too complicated and FDW side is too stupid, right?

I don't think the FDW side is stupid but seem too complex for the
benefit.  And just think that maybe we don't need the core part.

> > Mmm. AFAICS the running command will stop with "canceling statement
> > due to user request", which is a hoax.  We need a more decent message
> > there.
> 
> +1 about better messages.
> 
> > I understand that the motive of this patch is "to avoid wasted long
> > local work when fdw-connection dies".
> 
> Yeah your understanding is right.
> 
> > In regard to the workload in
> > your first mail, it is easily avoided by ending the transaction as soon
> > as remote access ends. This feature doesn't work for the case "begin;
> > ; ". But the same measure also works in
> > that case.  So the only case where this feature is useful is "begin;
> > ; ; ; end;".  But in the first
> > place how frequently do you expecting remote-connection close happens?
> > If that happens so frequently, you might need to recheck the system
> > health before implementing this feature.  Since it is correctly
> > detected when something really went wrong, I feel that it is a bit too
> > complex for the usefulness especially for the core part.
> 
> Thanks for analyzing motivation.
> Indeed, some cases may be resolved by separating tx and this event rarely 
> happens.
> 
> > In conclusion, as my humble opinion I would like to propose to reduce
> > this feature to:
> > 
> > - Just periodically check health (in any aspect) of all live
> >   connections regardless of the session state.
> 
> I understood here as removing following mechanism from core:
> 
> * disable timeout at end of tx.
> * skip if held off or read commands

I think we're on the same page. Anyway query cancel interrupt is
ignored while rading input.

> > - If an existing connection is found to be dead, just try canceling
> >   the query (or sending query cancel).
> > One issue with it is how to show the decent message for the query
> > cancel, but maybe we can have a global variable that suggests the
> > reason for the cancel.
> 
> Currently I have no good idea for that but I'll try.

However, I would like to hear others' opnions about the direction, of
course.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-16 Thread kuroda.hay...@fujitsu.com
> I understood here as removing following mechanism from core:
> 
> * disable timeout at end of tx.

While reading again and this part might be wrong.
Sorry for inconvenience.
But anyway some codes should be (re)moved from core, right?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-16 Thread kuroda.hay...@fujitsu.com
Dear Horiguchi-san,

Thank you for giving your suggestions. I want to confirm your saying.

> FWIW, I'm not sure this feature necessarily requires core support
> dedicated to FDWs.  The core have USER_TIMEOUT feature already and
> FDWs are not necessarily connection based.  It seems better if FDWs
> can implement health check feature without core support and it seems
> possible.  Or at least the core feature should be more generic and
> simpler. Why don't we just expose InTransactionHealthCheckCallbacks or
> something and operating functions on it?

I understood that core is too complicated and FDW side is too stupid, right?

> Mmm. AFAICS the running command will stop with "canceling statement
> due to user request", which is a hoax.  We need a more decent message
> there.

+1 about better messages.

> I understand that the motive of this patch is "to avoid wasted long
> local work when fdw-connection dies".

Yeah your understanding is right.

> In regard to the workload in
> your first mail, it is easily avoided by ending the transaction as soon
> as remote access ends. This feature doesn't work for the case "begin;
> ; ". But the same measure also works in
> that case.  So the only case where this feature is useful is "begin;
> ; ; ; end;".  But in the first
> place how frequently do you expecting remote-connection close happens?
> If that happens so frequently, you might need to recheck the system
> health before implementing this feature.  Since it is correctly
> detected when something really went wrong, I feel that it is a bit too
> complex for the usefulness especially for the core part.

Thanks for analyzing motivation.
Indeed, some cases may be resolved by separating tx and this event rarely 
happens.

> In conclusion, as my humble opinion I would like to propose to reduce
> this feature to:
> 
> - Just periodically check health (in any aspect) of all live
>   connections regardless of the session state.

I understood here as removing following mechanism from core:

* disable timeout at end of tx.
* skip if held off or read commands

> - If an existing connection is found to be dead, just try canceling
>   the query (or sending query cancel).
> One issue with it is how to show the decent message for the query
> cancel, but maybe we can have a global variable that suggests the
> reason for the cancel.

Currently I have no good idea for that but I'll try.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-15 Thread Kyotaro Horiguchi
At Tue, 1 Feb 2022 23:51:54 +0900, Fujii Masao  
wrote in 
> This logic sounds complicated to me. I'm afraid that FDW developers
> may a bit easily misunderstand the logic and make the bug in their
> FDW.

> Isn't it simpler to just disable the timeout in core whenever the
> transaction ends whether committed or aborted, like statement_timeout
> is disabled after each command? For example, call something like
> DisableForeignCheckTimeout() in CommitTransaction() etc.

> This approach seems to assume that FDW must manage all the
> ForeignServer information so that the callback can return it. Is this
> assumption valid for all the FDW?

FWIW, I'm not sure this feature necessarily requires core support
dedicated to FDWs.  The core have USER_TIMEOUT feature already and
FDWs are not necessarily connection based.  It seems better if FDWs
can implement health check feature without core support and it seems
possible.  Or at least the core feature should be more generic and
simpler. Why don't we just expose InTransactionHealthCheckCallbacks or
something and operating functions on it?


> How about making FDW trigger a query cancel interrupt by signaling
> SIGINT to the backend, instead?

Mmm. AFAICS the running command will stop with "canceling statement
due to user request", which is a hoax.  We need a more decent message
there.

I understand that the motive of this patch is "to avoid wasted long
local work when fdw-connection dies".  In regard to the workload in
your first mail, it is easily avoided by ending the transaction as soon
as remote access ends. This feature doesn't work for the case "begin;
; ". But the same measure also works in
that case.  So the only case where this feature is useful is "begin;
; ; ; end;".  But in the first
place how frequently do you expecting remote-connection close happens?
If that happens so frequently, you might need to recheck the system
health before implementing this feature.  Since it is correctly
detected when something really went wrong, I feel that it is a bit too
complex for the usefulness especially for the core part.


In conclusion, as my humble opinion I would like to propose to reduce
this feature to:

- Just periodically check health (in any aspect) of all live
  connections regardless of the session state.

- If an existing connection is found to be dead, just try canceling
  the query (or sending query cancel).

One issue with it is how to show the decent message for the query
cancel, but maybe we can have a global variable that suggests the
reason for the cancel.


regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-15 Thread kuroda.hay...@fujitsu.com
Dear Hackers,

I found patches we depend have been committed, so rebased.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=50e570a59e7f86bb41f029a66b781fc79b8d50f1

In this version there is a little bit change in part of postgres_fdw.
A system checking by WaitEventSetCanReportClosed() is added
because some OSes cannot wait WL_SOCKET_CLOSED event.
Note that test cannot be added in the regression test
because cfbot may be not happy.
In my environment a test that contained in previous patches works well.

0001, 0002 is not changed from previous version. 

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v09_0003_helth_check_for_postgres_fdw.patch
Description: v09_0003_helth_check_for_postgres_fdw.patch


v09_0002_add_doc_about_infrastructure.patch
Description: v09_0002_add_doc_about_infrastructure.patch


v09_0001_add_checking_infrastracture.patch
Description: v09_0001_add_checking_infrastracture.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2022-02-02 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for good suggestions.

> This logic sounds complicated to me. I'm afraid that FDW developers may a bit
> easily misunderstand the logic and make the bug in their FDW.
> Isn't it simpler to just disable the timeout in core whenever the transaction 
> ends
> whether committed or aborted, like statement_timeout is disabled after each
> command? For example, call something like DisableForeignCheckTimeout() in
> CommitTransaction() etc.

Your idea is that stop the timer at the end of each transactions, right?
I had not adopted that because I thought some developers might want not to stop 
the timer
even if transactions ends. It caused complexed situation and not have good 
usecase, however,
so your logic was implemented.

> > You are right. I think this suggests that error-reporting should be done in 
> > the
> core layer.
> > For meaningful error reporting, I changed a callback specification
> > that it should return ForeignServer* which points to downed remote server.
> 
> This approach seems to assume that FDW must manage all the ForeignServer
> information so that the callback can return it. Is this assumption valid for 
> all the
> FDW?

Not sure, the assumption might be too optimistic. 
mysql_fdw can easily return ForeignServer* because it caches serverid,
but dblink and maybe oracle_fdw cannot.

> How about making FDW trigger a query cancel interrupt by signaling SIGINT to
> the backend, instead?

I understood that the error should be caught by QueryCancelPending.
Could you check 0003? Does it follow that?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v08_0001_add_checking_infrastracture.patch
Description: v08_0001_add_checking_infrastracture.patch


v08_0002_add_doc.patch
Description: v08_0002_add_doc.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2022-02-01 Thread Fujii Masao




On 2022/02/01 13:37, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san,

Thank you for reviewing! I attached the latest version.


Thanks!


Indeed and it should be avoided. I added a counter to 
CheckingRemoteServersCallbackItem.
The register function returns the registered item, and it must be set as the 
argument for
enable and disable functions.
Callback functions will be called when item->counter is larger than zero.


This logic sounds complicated to me. I'm afraid that FDW developers may a bit 
easily misunderstand the logic and make the bug in their FDW.

Isn't it simpler to just disable the timeout in core whenever the transaction 
ends whether committed or aborted, like statement_timeout is disabled after 
each command? For example, call something like DisableForeignCheckTimeout() in 
CommitTransaction() etc.


You are right. I think this suggests that error-reporting should be done in the 
core layer.
For meaningful error reporting, I changed a callback specification
that it should return ForeignServer* which points to downed remote server.


This approach seems to assume that FDW must manage all the ForeignServer 
information so that the callback can return it. Is this assumption valid for 
all the FDW?

How about making FDW trigger a query cancel interrupt by signaling SIGINT to 
the backend, instead?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-31 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for reviewing! I attached the latest version.

> When more than one FDWs are used, even if one FDW calls this function to
> disable the timeout, its remote-server-check-callback can still be called. Is 
> this
> OK?
> Please imagine the case where two FDWs are used and they registered their
> callbacks. Even when one FDW calls TryDisableRemoteServerCheckingTimeout(),
> if another FDW has not called that yet, the timeout is still being enabled. 
> If the
> timeout is triggered during that period, even the callback registered by the 
> FDW
> that has already called TryDisableRemoteServerCheckingTimeout() would be
> called.

Indeed and it should be avoided. I added a counter to 
CheckingRemoteServersCallbackItem.
The register function returns the registered item, and it must be set as the 
argument for
enable and disable functions.
Callback functions will be called when item->counter is larger than zero.

> + if (remote_servers_connection_check_interval > 0)
> + {
> + CallCheckingRemoteServersCallbacks();
> +
>   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
> remote_servers_connection_check_interval);
> 
> LockErrorCleanup() needs to be called before reporting the error, doesn't it?

You are right. I think this suggests that error-reporting should be done in the 
core layer.
For meaningful error reporting, I changed a callback specification
that it should return ForeignServer* which points to downed remote server.

> This can cause an error even while DoingCommandRead == true. Is that safe?

I read codes again and I think it is not safe. It is OK when whereToSendOutput 
is DestRemote,
but not good in InteractiveBackend().

I changed that if-statement for CheckingRemoteServersTimeoutPending is moved 
just after
ClientConnectionLost, because the that may throw a FATAL error and disconnect 
from client.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v07_0001_add_checking_infrastracture.patch
Description: v07_0001_add_checking_infrastracture.patch


v07_0002_add_doc.patch
Description: v07_0002_add_doc.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-22 Thread Fujii Masao




On 2022/01/21 15:08, kuroda.hay...@fujitsu.com wrote:

Dear Fujii-san, Zhihong,

I attached the latest version.


Thanks for updating the patch!

+int
+TryDisableRemoteServerCheckingTimeout(void)

When more than one FDWs are used, even if one FDW calls this function to 
disable the timeout, its remote-server-check-callback can still be called. Is 
this OK?
Please imagine the case where two FDWs are used and they registered their 
callbacks. Even when one FDW calls TryDisableRemoteServerCheckingTimeout(), if 
another FDW has not called that yet, the timeout is still being enabled. If the 
timeout is triggered during that period, even the callback registered by the 
FDW that has already called TryDisableRemoteServerCheckingTimeout() would be 
called.


+   if (remote_servers_connection_check_interval > 0)
+   {
+   CallCheckingRemoteServersCallbacks();
+   
enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
+
remote_servers_connection_check_interval);

LockErrorCleanup() needs to be called before reporting the error, doesn't it?


This can cause an error even while DoingCommandRead == true. Is that safe?



The biggest change is that callbacks are no longer un-registered at the end of 
transactions.
FDW developer must enable or disable timeout instead, via new APIs.

The timer will be turned on when:
* new GUC is >= 0, and


This can cause the timeout to be enabled even when no remote transaction is 
started?

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-20 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san, Zhihong,

I attached the latest version.
The biggest change is that callbacks are no longer un-registered at the end of 
transactions.
FDW developer must enable or disable timeout instead, via new APIs.

The timer will be turned on when:
* new GUC is >= 0, and
* anyone calls TryEnableRemoteServerCheckingTimeout().

I think this version is reduced overhead, but it might not be developer 
friendly...

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v06_0002_add_doc.patch
Description: v06_0002_add_doc.patch


v06_0001_add_checking_infrastracture.patch
Description: v06_0001_add_checking_infrastracture.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-20 Thread kuroda.hay...@fujitsu.com
Dear Fujii-san,

Thank you for your interest! I'll post new version within several days.

> > Yeah, remote-checking timeout will be enable even ifa local transaction is
> opened.
> > In my understanding, postgres cannot distinguish whether opening 
> > transactions
> > are using only local object or not.
> > My first idea was that turning on the timeout when GetFdwRoutineXXX
> functions
> > were called,
> 
> What about starting the timeout in GetConnection(), instead?

Did you said about a function in postgres_fdw/connection.c?
In my understanding that means that the timeout should be enabled or disabled
by each FDW extensions.
I did not find bad cases for that, so I'll change like that and make new APIs.

> v05_0004_add_tests.patch failed to be applied to the master. Could you rebase 
> it?

It's caused because a testcase was added in postgres_fdw. Will rebase.

> The above change is included in both
> v5-0003-Use-WL_SOCKET_CLOSED-for-client_connection_check_.patch and
> v05_0002_add_doc.patch. If it should be in the former patch, it should be 
> removed
> from your patch v05_0002_add_doc.patch.

I confused about doc-patch. Sorry for inconvenience.

> There seems no user of UnregisterCheckingRemoteServersCallback(). So how
> about removing it?

Previously I kept the API for any other extensions, but I cannot find use cases.
Will remove.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



RE: [Proposal] Add foreign-server health checks infrastructure

2022-01-19 Thread kuroda.hay...@fujitsu.com
Dear Zhihong,

Thank you for reviewing! And I have to apologize for the late.
I'll post new patchset later.

> +   UnregisterAllCheckingRemoteServersCallback();
> 
> UnregisterAllCheckingRemoteServersCallback
> -> UnregisterAllCheckingRemoteServersCallbacks

Will fix.

> +   CheckingRemoteServersCallbackItem *item;
> +   item = fdw_callbacks;
> 
> The above two lines can be combined.

Will fix.

> +UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
> callback,
> +   void *arg)
> 
> Is the above func called anywhere ?

Currently not, I just kept the API because of any other FDW extensions.
But I cannot find any use cases, so will remove.

> +   if (item->callback == callback && item->arg == arg)
> 
> Is comparing argument pointers stable ? What if the same argument is cloned
> ?

This part is no longer needed. Will remove.

> +   CallCheckingRemoteServersCallbacks();
> +
> +   if (remote_servers_connection_check_interval > 0)
> +   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
> +
>  remote_servers_connection_check_interval);
>
> Should the call to CallCheckingRemoteServersCallbacks() be placed under the
> if block checking remote_servers_connection_check_interval ?
> If remote_servers_connection_check_interval is 0, it seems the callbacks
> shouldn't be called.

Agreed. We force stopping timeout when the GUC sets to 0 in assign_hook,
so your saying is consistent. Will move.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-18 Thread Fujii Masao




On 2021/12/15 15:40, kuroda.hay...@fujitsu.com wrote:

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened.
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not.
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called,


What about starting the timeout in GetConnection(), instead?


I attached which implements that.


v05_0004_add_tests.patch failed to be applied to the master. Could you rebase 
it?


-This option is currently available only on systems that support the
-non-standard POLLRDHUP extension to the
-poll system call, including Linux.
+This option relies on kernel events exposed by Linux, macOS, illumos
+and the BSD family of operating systems, and is not currently available
+on other systems.

The above change is included in both 
v5-0003-Use-WL_SOCKET_CLOSED-for-client_connection_check_.patch and 
v05_0002_add_doc.patch. If it should be in the former patch, it should be 
removed from your patch v05_0002_add_doc.patch.


There seems no user of UnregisterCheckingRemoteServersCallback(). So how about 
removing it?


Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-05 Thread Zhihong Yu
On Tue, Jan 4, 2022 at 10:21 PM Shinya Kato 
wrote:

> Thank you for the new patch!
>
> On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote:
> > Dear Kato-san,
> >
> > Thank you for giving comments! And sorry for late reply.
> > I rebased my patches.
> >
> >> Even for local-only transaction, I thought it useless to execute
> >> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> >> make it so that it determines outside whether it contains SQL to the
> >> remote or not?
> >
> > Yeah, remote-checking timeout will be enable even ifa local
> > transaction is opened.
> > In my understanding, postgres cannot distinguish whether opening
> > transactions
> > are using only local object or not.
> > My first idea was that turning on the timeout when GetFdwRoutineXXX
> > functions
> > were called, but in some cases FDWs may not use remote connection even
> > if
> > they call such a handler function. e.g. postgresExplainForeignScan().
> > Hence I tried another approach that unregister all checking callback
> > the end of each transactions. Only FDWs can notice that transactions
> > are remote or local,
> > so they must register callback when they really connect to other
> > database.
> > This implementation will avoid calling remote checking in the case of
> > local transaction,
> > but multiple registering and unregistering may lead another overhead.
> > I attached which implements that.
> >
> It certainly incurs another overhead, but I think it's better than the
> previous one.
> So far, I haven't encountered any problems, but I'd like other people's
> opinions.
>
> >> The following points are minor.
> >> 1. In foreign.c, some of the lines are too long and should be broken.
> >> 2. In pqcomm.c, the newline have been removed, what is the intention
> >> of
> >> this?
> >> 3. In postgres.c,
> >> 3-1. how about inserting a comment between lines 2713 and 2714,
> >> similar
> >> to line 2707?
> >> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> >> 3-3. you should change
> >>  /*
> >>  * Skip checking foreign servers while reading
> >> messages.
> >>  */
> >> to
> >>  /*
> >>   * Skip checking foreign servers while reading
> >> messages.
> >>   */
> >> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> >> be changed to "function".
> >
> > Maybe all of them were fixed. Thanks!
> Thank you, and it looks good to me.
>
>
> Hi,
+   UnregisterAllCheckingRemoteServersCallback();

UnregisterAllCheckingRemoteServersCallback
-> UnregisterAllCheckingRemoteServersCallbacks

+   CheckingRemoteServersCallbackItem *item;
+   item = fdw_callbacks;

The above two lines can be combined.

+UnregisterCheckingRemoteServersCallback(CheckingRemoteServersCallback
callback,
+   void *arg)

Is the above func called anywhere ?

+   if (item->callback == callback && item->arg == arg)

Is comparing argument pointers stable ? What if the same argument is cloned
?

+   CallCheckingRemoteServersCallbacks();
+
+   if (remote_servers_connection_check_interval > 0)
+   enable_timeout_after(CHECKING_REMOTE_SERVERS_TIMEOUT,
+
 remote_servers_connection_check_interval);

Should the call to CallCheckingRemoteServersCallbacks() be placed under the
if block checking remote_servers_connection_check_interval ?
If remote_servers_connection_check_interval is 0, it seems the callbacks
shouldn't be called.

Cheers


Re: [Proposal] Add foreign-server health checks infrastructure

2022-01-04 Thread Shinya Kato

Thank you for the new patch!

On 2021-12-15 15:40, kuroda.hay...@fujitsu.com wrote:

Dear Kato-san,

Thank you for giving comments! And sorry for late reply.
I rebased my patches.


Even for local-only transaction, I thought it useless to execute
CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
make it so that it determines outside whether it contains SQL to the
remote or not?


Yeah, remote-checking timeout will be enable even ifa local
transaction is opened.
In my understanding, postgres cannot distinguish whether opening 
transactions

are using only local object or not.
My first idea was that turning on the timeout when GetFdwRoutineXXX 
functions
were called, but in some cases FDWs may not use remote connection even 
if

they call such a handler function. e.g. postgresExplainForeignScan().
Hence I tried another approach that unregister all checking callback
the end of each transactions. Only FDWs can notice that transactions
are remote or local,
so they must register callback when they really connect to other 
database.

This implementation will avoid calling remote checking in the case of
local transaction,
but multiple registering and unregistering may lead another overhead.
I attached which implements that.

It certainly incurs another overhead, but I think it's better than the 
previous one.
So far, I haven't encountered any problems, but I'd like other people's 
opinions.



The following points are minor.
1. In foreign.c, some of the lines are too long and should be broken.
2. In pqcomm.c, the newline have been removed, what is the intention 
of

this?
3. In postgres.c,
3-1. how about inserting a comment between lines 2713 and 2714, 
similar

to line 2707?
3-2. the line breaks in line 3242 and line 3243 should be aligned.
3-3. you should change
/*
* Skip checking foreign servers while reading
messages.
*/
to
/*
 * Skip checking foreign servers while reading
messages.
 */
4. In connection.c, There is a typo in line 1684, so "fucntion" should
be changed to "function".


Maybe all of them were fixed. Thanks!

Thank you, and it looks good to me.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2021-12-14 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for giving comments! And sorry for late reply.
I rebased my patches.

> Even for local-only transaction, I thought it useless to execute
> CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I
> make it so that it determines outside whether it contains SQL to the
> remote or not?

Yeah, remote-checking timeout will be enable even ifa local transaction is 
opened. 
In my understanding, postgres cannot distinguish whether opening transactions
are using only local object or not. 
My first idea was that turning on the timeout when GetFdwRoutineXXX functions
were called, but in some cases FDWs may not use remote connection even if
they call such a handler function. e.g. postgresExplainForeignScan().
Hence I tried another approach that unregister all checking callback
the end of each transactions. Only FDWs can notice that transactions are remote 
or local,
so they must register callback when they really connect to other database.
This implementation will avoid calling remote checking in the case of local 
transaction,
but multiple registering and unregistering may lead another overhead.
I attached which implements that.

> The following points are minor.
> 1. In foreign.c, some of the lines are too long and should be broken.
> 2. In pqcomm.c, the newline have been removed, what is the intention of
> this?
> 3. In postgres.c,
> 3-1. how about inserting a comment between lines 2713 and 2714, similar
> to line 2707?
> 3-2. the line breaks in line 3242 and line 3243 should be aligned.
> 3-3. you should change
>   /*
>   * Skip checking foreign servers while reading
> messages.
>   */
> to
>   /*
>* Skip checking foreign servers while reading
> messages.
>*/
> 4. In connection.c, There is a typo in line 1684, so "fucntion" should
> be changed to "function".

Maybe all of them were fixed. Thanks!

How do you think?


Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v05_0001_add_checking_infrastracture.patch
Description: v05_0001_add_checking_infrastracture.patch


v05_0002_add_doc.patch
Description: v05_0002_add_doc.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-12-06 Thread Shinya Kato

On 2021-11-29 21:36, Zhihong Yu wrote:

On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com
 wrote:


Dear Zhihong,

Thank you for giving comments! I'll post new patches later.


+#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()

(CheckingRemoteServersHoldoffCount++)


The macro contains only one operation. Can the macro be removed

(with `CheckingRemoteServersHoldoffCount++` inlined) ?

Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and
HOLD_CANCEL_INTERRUPTS():

```
#define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)

#define RESUME_INTERRUPTS() \
do { \
Assert(InterruptHoldoffCount > 0); \
InterruptHoldoffCount--; \
} while(0)

#define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)

#define RESUME_CANCEL_INTERRUPTS() \
do { \
Assert(QueryCancelHoldoffCount > 0); \
QueryCancelHoldoffCount--; \
} while(0)

#define START_CRIT_SECTION()  (CritSectionCount++)

#define END_CRIT_SECTION() \
do { \
Assert(CritSectionCount > 0); \
CritSectionCount--; \
} while(0)
```

So I want to keep the current style. Could you tell me if you have
any other reasons?


+   if (CheckingRemoteServersTimeoutPending &&

CheckingRemoteServersHoldoffCount != 0)

+   {
+   /*
+* Skip checking foreign servers while reading messages.
+*/
+   InterruptPending = true;
+   }
+   else if (CheckingRemoteServersTimeoutPending)

Would the code be more readable if the above two if blocks be

moved inside one enclosing if block (factoring the common
condition)?


+   if (CheckingRemoteServersTimeoutPending)


+1. Will fix.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Hi,

It is Okay to keep the macros.

Thanks


Hi, Kuroda-san. Sorry for late reply.

Even for local-only transaction, I thought it useless to execute 
CallCheckingRemoteServersCallbacks() and enable_timeout_after(). Can I 
make it so that it determines outside whether it contains SQL to the 
remote or not?


The following points are minor.
1. In foreign.c, some of the lines are too long and should be broken.
2. In pqcomm.c, the newline have been removed, what is the intention of 
this?

3. In postgres.c,
3-1. how about inserting a comment between lines 2713 and 2714, similar 
to line 2707?

3-2. the line breaks in line 3242 and line 3243 should be aligned.
3-3. you should change
/*
* Skip checking foreign servers while reading messages.
*/
to
/*
 * Skip checking foreign servers while reading messages.
 */
4. In connection.c, There is a typo in line 1684, so "fucntion" should 
be changed to "function".



--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread Zhihong Yu
On Mon, Nov 29, 2021 at 12:51 AM kuroda.hay...@fujitsu.com <
kuroda.hay...@fujitsu.com> wrote:

> Dear Zhihong,
>
> Thank you for giving comments! I'll post new patches later.
>
> > +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()
> (CheckingRemoteServersHoldoffCount++)
> >
> > The macro contains only one operation. Can the macro be removed (with
> `CheckingRemoteServersHoldoffCount++` inlined) ?
>
> Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and
> HOLD_CANCEL_INTERRUPTS():
>
> ```
> #define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)
>
> #define RESUME_INTERRUPTS() \
> do { \
> Assert(InterruptHoldoffCount > 0); \
> InterruptHoldoffCount--; \
> } while(0)
>
> #define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)
>
> #define RESUME_CANCEL_INTERRUPTS() \
> do { \
> Assert(QueryCancelHoldoffCount > 0); \
> QueryCancelHoldoffCount--; \
> } while(0)
>
> #define START_CRIT_SECTION()  (CritSectionCount++)
>
> #define END_CRIT_SECTION() \
> do { \
> Assert(CritSectionCount > 0); \
> CritSectionCount--; \
> } while(0)
> ```
>
> So I want to keep the current style. Could you tell me if you have any
> other reasons?
>
> > +   if (CheckingRemoteServersTimeoutPending &&
> CheckingRemoteServersHoldoffCount != 0)
> > +   {
> > +   /*
> > +* Skip checking foreign servers while reading messages.
> > +*/
> > +   InterruptPending = true;
> > +   }
> > +   else if (CheckingRemoteServersTimeoutPending)
> >
> > Would the code be more readable if the above two if blocks be moved
> inside one enclosing if block (factoring the common condition)?
> >
> > +   if (CheckingRemoteServersTimeoutPending)
>
> +1. Will fix.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>

Hi,

It is Okay to keep the macros.

Thanks


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread kuroda.hay...@fujitsu.com
Dear Hackers,

I attached new version that is based from Zhihong's comments.
And I newly attached a doc patch. I think the infrastructure part is no longer 
WIP. 

Could you review them?

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v04_0001_add_checking_infrastracture.patch
Description: v04_0001_add_checking_infrastracture.patch


v04_0002_add_doc.patch
Description: v04_0002_add_doc.patch


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-29 Thread kuroda.hay...@fujitsu.com
Dear Zhihong,

Thank you for giving comments! I'll post new patches later.

> +#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()  
> (CheckingRemoteServersHoldoffCount++)
> 
> The macro contains only one operation. Can the macro be removed (with 
> `CheckingRemoteServersHoldoffCount++` inlined) ?

Hmm, these HOLD/RESUME macros are followed HOLD_INTERRUPUST() and 
HOLD_CANCEL_INTERRUPTS():

```
#define HOLD_INTERRUPTS()  (InterruptHoldoffCount++)

#define RESUME_INTERRUPTS() \
do { \
Assert(InterruptHoldoffCount > 0); \
InterruptHoldoffCount--; \
} while(0)

#define HOLD_CANCEL_INTERRUPTS()  (QueryCancelHoldoffCount++)

#define RESUME_CANCEL_INTERRUPTS() \
do { \
Assert(QueryCancelHoldoffCount > 0); \
QueryCancelHoldoffCount--; \
} while(0)

#define START_CRIT_SECTION()  (CritSectionCount++)

#define END_CRIT_SECTION() \
do { \
Assert(CritSectionCount > 0); \
CritSectionCount--; \
} while(0)
```

So I want to keep the current style. Could you tell me if you have any other 
reasons?

> +   if (CheckingRemoteServersTimeoutPending && 
> CheckingRemoteServersHoldoffCount != 0)
> +   {
> +   /*
> +* Skip checking foreign servers while reading messages.
> +*/
> +   InterruptPending = true;
> +   }
> +   else if (CheckingRemoteServersTimeoutPending)
> 
> Would the code be more readable if the above two if blocks be moved inside 
> one enclosing if block (factoring the common condition)?
> 
> +   if (CheckingRemoteServersTimeoutPending)

+1. Will fix.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED




Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-28 Thread Zhihong Yu
On Tue, Nov 23, 2021 at 8:57 PM kuroda.hay...@fujitsu.com <
kuroda.hay...@fujitsu.com> wrote:

> Dear Kato-san,
>
> Thank you for reviewing!
>
> > Thank you for sending the patches!
> > I confirmed that they can be compiled and tested successfully on CentOS
> > 8.
>
> Thanks!
>
> > + {
> > + {"remote_servers_connection_check_interval", PGC_USERSET,
> > CONN_AUTH_SETTINGS,
> > + gettext_noop("Sets the time interval between checks
> > for
> > disconnection of remote servers."),
> > + NULL,
> > + GUC_UNIT_MS
> > + },
> > + _servers_connection_check_interval,
> > + 0, 0, INT_MAX,
> > + },
> >
> > If you don't use check_hook, assign_hook and show_hook, you should
> > explicitly write "NULL, NULL, NULL", as show below.
>
> Yeah I forgot the line. Fixed.
>
> > + ereport(ERROR,
> > +
> >   errcode(ERRCODE_CONNECTION_FAILURE),
> > + errmsg("Postgres foreign server %s
> > might be down.",
> > +
> >   server->servername));
> >
> > According to [1], error messages should start with a lowercase letter
> > and not use a period.
> > Also, along with the rest of the code, it is a good idea to enclose the
> > server name in double quotes.
>
> I confirmed the postgres error-reporting policy and fixed to follow that.
> How do you think?
>
> Attached are the latest version.
>
> Best Regards,
> Hayato Kuroda
> FUJITSU LIMITED
>
> Hi,

+#define HOLD_CHECKING_REMOTE_SERVERS_INTERRUPTS()
 (CheckingRemoteServersHoldoffCount++)

The macro contains only one operation. Can the macro be removed (with
`CheckingRemoteServersHoldoffCount++` inlined) ?

+   if (CheckingRemoteServersTimeoutPending &&
CheckingRemoteServersHoldoffCount != 0)
+   {
+   /*
+* Skip checking foreign servers while reading messages.
+*/
+   InterruptPending = true;
+   }
+   else if (CheckingRemoteServersTimeoutPending)

Would the code be more readable if the above two if blocks be moved inside
one enclosing if block (factoring the common condition)?

+   if (CheckingRemoteServersTimeoutPending)

Cheers


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-23 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

I found the missing space, and I added a test.
I'm very happy if you review this.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED



v03_0001_add_checking_infrastracture.patch
Description: v03_0001_add_checking_infrastracture.patch
<>


RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-23 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for reviewing!

> Thank you for sending the patches!
> I confirmed that they can be compiled and tested successfully on CentOS
> 8.

Thanks!

> + {
> + {"remote_servers_connection_check_interval", PGC_USERSET,
> CONN_AUTH_SETTINGS,
> + gettext_noop("Sets the time interval between checks
> for
> disconnection of remote servers."),
> + NULL,
> + GUC_UNIT_MS
> + },
> + _servers_connection_check_interval,
> + 0, 0, INT_MAX,
> + },
> 
> If you don't use check_hook, assign_hook and show_hook, you should
> explicitly write "NULL, NULL, NULL", as show below.

Yeah I forgot the line. Fixed.

> + ereport(ERROR,
> +
>   errcode(ERRCODE_CONNECTION_FAILURE),
> + errmsg("Postgres foreign server %s
> might be down.",
> +
>   server->servername));
> 
> According to [1], error messages should start with a lowercase letter
> and not use a period.
> Also, along with the rest of the code, it is a good idea to enclose the
> server name in double quotes.

I confirmed the postgres error-reporting policy and fixed to follow that.
How do you think?

Attached are the latest version.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v02_add_checking_infrastracture.patch
Description: v02_add_checking_infrastracture.patch


v02_add_helth_check_for_postgres_fdw.patch
Description: v02_add_helth_check_for_postgres_fdw.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-19 Thread Shinya Kato

On 2021-11-18 21:43, kuroda.hay...@fujitsu.com wrote:

Dear Kato-san,

Thank you for your interest!


> I also want you to review the postgres_fdw part,
> but I think it should not be attached because cfbot cannot understand
> such a dependency
> and will throw build error. Do you know how to deal with them in this
> case?

I don't know how to deal with them, but I hope you will attach the 
PoC,

as it may be easier to review.


OK, I attached the PoC along with the dependent patches. Please see
the zip file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially 
and

WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.


Thank you for sending the patches!
I confirmed that they can be compiled and tested successfully on CentOS 
8.


I haven't looked at the core of the code yet, but I took a quick look at 
it.


+   {
+		{"remote_servers_connection_check_interval", PGC_USERSET, 
CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the time interval between checks for 
disconnection of remote servers."),

+   NULL,
+   GUC_UNIT_MS
+   },
+   _servers_connection_check_interval,
+   0, 0, INT_MAX,
+   },

If you don't use check_hook, assign_hook and show_hook, you should 
explicitly write "NULL, NULL, NULL", as show below.

+   {
+		{"remote_servers_connection_check_interval", PGC_USERSET, 
CONN_AUTH_SETTINGS,
+			gettext_noop("Sets the time interval between checks for 
disconnection of remote servers."),

+   NULL,
+   GUC_UNIT_MS
+   },
+   _servers_connection_check_interval,
+   0, 0, INT_MAX,
+   NULL, NULL, NULL
+   },


+   ereport(ERROR,
+   errcode(ERRCODE_CONNECTION_FAILURE),
+   errmsg("Postgres foreign server %s might be 
down.",
+   server->servername));

According to [1], error messages should start with a lowercase letter 
and not use a period.
Also, along with the rest of the code, it is a good idea to enclose the 
server name in double quotes.


I'll get back to you once I've read all the code.

[1] https://www.postgresql.org/docs/devel/error-style-guide.html

--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




RE: [Proposal] Add foreign-server health checks infrastructure

2021-11-18 Thread kuroda.hay...@fujitsu.com
Dear Kato-san,

Thank you for your interest!

> > I also want you to review the postgres_fdw part,
> > but I think it should not be attached because cfbot cannot understand
> > such a dependency
> > and will throw build error. Do you know how to deal with them in this
> > case?
> 
> I don't know how to deal with them, but I hope you will attach the PoC,
> as it may be easier to review.

OK, I attached the PoC along with the dependent patches. Please see the zip 
file.
add_helth_check_... patch is written by me, and other two patches are
just copied from [1].
In the new callback function ConnectionHash is searched sequentially and
WaitEventSetWait() is performed for WL_SOCKET_CLOSED socket event.
This event is added by the dependent ones.

===
How to use
===

I'll explain how to use it briefly.

1. boot two postmaster processes. One is coordinator, and another is worker
2. set remote_servers_connection_check_interval to non-zero value at the 
coordinator
3. create tables to worker DB-cluster.
4. create foreign server, user mapping, and foreign table to coordinator.
5. connect to coordinator via psql.
6. open a transaction and access to foreing tables.
7. do "pg_ctl stop" command to woker DB-cluser.
8. execute some commands that does not access an foreign table.
9. Finally the following output will be get:

ERROR:  Postgres foreign server XXX might be down.

===
Example in some steps
===

3. at worker

```
postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | remote | table | hayato
(1 row)
```

4. at coordinator 

```
postgres=# select * from pg_foreign_server ;
  oid  | srvname | srvowner | srvfdw | srvtype | srvversion | srvacl | 
srvoptions  
---+-+--++-+++-
 16406 | remote  |   10 |  16402 | ||| 
{port=5433,dbname=postgres}
(1 row)

postgres=# select * from pg_user_mapping ;
  oid  | umuser | umserver |   umoptions   
---++--+---
 16407 | 10 |16406 | {user=hayato}
(1 row)

postgres=# \d
List of relations
 Schema |  Name  | Type  | Owner  
++---+
 public | local  | table | hayato
 public | remote | foreign table | hayato
(2 rows)
```

6-9. at coordinator

```
postgres=# begin;
BEGIN
postgres=*# select * from remote ;
 id 

  1
(1 row)

postgres=*# select * from local ;
ERROR:  Postgres foreign server remote might be down.
postgres=!#
```

Note that some keepalive settings are needed
if you want to detect cable breakdown events.
In my understanding following parameters are needed as server options:

* keepalives_idle
* keepalives_count
* keepalives_interval

[1]: https://commitfest.postgresql.org/35/3098/

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

<>


v01_add_checking_infrastracture.patch
Description: v01_add_checking_infrastracture.patch


Re: [Proposal] Add foreign-server health checks infrastructure

2021-11-15 Thread Shinya Kato

Hi,
Thank you for the patch!

On 2021-10-30 12:50, kuroda.hay...@fujitsu.com wrote:


## Background

Currently there is no way to check the status of an foreign server in
PostgreSQL.
If an foreign server's postmaster goes down, or if the network between
servers is lost,
the backend process will only detect these when it uses the connection
corresponding to that foreign server.

Consider a workload that updates data on an foreign server only at the
beginning of a transaction,
and runs a lot of local SQLs. Even if the network is disconnected
immediately after accessing the foreign server,
the backend process will continue to execute local SQLs without 
realizing it.


The process will eventually finish to execute SQLs and try to commit.
Only then will it realize that the foreign server cannot be connect
and will abort the transaction.
This situation should be detected as soon as possible
because it is impossible to commit a transaction when the foreign
server goes down.
This can be more of a problem if you have system-wide downtime 
requirements.

That's why I want to implement the health-check feature to postgres.


It's a good idea. I also think such a situation should be avoided.


## Further work

As the next step I have a plan to implement the callback function to
postgres_fdw.
I already made a PoC, but it deeply depends on the following thread:
https://commitfest.postgresql.org/35/3098/

I also want you to review the postgres_fdw part,
but I think it should not be attached because cfbot cannot understand
such a dependency
and will throw build error. Do you know how to deal with them in this 
case?


I don't know how to deal with them, but I hope you will attach the PoC, 
as it may be easier to review.


--
Regards,

--
Shinya Kato
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




  1   2   >