Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-15 Thread Jelte Fennema
Rebased

On Tue, 14 Mar 2023 at 19:05, Gregory Stark (as CFM)
 wrote:
>
> The pgindent run in b6dfee28f is causing this patch to need a rebase
> for the cfbot to apply it.


v12-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v12-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v12-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v12-0004-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-14 Thread Gregory Stark (as CFM)
The pgindent run in b6dfee28f is causing this patch to need a rebase
for the cfbot to apply it.




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-06 Thread Jelte Fennema
Small update. Improved some wording in the docs.

On Fri, 3 Mar 2023 at 15:37, Jelte Fennema  wrote:
>
> > I want to note that the Fisher-Yates algorithm is implemented in a
> > difficult to understand manner.
> > +if (j < i) /* avoid fetching undefined data if j=i */
> > This stuff does not make sense in case of shuffling arrays inplace. It
> > is important only for making a new copy of an array and only in
> > languages that cannot access uninitialized values. I'd suggest just
> > removing this line (in both cases).
>
> Done. Also added another patch to remove the same check from another
> place in the codebase where it is unnecessary.


v11-0004-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v11-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v11-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v11-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-03 Thread Jelte Fennema
> I want to note that the Fisher-Yates algorithm is implemented in a
> difficult to understand manner.
> +if (j < i) /* avoid fetching undefined data if j=i */
> This stuff does not make sense in case of shuffling arrays inplace. It
> is important only for making a new copy of an array and only in
> languages that cannot access uninitialized values. I'd suggest just
> removing this line (in both cases).

Done. Also added another patch to remove the same check from another
place in the codebase where it is unnecessary.


v10-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v10-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v10-0004-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v10-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-02 Thread Andrey Borodin
On Wed, Mar 1, 2023 at 12:03 PM Jelte Fennema  wrote:
>
> done and updated cf entry
>

Hi Jelte!

I've looked into the patch. Although so many improvements can be
suggested, It definitely makes sense as-is too.
These improvements might be, for example, sorting hosts according to
ping latency or something like that. Or, perhaps, some other balancing
policies. Anyway, randomizing is a good start too.

I want to note that the Fisher-Yates algorithm is implemented in a
difficult to understand manner.
+if (j < i) /* avoid fetching undefined data if j=i */
This stuff does not make sense in case of shuffling arrays inplace. It
is important only for making a new copy of an array and only in
languages that cannot access uninitialized values. I'd suggest just
removing this line (in both cases).


Best regards, Andrey Borodin.




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-01 Thread Jelte Fennema
done and updated cf entry

On Wed, 1 Mar 2023 at 20:13, Greg S  wrote:
>
> This patch seems to need a rebase.
>
> I'll update the status to Waiting on Author for now. After rebasing
> please update it to either Needs Review or Ready for Committer
> depending on how simple the rebase was and whether there are open
> questions to finish it.


v9-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owned.patch
Description: Binary data


v9-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v9-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-03-01 Thread Greg S
This patch seems to need a rebase.

I'll update the status to Waiting on Author for now. After rebasing
please update it to either Needs Review or Ready for Committer
depending on how simple the rebase was and whether there are open
questions to finish it.




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-26 Thread Jelte Fennema
After discussing this patch privately with Andres here's a new version of this
patch. The major differences are:
1. Use the pointer value of the connection as a randomness source
2. Use more precise time as randomness source
3. Move addrinfo changes into a separate commit. This is both to make
the actual change cleaner, and because another patch of mine (non
blocking cancels) benefits from the same change.
4. Use the same type of Fisher-Yates shuffle as is done in two other
places in the PG source code.
5. Move tests depending on hosts file to a separate file. This makes
it clear in the output that tests are skipped, because skip_all shows
a nice message.
6. Only enable hosts file load balancing when loadbalance is included
in PG_TEST_EXTRA, since this test listens on TCP socket and is thus
dangerous on a multi-user system.

On Wed, 18 Jan 2023 at 11:24, Jelte Fennema  wrote:
>
> As far as I can tell this is ready for committer feedback now btw. I'd
> really like to get this into PG16.
>
> > It hadn't been my intention to block the patch on it, sorry. Just
> > registering a preference.
>
> No problem. I hadn't looked into the shared PRNG solution closely
> enough to determine if I thought it was better or not. Now that I
> implemented an initial version I personally don't think it brings
> enough advantages to warrant the added complexity. I definitely
> don't think it's required for this patch, if needed this change can
> always be done later without negative user impact afaict. And the
> connection local PRNG works well enough to bring advantages.
>
> > my
> > assumption was that you could use the existing pglock_thread() to
> > handle it, since it didn't seem like the additional contention would
> > hurt too much.
>
> That definitely would have been the easier approach and I considered
> it. But the purpose of pglock_thread seemed so different from this lock
> that it felt weird to combine the two. Another reason I refactored the lock
> code is that it would be probably be necessary for a future round-robin
> load balancing, which would require sharing state between different
> connections.
>
> > > And my thought was that the one-time
> > > initialization could be moved to a place that doesn't need to know the
> > > connection options at all, to make it easier to reason about the
> > > architecture. Say, next to the WSAStartup machinery.
>
> That's an interesting thought, but I don't think it would really simplify
> the initialization code. Mostly it would change its location.
>
> > (And after marinating on this over the weekend, it occurred to me that
> > keeping the per-connection option while making the PRNG global
> > introduces an additional hazard, because two concurrent connections
> > can now fight over the seed value.)
>
> I think since setting the initial seed value is really only meant for testing
> it's not a big deal if it doesn't work with concurrent connections.


v8-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v8-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owned.patch
Description: Binary data


v8-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-18 Thread Jelte Fennema
As far as I can tell this is ready for committer feedback now btw. I'd
really like to get this into PG16.

> It hadn't been my intention to block the patch on it, sorry. Just
> registering a preference.

No problem. I hadn't looked into the shared PRNG solution closely
enough to determine if I thought it was better or not. Now that I
implemented an initial version I personally don't think it brings
enough advantages to warrant the added complexity. I definitely
don't think it's required for this patch, if needed this change can
always be done later without negative user impact afaict. And the
connection local PRNG works well enough to bring advantages.

> my
> assumption was that you could use the existing pglock_thread() to
> handle it, since it didn't seem like the additional contention would
> hurt too much.

That definitely would have been the easier approach and I considered
it. But the purpose of pglock_thread seemed so different from this lock
that it felt weird to combine the two. Another reason I refactored the lock
code is that it would be probably be necessary for a future round-robin
load balancing, which would require sharing state between different
connections.

> > And my thought was that the one-time
> > initialization could be moved to a place that doesn't need to know the
> > connection options at all, to make it easier to reason about the
> > architecture. Say, next to the WSAStartup machinery.

That's an interesting thought, but I don't think it would really simplify
the initialization code. Mostly it would change its location.

> (And after marinating on this over the weekend, it occurred to me that
> keeping the per-connection option while making the PRNG global
> introduces an additional hazard, because two concurrent connections
> can now fight over the seed value.)

I think since setting the initial seed value is really only meant for testing
it's not a big deal if it doesn't work with concurrent connections.




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-17 Thread Jacob Champion
On Fri, Jan 13, 2023 at 10:44 AM Jacob Champion  wrote:
> And my thought was that the one-time
> initialization could be moved to a place that doesn't need to know the
> connection options at all, to make it easier to reason about the
> architecture. Say, next to the WSAStartup machinery.

(And after marinating on this over the weekend, it occurred to me that
keeping the per-connection option while making the PRNG global
introduces an additional hazard, because two concurrent connections
can now fight over the seed value.)

--Jacob




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-13 Thread Jacob Champion
On Fri, Jan 13, 2023 at 9:10 AM Jelte Fennema  wrote:
>
> > Just a quick single-issue review, but I agree with Maxim that having
> > one PRNG, seeded once, would be simpler
>
> I don't agree that it's simpler. Because now there's a mutex you have
> to manage, and honestly cross-platform threading in C is not simple.
> However, I attached two additional patches that implement this
> approach on top of the previous patchset. Just to make sure that
> this patch is not blocked on this.

It hadn't been my intention to block the patch on it, sorry. Just
registering a preference.

I also didn't intend to make you refactor the locking code -- my
assumption was that you could use the existing pglock_thread() to
handle it, since it didn't seem like the additional contention would
hurt too much. Maybe that's not actually performant enough, in which
case my suggestion loses an advantage.

> > The test seed could then be handled globally as well (envvar?) so that you
> > don't have to introduce a debug-only option into the connection string.
>
> Why is a debug-only envvar any better than a debug-only connection option?
> For now I kept the connection option approach, since to me they seem pretty
> much equivalent.

I guess I worry less about envvar namespace pollution than pollution
of the connection options. And my thought was that the one-time
initialization could be moved to a place that doesn't need to know the
connection options at all, to make it easier to reason about the
architecture. Say, next to the WSAStartup machinery.

But as it is now, I agree that the implementation hasn't really lost
any complexity compared to the original, and I don't feel particularly
strongly about it. If it doesn't help to make the change, then it
doesn't help.

Thanks,
--Jacob




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-13 Thread Jelte Fennema
> Just a quick single-issue review, but I agree with Maxim that having
> one PRNG, seeded once, would be simpler

I don't agree that it's simpler. Because now there's a mutex you have
to manage, and honestly cross-platform threading in C is not simple.
However, I attached two additional patches that implement this
approach on top of the previous patchset. Just to make sure that
this patch is not blocked on this.

> with the tangible benefit that it would eliminate weird behavior on
> simultaneous connections when the client isn't using OpenSSL.

This is true, but still I think in practice very few people have a libpq
that's compiled without OpenSSL support.

> I'm guessing a simple lock on a
> global PRNG would be less overhead than the per-connection
> strong_random machinery, too, but I have no data to back that up.

It might very well have less overhead, but neither of them should take
up any significant amount of time during connection establishment.

> The test seed could then be handled globally as well (envvar?) so that you
> don't have to introduce a debug-only option into the connection string.

Why is a debug-only envvar any better than a debug-only connection option?
For now I kept the connection option approach, since to me they seem pretty
much equivalent.
From 561dca3b9510464600b4da8d1397e7762f523568 Mon Sep 17 00:00:00 2001
From: Jelte Fennema 
Date: Fri, 13 Jan 2023 16:52:17 +0100
Subject: [PATCH v7 3/4] Make mutexes easier to use in libpq

For process global mutexes windows requires some different setup than
other OSes. This abstracts away that logic into the newly added pglock
and pgunlock functions. The main reason to do this refactoring is to
prepare for a new global mutex that will be added in a follow up commit.
---
 src/interfaces/libpq/fe-connect.c| 58 +++-
 src/interfaces/libpq/fe-secure-openssl.c | 33 +++---
 src/interfaces/libpq/libpq-int.h | 21 +
 3 files changed, 64 insertions(+), 48 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 18a07d810dc..69ed891703a 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -7424,36 +7424,17 @@ pqGetHomeDirectory(char *buf, int bufsize)
 static void
 default_threadlock(int acquire)
 {
-#ifdef ENABLE_THREAD_SAFETY
-#ifndef WIN32
-	static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER;
-#else
-	static pthread_mutex_t singlethread_lock = NULL;
-	static long mutex_initlock = 0;
-
-	if (singlethread_lock == NULL)
-	{
-		while (InterlockedExchange(_initlock, 1) == 1)
-			 /* loop, another thread own the lock */ ;
-		if (singlethread_lock == NULL)
-		{
-			if (pthread_mutex_init(_lock, NULL))
-Assert(false);
-		}
-		InterlockedExchange(_initlock, 0);
-	}
-#endif
+	static pglock_t singlethread_lock = PGLOCK_INITIALIZER;
 	if (acquire)
 	{
-		if (pthread_mutex_lock(_lock))
+		if (!pglock(_lock))
 			Assert(false);
 	}
 	else
 	{
-		if (pthread_mutex_unlock(_lock))
+		if (!pgunlock(_lock))
 			Assert(false);
 	}
-#endif
 }
 
 pgthreadlock_t
@@ -7468,3 +7449,36 @@ PQregisterThreadLock(pgthreadlock_t newhandler)
 
 	return prev;
 }
+
+bool
+pglock(pglock_t * lock)
+{
+#ifdef WIN32
+	if (lock->mutex == NULL)
+	{
+		while (InterlockedExchange(>mutex_initlock, 1) == 1)
+			 /* loop, another thread own the lock */ ;
+		if (lock->mutex == NULL)
+		{
+			if (pthread_mutex_init(>mutex, NULL))
+return false;
+		}
+		InterlockedExchange(>mutex_initlock, 0);
+	}
+#endif
+	if (pthread_mutex_lock(>mutex))
+	{
+		return false;
+	}
+	return true;
+}
+
+bool
+pgunlock(pglock_t * lock)
+{
+	if (pthread_mutex_unlock(>mutex))
+	{
+		return false;
+	}
+	return true;
+}
diff --git a/src/interfaces/libpq/fe-secure-openssl.c b/src/interfaces/libpq/fe-secure-openssl.c
index ab2cbf045b8..c52c2ccf217 100644
--- a/src/interfaces/libpq/fe-secure-openssl.c
+++ b/src/interfaces/libpq/fe-secure-openssl.c
@@ -94,12 +94,7 @@ static bool ssl_lib_initialized = false;
 #ifdef ENABLE_THREAD_SAFETY
 static long crypto_open_connections = 0;
 
-#ifndef WIN32
-static pthread_mutex_t ssl_config_mutex = PTHREAD_MUTEX_INITIALIZER;
-#else
-static pthread_mutex_t ssl_config_mutex = NULL;
-static long win32_ssl_create_mutex = 0;
-#endif
+static pglock_t ssl_config_lock = PGLOCK_INITIALIZER;
 #endif			/* ENABLE_THREAD_SAFETY */
 
 static PQsslKeyPassHook_OpenSSL_type PQsslKeyPassHook = NULL;
@@ -744,21 +739,7 @@ int
 pgtls_init(PGconn *conn, bool do_ssl, bool do_crypto)
 {
 #ifdef ENABLE_THREAD_SAFETY
-#ifdef WIN32
-	/* Also see similar code in fe-connect.c, default_threadlock() */
-	if (ssl_config_mutex == NULL)
-	{
-		while (InterlockedExchange(_ssl_create_mutex, 1) == 1)
-			 /* loop, another thread own the lock */ ;
-		if (ssl_config_mutex == NULL)
-		{
-			if (pthread_mutex_init(_config_mutex, NULL))
-return -1;
-		}
-		InterlockedExchange(_ssl_create_mutex, 0);
-	}
-#endif
-	if (pthread_mutex_lock(_config_mutex))
+	if 

Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-12 Thread Jacob Champion
On Wed, Sep 14, 2022 at 7:54 AM Maxim Orlov  wrote:
> For the patch itself, I think it is better to use a more precise time 
> function in libpq_prng_init or call it only once.
> Thought it is a special corner case, imagine all the connection attempts at 
> first second will be seeded with the save
> value, i.e. will attempt to connect to the same host. I think, this is not we 
> want to achieve.

Just a quick single-issue review, but I agree with Maxim that having
one PRNG, seeded once, would be simpler -- with the tangible benefit
that it would eliminate weird behavior on simultaneous connections
when the client isn't using OpenSSL. (I'm guessing a simple lock on a
global PRNG would be less overhead than the per-connection
strong_random machinery, too, but I have no data to back that up.) The
test seed could then be handled globally as well (envvar?) so that you
don't have to introduce a debug-only option into the connection
string.

Overall, I like the concept.

--Jacob




Re: [EXTERNAL] Re: Support load balancing in libpq

2023-01-09 Thread Jelte Fennema
Attached an updated patch which should address your feedback and
I updated the commit message.

> I wonder whether making the parameter a boolean will paint us into a
> corner

I made it a string option, just like target_session_attrs. I'm pretty sure a 
round-robin load balancing policy could be implemented in the future
given certain constraints, like connections being made within the same
process. I adjusted the docs accordingly.

> > +typedef struct
> > +{
> > +   int family;
> > +   SockAddraddr;
> > +}  AddrInfo;
>
> That last line looks weirdly indented compared to SockAddr; in the
> struct above.

Yes I agree, but for some reason pgindent really badly wants it formatted
that way. I now undid the changes made by pgindent manually.

> I wonder whether this needs to be documented if it is mostly a
> development/testing parameter?

I also wasn't sure whether it should be documented or not. I'm fine with
either, I'll leave it in for now and let a committer decide if it's wanted or 
not.

> A bit unclear why you put the variables at this point in the list, but
> the list doesn't seem to be ordered strictly anyway; still, maybe they
> would fit best at the bottom below target_session_attrs?

Good point, I added them after target_session_attrs now and also moved
docs/parsing accordingly. This makes conceptually to me as well, since
target_session_attrs and load_balance_hosts have some interesting
sense contextually too.

P.S. I also attached the same pgindent run patch that I added in
https://www.postgresql.org/message-id/flat/am5pr83mb0178d3b31ca1b6ec4a8ecc42f7...@am5pr83mb0178.eurprd83.prod.outlook.com

v6-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: v6-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch


v6-0002-Support-load-balancing-in-libpq.patch
Description: v6-0002-Support-load-balancing-in-libpq.patch


Re: [EXTERNAL] Re: Support load balancing in libpq

2022-09-17 Thread Michael Banck
Hi,

On Mon, Sep 12, 2022 at 02:16:56PM +, Jelte Fennema wrote:
> Attached is an updated patch with the following changes:
> 1. rebased (including solved merge conflict)
> 2. fixed failing tests in CI
> 3. changed the commit message a little bit
> 4. addressed the two remarks from Micheal
> 5. changed the prng_state from a global to a connection level value for 
> thread-safety
> 6. use pg_prng_uint64_range

Thanks!
 
I tested this some more, and found it somewhat surprising that at least
when looking at it on a microscopic level, some hosts are chosen more
often than the others for a while.

I basically ran 

while true; do psql -At "host=pg1,pg2,pg3 load_balance_hosts=1" -c
"SELECT inet_server_addr()"; sleep 1; done

and the initial output was:

10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.240
10.0.3.109
10.0.3.109
10.0.3.240
10.0.3.240
10.0.3.109
10.0.3.60

I.e. the second host (pg2/10.0.3.60) was only hit after 19 iterations.

Once significantly more than a hundred iterations are run, the hosts
somewhat even out, but it is maybe suprising to users:

   50   100   250   500   1000   1
10.0.3.60   92477   1653283317
10.0.3.109 254288   1783533372
10.0.3.240 163485   1573193311

Or maybe my test setup is skewed? When I choose a two seconds timeout
between psql calls, I get a more even distribution initially, but it
then diverges after 100 iterations:

   50   100   250500   1000
10.0.3.60  193698199374 
10.0.3.109 133380150285 
10.0.3.240 183172151341 

Could just be bad luck...

I also switch one host to have two IP addresses in /etc/hosts:

10.0.3.109 pg1
10.0.3.60  pg1
10.0.3.240 pg3

And this resulted in this (one second timeout again):

First run:

   50   100   250500   1000
10.0.3.60  101856120255
10.0.3.109 143067139278
10.0.3.240 2652   127241467

Second run:

   50   100   250500   1000
10.0.3.60  203177138265 
10.0.3.109  92052116245 
10.0.3.240 2149   121246490 

So it looks like it load-balances between pg1 and pg3, and not between
the three IPs -  is this expected?

If I switch from "host=pg1,pg3" to "host=pg1,pg1,pg3", each IP adress is
hit roughly equally.

So I guess this is how it should work, but in that case I think the
documentation should be more explicit about what is to be expected if a
host has multiple IP addresses or hosts are specified multiple times in
the connection string.

> > Maybe my imagination is not so great, but what else than hosts could we
> > possibly load-balance? I don't mind calling it load_balance, but I also
> > don't feel very strongly one way or the other and this is clearly
> > bikeshed territory.
> 
> I agree, which is why I called it load_balance in my original patch.
> But I also think it's useful to match the naming for the already
> existing implementations in the PG ecosystem around this. 
> But like you I don't really feel strongly either way. It's a tradeoff
> between short name and consistency in the ecosystem.

I don't think consistency is an extremely valid concern. As a
counterpoint, pgJDBC had targetServerType some time before Postgres, and
the libpq parameter was then named somewhat differently when it was
introduced, namely target_session_attrs.

> > If I understand correctly, you've added DNS-based load balancing on top
> > of just shuffling the provided hostnames.  This makes sense if a
> > hostname is backed by more than one IP address in the context of load
> > balancing, but it also complicates the patch. So I'm wondering how much
> > shorter the patch would be if you leave that out for now?
> 
> Yes, that's correct and indeed the patch would be simpler without, i.e. all 
> the
> addrinfo changes would become unnecessary. But IMHO the behaviour of 
> the added option would be very unexpected if it didn't load balance across
> multiple IPs in a DNS record. libpq currently makes no real distinction in 
> handling of provided hosts and handling of their resolved IPs. If load 
> balancing
> would only apply to the host list that would start making a distinction
> between the two.

Fair enough, I agree.
 
> Apart from that the load balancing across IPs is one of the main reasons
> for my interest in this patch. The reason is that it allows expanding or 
> reducing
> the number of nodes that are being load balanced across transparently to the
> application. Which means that there's no need to re-deploy applications with 
> new connection strings when changing the number hosts.

That's a good point as well.
 
> > On the other hand, I believe pgJDBC keeps track of which hosts are up or
> > down and only load balances among the ones 

Re: [EXTERNAL] Re: Support load balancing in libpq

2022-09-17 Thread Michael Banck
Hi,

On Wed, Sep 14, 2022 at 05:53:48PM +0300, Maxim Orlov wrote:
> > Also, IMO, the solution must have a fallback mechanism if the
> > standby/chosen host isn't reachable.
> 
> Yeah, I think it should. I'm not insisting on a particular name of options
> here, but in my view, the overall idea may be next:
> - we have two libpq's options: "load_balance_hosts" and "failover_timeout";
> - the "load_balance_hosts" should be "sequential" or "random";
> - the "failover_timeout" is a time period, within which, if connection to
> the server is not established, we switch to the next address or host.

Isn't this exactly what connect_timeout is providing? In my tests, it
worked exactly as I would expect it, i.e. after connect_timeout seconds,
libpq was re-shuffling and going for another host.

If you specify only one host (or all are down), you get an error.

In any case, I am not sure what failover has to do with it if we are
talking about initiating connections - usually failover is for already
established connections that suddendly go away for one reason or
another.

Or maybe I'm just not understanding where you're getting at?

> While writing this text, I start thinking that load balancing is a
> combination of two parameters above.

So I guess what you are saying is that if load_balance_hosts is set,
not setting connect_timeout would be a hazard, cause it would stall the
connection attempt even though other hosts would be available.

That's right, but I guess it's already a hazard if you put multiple
hosts in there, and the connection is not immediately failed (because
the host doesn't exist or it rejects the connection) but stalled by a
firewall for one host, while other hosts later on in the list would be
happy to accept connections.

So maybe this is something to think about, but just changing the
defaul of connect_timeout to something else when load balancing is on
would be very surprising. In any case, I don't think this absolutely
needs to be addressed by the initial feature, it could be expanded upon
later on if needed, the feature is useful on its own, along with
connect_timeout.


Michael




Re: [EXTERNAL] Re: Support load balancing in libpq

2022-09-14 Thread Maxim Orlov
+1 for overall idea of load balancing via random host selection.

For the patch itself, I think it is better to use a more precise time
function in libpq_prng_init or call it only once.
Thought it is a special corner case, imagine all the connection attempts at
first second will be seeded with the save
value, i.e. will attempt to connect to the same host. I think, this is not
we want to achieve.

And the "hostroder" option should be free'd in freePGconn.

> Also, IMO, the solution must have a fallback mechanism if the
> standby/chosen host isn't reachable.

Yeah, I think it should. I'm not insisting on a particular name of options
here, but in my view, the overall idea may be next:
- we have two libpq's options: "load_balance_hosts" and "failover_timeout";
- the "load_balance_hosts" should be "sequential" or "random";
- the "failover_timeout" is a time period, within which, if connection to
the server is not established, we switch to the next address or host.

While writing this text, I start thinking that load balancing is a
combination of two parameters above.

> 3) Isn't it good to provide a way to test the patch?

Good idea too. I think, we should add tap test here.


-- 
Best regards,
Maxim Orlov.


Re: [EXTERNAL] Re: Support load balancing in libpq

2022-09-12 Thread Jelte Fennema
Attached is an updated patch with the following changes:
1. rebased (including solved merge conflict)
2. fixed failing tests in CI
3. changed the commit message a little bit
4. addressed the two remarks from Micheal
5. changed the prng_state from a global to a connection level value for 
thread-safety
6. use pg_prng_uint64_range

> Maybe my imagination is not so great, but what else than hosts could we
> possibly load-balance? I don't mind calling it load_balance, but I also
> don't feel very strongly one way or the other and this is clearly
> bikeshed territory.

I agree, which is why I called it load_balance in my original patch. But I also
think it's useful to match the naming for the already existing implementations 
in the PG ecosystem around this. But like you I don't really feel strongly 
either
way. It's a tradeoff between short name and consistency in the ecosystem.

> If I understand correctly, you've added DNS-based load balancing on top
> of just shuffling the provided hostnames.  This makes sense if a
> hostname is backed by more than one IP address in the context of load
> balancing, but it also complicates the patch. So I'm wondering how much
> shorter the patch would be if you leave that out for now?

Yes, that's correct and indeed the patch would be simpler without, i.e. all the
addrinfo changes would become unnecessary. But IMHO the behaviour of 
the added option would be very unexpected if it didn't load balance across
multiple IPs in a DNS record. libpq currently makes no real distinction in 
handling of provided hosts and handling of their resolved IPs. If load balancing
would only apply to the host list that would start making a distinction
between the two.

Apart from that the load balancing across IPs is one of the main reasons
for my interest in this patch. The reason is that it allows expanding or 
reducing
the number of nodes that are being load balanced across transparently to the
application. Which means that there's no need to re-deploy applications with 
new connection strings when changing the number hosts.

> On the other hand, I believe pgJDBC keeps track of which hosts are up or
> down and only load balances among the ones which are up (maybe
> rechecking after a timeout? I don't remember), is this something you're
> doing, or did you consider it?

I don't think it's possible to do this in libpq without huge changes to its
architecture, since normally a connection will only a PGconn will only
create a single connection. The reason pgJDBC can do this is because
it's actually a connection pooler, so it will open more than one connection 
and can thus keep some global state about the different hosts.

Jelte

0001-Support-load-balancing-in-libpq.patch
Description: 0001-Support-load-balancing-in-libpq.patch


Re: [EXTERNAL] Re: Support load balancing in libpq

2022-07-05 Thread Jelte Fennema
> I'm quoting a previous attempt by Satyanarayana Narlapuram on this
> topic [1], it also has a patch set.

Thanks for sharing that. It's indeed a different approach to solve the
same problem. I think my approach is much simpler, since it only 
requires minimal changes to the libpq client and none to the postgres 
server or the postgres protocol.

However, that linked patch is more flexible due to allowing redirection
based on users and databases. With my patch something similar could
still be achieved by using different hostname lists for different databases
or users at the client side.

To be completely clear on the core difference between the patch IMO:
In this patch a DNS server or (a hardcoded hostname/IP list at the client
side) is used to determine what host to connect to. In the linked
patch instead the Postgres server starts being the source of truth of what
to connect to, thus essentially becoming something similar to a DNS server.

> We may not have to go the extra
> mile to determine all of these parameters dynamically during query
> authentication time, but we can let users provide a list of standby
> hosts based on "some" priority (Satya's thread [1] attempts to do
> this, in a way, with users specifying the hosts via pg_hba.conf file).
> If required, randomization in choosing the hosts can be optional.

I'm not sure if you read my response to Aleksander. I feel like I
addressed part of this at least. But maybe I was not clear enough, 
or added too much fluff. So, I'll re-iterate the important part:
By specifying the same host multiple times in the DNS response or
the hostname/IP list you can achieve weighted load balancing.

Few thoughts on the patch:
> 1) How are we determining if the submitted query is read-only or write?

This is not part of this patch. libpq and thus this patch works at the 
connection 
level, not at the query level, so determining a read-only query or write only 
query
is not possible without large changes.

However, libpq already has a target_session_attrs[1] connection option. This 
can be 
used to open connections specifically to read-only or writable servers. However,
once a read-only connection is opened it is then the responsibility of the 
client 
not to send write queries over this read-only connection, otherwise they will 
fail.

> 2) What happens for explicit transactions? The queries related to the
> same txn get executed on the same host right? How are we guaranteeing
> this?

We're load balancing connections, not queries. Once a connection is made
all queries on that connection will be executed on the same host. 

> 3) Isn't it good to provide a way to test the patch?

The way I tested it myself was by setting up a few databases on my local machine
listening on 127.0.0.1, 127.0.0.2, 127.0.0.3 and then putting all those in the 
connection 
string. Then looking at the connection attempts on the servers their logs 
showed that
the client was indeed connecting to a random one (by using log_connections=true 
in postgresql.conf).

I would definitely like to have some automated tests for this, but I couldn't 
find tests
for libpq that were connecting to multiple postgres servers. If they exist, any 
pointers
are appreciated. If they don't exist, pointers to similar tests are also 
appreciated.

[1]: 
https://www.postgresql.org/docs/current/libpq-connect.html#LIBPQ-CONNECT-TARGET-SESSION-ATTRS