Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Daniel Gustafsson
> On 30 Mar 2023, at 10:21, Daniel Gustafsson  wrote:
> 
>> On 30 Mar 2023, at 10:00, Julien Rouhaud  wrote:
>> 
>> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>>> 
 On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
  wrote:
>>> 
 While checking the buildfarm, I found a failure on NetBSD caused by the 
 added code[1]:
>>> 
>>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
>>> 4.7.2)
>>> has the same error.  I'll look into it today to get a fix committed.
>> 
>> This is an i686 machine, so it probably has the same void *
>> difference.  Building with -m32 might be enough to reproduce the
>> problem.
> 
> Makes sense.  I think the best option is to simply remove conn from being part
> of the seed and rely on the other values.  Will apply that after a testrun.

After some offlist discussion I ended up pushing the proposed uintptr_t fix
instead, now waiting for these animals to build.

--
Daniel Gustafsson





Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Daniel Gustafsson
> On 30 Mar 2023, at 10:00, Julien Rouhaud  wrote:
> 
> On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>> 
>>> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
>>>  wrote:
>> 
>>> While checking the buildfarm, I found a failure on NetBSD caused by the 
>>> added code[1]:
>> 
>> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
>> 4.7.2)
>> has the same error.  I'll look into it today to get a fix committed.
> 
> This is an i686 machine, so it probably has the same void *
> difference.  Building with -m32 might be enough to reproduce the
> problem.

Makes sense.  I think the best option is to simply remove conn from being part
of the seed and rely on the other values.  Will apply that after a testrun.

--
Daniel Gustafsson





Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Julien Rouhaud
On Thu, Mar 30, 2023 at 3:03 PM Daniel Gustafsson  wrote:
>
> > On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu) 
> >  wrote:
>
> > While checking the buildfarm, I found a failure on NetBSD caused by the 
> > added code[1]:
>
> Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 
> 4.7.2)
> has the same error.  I'll look into it today to get a fix committed.

This is an i686 machine, so it probably has the same void *
difference.  Building with -m32 might be enough to reproduce the
problem.




Re: [EXTERNAL] Support load balancing in libpq

2023-03-30 Thread Daniel Gustafsson
> On 30 Mar 2023, at 03:48, Hayato Kuroda (Fujitsu)  
> wrote:

> While checking the buildfarm, I found a failure on NetBSD caused by the added 
> code[1]:

Thanks for reporting, I see that lapwing which runs Linux (Debian 7, gcc 4.7.2)
has the same error.  I'll look into it today to get a fix committed.

--
Daniel Gustafsson





RE: [EXTERNAL] Support load balancing in libpq

2023-03-29 Thread Hayato Kuroda (Fujitsu)
Dear Daniel, Jelte

Thank you for creating a good feature!
While checking the buildfarm, I found a failure on NetBSD caused by the added 
code[1]:

```
fe-connect.c: In function 'libpq_prng_init':
fe-connect.c:1048:11: error: cast from pointer to integer of different size 
[-Werror=pointer-to-int-cast]
 1048 |  rseed = ((uint64) conn) ^
  |   ^
cc1: all warnings being treated as errors
```

This failure seemed to occurr when the pointer is casted to different size.
And while checking more, I found that this machine seemed that size of pointer 
is 4 byte [2],
whereas sizeof(uint64) is 8.

```
checking size of void *... (cached) 4
```

I could not test because I do not have NetBSD, but I have come up with
Following solution to avoid the failure. sizeof(uintptr_t) will be addressed
based on the environment. How do you think?

```
diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index a13ec16b32..bb7347cb0c 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -1045,7 +1045,7 @@ libpq_prng_init(PGconn *conn)
 
gettimeofday(, NULL);
 
-   rseed = ((uint64) conn) ^
+   rseed = ((uintptr_t) conn) ^
((uint64) getpid()) ^
((uint64) tval.tv_usec) ^
((uint64) tval.tv_sec);
```

[1]: 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mamba=2023-03-29%2023%3A24%3A44
[2]: 
https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=mamba=2023-03-29%2023%3A24%3A44=configure

Best Regards,
Hayato Kuroda
FUJITSU LIMITED





Re: [EXTERNAL] Support load balancing in libpq

2023-03-29 Thread Daniel Gustafsson
I took another couple of looks at this and pushed it after a few small tweaks
to the docs.

--
Daniel Gustafsson





Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Tatsuo Ishii
> I think it's fine to remove it. It originated from postmaster.c, where
> I copied the original implementation of libpq_prng_init from.

I agree to remove unlikely macro here.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Tatsuo Ishii
>> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
>> if the place is really 'hot' to use "unlikely" macro.
> 
> I don't think it is, I was thinking to rewrite as the below sketch:
> 
> {
> if (pg_prng_strong_seed(>prng_state)))
> return;
> 
> /* fallback seeding */
> }

+1.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Jelte Fennema
I think it's fine to remove it. It originated from postmaster.c, where
I copied the original implementation of libpq_prng_init from.

On Tue, 28 Mar 2023 at 09:22, Daniel Gustafsson  wrote:
>
> > On 28 Mar 2023, at 09:16, Tatsuo Ishii  wrote:
>
> > "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
> > if the place is really 'hot' to use "unlikely" macro.
>
> I don't think it is, I was thinking to rewrite as the below sketch:
>
> {
> if (pg_prng_strong_seed(>prng_state)))
> return;
>
> /* fallback seeding */
> }
>
> --
> Daniel Gustafsson
>




Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Daniel Gustafsson
> On 28 Mar 2023, at 09:16, Tatsuo Ishii  wrote:

> "unlikely" macro is used in libpq_prng_init() in the patch. I wonder
> if the place is really 'hot' to use "unlikely" macro.

I don't think it is, I was thinking to rewrite as the below sketch:

{
if (pg_prng_strong_seed(>prng_state)))
return;

/* fallback seeding */
}

--
Daniel Gustafsson





Re: [EXTERNAL] Support load balancing in libpq

2023-03-28 Thread Tatsuo Ishii
Hi,

"unlikely" macro is used in libpq_prng_init() in the patch. I wonder
if the place is really 'hot' to use "unlikely" macro.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Aleksander Alekseev
Hi,

> I guess you didn't set up the hostnames in /etc/hosts as described in
> 004_load_balance_dns.pl. Then it's expected that the loop body isn't
> covered. As discussed upthread, running this test manually is much
> more cumbersome than is desirable, but it's still better than not
> having the test at all, because it is run in CI.

Got it, thanks.

I guess I'm completely out of nitpicks then!

-- 
Best regards,
Aleksander Alekseev




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Jelte Fennema
> > ```
> > if (conn->addr == NULL && conn->naddr != 0)
> > ```

Afaict this is not necessary, since getaddrinfo already returns an
error if the host could not be resolved to any addresses. A quick test
gives me this error:
error: could not translate host name "doesnotexist" to address: Name
or service not known

>
> ```
> +}
> +else
> +conn->load_balance_type = LOAD_BALANCE_DISABLE;
> ```
>
> The else branch is never executed.

I don't think that line is coverable then. There's definitely places
in the test suite where load_balance_hosts is not explicitly set. But
even in those cases I guess the argument parsing logic will use
DefaultLoadBalanceHosts instead of NULL as a value for
conn->load_balance_type.

> Strangely enough the body of the for loop is never executed either.
> Apparently only one address is used and there is nothing to shuffle?
>
> Here is the exact command I used to build the code coverage report:

I guess you didn't set up the hostnames in /etc/hosts as described in
004_load_balance_dns.pl. Then it's expected that the loop body isn't
covered. As discussed upthread, running this test manually is much
more cumbersome than is desirable, but it's still better than not
having the test at all, because it is run in CI.




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Aleksander Alekseev
Hi,

> ```
> +ret = store_conn_addrinfo(conn, addrlist);
> +pg_freeaddrinfo_all(hint.ai_family, addrlist);
> +if (ret)
> +goto error_return;/* message already logged */
> ```
> The goto path is not test-covered.

D'oh, this one is fine since store_conn_addrinfo() is going to fail
only when we are out of memory.

-- 
Best regards,
Aleksander Alekseev




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Aleksander Alekseev
Hi,

> So I think it should be:
>
> ```
> if (conn->addr == NULL && conn->naddr != 0)
> ```
>
> [...]
>
> I will take a look at v16 now.

The code coverage could be slightly better.

In v16-0001:

```
+ret = store_conn_addrinfo(conn, addrlist);
+pg_freeaddrinfo_all(hint.ai_family, addrlist);
+if (ret)
+goto error_return;/* message already logged */
```

The goto path is not test-covered.

In v16-0002:

```
+}
+else
+conn->load_balance_type = LOAD_BALANCE_DISABLE;
```

The else branch is never executed.

```
 if (ret)
 goto error_return;/* message already logged */

+/*
+ * If random load balancing is enabled we shuffle the addresses.
+ */
+if (conn->load_balance_type == LOAD_BALANCE_RANDOM)
+{
+/*
+ * This is the "inside-out" variant of the Fisher-Yates shuffle
[...]
+ */
+for (int i = 1; i < conn->naddr; i++)
+{
+intj =
pg_prng_uint64_range(>prng_state, 0, i);
+AddrInfotemp = conn->addr[j];
+
+conn->addr[j] = conn->addr[i];
+conn->addr[i] = temp;
+}
+}
```

Strangely enough the body of the for loop is never executed either.
Apparently only one address is used and there is nothing to shuffle?

Here is the exact command I used to build the code coverage report:

```
git clean -dfx && meson setup --buildtype debug -Db_coverage=true
-Dcassert=true -DPG_TEST_EXTRA="kerberos ldap ssl load_balance"
-Dldap=disabled -Dssl=openssl -Dtap_tests=enabled
-Dprefix=/home/eax/projects/pginstall build && ninja -C build &&
PG_TEST_EXTRA=1 meson test -C build && ninja -C build coverage-html
```

I'm sharing this for the sake of completeness. I don't have a strong
opinion on whether we should bother with covering every new line of
code with tests.

Except for the named nitpicks v16 looks good to me.

-- 
Best regards,
Aleksander Alekseev




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Aleksander Alekseev
Hi,

> > ▶ 6/6 - received at least one connection on node1   OK
> > ▶ 6/6 - received at least one connection on node2   OK
> > ▶ 6/6 - received at least one connection on node3   OK
> > ▶ 6/6 - received 50 connections across all nodesOK
>
> Good point.
>
> > Finally, I changed a few small typos in your updated commit message
> > (some of which originated from my earlier commit messages)
>
> +1

Hi,

> I would like to see this wrapped up in the current CF, what do you think about
> the attached?

In v15-0001:

```
+conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
+if (conn->addr == NULL)
+{
+libpq_append_conn_error(conn, "out of memory");
+return 1;
+}
```

According to the man pages, in a corner case when naddr is 0 calloc
can return NULL which will not indicate an error.

So I think it should be:

```
if (conn->addr == NULL && conn->naddr != 0)
```

Other than that v15 looked very good. It was checked on Linux and
MacOS including running it under sanitizer.

I will take a look at v16 now.

-- 
Best regards,
Aleksander Alekseev




Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Daniel Gustafsson
> On 27 Mar 2023, at 13:50, Jelte Fennema  wrote:
> 
> Looks good overall. I attached a new version with a few small changes:
> 
>>  * Changed store_conn_addrinfo to return int like how all the functions
>>dealing with addrinfo does.  Also moved the error reporting to inside 
>> there
>>where the error happened.
> 
> I don't feel strong about the int vs bool return type. The existing
> static libpq functions are a bit of a mixed bag around this, so either
> way seems fine to me. And moving the log inside the function seems
> fine too. But it seems you accidentally removed the "goto
> error_return" part as well, so now we're completely ignoring the
> allocation failure. The attached patch fixes that.

Ugh, thanks.  I had a conflict here when rebasing with the load balancing
commit in place and clearly fat-fingered that one.

>> +ok($node1_occurences > 1, "expected at least one execution on node1, found 
>> none");
>> +ok($node2_occurences > 1, "expected at least one execution on node2, found 
>> none");
>> +ok($node3_occurences > 1, "expected at least one execution on node3, found 
>> none");
> 
> I changed the message to be a description of the expected case,
> instead of the failure case. This is in line with the way these
> messages are used in other tests, and indeed seems like the correct
> way because you get output from "meson test -v postgresql:libpq /
> libpq/003_load_balance_host_list" like this:
> ▶ 6/6 - received at least one connection on node1   OK
> ▶ 6/6 - received at least one connection on node2   OK
> ▶ 6/6 - received at least one connection on node3   OK
> ▶ 6/6 - received 50 connections across all nodesOK

Good point.

> Finally, I changed a few small typos in your updated commit message
> (some of which originated from my earlier commit messages)

+1

--
Daniel Gustafsson





Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Jelte Fennema
Looks good overall. I attached a new version with a few small changes:

>   * Changed store_conn_addrinfo to return int like how all the functions
> dealing with addrinfo does.  Also moved the error reporting to inside 
> there
> where the error happened.

I don't feel strong about the int vs bool return type. The existing
static libpq functions are a bit of a mixed bag around this, so either
way seems fine to me. And moving the log inside the function seems
fine too. But it seems you accidentally removed the "goto
error_return" part as well, so now we're completely ignoring the
allocation failure. The attached patch fixes that.

>+ok($node1_occurences > 1, "expected at least one execution on node1, found 
>none");
>+ok($node2_occurences > 1, "expected at least one execution on node2, found 
>none");
>+ok($node3_occurences > 1, "expected at least one execution on node3, found 
>none");

I changed the message to be a description of the expected case,
instead of the failure case. This is in line with the way these
messages are used in other tests, and indeed seems like the correct
way because you get output from "meson test -v postgresql:libpq /
libpq/003_load_balance_host_list" like this:
▶ 6/6 - received at least one connection on node1   OK
▶ 6/6 - received at least one connection on node2   OK
▶ 6/6 - received at least one connection on node3   OK
▶ 6/6 - received 50 connections across all nodesOK

Finally, I changed a few small typos in your updated commit message
(some of which originated from my earlier commit messages)


v16-0001-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch
Description: Binary data


v16-0002-Support-connection-load-balancing-in-libpq.patch
Description: Binary data


Re: [EXTERNAL] Support load balancing in libpq

2023-03-27 Thread Daniel Gustafsson
> On 17 Mar 2023, at 09:50, Jelte Fennema  wrote:
> 
>> The documentation lists the modes disabled and random, but I wonder if it's
>> worth expanding the docs to mention that "disabled" is pretty much a round
>> robin load balancing scheme?  It reads a bit odd to present load balancing
>> without a mention of round robin balancing given how common it is.
> 
> I think you misunderstood what I meant in that section, so I rewrote
> it to hopefully be clearer. Because disabled really isn't the same as
> round-robin.

Thinking more about it I removed that section since it adds more confusion than
it resolves I think.  It would be interesting to make it a true round-robin
with some form of locally stored pointer to last connection but thats for
future hacking.

>> -#ifndef WIN32
>> +/* MinGW has sys/time.h, but MSVC doesn't */
>> +#ifndef _MSC_VER
>> #include 
>> This seems unrelated to the patch in question, and should be a separate 
>> commit IMO.
> 
> It's not really unrelated. This only started to be needed because
> libpq_prng_init calls gettimeofday . That did not work on MinGW
> systems. Before this patch libpq was never calling gettimeofday. So I
> think it makes sense to leave it in the commit.

Gotcha.

>> A test
>> which require root permission level manual system changes stand a very low
>> chance of ever being executed, and as such will equate to dead code that may
>> easily be broken or subtly broken.
> 
> While I definitely agree that it makes it hard to execute, I don't
> think that means it will be executed nearly as few times as you
> suggest. Maybe you missed it, but I modified the .cirrus.yml file to
> configure the hosts file for both Linux and Windows runs. So, while I
> agree it is unlikely to be executed manually by many people, it would
> still be run on every commit fest entry (which should capture most
> issues that I can imagine could occur).

I did see it was used in the CI since the jobs there are containerized, what
I'm less happy about is that we wont be able to test this in the BF.  That
being said, not having the test at all would mean even less testing so in the
end I agree that including it is the least bad option.  Longer term I would
like to rework into something less do-this-manually test, but I have no good
ideas right now.

I've played around some more with this and came up with the attached v15 which
I think is close to the final state.  The changes I've made are:

  * Added the DNS test back into the main commit
  * A few incorrect (referred to how the test worked previously) comments in
the tests fixed.
  * The check against PG_TEST_EXTRA performed before any processing done
  * Reworked the check for hosts content attempting to make it a bit more
robust
  * Changed store_conn_addrinfo to return int like how all the functions
dealing with addrinfo does.  Also moved the error reporting to inside there
where the error happened.
  * Made the prng init function void as it always returned true anyways.
  * Minor comment and docs tweaking.
  * I removed the change to geqo, while I don't think it's incorrect it also
hardly seems worth the churn.
  * Commit messages are reworded.

I would like to see this wrapped up in the current CF, what do you think about
the attached?

--
Daniel Gustafsson



v15-0002-Support-connection-load-balancing-in-libpq.patch
Description: Binary data


v15-0001-Copy-and-store-addrinfo-in-libpq-owned-private-m.patch
Description: Binary data


Re: [EXTERNAL] Support load balancing in libpq

2023-03-22 Thread Jelte Fennema
Rebased patch after conflicts with bfc9497ece01c7c45437bc36387cb1ebe346f4d2


v14-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v14-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v14-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v14-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v14-0004-Add-DNS-based-libpq-load-balancing-test.patch
Description: Binary data


Re: [EXTERNAL] Support load balancing in libpq

2023-03-17 Thread Jelte Fennema
> The documentation lists the modes disabled and random, but I wonder if it's
> worth expanding the docs to mention that "disabled" is pretty much a round
> robin load balancing scheme?  It reads a bit odd to present load balancing
> without a mention of round robin balancing given how common it is.

I think you misunderstood what I meant in that section, so I rewrote
it to hopefully be clearer. Because disabled really isn't the same as
round-robin.

> This removes all uses of conn->addrlist_family and that struct member can be
> removed.

done

> s/stanby/standby/
> s/Postgres/PostgreSQL/

done

> The documentation typically use a less personal form, I would suggest 
> something
> along the lines of:
>
> "If uniform load balancing is required then an external load balancing 
> tool
> must be used.  Non-uniform load balancing can also be used to skew the
> results, e.g.  by providing the.."

rewrote this to stop using "you" and expanded a bit on the topic

>   libpq_append_conn_error(conn, "unable to initiate random number generator");

done

> -#ifndef WIN32
> +/* MinGW has sys/time.h, but MSVC doesn't */
> +#ifndef _MSC_VER
>  #include 
> This seems unrelated to the patch in question, and should be a separate 
> commit IMO.

It's not really unrelated. This only started to be needed because
libpq_prng_init calls gettimeofday . That did not work on MinGW
systems. Before this patch libpq was never calling gettimeofday. So I
think it makes sense to leave it in the commit.

> +   LOAD_BALANCE_RANDOM,/* Read-write server */
> I assume this comment is a copy/paste and should have been reflecting random 
> order?

Yes, done

> +++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
> Nitpick, but we should probably rename this load_balance to match the 
> parameter
> being tested.

Done

> A test
> which require root permission level manual system changes stand a very low
> chance of ever being executed, and as such will equate to dead code that may
> easily be broken or subtly broken.

While I definitely agree that it makes it hard to execute, I don't
think that means it will be executed nearly as few times as you
suggest. Maybe you missed it, but I modified the .cirrus.yml file to
configure the hosts file for both Linux and Windows runs. So, while I
agree it is unlikely to be executed manually by many people, it would
still be run on every commit fest entry (which should capture most
issues that I can imagine could occur).

> I am also not a fan of the random_seed parameter.

Fair enough. Removed

> A few ideas:
>
>   * Add basic tests for the load_balance_host connection param only accepting
> sane values
>
>   * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not 
> require
> random_seed but instead using randomization. Thinking out loud, how about
> something along these lines?
> - Passing a list of unreachable hostnames with a single good hostname
>   should result in a connection.
> - Passing a list of one good hostname should result in a connection
> - Passing a list on n good hostname (where all are equal) should result in
>   a connection

Implemented all these.

> - Passing a list of n unreachable hostnames should result in log entries
>   for n broken resolves in some order, and running that test n' times
>   shouldn't - statistically - result in the same order for a large enough 
> n'.

I didn't implement this one. Instead I went for another statistics
based approach with working hosts (see test for details).

>   * Remove random_seed and 004_loadbalance_dns.pl

I moved 004_load_balance_dns.pl to a separate commit (after making
similar random_seed removal related changes to it). As explained above
I think it's worth it to have it because it gets executed in CI. But
feel free to commit only the main patch, if you disagree.


v13-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch
Description: Binary data


v13-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch
Description: Binary data


v13-0003-Support-load-balancing-in-libpq.patch
Description: Binary data


v13-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch
Description: Binary data


v13-0004-Add-DNS-based-libpq-load-balancing-test.patch
Description: Binary data


Re: [EXTERNAL] Support load balancing in libpq

2023-03-16 Thread Daniel Gustafsson
In general I think this feature makes sense (which has been echoed many times
in the thread), and the implementation strikes a good balance of robustness and
simplicity.  Reading this I think it's very close to being committable, but I
have a few comments on the patch series:

+sent to the same server. There are currently two modes:
The documentation lists the modes disabled and random, but I wonder if it's
worth expanding the docs to mention that "disabled" is pretty much a round
robin load balancing scheme?  It reads a bit odd to present load balancing
without a mention of round robin balancing given how common it is.

-   conn->addrlist_family = hint.ai_family = AF_UNSPEC;
+   hint.ai_family = AF_UNSPEC;
This removes all uses of conn->addrlist_family and that struct member can be
removed.

+to, for example, load balance over stanby servers only. Once 
successfully
s/stanby/standby/

+Postgres servers.
s/Postgres/PostgreSQL/

+more addresses than others. So if you want uniform load balancing,
+this is something to keep in mind. However, non-uniform load
+balancing can also be used to your advantage, e.g. by providing the
The documentation typically use a less personal form, I would suggest something
along the lines of:

"If uniform load balancing is required then an external load balancing tool
must be used.  Non-uniform load balancing can also be used to skew the
results, e.g.  by providing the.."

+   if (!libpq_prng_init(conn))
+   return false;
This needs to set a returned error message with libpq_append_conn_error (feel
free to come up with a better wording of course):

  libpq_append_conn_error(conn, "unable to initiate random number generator");

-#ifndef WIN32
+/* MinGW has sys/time.h, but MSVC doesn't */
+#ifndef _MSC_VER
 #include 
This seems unrelated to the patch in question, and should be a separate commit 
IMO.

+   LOAD_BALANCE_RANDOM,/* Read-write server */
I assume this comment is a copy/paste and should have been reflecting random 
order?

+++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
Nitpick, but we should probably rename this load_balance to match the parameter
being tested.

+++ b/src/interfaces/libpq/t/004_loadbalance_dns.pl
On the subject of tests, I'm a bit torn.  I appreciate that the patch includes
a thorough test, but I'm not convinced we should add that to the tree.  A test
which require root permission level manual system changes stand a very low
chance of ever being executed, and as such will equate to dead code that may
easily be broken or subtly broken.
 
I am also not a fan of the random_seed parameter.  A parameter only useful for
testing, and which enables testing by circumventing the mechanism to test
(making randomness predictable), seems like expensive bagage to carry around.
From experience we also know this risks ending up in production configs for all
the wrong reasons.

Given the implementation of this feature, the actual connection phase isn't any
different from just passing multiple hostnames and operating in the round-robin
fashion.  What we want to ensure is that the randomization isn't destroying the
connection array.  Let's see what we can do while still having tests that can
be executed in the buildfarm.

A few ideas:

  * Add basic tests for the load_balance_host connection param only accepting
sane values

  * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not require
random_seed but instead using randomization. Thinking out loud, how about
something along these lines?
- Passing a list of unreachable hostnames with a single good hostname
  should result in a connection.
- Passing a list of one good hostname should result in a connection
- Passing a list on n good hostname (where all are equal) should result in
  a connection
- Passing a list of n unreachable hostnames should result in log entries
  for n broken resolves in some order, and running that test n' times
  shouldn't - statistically - result in the same order for a large enough 
n'.

  * Remove random_seed and 004_loadbalance_dns.pl

Thoughts?

--
Daniel Gustafsson