Hello devs,

While reviewing various patches by Tom which are focussing on libpq multi-host behavior,

        https://commitfest.postgresql.org/19/1749/
        https://commitfest.postgresql.org/19/1752/

it occured to me that there are a few more problems with the documentation, the host/hostaddr feature, and the consistency of both. Namely:

* According to the documentation, either "host" or "hostaddr" can be specified. The former for names and socket directories, the later for ip addresses. If both are specified, "hostaddr" supersedes "host", and it may be used for various authentication purposes.

However, the actual capability is slightly different: specifying an ip address to "host" does work, without ensuing any name or reverse name look-ups, even if this is undocumented. This means that the host/hostaddr dichotomy is somehow moot as host can already be used for the same purpose.

* \conninfo does not follow the implemented logic, and, as there is no sanity check performed on the specification, it can display wrong informations, which are not going to be helpful to anyone with a problem to solve and trying to figure out the current state:

  sh> psql "host=/tmp hostaddr=127.0.0.1"
  psql> \conninfo
  You are connected to database "fabien" as user "fabien" via socket in "/tmp" at port 
"5432"
  # wrong, it is really connected to 127.0.0.1 by TCP/IP

  sh> psql "host=127.0.0.2 hostaddr=127.0.0.1"
  psql> \conninfo
  You are connected to database "fabien" as user "fabien" on host "127.0.0.2" at port 
"5432".
  # wrong again, it is really connected to 127.0.0.1

  sh> psql "hostaddr=127.0.0.1"
  psql> \conninfo
  You are connected to database "fabien" as user "fabien" via socket in 
"/var/run/postgresql" at port "5432".
  # wrong again

* Another issue with \conninfo is that if a host resolves to multiple ips, there is no way to know which was chosen and/or worked, although on errors some messages show the failing ip.

* The host/hostaddr dichotomy worsens when several targets are specified, because according to the documentation you should specify either names & dirs as host and ips as hostaddr, which leads to pretty strange spec each being a possible source of confusion and unhelpful messages as described above:

  sh> psql "host=localhost,127.0.0.2,, hostaddr=127.0.0.1,,127.0.0.3,"
  # attempt 1 is 127.0.0.1 identified as localhost
  # attempt 2 is 127.0.0.2
  # attempt 3 is 127.0.0.3 identified as the default, whatever it is
  # attempt 4 is really the default

* The documentation about host/hostaddr/port accepting lists is really added as an afterthought: the features are presented for one, and then the list is mentionned. Moreover there are quite a few repeats between the paragraph about defaults and so.


Given this state of affair ISTM that the situation would be clarified by:

(1) describing "host" full capability to accept names, ips and dirs.

(2) describing "hostaddr" as a look-up shortcut. Maybe the "hostaddr" could be renamed in passing, eg "resolve" to outline that it is just a lookup shortcut, and not a partial alternative to "host".

(3) checking that hostaddr non empty addresses are only accepted if the corresponding host is a name. The user must use the "host=ip" syntax
to connect to an ip.

(4) teaching \conninfo to show the real connection, which probably require extending libpq to access the underlying ip, eg PQaddr or PQhostaddr or whatever.

The attached patch does 1-3 (2 without renaming, though).

Thoughts?

--
Fabien.
diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 49de975e7f..6b3e0b0b87 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -955,23 +955,28 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
       <term><literal>host</literal></term>
       <listitem>
        <para>
-        Name of host to connect to.<indexterm><primary>host name</primary></indexterm>
-        If a host name begins with a slash, it specifies Unix-domain
-        communication rather than TCP/IP communication; the value is the
-        name of the directory in which the socket file is stored.  If
-        multiple host names are specified, each will be tried in turn in
-        the order given.  The default behavior when <literal>host</literal> is
-        not specified, or is empty, is to connect to a Unix-domain
+        Comma-separated list of hosts to connect to.<indexterm><primary>host name</primary></indexterm>
+        Each item may be a host name that will be resolved with a look-up,
+        a numeric IP address (IPv4 in the standard format, e.g.,
+        <literal>172.28.40.9</literal>, or IPv6 if supported by your machine)
+        that will be used directly, or
+        the name of a directory which contains the socket file for Unix-domain
+        communication rather than TCP/IP communication
+        (the specification must then begin with a slash);
+       </para>
+
+       <para>
+        The default behavior when <literal>host</literal> is
+        not specified, or an item is empty, is to connect to a Unix-domain
         socket<indexterm><primary>Unix domain socket</primary></indexterm> in
         <filename>/tmp</filename> (or whatever socket directory was specified
         when <productname>PostgreSQL</productname> was built). On machines without
         Unix-domain sockets, the default is to connect to <literal>localhost</literal>.
        </para>
+
        <para>
-        A comma-separated list of host names is also accepted, in which case
-        each host name in the list is tried in order; an empty item in the
-        list selects the default behavior as explained above. See
-        <xref linkend="libpq-multiple-hosts"/> for details.
+        If multiple host names are specified, each will be tried in turn in
+        the order given.  See <xref linkend="libpq-multiple-hosts"/> for details.
        </para>
       </listitem>
      </varlistentry>
@@ -980,65 +985,25 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
       <term><literal>hostaddr</literal></term>
       <listitem>
        <para>
-        Numeric IP address of host to connect to.  This should be in the
-        standard IPv4 address format, e.g., <literal>172.28.40.9</literal>.  If
-        your machine supports IPv6, you can also use those addresses.
-        TCP/IP communication is
-        always used when a nonempty string is specified for this parameter.
+        Comma-separated numeric IP addresses corresponding one-to-one to
+        <literal>host</literal>, to avoid a host name look-up when non empty.
+        This may be desirable for testing, to work around NAT settings, or for
+        better performance.
+        TCP/IP communication is always used when a nonempty string is specified
+        for this parameter.
        </para>
 
        <para>
-        Using <literal>hostaddr</literal> instead of <literal>host</literal> allows the
-        application to avoid a host name look-up, which might be important
-        in applications with time constraints. However, a host name is
-        required for GSSAPI or SSPI authentication
+        If a host name is required for GSSAPI or SSPI authentication
         methods, as well as for <literal>verify-full</literal> SSL
-        certificate verification.  The following rules are used:
-        <itemizedlist>
-         <listitem>
-          <para>
-           If <literal>host</literal> is specified without <literal>hostaddr</literal>,
-           a host name lookup occurs.
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-           If <literal>hostaddr</literal> is specified without <literal>host</literal>,
-           the value for <literal>hostaddr</literal> gives the server network address.
-           The connection attempt will fail if the authentication
-           method requires a host name.
-          </para>
-         </listitem>
-         <listitem>
-          <para>
-           If both <literal>host</literal> and <literal>hostaddr</literal> are specified,
-           the value for <literal>hostaddr</literal> gives the server network address.
-           The value for <literal>host</literal> is ignored unless the
-           authentication method requires it, in which case it will be
-           used as the host name.
-          </para>
-         </listitem>
-        </itemizedlist>
+        certificate verification, the corresponding host name is used.
+        The host name is also used to identify the connection in
+        a password file (see <xref linkend="libpq-pgpass"/>).
+       </para>
+
+       <para>
         Note that authentication is likely to fail if <literal>host</literal>
         is not the name of the server at network address <literal>hostaddr</literal>.
-        Also, when both <literal>host</literal> and <literal>hostaddr</literal>
-        are specified, <literal>host</literal>
-        is used to identify the connection in a password file (see
-        <xref linkend="libpq-pgpass"/>).
-       </para>
-
-       <para>
-        A comma-separated list of <literal>hostaddr</literal> values is also
-        accepted, in which case each host in the list is tried in order.
-        An empty item in the list causes the corresponding host name to be
-        used, or the default host name if that is empty as well. See
-        <xref linkend="libpq-multiple-hosts"/> for details.
-       </para>
-       <para>
-        Without either a host name or host address,
-        <application>libpq</application> will connect using a
-        local Unix-domain socket; or on machines without Unix-domain
-        sockets, it will attempt to connect to <literal>localhost</literal>.
        </para>
        </listitem>
       </varlistentry>
diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index 9315a27561..4c34224b8b 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -908,14 +908,15 @@ count_comma_separated_elems(const char *input)
 /*
  * Parse a simple comma-separated list.
  *
- * On each call, returns a malloc'd copy of the next element, and sets *more
- * to indicate whether there are any more elements in the list after this,
+ * Note: this parsing must be consistent with count_comma_separated_elems.
+ *
+ * On each call, returns a malloc'd copy of the next element,
  * and updates *startptr to point to the next element, if any.
  *
  * On out of memory, returns NULL.
  */
 static char *
-parse_comma_separated_list(char **startptr, bool *more)
+parse_comma_separated_list(char **startptr)
 {
 	char	   *p;
 	char	   *s = *startptr;
@@ -929,7 +930,6 @@ parse_comma_separated_list(char **startptr, bool *more)
 	e = s;
 	while (*e != '\0' && *e != ',')
 		++e;
-	*more = (*e == ',');
 
 	len = e - s;
 	p = (char *) malloc(sizeof(char) * (len + 1));
@@ -943,6 +943,22 @@ parse_comma_separated_list(char **startptr, bool *more)
 	return p;
 }
 
+/* tell whether it is an ipv4 or possibly ipv6 address */
+static bool
+look_like_an_ip(const char *str)
+{
+	int b0, b1, b2, b3;
+	char garbage;
+
+	if (sscanf(str, "%d.%d.%d.%d%c", &b0, &b1, &b2, &b3, &garbage) == 4 &&
+		0 <= b0 && b0 < 256 && 0 <= b1 && b1 < 256 &&
+		0 <= b2 && b2 < 256 && 0 <= b3 && b3 < 256)
+		return true;
+	else
+		/* ":" cannot appear in a host name */
+		return strchr(str, ':') != NULL;
+}
+
 /*
  *		connectOptions2
  *
@@ -958,16 +974,27 @@ connectOptions2(PGconn *conn)
 
 	/*
 	 * Allocate memory for details about each host to which we might possibly
-	 * try to connect.  For that, count the number of elements in the hostaddr
-	 * or host options.  If neither is given, assume one host.
+	 * try to connect.  For that, count the number of elements in the host
+	 * options or assume one host if not set.
 	 */
 	conn->whichhost = 0;
-	if (conn->pghostaddr && conn->pghostaddr[0] != '\0')
-		conn->nconnhost = count_comma_separated_elems(conn->pghostaddr);
-	else if (conn->pghost && conn->pghost[0] != '\0')
+
+	if (conn->pghost && conn->pghost[0] != '\0')
 		conn->nconnhost = count_comma_separated_elems(conn->pghost);
 	else
 		conn->nconnhost = 1;
+
+	/* check that hostaddr length is compatible */
+	if (conn->pghostaddr && conn->pghostaddr[0] != '\0' &&
+		count_comma_separated_elems(conn->pghostaddr) != conn->nconnhost)
+	{
+		conn->status = CONNECTION_BAD;
+		appendPQExpBuffer(&conn->errorMessage,
+						  libpq_gettext("host and hostaddr list must be of equal length, got %d and %d\n"),
+						  conn->nconnhost, count_comma_separated_elems(conn->pghostaddr));
+		return false;
+	}
+
 	conn->connhost = (pg_conn_host *)
 		calloc(conn->nconnhost, sizeof(pg_conn_host));
 	if (conn->connhost == NULL)
@@ -977,82 +1004,86 @@ connectOptions2(PGconn *conn)
 	 * We now have one pg_conn_host structure per possible host.  Fill in the
 	 * host and hostaddr fields for each, by splitting the parameter strings.
 	 */
-	if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0')
-	{
-		char	   *s = conn->pghostaddr;
-		bool		more = true;
-
-		for (i = 0; i < conn->nconnhost && more; i++)
-		{
-			conn->connhost[i].hostaddr = parse_comma_separated_list(&s, &more);
-			if (conn->connhost[i].hostaddr == NULL)
-				goto oom_error;
-		}
-
-		/*
-		 * If hostaddr was given, the array was allocated according to the
-		 * number of elements in the hostaddr list, so it really should be the
-		 * right size.
-		 */
-		Assert(!more);
-		Assert(i == conn->nconnhost);
-	}
-
 	if (conn->pghost != NULL && conn->pghost[0] != '\0')
 	{
 		char	   *s = conn->pghost;
-		bool		more = true;
 
-		for (i = 0; i < conn->nconnhost && more; i++)
+		for (i = 0; i < conn->nconnhost; i++)
 		{
-			conn->connhost[i].host = parse_comma_separated_list(&s, &more);
+			conn->connhost[i].host = parse_comma_separated_list(&s);
 			if (conn->connhost[i].host == NULL)
 				goto oom_error;
 		}
+	}
 
-		/* Check for wrong number of host items. */
-		if (more || i != conn->nconnhost)
+	/* we know that the number or host/hostaddr matches. */
+	if (conn->pghostaddr != NULL && conn->pghostaddr[0] != '\0')
+	{
+		char	   *s = conn->pghostaddr;
+
+		for (i = 0; i < conn->nconnhost; i++)
 		{
-			conn->status = CONNECTION_BAD;
-			printfPQExpBuffer(&conn->errorMessage,
-							  libpq_gettext("could not match %d host names to %d hostaddr values\n"),
-							  count_comma_separated_elems(conn->pghost), conn->nconnhost);
-			return false;
+			conn->connhost[i].hostaddr = parse_comma_separated_list(&s);
+			if (conn->connhost[i].hostaddr == NULL)
+				goto oom_error;
 		}
 	}
 
 	/*
-	 * Now, for each host slot, identify the type of address spec, and fill in
-	 * the default address if nothing was given.
+	 * Now, for each host slot, fill in the default address if necessary,
+	 * identify the type of address spec, and make some sanity checks.
 	 */
 	for (i = 0; i < conn->nconnhost; i++)
 	{
 		pg_conn_host *ch = &conn->connhost[i];
 
-		if (ch->hostaddr != NULL && ch->hostaddr[0] != '\0')
-			ch->type = CHT_HOST_ADDRESS;
-		else if (ch->host != NULL && ch->host[0] != '\0')
-		{
-			ch->type = CHT_HOST_NAME;
-#ifdef HAVE_UNIX_SOCKETS
-			if (is_absolute_path(ch->host))
-				ch->type = CHT_UNIX_SOCKET;
-#endif
-		}
-		else
+		if (ch->host == NULL || ch->host[0] == '\0')
 		{
 			if (ch->host)
 				free(ch->host);
 #ifdef HAVE_UNIX_SOCKETS
 			ch->host = strdup(DEFAULT_PGSOCKET_DIR);
-			ch->type = CHT_UNIX_SOCKET;
 #else
 			ch->host = strdup(DefaultHost);
-			ch->type = CHT_HOST_NAME;
 #endif
 			if (ch->host == NULL)
 				goto oom_error;
 		}
+
+#ifdef HAVE_UNIX_SOCKETS
+		if (is_absolute_path(ch->host))
+			ch->type = CHT_UNIX_SOCKET;
+		else
+#endif
+		if (look_like_an_ip(ch->host))
+			ch->type = CHT_HOST_ADDRESS;
+		else
+			ch->type = CHT_HOST_NAME;
+
+		if (ch->hostaddr != NULL && ch->hostaddr[0] != '\0')
+		{
+			/* hostaddr only allowed on host names */
+			if (ch->type != CHT_HOST_NAME)
+			{
+				conn->status = CONNECTION_BAD;
+				appendPQExpBuffer(&conn->errorMessage,
+								  libpq_gettext("host \"%s\" cannot have an hostaddr \"%s\"\n"),
+								  ch->host, ch->hostaddr);
+				return false;
+			}
+
+			if (!look_like_an_ip(ch->hostaddr))
+			{
+				conn->status = CONNECTION_BAD;
+				appendPQExpBuffer(&conn->errorMessage,
+								  libpq_gettext("hostaddr \"%s\" is not a numeric IP address\n"),
+								  ch->hostaddr);
+				return false;
+			}
+
+			/* hostaddr superseedes name for this connection */
+			ch->type = CHT_HOST_ADDRESS;
+		}
 	}
 
 	/*
@@ -1064,36 +1095,40 @@ connectOptions2(PGconn *conn)
 	 */
 	if (conn->pgport != NULL && conn->pgport[0] != '\0')
 	{
-		char	   *s = conn->pgport;
-		bool		more = true;
+		int			count = count_comma_separated_elems(conn->pgport);
 
-		for (i = 0; i < conn->nconnhost && more; i++)
+		if (count > 1)
 		{
-			conn->connhost[i].port = parse_comma_separated_list(&s, &more);
-			if (conn->connhost[i].port == NULL)
-				goto oom_error;
-		}
+			char	   *s = conn->pgport;
 
-		/*
-		 * If exactly one port was given, use it for every host.  Otherwise,
-		 * there must be exactly as many ports as there were hosts.
-		 */
-		if (i == 1 && !more)
-		{
-			for (i = 1; i < conn->nconnhost; i++)
+			/* there must be exactly as many ports as there were hosts. */
+			if (count != conn->nconnhost)
+			{
+				conn->status = CONNECTION_BAD;
+				printfPQExpBuffer(&conn->errorMessage,
+								  libpq_gettext("could not match %d port numbers to %d hosts\n"),
+								  count, conn->nconnhost);
+				return false;
+			}
+
+			for (i = 0; i < conn->nconnhost; i++)
 			{
-				conn->connhost[i].port = strdup(conn->connhost[0].port);
+				conn->connhost[i].port = parse_comma_separated_list(&s);
 				if (conn->connhost[i].port == NULL)
 					goto oom_error;
 			}
 		}
-		else if (more || i != conn->nconnhost)
+		else
 		{
-			conn->status = CONNECTION_BAD;
-			printfPQExpBuffer(&conn->errorMessage,
-							  libpq_gettext("could not match %d port numbers to %d hosts\n"),
-							  count_comma_separated_elems(conn->pgport), conn->nconnhost);
-			return false;
+			Assert(count == 1);
+
+			/* If exactly one port was given, use it for every host. */
+			for (i = 0; i < conn->nconnhost; i++)
+			{
+				conn->connhost[i].port = strdup(conn->pgport);
+				if (conn->connhost[i].port == NULL)
+					goto oom_error;
+			}
 		}
 	}
 
@@ -1773,6 +1808,7 @@ connectDBStart(PGconn *conn)
 		pg_conn_host *ch = &conn->connhost[i];
 		struct addrinfo hint;
 		int			thisport;
+		char *addr;
 
 		/* Initialize hint structure */
 		MemSet(&hint, 0, sizeof(hint));
@@ -1810,11 +1846,12 @@ connectDBStart(PGconn *conn)
 
 			case CHT_HOST_ADDRESS:
 				hint.ai_flags = AI_NUMERICHOST;
-				ret = pg_getaddrinfo_all(ch->hostaddr, portstr, &hint, &ch->addrlist);
+				addr = (ch->hostaddr != NULL && ch->hostaddr[0] != '\0') ? ch->hostaddr : ch->host;
+				ret = pg_getaddrinfo_all(addr, portstr, &hint, &ch->addrlist);
 				if (ret || !ch->addrlist)
 					appendPQExpBuffer(&conn->errorMessage,
 									  libpq_gettext("could not parse network address \"%s\": %s\n"),
-									  ch->hostaddr, gai_strerror(ret));
+									  addr, gai_strerror(ret));
 				break;
 
 			case CHT_UNIX_SOCKET:
@@ -6119,12 +6156,10 @@ PQhost(const PGconn *conn)
 
 	if (conn->connhost != NULL)
 	{
+		/* always true, defaults are filled in by Options2 */
 		if (conn->connhost[conn->whichhost].host != NULL &&
 			conn->connhost[conn->whichhost].host[0] != '\0')
 			return conn->connhost[conn->whichhost].host;
-		else if (conn->connhost[conn->whichhost].hostaddr != NULL &&
-				 conn->connhost[conn->whichhost].hostaddr[0] != '\0')
-			return conn->connhost[conn->whichhost].hostaddr;
 	}
 
 	return "";

Reply via email to