Re: Experiments with Postgres and SSL

2024-02-19 Thread Matthias van de Meent
I've been asked to take a look at this thread and review some patches,
and the subject looks interesting enough, so here I am.

On Thu, 19 Jan 2023 at 04:16, Greg Stark  wrote:
> I had a conversation a while back with Heikki where he expressed that
> it was annoying that we negotiate SSL/TLS the way we do since it
> introduces an extra round trip. Aside from the performance
> optimization I think accepting standard TLS connections would open the
> door to a number of other opportunities that would be worth it on
> their own.

I agree that this would be very nice.

> Other things it would open the door to in order from least
> controversial to most
>
> * Hiding Postgres behind a standard SSL proxy terminating SSL without
> implementing the Postgres protocol.

I think there is also the option "hiding Postgres behind a standard
SNI-based SSL router that does not terminate SSL", as that's arguably
a more secure way to deploy any SSL service than SSL-terminating
proxies.

> * "Service Mesh" type tools that hide multiple services behind a
> single host/port ("Service Mesh" is just a new buzzword for "proxy").

People proxying PostgreSQL seems fine, and enabling better proxying
seems reasonable.

> * Browser-based protocol implementations using websockets for things
> like pgadmin or other tools to connect directly to postgres using
> Postgres wire protocol but using native SSL implementations.
>
> * Postgres could even implement an HTTP based version of its protocol
> and enable things like queries or browser based tools using straight
> up HTTP requests so they don't need to use websockets.
>
> * Postgres could implement other protocols to serve up data like
> status queries or monitoring metrics, using HTTP based standard
> protocols instead of using our own protocol.

I don't think we should be trying to serve anything HTTP-like, even
with a ten-foot pole, on a port that we serve the PostgreSQL wire
protocol on.

If someone wants to multiplex the PostgreSQL wire protocol on the same
port that serves HTTPS traffic, they're welcome to do so with their
own proxy, but I'd rather we keep the PostgreSQL server's socket
handling fundamentaly incapable of servicng protocols primarily used
in web browsers on the same socket that handles normal psql data
connections.

PostgreSQL may have its own host-based authentication with HBA, but
I'd rather not have to depend on it to filter incoming connections
between valid psql connections and people trying to grab the latest
monitoring statistics at some http endpoint - I'd rather use my trusty
firewall that can already limit access to specific ports very
efficiently without causing undue load on the database server.

Matthias van de Meent
Neon (https://neon.tech)




Re: Experiments with Postgres and SSL

2024-02-21 Thread Matthias van de Meent
On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas  wrote:
>
> Some more comments on this:
>
> 1. It feels weird that the combination of "gssencmode=require
> sslnegotiation=direct" combination is forbidden. Sure, the ssl
> negotiation will never happen with gssencmode=require, so the
> sslnegotiation option has no effect. But by that token, should we also
> forbid the combination "sslmode=disable sslnegotiation=direct"? I think
> not. The sslnegotiation option should mean "if we are going to try SSL,
> should we try it in direct or negotiated mode?"

I'm not sure about this either. The 'gssencmode' option is already
quite weird in that it seems to override the "require"d priority of
"sslmode=require", which it IMO really shouldn't.

> 2. Should we allow direct SSL only at the very beginning of a TCP
> connection, or should we also allow it after we have requested GSS and
> the server said no? Like this:
>
> Client: GSSENCRequest
> Server: 'N' (gss not supported)
> Client: TLS client Hello
>
> On one hand, why not? It saves you a round-trip in this case too. If we
> don't allow it, the client will have to send SSLRequest and wait for
> response, or reconnect to try direct SSL. On the other hand, flexibility
> is not necessarily a good thing in security-critical code like this.

I think this should be "no".
Once we start accepting PostgreSQL protocol packets (such as the
GSSENCRequest packet) I don't think we should start treating data
stream corruption as attempted SSL connections.

> The patch set is confused on whether that's allowed or not. The server
> rejects it. But if you use "gssencmode=prefer
> sslnegotiation=requiredrect", libpq will attempt to do it, and fail.

That should then be detected as an incorrect combination of flags in
psql: you can't have direct-to-ssl and put something in front of it.

> 3. With "sslmode=verify-full sslnegotiation=direct", if the direct SSL
> connection fails because of a problem with the certificate, libpq will
> try again in negotiated SSL mode. That seems pointless. If the server
> responded to the direct TLS Client Hello message with a valid
> ServerHello, that indicates that the server supports direct SSL. If
> anything goes wrong after that, retrying in negotiated mode is not going
> to help.

This makes sense.

> 4. The number of combinations of sslmode, gssencmode and sslnegotiation
> settings is scary. And we have very few tests for them.

Yeah, it's not great. We could easily automate this better though. I
mean, can't we run the tests using a "cube" configuration, i.e. test
every combination of parameters? We would use a mapping function of
(psql connection parameter values -> expectations), which would be
along the lines of the attached pl testfile. I feel it's a bit more
approachable than the lists of manual option configurations, and makes
it a bit easier to program the logic of which connection security
option we should have used to connect.
The attached file would be a drop-in replacement; it's tested to work
with SSL only - without GSS - because I've been having issues getting
GSS working on my machine.

> I'm going to put this down for now. The attached patch set is even more
> raw than v6, but I'm including it here to "save the work".

v6 doesn't apply cleanly anymore after 774bcffe, but here are some notes:

Several patches are still very much WIP. Reviewing them on a
patch-by-patch basis is therefore nigh impossible; the specific
reviews below are thus on changes that could be traced back to a
specific patch. A round of cleanup would be appreciated.

> 0003: Direct SSL connections postmaster support
> [...]
> -extern bool pq_buffer_has_data(void);
> +extern size_t pq_buffer_has_data(void);

This should probably be renamed to pg_buffer_remaining_data or such,
if we change the signature like this.

> +/* Read from the "unread" buffered data first. c.f. libpq-be.h */
> +if (port->raw_buf_remaining > 0)
> +{
> +/* consume up to len bytes from the raw_buf */
> +if (len > port->raw_buf_remaining)
> +len = port->raw_buf_remaining;

Shouldn't we also try to read from the socket, instead of only
consuming bytes from the raw buffer if it contains bytes?

> 0008: Allow pipelining data after ssl request
> +/*
> + * At this point we should have no data already buffered.  If we 
> do,
> + * it was received before we performed the SSL handshake, so it 
> wasn't
> + * encrypted and indeed may have been injected by a 
> man-in-the-middle.
> + * We report this case to the client.
> + */
> +if (port->raw_buf_remaining > 0)
> +ereport(FATAL,
> +(errcode(ERRCODE_PROTOCOL_VIOLATION),
> + errmsg("received unencrypted data after SSL 
> request"),
> + errdetail("This could be either a client-software 
> bug or evidence of an attempted man-in-the-middle attack.")));

Re: Experiments with Postgres and SSL

2024-02-22 Thread Heikki Linnakangas

On 22/02/2024 01:43, Matthias van de Meent wrote:

On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas  wrote:

4. The number of combinations of sslmode, gssencmode and sslnegotiation
settings is scary. And we have very few tests for them.


Yeah, it's not great. We could easily automate this better though. I
mean, can't we run the tests using a "cube" configuration, i.e. test
every combination of parameters? We would use a mapping function of
(psql connection parameter values -> expectations), which would be
along the lines of the attached pl testfile. I feel it's a bit more
approachable than the lists of manual option configurations, and makes
it a bit easier to program the logic of which connection security
option we should have used to connect.
The attached file would be a drop-in replacement; it's tested to work
with SSL only - without GSS - because I've been having issues getting
GSS working on my machine.


+1 testing all combinations. I don't think the 'mapper' function 
approach in your version is much better than the original though. Maybe 
it would be better with just one 'mapper' function that contains all the 
rules, along the lines of: (This isn't valid perl, just pseudo-code)


sub expected_outcome
{
my ($user, $sslmode, $negotiation, $gssmode) = @_;

my @possible_outcomes = { 'plain', 'ssl', 'gss' }

delete $possible_outcomes{'plain'} if $sslmode eq 'require';
delete $possible_outcomes{'ssl'} if $sslmode eq 'disable';

delete $possible_outcomes{'plain'} if $user eq 'ssluser';
delete $possible_outcomes{'plain'} if $user eq 'ssluser';

if $sslmode eq 'allow' {
# move 'plain' before 'ssl' in the list
}
if $sslmode eq 'prefer' {
# move 'ssl' before 'plain' in the list
}

# more rules here


# If there are no outcomes left in $possible_outcomes, return 'fail'
# If there's exactly one outcome left, return that.
# If there's more, return the first one.
}


Or maybe a table that lists all the combinations and the expected 
outcome. Something lieke this:


nossluser   nogssuser   ssluser gssuser 
sslmode=require fail...
sslmode=prefer  plain
sslmode=disable plain


The problem is that there are more than two dimensions. So maybe an 
exhaustive list like this:


usersslmode gssmode outcome

nossluser   require disable fail
nossluser   prefer  disable plain
nossluser   disable disable plain
ssluser require disable ssl
...


I'm just throwing around ideas here, can you experiment with different 
approaches and see what looks best?



ALPN


Does the TLS ALPN spec allow protocol versions in the protocol tag? It
would be very useful to detect clients with new capabilities at the
first connection, rather than having to wait for one round trip, and
would allow one avenue for changing the protocol version.


Looking at the list of registered ALPN tags [0], I can see "http/0.9"; 
"http/1.0" and "http/1.1". I think we'd want to changing the major 
protocol version in a way that would introduce a new roundtrip, though.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-02-28 Thread Matthias van de Meent
On Thu, 22 Feb 2024 at 18:02, Heikki Linnakangas  wrote:
>
> On 22/02/2024 01:43, Matthias van de Meent wrote:
>> On Wed, 10 Jan 2024 at 09:31, Heikki Linnakangas  wrote:
>>> 4. The number of combinations of sslmode, gssencmode and sslnegotiation
>>> settings is scary. And we have very few tests for them.
>>
>> Yeah, it's not great. We could easily automate this better though. I
>> mean, can't we run the tests using a "cube" configuration, i.e. test
>> every combination of parameters? We would use a mapping function of
>> (psql connection parameter values -> expectations), which would be
>> along the lines of the attached pl testfile. I feel it's a bit more
>> approachable than the lists of manual option configurations, and makes
>> it a bit easier to program the logic of which connection security
>> option we should have used to connect.
>> The attached file would be a drop-in replacement; it's tested to work
>> with SSL only - without GSS - because I've been having issues getting
>> GSS working on my machine.
>
> +1 testing all combinations. I don't think the 'mapper' function
> approach in your version is much better than the original though. Maybe
> it would be better with just one 'mapper' function that contains all the
> rules, along the lines of: (This isn't valid perl, just pseudo-code)
>
> sub expected_outcome
> {
[...]
> }
>
> Or maybe a table that lists all the combinations and the expected
> outcome. Something lieke this:
[...]
>
> The problem is that there are more than two dimensions. So maybe an
> exhaustive list like this:
>
> usersslmode gssmode outcome
>
> nossluser   require disable fail
> ...

> I'm just throwing around ideas here, can you experiment with different
> approaches and see what looks best?

One issue with exhaustive tables is that they would require a product
of all options to be listed, and that'd require at least 216 rows to
manage: server_ssl 2 * server_gss 2 * users 3 * client_ssl 4 *
client_gss 3 * client_ssldirect 3 = 216 different states. I think the
expected_autcome version is easier in that regard.

Attached an updated version using a single unified connection type
validator using an approach similar to yours. Note that it does fail 8
tests, all of which are attributed to the current handling of
`sslmode=require gssencmode=prefer`: right now, we allow GSS in that
case, even though the user require-d sslmode.

An alternative check that does pass tests with the code of the patch
is commented out, at lines 209-216.

>>> ALPN
>>
>> Does the TLS ALPN spec allow protocol versions in the protocol tag? It
>> would be very useful to detect clients with new capabilities at the
>> first connection, rather than having to wait for one round trip, and
>> would allow one avenue for changing the protocol version.
>
> Looking at the list of registered ALPN tags [0], I can see "http/0.9";
> "http/1.0" and "http/1.1".

Ah, nice.

> I think we'd want to changing the major
> protocol version in a way that would introduce a new roundtrip, though.

I don't think I understand what you meant here, could you correct the
sentence or expand why we want to do that?
Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0
during the handshake, which could save round-trips.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)


001_negotiate_encryption.pl
Description: Binary data


Re: Experiments with Postgres and SSL

2024-02-28 Thread Heikki Linnakangas

On 28/02/2024 14:00, Matthias van de Meent wrote:

I don't think I understand what you meant here, could you correct the
sentence or expand why we want to do that?
Note that with ALPN you could negotiate postgres/3.0 or postgres/4.0
during the handshake, which could save round-trips.


Sorry, I missed "avoid" there. I meant:

I think we'd want to *avoid* changing the major protocol version in a 
way that would introduce a new roundtrip, though.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-03-01 Thread Jacob Champion
On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas  wrote:
> I think we'd want to *avoid* changing the major protocol version in a
> way that would introduce a new roundtrip, though.

I'm starting to get up to speed with this patchset. So far I'm mostly
testing how it works; I have yet to take an in-depth look at the
implementation.

I'll squint more closely at the MITM-protection changes in 0008 later.
First impressions, though: it looks like that code has gotten much
less straightforward, which I think is dangerous given the attack it's
preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
protocol doesn't seem particularly replay-safe to me.)

If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.

If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.

I'm not excited about the proliferation of connection options. I don't
have a lot of ideas on how to fix it, though, other than to note that
the current sslnegotiation option names are very unintuitive to me:
- "postgres": only legacy handshakes
- "direct": might be direct... or maybe legacy
- "requiredirect": only direct handshakes... unless other options are
enabled and then we fall back again to legacy? How many people willing
to break TLS compatibility with old servers via "requiredirect" are
going to be okay with lazy fallback to GSS or otherwise?

Heikki mentioned possibly hard-coding a TLS alert if direct SSL is
attempted without server TLS support. I think that's a cool idea, but
without an official "TLS not supported" alert code (which, honestly,
would be strange to standardize) I'm kinda -0.5 on it. If the client
tells me about a handshake_failure or similar, I'm going to start
investigating protocol versions and ciphersuites; I'm not going to
think to myself that maybe the server lacks TLS support altogether.
(Plus, we need to have a good error message when connecting to older
servers anyway. I think we should be able to key off of the EOF coming
back from OpenSSL; it'd be a good excuse to give that part of the code
some love.)

For the record, I'm adding some one-off tests for this feature to a
local copy of my OAuth pytest suite, which is designed to do the kinds
of testing you're running into trouble with. It's not in any way
viable for a PG17 commit, but if you're interested I can make the
patches available.

--Jacob

[1] https://www.rfc-editor.org/rfc/rfc8701.html
[2] https://alpaca-attack.com/libs.html




Re: Experiments with Postgres and SSL

2024-03-04 Thread Heikki Linnakangas

On 01/03/2024 23:49, Jacob Champion wrote:

On Wed, Feb 28, 2024 at 4:10 AM Heikki Linnakangas  wrote:

I think we'd want to *avoid* changing the major protocol version in a
way that would introduce a new roundtrip, though.


I'm starting to get up to speed with this patchset. So far I'm mostly
testing how it works; I have yet to take an in-depth look at the
implementation.


Thank you!


I'll squint more closely at the MITM-protection changes in 0008 later.
First impressions, though: it looks like that code has gotten much
less straightforward, which I think is dangerous given the attack it's
preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
protocol doesn't seem particularly replay-safe to me.)


Let's drop that patch. AFAICS it's not needed by the rest of the patches.


If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.


Can you elaborate? Do we need to do something extra in the server to be 
compatible with GREASE?



If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.


Hmm, I thought that's what the patches does. But looking closer, libpq 
is not checking that ALPN was used. We should add that. Am I right?



I'm not excited about the proliferation of connection options. I don't
have a lot of ideas on how to fix it, though, other than to note that
the current sslnegotiation option names are very unintuitive to me:
- "postgres": only legacy handshakes
- "direct": might be direct... or maybe legacy
- "requiredirect": only direct handshakes... unless other options are
enabled and then we fall back again to legacy? How many people willing
to break TLS compatibility with old servers via "requiredirect" are
going to be okay with lazy fallback to GSS or otherwise?


Yeah, this is my biggest complaint about all this. Not so much the names 
of the options, but the number of combinations of different options, and 
how we're going to test them all. I don't have any great solutions, 
except adding a lot of tests to cover them, like Matthias did.



Heikki mentioned possibly hard-coding a TLS alert if direct SSL is
attempted without server TLS support. I think that's a cool idea, but
without an official "TLS not supported" alert code (which, honestly,
would be strange to standardize) I'm kinda -0.5 on it. If the client
tells me about a handshake_failure or similar, I'm going to start
investigating protocol versions and ciphersuites; I'm not going to
think to myself that maybe the server lacks TLS support altogether.


Agreed.


(Plus, we need to have a good error message when connecting to older
servers anyway.I think we should be able to key off of the EOF coming
back from OpenSSL; it'd be a good excuse to give that part of the code
some love.)


Hmm, if OpenSSL sends ClientHello and the server responds with a 
Postgres error packet, OpenSSL will presumably consume the error packet 
or at least part of it. But with our custom BIO, we can peek at the 
server response before handing it to OpenSSL.


If it helps, we could backport a nicer error message to old server 
versions, similar to what we did with SCRAM in commit 96d0f988b1.



For the record, I'm adding some one-off tests for this feature to a
local copy of my OAuth pytest suite, which is designed to do the kinds
of testing you're running into trouble with. It's not in any way
viable for a PG17 commit, but if you're interested I can make the
patches available.


Yes please, it would be nice to see what tests you've performed, and 
have it archived.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-03-05 Thread Heikki Linnakangas
I hope I didn't joggle your elbow reviewing this, Jacob, but I spent 
some time rebase and fix various little things:


- Incorporated Matthias's test changes

- Squashed the client, server and documentation patches. Not much point 
in keeping them separate, as one requires the other, and if you're only 
interested e.g. in the server parts, just look at src/backend.


- Squashed some of my refactorings with the main patches, because I'm 
certain enough that they're desirable. I kept the last libpq state 
machine refactoring separate though. I'm pretty sure we need a 
refactoring like that, but I'm not 100% sure about the details.


- Added some comments to the new state machine logic in fe-connect.c.

- Removed the XXX comments about TLS alerts.

- Removed the "Allow pipelining data after ssl request" patch

- Reordered the patches so that the first two patches add the tests 
different combinations of sslmode, gssencmode and server support. That 
could be committed separately, without the rest of the patches. A later 
patch expands the tests for the new sslnegotiation option.



The tests are still not distinguishing whether a connection was 
established in direct or negotiated mode. So if we e.g. had a bug that 
accidentally disabled direct SSL connection completely and always used 
negotiated mode, the tests would still pass. I'd like to see some tests 
that would catch that.


--
Heikki Linnakangas
Neon (https://neon.tech)
From c3b88ffb05a2a8b50e1af3220bf8f524e8bbae46 Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 5 Mar 2024 15:40:30 +0200
Subject: [PATCH v8 1/6] Move Kerberos module

So that we can reuse it in new tests.
---
 src/test/kerberos/t/001_auth.pl   | 174 ++--
 src/test/perl/PostgreSQL/Test/Kerberos.pm | 229 ++
 2 files changed, 240 insertions(+), 163 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/Kerberos.pm

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index 2a81ce8834b..9d3fac83aaa 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -21,6 +21,7 @@ use strict;
 use warnings FATAL => 'all';
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Kerberos;
 use Test::More;
 use Time::HiRes qw(usleep);
 
@@ -34,177 +35,27 @@ elsif (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
 	  'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA';
 }
 
-my ($krb5_bin_dir, $krb5_sbin_dir);
-
-if ($^O eq 'darwin' && -d "/opt/homebrew")
-{
-	# typical paths for Homebrew on ARM
-	$krb5_bin_dir = '/opt/homebrew/opt/krb5/bin';
-	$krb5_sbin_dir = '/opt/homebrew/opt/krb5/sbin';
-}
-elsif ($^O eq 'darwin')
-{
-	# typical paths for Homebrew on Intel
-	$krb5_bin_dir = '/usr/local/opt/krb5/bin';
-	$krb5_sbin_dir = '/usr/local/opt/krb5/sbin';
-}
-elsif ($^O eq 'freebsd')
-{
-	$krb5_bin_dir = '/usr/local/bin';
-	$krb5_sbin_dir = '/usr/local/sbin';
-}
-elsif ($^O eq 'linux')
-{
-	$krb5_sbin_dir = '/usr/sbin';
-}
-
-my $krb5_config = 'krb5-config';
-my $kinit = 'kinit';
-my $klist = 'klist';
-my $kdb5_util = 'kdb5_util';
-my $kadmin_local = 'kadmin.local';
-my $krb5kdc = 'krb5kdc';
-
-if ($krb5_bin_dir && -d $krb5_bin_dir)
-{
-	$krb5_config = $krb5_bin_dir . '/' . $krb5_config;
-	$kinit = $krb5_bin_dir . '/' . $kinit;
-	$klist = $krb5_bin_dir . '/' . $klist;
-}
-if ($krb5_sbin_dir && -d $krb5_sbin_dir)
-{
-	$kdb5_util = $krb5_sbin_dir . '/' . $kdb5_util;
-	$kadmin_local = $krb5_sbin_dir . '/' . $kadmin_local;
-	$krb5kdc = $krb5_sbin_dir . '/' . $krb5kdc;
-}
-
-my $host = 'auth-test-localhost.postgresql.example.com';
-my $hostaddr = '127.0.0.1';
-my $realm = 'EXAMPLE.COM';
-
-my $krb5_conf = "${PostgreSQL::Test::Utils::tmp_check}/krb5.conf";
-my $kdc_conf = "${PostgreSQL::Test::Utils::tmp_check}/kdc.conf";
-my $krb5_cache = "${PostgreSQL::Test::Utils::tmp_check}/krb5cc";
-my $krb5_log = "${PostgreSQL::Test::Utils::log_path}/krb5libs.log";
-my $kdc_log = "${PostgreSQL::Test::Utils::log_path}/krb5kdc.log";
-my $kdc_port = PostgreSQL::Test::Cluster::get_free_port();
-my $kdc_datadir = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc";
-my $kdc_pidfile = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc.pid";
-my $keytab = "${PostgreSQL::Test::Utils::tmp_check}/krb5.keytab";
-
 my $pgpass = "${PostgreSQL::Test::Utils::tmp_check}/.pgpass";
 
 my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
 
-note "setting up Kerberos";
-
-my ($stdout, $krb5_version);
-run_log [ $krb5_config, '--version' ], '>', \$stdout
-  or BAIL_OUT("could not execute krb5-config");
-BAIL_OUT("Heimdal is not supported") if $stdout =~ m/heimdal/;
-$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/
-  or BAIL_OUT("could not get Kerberos version");
-$krb5_version = $1;
-
 # Construct a pgpass file to make sure we don't use it
 append_to_file($pgpass, '*:*:*:*:abc123');
 
 chmod 0600, $pgpass;
 
-# Build the krb5.conf to use.
-#
-# Ex

Re: Experiments with Postgres and SSL

2024-03-05 Thread Jacob Champion
On Tue, Mar 5, 2024 at 6:09 AM Heikki Linnakangas  wrote:
>
> I hope I didn't joggle your elbow reviewing this

Nope, not at all!

> The tests are still not distinguishing whether a connection was
> established in direct or negotiated mode. So if we e.g. had a bug that
> accidentally disabled direct SSL connection completely and always used
> negotiated mode, the tests would still pass. I'd like to see some tests
> that would catch that.

+1

On Mon, Mar 4, 2024 at 7:29 AM Heikki Linnakangas  wrote:
> On 01/03/2024 23:49, Jacob Champion wrote:
> > I'll squint more closely at the MITM-protection changes in 0008 later.
> > First impressions, though: it looks like that code has gotten much
> > less straightforward, which I think is dangerous given the attack it's
> > preventing. (Off-topic: I'm skeptical of future 0-RTT support. Our
> > protocol doesn't seem particularly replay-safe to me.)
>
> Let's drop that patch. AFAICS it's not needed by the rest of the patches.

Okay, sounds good.

> > If we're interested in ALPN negotiation in the future, we may also
> > want to look at GREASE [1] to keep those options open in the presence
> > of third-party implementations. Unfortunately OpenSSL doesn't do this
> > automatically yet.
>
> Can you elaborate?

Sure: now that we're letting middleboxes and proxies inspect and react
to connections based on ALPN, it's possible that some intermediary
might incorrectly fixate on the "postgres" ID (or whatever we choose
in the end), and shut down connections that carry additional protocols
rather than ignoring them. That would prevent future graceful upgrades
where the client sends both "postgres/X" and "postgres/X+1". While
that wouldn't be our fault, it'd be cold comfort to whoever has that
middlebox.

GREASE is a set of reserved protocol IDs that you can add randomly to
your ALPN list, so any middleboxes that fail to follow the rules will
just break outright rather than silently proliferating. (Hence the
pun: GREASE keeps the joints in the pipe from rusting into place.) The
RFC goes into more detail about how to do it. And I don't know if it's
necessary for a v1, but it'd be something to keep in mind.

> Do we need to do something extra in the server to be
> compatible with GREASE?

No, I think that as long as we use OpenSSL's APIs correctly on the
server side, we'll be compatible by default. This would be a
client-side implementation, to push random GREASE strings into the
ALPN list. (There is a risk that if/when OpenSSL finally starts
supporting this transparently, we'd need to remove it from our code.)

> > If we don't have a reason not to, it'd be good to follow the strictest
> > recommendations from [2] to avoid cross-protocol attacks. (For anyone
> > currently running web servers and Postgres on the same host, they
> > really don't want browsers "talking" to their Postgres servers.) That
> > would mean checking the negotiated ALPN on both the server and client
> > side, and failing if it's not what we expect.
>
> Hmm, I thought that's what the patches does. But looking closer, libpq
> is not checking that ALPN was used. We should add that. Am I right?

Right. Also, it looks like the server isn't failing the TLS handshake
itself, but instead just dropping the connection after the handshake.
In a cross-protocol attack, there's a danger that the client (which is
not speaking our protocol) could still treat the server as
authoritative in that situation.

> > I'm not excited about the proliferation of connection options. I don't
> > have a lot of ideas on how to fix it, though, other than to note that
> > the current sslnegotiation option names are very unintuitive to me:
> > - "postgres": only legacy handshakes
> > - "direct": might be direct... or maybe legacy
> > - "requiredirect": only direct handshakes... unless other options are
> > enabled and then we fall back again to legacy? How many people willing
> > to break TLS compatibility with old servers via "requiredirect" are
> > going to be okay with lazy fallback to GSS or otherwise?
>
> Yeah, this is my biggest complaint about all this. Not so much the names
> of the options, but the number of combinations of different options, and
> how we're going to test them all. I don't have any great solutions,
> except adding a lot of tests to cover them, like Matthias did.

The default gssencmode=prefer is especially problematic if I'm trying
to use sslnegotiation=requiredirect for security. It'll appear to work
at first, but if somehow I get a credential cache into my environment,
libpq will suddenly fall back to plaintext negotiation :(

> > (Plus, we need to have a good error message when connecting to older
> > servers anyway.I think we should be able to key off of the EOF coming
> > back from OpenSSL; it'd be a good excuse to give that part of the code
> > some love.)
>
> Hmm, if OpenSSL sends ClientHello and the server responds with a
> Postgres error packet, OpenSSL will presumably consume the error packet
> or at least

Re: Experiments with Postgres and SSL

2024-03-05 Thread Jacob Champion
I keep forgetting -- attached is the diff I'm carrying to plug
libpq_encryption into Meson. (The current patchset has a meson.build
for it, but it's not connected.)

--Jacob
commit 64215f1e6b68208378b34cc0736d2f0eb1d45919
Author: Jacob Champion 
Date:   Wed Feb 28 11:28:17 2024 -0800

WIP: mesonify

diff --git a/src/test/libpq_encryption/meson.build 
b/src/test/libpq_encryption/meson.build
index 04f479e9fe..ac1db10d74 100644
--- a/src/test/libpq_encryption/meson.build
+++ b/src/test/libpq_encryption/meson.build
@@ -1,13 +1,12 @@
 # Copyright (c) 2022-2024, PostgreSQL Global Development Group
 
 tests += {
-  'name': 'ldap',
+  'name': 'libpq_encryption',
   'sd': meson.current_source_dir(),
   'bd': meson.current_build_dir(),
   'tap': {
 'tests': [
-  't/001_auth.pl',
-  't/002_bindpasswd.pl',
+  't/001_negotiate_encryption.pl',
 ],
 'env': {
   'with_ssl': ssl_library,
diff --git a/src/test/meson.build b/src/test/meson.build
index c3d0dfedf1..702213bc6f 100644
--- a/src/test/meson.build
+++ b/src/test/meson.build
@@ -4,6 +4,7 @@ subdir('regress')
 subdir('isolation')
 
 subdir('authentication')
+subdir('libpq_encryption')
 subdir('recovery')
 subdir('subscription')
 subdir('modules')


Re: Experiments with Postgres and SSL

2023-01-18 Thread Andrey Borodin
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:
>
> So I took a look into what it would take to do and I think it would
> actually be quite feasible. The first byte of a standard TLS
> connection can't look anything like the first byte of any flavour of
> Postgres startup packet because it would be the high order bits of the
> length so unless we start having multi-megabyte startup packets
>

This is a fascinating idea! I like it a lot.
But..do we have to treat any unknown start sequence of bytes as a TLS
connection? Or is there some definite subset of possible first bytes
that clearly indicates that this is a TLS connection or not?

Best regards, Andrey Borodin.




Re: Experiments with Postgres and SSL

2023-01-19 Thread Greg Stark
On Thu, 19 Jan 2023 at 00:45, Andrey Borodin  wrote:

> But..do we have to treat any unknown start sequence of bytes as a TLS
> connection? Or is there some definite subset of possible first bytes
> that clearly indicates that this is a TLS connection or not?

Absolutely not, there's only one MessageType that can initiate a
connection, ClientHello, so the initial byte has to be a specific
value. (0x16)

And probably to implement HTTP/Websocket it would probably only peek
at the first byte and check for things like G(ET) and H(EAD) and so
on, possibly only over SSL but in theory it could be over any
connection if the request comes before the startup packet.

Personally I'm motivated by wanting to implement status and monitoring
data for things like Prometheus and the like. For that it would just
be simple GET queries to recognize. But tunneling pg wire protocol
over websockets sounds cool but not really something I know a lot
about. I note that Neon is doing something similar with a proxy:
https://neon.tech/blog/serverless-driver-for-postgres


--
greg




Re: Experiments with Postgres and SSL

2023-01-19 Thread Vladimir Sitnikov
It would be great if PostgreSQL supported 'start with TLS', however, how
could clients activate the feature?

I would like to refrain users from configuring the handshake mode, and I
would like to refrain from degrading performance when a new client talks to
an old database.

What if the server that supports 'fast TLS' added an extra notification in
case client connects with a classic TLS?
Then a capable client could remember host:port and try with newer TLS
appoach the next time it connects.

It would be transparent to the clients, and the users won't need to
configure 'prefer classic or fast TLS'
The old clients could discard the notification.

Vladimir

-- 
Vladimir


Re: Experiments with Postgres and SSL

2023-01-19 Thread Greg Stark
On Thu, 19 Jan 2023 at 15:49, Vladimir Sitnikov
 wrote:
>
> What if the server that supports 'fast TLS' added an extra notification in 
> case client connects with a classic TLS?
> Then a capable client could remember host:port and try with newer TLS appoach 
> the next time it connects.
>
> It would be transparent to the clients, and the users won't need to configure 
> 'prefer classic or fast TLS'
> The old clients could discard the notification.

Hm. I hadn't really thought about the case of a new client connecting
to an old server. I don't think it's worth implementing a code path in
the server like this as it would then become cruft that would be hard
to ever get rid of.

I think you can do the same thing, more or less, in the client. Like
if the driver tries to connect via SSL and gets an error it remembers
that host/port and connects using negotiation in the future.

In practice though, by the time drivers support this it'll probably be
far enough in the future that they can just enable it and you can
disable it if you're connecting to an old server. The main benefit for
the near term is going to be clients that are specifically designed to
take advantage of it because it's necessary to enable the environment
they need -- like monitoring tools and proxies.

I've attached the POC. It's not near committable, mainly because of
the lack of any proper interface to the added fields in Port. I
actually had a whole API but ripped it out while debugging because it
wasn't working out.

But here's an example of psql connecting to the same server via
negotiated SSL or through stunnel where stunnel establishes the SSL
connection and psql is just doing plain text:

stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:9432/postgres'
psql (16devel)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384,
compression: off)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48771 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)

postgres=# \q
stark@hatter:~/src/postgresql$ ~/pgsql-sslhacked/bin/psql
'postgresql://localhost:8999/postgres'
psql (16devel)
Type "help" for help.

postgres=# select * from pg_stat_ssl;
  pid  | ssl | version | cipher | bits | client_dn |
client_serial | issuer_dn
---+-+-++--+---+---+---
 48797 | t   | TLSv1.3 | TLS_AES_256_GCM_SHA384 |  256 |   |
|
(1 row)


-- 
greg
From 4508f872720a0977cf00041a865d76a4d5f77028 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Wed, 18 Jan 2023 15:34:34 -0500
Subject: [PATCH v1] Direct SSL Connections

---
 src/backend/libpq/be-secure.c   |  13 +++
 src/backend/libpq/pqcomm.c  |   9 +-
 src/backend/postmaster/postmaster.c | 140 ++--
 src/include/libpq/libpq-be.h|   3 +
 src/include/libpq/libpq.h   |   2 +-
 5 files changed, 134 insertions(+), 33 deletions(-)

diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index a0f7084018..39366d04dd 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -235,6 +235,19 @@ secure_raw_read(Port *port, void *ptr, size_t len)
 {
 	ssize_t		n;
 
+	/* XXX feed the raw_buf into SSL */
+	if (port->raw_buf_remaining > 0)
+	{
+		/* consume up to len bytes from the raw_buf */
+		if (len > port->raw_buf_remaining)
+			len = port->raw_buf_remaining;
+		Assert(port->raw_buf);
+		memcpy(ptr, port->raw_buf + port->raw_buf_consumed, len);
+		port->raw_buf_consumed += len;
+		port->raw_buf_remaining -= len;
+		return len;
+	}
+
 	/*
 	 * Try to read from the socket without blocking. If it succeeds we're
 	 * done, otherwise we'll wait for the socket using the latch mechanism.
diff --git a/src/backend/libpq/pqcomm.c b/src/backend/libpq/pqcomm.c
index 864c9debe8..60fab6a52b 100644
--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1119,13 +1119,16 @@ pq_discardbytes(size_t len)
 /* 
  *		pq_buffer_has_data		- is any buffered data available to read?
  *
- * This will *not* attempt to read more data.
+ * Actually returns the number of bytes in the buffer...
+ *
+ * This will *not* attempt to read more data. And reading up to that number of
+ * bytes should not cause reading any more data either.
  * 
  */
-bool
+size_t
 pq_buffer_has_data(void)
 {
-	return (PqRecvPointer < PqRecvLength);
+	return (PqRecvLength - PqRecvPointer);
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9cedc1b9f0..b1631e0830 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -412,6 +412,7 @@ static void BackendRun(Por

Re: Experiments with Postgres and SSL

2023-01-19 Thread Jacob Champion
On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:
> I had a conversation a while back with Heikki where he expressed that
> it was annoying that we negotiate SSL/TLS the way we do since it
> introduces an extra round trip. Aside from the performance
> optimization I think accepting standard TLS connections would open the
> door to a number of other opportunities that would be worth it on
> their own.

Nice! I want this too, but for security reasons [1] -- I want to be
able to turn off negotiated (explicit) TLS, to force (implicit)
TLS-only mode.

> Other things it would open the door to in order from least
> controversial to most
>
> * Hiding Postgres behind a standard SSL proxy terminating SSL without
> implementing the Postgres protocol.

+1

> * "Service Mesh" type tools that hide multiple services behind a
> single host/port ("Service Mesh" is just a new buzzword for "proxy").

If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].

ALPN doesn't prevent cross-port attacks though, and speaking of those...

> * Browser-based protocol implementations using websockets for things
> like pgadmin or other tools to connect directly to postgres using
> Postgres wire protocol but using native SSL implementations.
>
> * Postgres could even implement an HTTP based version of its protocol
> and enable things like queries or browser based tools using straight
> up HTTP requests so they don't need to use websockets.
>
> * Postgres could implement other protocols to serve up data like
> status queries or monitoring metrics, using HTTP based standard
> protocols instead of using our own protocol.

I see big red warning lights going off in my head -- in a previous
life, I got to fix vulnerabilities that resulted from bolting HTTP
onto existing protocol servers. Not only do you opt into the browser
security model forever, you also gain the ability to speak for any
other web server already running on the same host.

(I know you have PG committers who are also HTTP experts, and I think
you were hacking on mod_perl well before I knew web servers existed.
Just... please be careful. ;D )

> Incidentally I find the logic in ProcessStartupPacket incredibly
> confusing. It took me a while before I realized it's using tail
> recursion to implement the startup logic. I think it would be way more
> straightforward and extensible if it used a much more common iterative
> style. I think it would make it possible to keep more state than just
> ssl_done and gss_done without changing the function signature every
> time for example.

+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.

Thanks!
--Jacob

[1] 
https://www.postgresql.org/message-id/flat/fcc3ebeb7f05775b63f3207ed52a54ea5d17fb42.camel%40vmware.com
[2] https://alpaca-attack.com/




Re: Experiments with Postgres and SSL

2023-01-19 Thread Vladimir Sitnikov
>I don't think it's worth implementing a code path in
> the server like this as it would then become cruft that would be hard
> to ever get rid of.

Do you think the server can de-support the old code path soon?

> I think you can do the same thing, more or less, in the client. Like
> if the driver tries to connect via SSL and gets an error it remembers
> that host/port and connects using negotiation in the future.

Well, I doubt everybody would instantaneously upgrade to the database that
supports fast TLS,
so there will be a timeframe when there will be a lot of old databases, and
the clients will be new.
In that case, going with "try fast, ignore exception" would degrade
performance for old databases.

I see you suggest caching, however, "degrading one of the cases" might be
more painful than
"not improving one of the cases".

I would like to refrain from implementing "parallel connect both ways
and check which is faster" in
PG clients (e.g. https://en.wikipedia.org/wiki/Happy_Eyeballs ).

Just wondering: do you consider back-porting the feature to all supported
DB versions?

> In practice though, by the time drivers support this it'll probably be
> far enough in the future

I think drivers release more often than the database, and we can get driver
support even before the database releases.
I'm from pgjdbc Java driver team, and I think it is unfair to suggest that
"driver support is only far enough in the future".

Vladimir


Re: Experiments with Postgres and SSL

2023-01-20 Thread Greg Stark
On Fri, 20 Jan 2023 at 01:41, Vladimir Sitnikov
 wrote:
>
> Do you think the server can de-support the old code path soon?

I don't have any intention to de-support anything. I really only
picture it being an option in environments where the client and server
are all part of a stack controlled by a single group. User tools and
general purpose tools are better served by our current more flexible
setup.

> Just wondering: do you consider back-porting the feature to all supported DB 
> versions?

I can't see that, no.

> > In practice though, by the time drivers support this it'll probably be
> > far enough in the future
>
> I think drivers release more often than the database, and we can get driver 
> support even before the database releases.
> I'm from pgjdbc Java driver team, and I think it is unfair to suggest that 
> "driver support is only far enough in the future".

Interesting. I didn't realize this would be so attractive to regular
driver authors. I did think of the Happy Eyeballs technique too but I
agree I wouldn't want to go that way either :)

I guess the server doesn't really have to do anything specific to do
what you want. You could just hard code that servers newer than a
specific version would have this support. Or it could be done with a
"protocol option" -- which wouldn't actually change any behaviour but
would be rejected if the server doesn't support "fast ssl" giving you
the feedback you expect without having much extra legacy complexity.

I guess a lot depends on the way the driver works and the way the
application is structured. Applications that make a single connection
or don't have shared state across connections wouldn't think this way.
And interfaces like libpq would normally just leave it up to the
application to make choices like this. But I guess JVM based
applications are more likely to have long-lived systems that make many
connections and also more likely to make it the driver's
responsibility to manage such things.



--
greg




Re: Experiments with Postgres and SSL

2023-01-20 Thread Vladimir Sitnikov
>You could just hard code that servers newer than a
> specific version would have this support

Suppose PostgreSQL 21 implements "fast TLS"
Suppose pgjdbc 43 supports "fast TLS"
Suppose PgBouncer 1.17.0 does not support "fast TLS" yet

If pgjdbc connects to the DB via balancer, then the server would
respond with "server_version=21".
The balancer would forward "server_version", so the driver would
assume "fast TLS is supported".

In practice, fast TLS can't be used in that configuration since the
connection will fail when the driver attempts to ask
"fast TLS" from the PgBouncer.

> Or it could be done with a "protocol option"

Would you please clarify what you mean by "protocol option"?

>I guess a lot depends on the way the driver works and the way the
> application is structured

There are cases when applications pre-create connections on startup,
so the faster connections are created the better.
The same case happens when the admin issues "reset connection pool",
so it discards old connections and creates new ones.
People rarely know all the knobs, so I would like to have a "fast by
default" design (e.g. server sending a notification "you may use fast
mode the next time")
rather than "keep old behaviour and require everybody to add fast=true
to their configuration" (e.g. users having to configure
"try_fast_tls_first=true")

Vladimir




Re: Experiments with Postgres and SSL

2023-12-30 Thread Heikki Linnakangas

On 05/07/2023 02:33, Michael Paquier wrote:

On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:

I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
always.

Admittedly having the options make testing different of combinations of old
and new clients and servers a little easier. But I don't think we should add
options for the sake of backwards compatibility tests.


Hmm.  I would actually argue in favor of having these with tests in
core to stress the previous SSL hanshake protocol, as not having these
parameters would mean that we rely only on major version upgrades in
the buildfarm to test the backward-compatible code path, making issues
much harder to catch.  And we still need to maintain the
backward-compatible path for 10 years based on what pg_dump and
pg_upgrade need to support.


Ok, let's keep it.

I started to review this again. There's a lot of little things to fix 
before this is ready for commit, but overall this looks pretty good. A 
few notes / questions on the first two patches (in addition to the few 
comments I made earlier):


If the client sends TLS HelloClient directly, but the server does not 
support TLS, it just closes the connection. It would be nice to still 
send some kind of an error to the client. Maybe a TLS alert packet? I 
don't want to start implementing TLS, but I think a TLS alert packet 
with a suitable error code would be just a constant.


The new CONNECTION_DIRECT_SSL_STARTUP state needs to be moved to end of 
the enum. We cannot change the integer values of existing of enum 
values, or clients compiled with old libpq version would mix up the states.



/*
 * validate sslnegotiation option, default is "postgres" for the 
postgres
 * style negotiated connection with an extra round trip but more 
options.
 */


What "more options" does the negotiated connection provide?


if (conn->sslnegotiation)
{
if (strcmp(conn->sslnegotiation, "postgres") != 0
&& strcmp(conn->sslnegotiation, "direct") != 0
&& strcmp(conn->sslnegotiation, "requiredirect") != 0)
{
conn->status = CONNECTION_BAD;
libpq_append_conn_error(conn, "invalid %s value: 
\"%s\"",

"sslnegotiation", conn->sslnegotiation);
return false;
}

#ifndef USE_SSL
if (conn->sslnegotiation[0] != 'p') {
conn->status = CONNECTION_BAD;
libpq_append_conn_error(conn, "sslnegotiation value \"%s\" 
invalid when SSL support is not compiled in",

conn->sslnegotiation);
return false;
}
#endif
}


At the same time, the patch allows the combination of "sslmode=disable" 
and "sslnegotiation=requiredirect". Seems inconsistent to error out if 
compiled without SSL support.



else
{
libpq_append_conn_error(conn, "sslnegotiation missing?");
return false;
}


In the other similar settings, like 'channel_binding' and 'sslcertmode', 
we strdup() the compiled-in default if the option is NULL. I'm not sure 
if that's necessary, I think the compiled-in defaults should get filled 
in conninfo_add_defaults(). If so, then those other places could be 
turned into errors like this too. This seems to be a bit of a mess even 
before this patch.


In pg_conn struct:


+   boolallow_direct_ssl_try; /* Try to make a direct SSL 
connection
+  * without an 
"SSL negotiation packet" */
boolallow_ssl_try;  /* Allowed to try SSL negotiation */
boolwait_ssl_try;   /* Delay SSL negotiation until after
 * attempting 
normal connection */


It's getting hard to follow what combinations of these booleans are 
valid and what they're set to at different stages. I think it's time to 
turn all these into one enum, or something like that.


I intend to continue reviewing this after Jan 8th. I'd still like to get 
this into v17.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2023-02-22 Thread Heikki Linnakangas

On 20/01/2023 03:28, Jacob Champion wrote:

On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:

* "Service Mesh" type tools that hide multiple services behind a
single host/port ("Service Mesh" is just a new buzzword for "proxy").


If you want to multiplex protocols on a port, now is an excellent time
to require clients to use ALPN on implicit-TLS connections. (There are
no clients that can currently connect via implicit TLS, so you'll
never have another chance to force the issue without breaking
backwards compatibility.) That should hopefully make it harder to
ALPACA yourself or others [2].


Good idea. Do we want to just require the protocol to be "postgres", or 
perhaps "postgres/3.0"? Need to register that with IANA, I guess.


We implemented a protocol version negotiation mechanism in the libpq 
protocol itself, how would this interact with it? If it's just 
"postgres", then I guess we'd still negotiate the protocol version and 
list of extensions after the TLS handshake.



Incidentally I find the logic in ProcessStartupPacket incredibly
confusing. It took me a while before I realized it's using tail
recursion to implement the startup logic. I think it would be way more
straightforward and extensible if it used a much more common iterative
style. I think it would make it possible to keep more state than just
ssl_done and gss_done without changing the function signature every
time for example.


+1. The complexity of the startup logic, both client- and server-side,
is a big reason why I want implicit TLS in the first place. That way,
bugs in that code can't be exploited before the TLS handshake
completes.


+1. We need to support explicit TLS for a long time, so we can't 
simplify by just removing it. But let's refactor the code somehow, to 
make it more clear.


Looking at the patch, I think it accepts an SSLRequest packet even if 
implicit TLS has already been established. That's surely wrong, and 
shows how confusing the code is. (Or I'm reading it incorrectly, which 
also shows how confusing it is :-) )


Regarding Vladimir's comments on how clients can migrate to this, I 
don't have any great suggestions. To summarize, there are several options:


- Add an "fast_tls" option that the user can enable if they know the 
server supports it


- First connect in old-fashioned way, and remember the server version. 
Later, if you reconnect to the same server, use implicit TLS if the 
server version was high enough. This would be most useful for connection 
pools.


- Connect both ways at the same time, and continue with the fastest, 
i.e. "happy eyeballs"


- Try implicit TLS first, and fall back to explicit TLS if it fails.

For libpq, we don't necessarily need to do anything right now. We can 
add the implicit TLS support in a later version. Not having libpq 
support makes it hard to test the server codepath, though. Maybe just 
test it with 'stunnel' or 'openssl s_client'.


- Heikki





Re: Experiments with Postgres and SSL

2023-02-22 Thread Jacob Champion
On Wed, Feb 22, 2023 at 4:26 AM Heikki Linnakangas  wrote:
> On 20/01/2023 03:28, Jacob Champion wrote:
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

Unless you plan to make the next minor protocol version fundamentally
incompatible, I don't think there's much reason to add '.0'. (And even
if that does happen, 'postgres/3.1' is still distinct from
'postgres/3'. Or 'postgres' for that matter.) The Expert Review
process might provide some additional guidance?

> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.

Yeah. You could choose to replace major version negotiation completely
with ALPN, I suppose, but there might not be any maintenance benefit
if you still have to support plaintext negotiation. Maybe there are
performance implications to handling the negotiation earlier vs.
later?

Note that older versions of TLS will expose the ALPN in plaintext...
but that may not be a factor by the time a postgres/4 shows up, and if
the next protocol is incompatible then it may not be feasible to hide
the differences via transport encryption anyway.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it

I like that such an option could eventually be leveraged for a
postgresqls:// URI scheme (which should not fall back, ever). There
would be other things we'd have to change first to make that a reality
-- postgresqls://example.com?host=evil.local is problematic, for
example -- but it'd be really nice to have an HTTPS equivalent.

--Jacob




Re: Experiments with Postgres and SSL

2023-02-28 Thread Greg Stark
On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas  wrote:
>
> On 20/01/2023 03:28, Jacob Champion wrote:
> > On Wed, Jan 18, 2023 at 7:16 PM Greg Stark  wrote:
> >> * "Service Mesh" type tools that hide multiple services behind a
> >> single host/port ("Service Mesh" is just a new buzzword for "proxy").
> >
> > If you want to multiplex protocols on a port, now is an excellent time
> > to require clients to use ALPN on implicit-TLS connections. (There are
> > no clients that can currently connect via implicit TLS, so you'll
> > never have another chance to force the issue without breaking
> > backwards compatibility.) That should hopefully make it harder to
> > ALPACA yourself or others [2].
>
> Good idea. Do we want to just require the protocol to be "postgres", or
> perhaps "postgres/3.0"? Need to register that with IANA, I guess.

I had never heard of this before, it does seem useful. But if I
understand it right it's entirely independent of this patch. We can
add it to all our Client/Server exchanges whether they're the initial
direct SSL connection or the STARTTLS negotiation?


> We implemented a protocol version negotiation mechanism in the libpq
> protocol itself, how would this interact with it? If it's just
> "postgres", then I guess we'd still negotiate the protocol version and
> list of extensions after the TLS handshake.
>
> >> Incidentally I find the logic in ProcessStartupPacket incredibly
> >> confusing. It took me a while before I realized it's using tail
> >> recursion to implement the startup logic. I think it would be way more
> >> straightforward and extensible if it used a much more common iterative
> >> style. I think it would make it possible to keep more state than just
> >> ssl_done and gss_done without changing the function signature every
> >> time for example.
> >
> > +1. The complexity of the startup logic, both client- and server-side,
> > is a big reason why I want implicit TLS in the first place. That way,
> > bugs in that code can't be exploited before the TLS handshake
> > completes.
>
> +1. We need to support explicit TLS for a long time, so we can't
> simplify by just removing it. But let's refactor the code somehow, to
> make it more clear.
>
> Looking at the patch, I think it accepts an SSLRequest packet even if
> implicit TLS has already been established. That's surely wrong, and
> shows how confusing the code is. (Or I'm reading it incorrectly, which
> also shows how confusing it is :-) )

I'll double check it but I think I tested that that wasn't the case. I
think it accepts the SSL request packet and sends back an N which the
client libpq just interprets as the server not supporting SSL and does
an unencrypted connection (which is tunneled over stunnel unbeknownst
to libpq).

I agree I would want to flatten this logic to an iterative approach
but having wrapped my head around it now I'm not necessarily rushing
to do it now. The main advantage of flattening it would be to make it
easy to support other protocol types which I think could be really
interesting. It would be much clearer to document the state machine if
all the state is in one place and the code just loops through
processing startup packets and going to a new state until the
connection is established. That's true now but you have to understand
how the state is passed in the function parameters and notice that all
the recursion is tail recursive (I think). And extending that state
would require extending the function signature which would get awkward
quickly.

> Regarding Vladimir's comments on how clients can migrate to this, I
> don't have any great suggestions. To summarize, there are several options:
>
> - Add an "fast_tls" option that the user can enable if they know the
> server supports it
>
> - First connect in old-fashioned way, and remember the server version.
> Later, if you reconnect to the same server, use implicit TLS if the
> server version was high enough. This would be most useful for connection
> pools.

Vladimir pointed out that this doesn't necessarily work. The server
may be new enough to support it but it could be behind a proxy like
pgbouncer or something. The same would be true if the server reported
a "connection option" instead of depending on version.

> - Connect both ways at the same time, and continue with the fastest,
> i.e. "happy eyeballs"

That seems way too complex for us to bother with imho.

> - Try implicit TLS first, and fall back to explicit TLS if it fails.

> For libpq, we don't necessarily need to do anything right now. We can
> add the implicit TLS support in a later version. Not having libpq
> support makes it hard to test the server codepath, though. Maybe just
> test it with 'stunnel' or 'openssl s_client'.

I think we should have an option to explicitly enable it in psql, if
only for testing. And then wait five years and switch the default on
it then. In the meantime users can just set it based on their setup.
That's not the way to the quickest adoption but i

Re: Experiments with Postgres and SSL

2023-02-28 Thread Jacob Champion
On Tue, Feb 28, 2023 at 10:33 AM Greg Stark  wrote:
> On Wed, 22 Feb 2023 at 07:27, Heikki Linnakangas  wrote:
> > Good idea. Do we want to just require the protocol to be "postgres", or
> > perhaps "postgres/3.0"? Need to register that with IANA, I guess.
>
> I had never heard of this before, it does seem useful. But if I
> understand it right it's entirely independent of this patch.

It can be. If you want to use it in the strongest possible way,
though, you'd have to require its use by clients. Introducing that
requirement later would break existing ones, so I think it makes sense
to do it at the same time as the initial implementation, if there's
interest.

> We can
> add it to all our Client/Server exchanges whether they're the initial
> direct SSL connection or the STARTTLS negotiation?

I'm not sure it would buy you anything during the STARTTLS-style
opening. You already know what protocol you're speaking in that case.
(So with the ALPACA example, the damage is already done.)

Thanks,
--Jacob




Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 08.04.24 10:38, Heikki Linnakangas wrote:

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is 
"postgres".  This seems confusing.






Re: Experiments with Postgres and SSL

2024-04-24 Thread Peter Eisentraut

On 01.03.24 22:49, Jacob Champion wrote:

If we're interested in ALPN negotiation in the future, we may also
want to look at GREASE [1] to keep those options open in the presence
of third-party implementations. Unfortunately OpenSSL doesn't do this
automatically yet.

If we don't have a reason not to, it'd be good to follow the strictest
recommendations from [2] to avoid cross-protocol attacks. (For anyone
currently running web servers and Postgres on the same host, they
really don't want browsers "talking" to their Postgres servers.) That
would mean checking the negotiated ALPN on both the server and client
side, and failing if it's not what we expect.


I've been reading up on ALPN.  There is another thread that is 
discussing PostgreSQL protocol version negotiation, and ALPN also has 
"protocol negotiation" in the name and there is some discussion in this 
thread about the granularity oft the protocol names.


I'm concerned that there appears to be some confusion over whether ALPN 
is a performance feature or a security feature.  RFC 7301 appears to be 
pretty clear that it's for performance, not for security.


Looking at the ALPACA attack, I'm not convinced that it's very relevant 
for PostgreSQL.  It's basically just a case of, you connected to the 
wrong server.  And web browsers routinely open additional connections 
based on what data they have previously received, and they liberally 
send along session cookies to those new connections, so I understand 
that this can be a problem.  But I don't see how ALPN is a good defense. 
 It can help only if all other possible services other than http 
implement it and say, you're a web browser, go away.  And what if the 
rogue server is in fact a web server, then it doesn't help at all.  I 
guess there could be some common configurations where there is a web 
server, and ftp server, and some mail servers running on the same TLS 
end point.  But in how many cases is there also a PostgreSQL server 
running on the same end point?  The page about ALPACA also suggests SNI 
as a mitigation, which seems more sensible, because the burden is then 
on the client to do the right thing, and not on all other servers to 
send away clients doing the wrong thing.  And of course libpq already 
supports SNI.


For the protocol negotiation aspect, how does this work if the wrapped 
protocol already has a version negotiation system?  For example, various 
HTTP versions are registered as separate protocols for ALPN.  What if 
ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or 
vice versa?  What is the actual mechanism where the performance benefits 
(saving round-trips) are created?  I haven't caught up with HTTP 2 and 
so on, so maybe there are additional things at play there, but it is not 
fully explained in the RFCs.  I suppose PostgreSQL would keep its 
internal protocol version negotiation in any case, but then what do we 
need ALPN on top for?






Re: Experiments with Postgres and SSL

2024-04-24 Thread Jacob Champion
On Wed, Apr 24, 2024 at 1:57 PM Peter Eisentraut  wrote:
> I'm concerned that there appears to be some confusion over whether ALPN
> is a performance feature or a security feature.  RFC 7301 appears to be
> pretty clear that it's for performance, not for security.

It was also designed to give benefits for more complex topologies
(proxies, cert selection, etc.), but yeah, this is a mitigation
technique that just uses what is already widely implemented.

> Looking at the ALPACA attack, I'm not convinced that it's very relevant
> for PostgreSQL.  It's basically just a case of, you connected to the
> wrong server.

I think that's an oversimplification. This prevents active MITM, where
an adversary has connected you to the wrong server.

> But I don't see how ALPN is a good defense.
>   It can help only if all other possible services other than http
> implement it and say, you're a web browser, go away.

Why? An ALPACA-aware client will fail the connection if the server
doesn't advertise the correct protocol. An ALPACA-aware server will
fail the handshake if the client doesn't advertise the correct
protocol. They protect themselves, and their peers, without needing
their peers to understand.

> And what if the
> rogue server is in fact a web server, then it doesn't help at all.

It's not a rogue server; the attack is using other friendly services
against you. If you're able to set up an attacker-controlled server,
using the same certificate as the valid server, on a host covered by
the cert, I think it's game over for many other reasons.

If you mean that you can't prevent an attacker from redirecting one
web server's traffic to another (friendly) web server that's running
on the same host, that's correct. Web admins who care would need to
implement countermeasures, like Origin header filtering or something?
I don't think we have a similar concept to that -- it'd be nice! --
but we don't need to have one in order to provide protection for the
other network protocols we exist next to.

> I
> guess there could be some common configurations where there is a web
> server, and ftp server, and some mail servers running on the same TLS
> end point.  But in how many cases is there also a PostgreSQL server
> running on the same end point?

Not only have I seen those cohosted, I've deployed such setups myself.
Isn't that basically cPanel's MO, and a standard setup for ? (It's been a while and I don't have a setup
handy to double-check, sorry; feel free to push back if they don't do
that anymore.)

A quick search for "running web server and Postgres on the same host"
seems to yield plenty of conversations. Some of those conversations
say "don't do it", but of course others do not :) Some actively
encourage it for simplicity.

> The page about ALPACA also suggests SNI
> as a mitigation, which seems more sensible, because the burden is then
> on the client to do the right thing, and not on all other servers to
> send away clients doing the wrong thing.  And of course libpq already
> supports SNI.

That mitigates a different attack. From the ALPACA site [1]:

> Implementing these [ALPN] countermeasures is effective in preventing 
> cross-protocol attacks irregardless of hostnames and ports used for 
> application servers.
> ...
> Implementing these [SNI] countermeasures is effective in preventing 
> same-protocol attacks on servers with different hostnames, as well as 
> cross-protocol attacks on servers with different hostnames even if the ALPN 
> countermeasures can not be implemented.

SNI is super useful; it's just not always enough. And a strict SNI
check would also be good to do, but it doesn't seem imperative to tie
it to this feature, since same-protocol attacks were already possible
AFAICT. It's the cross-protocol attacks that are new, made possible by
the new handshake.

> For the protocol negotiation aspect, how does this work if the wrapped
> protocol already has a version negotiation system?  For example, various
> HTTP versions are registered as separate protocols for ALPN.  What if
> ALPN says it's HTTP/1.0 but the actual HTTP requests specify 1.1, or
> vice versa?

If a client or server incorrectly negotiates a protocol and then
starts speaking a different one, then it's just protocol-dependent
whether that works or not. HTTP/1.0 and HTTP/1.1 would still be
cross-compatible in some cases. The others, not so much.

> What is the actual mechanism where the performance benefits
> (saving round-trips) are created?

The negotiation gets done as part of the TLS handshake, which had to
be done anyway.

> I haven't caught up with HTTP 2 and
> so on, so maybe there are additional things at play there, but it is not
> fully explained in the RFCs.

Practically speaking, HTTP/2 is negotiated via ALPN in the real world,
at least last I checked. I don't think browsers ever supported the
plaintext h2c:// scheme. There's also an in-band `Upgrade: h2c` path
defined that does not use ALPN at all, but again I don't think any
b

Re: Experiments with Postgres and SSL

2024-04-25 Thread Heikki Linnakangas

On 24/04/2024 23:51, Peter Eisentraut wrote:

On 08.04.24 10:38, Heikki Linnakangas wrote:

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that:
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


Why did you ask for "pgsql"?  The IANA protocol name for port 5432 is
"postgres".  This seems confusing.


Oh, I was not aware of that. According to [1], it's actually 
"postgresql". The ALPN registration has not been approved yet, so I'll 
reply on the ietf thread to point that out.


[1] 
https://www.iana.org/assignments/service-names-port-numbers/service-names-port-numbers.txt


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2023-07-04 Thread Heikki Linnakangas

On 31/03/2023 10:59, Greg Stark wrote:

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.


+1 on removing the variable. Let's make ALPN mandatory for direct SSL 
connections, like Jacob suggested. And for old-style handshakes, accept 
and check ALPN if it's given.


I don't see the point of the libpq 'sslalpn' option either. Let's send 
ALPN always.


Admittedly having the options make testing different of combinations of 
old and new clients and servers a little easier. But I don't think we 
should add options for the sake of backwards compatibility tests.



--- a/src/backend/libpq/pqcomm.c
+++ b/src/backend/libpq/pqcomm.c
@@ -1126,13 +1126,16 @@ pq_discardbytes(size_t len)
 /* 
  * pq_buffer_has_data  - is any buffered data 
available to read?
  *
- * This will *not* attempt to read more data.
+ * Actually returns the number of bytes in the buffer...
+ *
+ * This will *not* attempt to read more data. And reading up to that number of
+ * bytes should not cause reading any more data either.
  * 
  */
-bool
+size_t
 pq_buffer_has_data(void)
 {
-   return (PqRecvPointer < PqRecvLength);
+   return (PqRecvLength - PqRecvPointer);
 }


Let's rename the function.


/* push unencrypted buffered data back through SSL setup */
len = pq_buffer_has_data();
if (len > 0)
{
buf = palloc(len);
if (pq_getbytes(buf, len) == EOF)
return STATUS_ERROR; /* shouldn't be possible */
port->raw_buf = buf;
port->raw_buf_remaining = len;
port->raw_buf_consumed = 0;
}

Assert(pq_buffer_has_data() == 0);
if (secure_open_server(port) == -1)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("SSL Protocol Error during direct 
SSL connection initiation")));
return STATUS_ERROR;
}

if (port->raw_buf_remaining > 0)
{
ereport(COMMERROR,
(errcode(ERRCODE_PROTOCOL_VIOLATION),
 errmsg("received unencrypted data after SSL 
request"),
 errdetail("This could be either a 
client-software bug or evidence of an attempted man-in-the-middle attack.")));
return STATUS_ERROR;
}
if (port->raw_buf)
pfree(port->raw_buf);


This pattern is repeated in both callers of secure_open_server(). Could 
we move this into secure_open_server() itself? That would feel pretty 
natural, be-secure.c already contains the secure_raw_read() function 
that reads the 'raw_buf' field.



const char *
PQsslAttribute(PGconn *conn, const char *attribute_name)
{
...

if (strcmp(attribute_name, "alpn") == 0)
{
const unsigned char *data;
unsigned int len;
static char alpn_str[256]; /* alpn doesn't support longer than 
255 bytes */
SSL_get0_alpn_selected(conn->ssl, &data, &len);
if (data == NULL || len==0 || len > sizeof(alpn_str)-1)
return NULL;
memcpy(alpn_str, data, len);
alpn_str[len] = 0;
return alpn_str;
}


Using a static buffer doesn't look right. If you call PQsslAttribute on 
two different connections from two different threads concurrently, they 
will write to the same buffer. I see that you copied it from the 
"key_bits" handlng, but it has the same issue.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2023-07-04 Thread Michael Paquier
On Tue, Jul 04, 2023 at 05:15:49PM +0300, Heikki Linnakangas wrote:
> I don't see the point of the libpq 'sslalpn' option either. Let's send ALPN
> always.
> 
> Admittedly having the options make testing different of combinations of old
> and new clients and servers a little easier. But I don't think we should add
> options for the sake of backwards compatibility tests.

Hmm.  I would actually argue in favor of having these with tests in
core to stress the previous SSL hanshake protocol, as not having these
parameters would mean that we rely only on major version upgrades in
the buildfarm to test the backward-compatible code path, making issues
much harder to catch.  And we still need to maintain the
backward-compatible path for 10 years based on what pg_dump and
pg_upgrade need to support.
--
Michael


signature.asc
Description: PGP signature


Re: Experiments with Postgres and SSL

2023-03-16 Thread Greg Stark
Here's an updated patch for direct SSL connections.

I've added libpq client support with a new connection parameter. This
allows testing it easily with psql. It's still a bit hard to see
what's going on though. I'm thinking it would be good to have libpq
keep a string which describes what negotiations were attempted and
failed and what was eventually accepted which psql could print with
the SSL message or expose some other way.

In the end I didn't see how adding an API for this really helped any
more than just saying the API is to stuff the unread data into the
Port structure. So I just documented that. If anyone has any better
idea...

I added documentation for the libpq connection setting.

One thing, I *think* it's ok to replace the send(2) call with
secure_write() in the negotiation. It does mean it's possible for the
connection to fail with FATAL at that point instead of COMMERROR but I
don't think that's a problem.

I haven't added tests. I'm not sure how to test this since to test it
properly means running the server with every permutation of ssl and
gssapi configurations.

Incidentally, some of the configuration combinations -- namely
sslnegotiation=direct and default gssencmode and sslmode results in a
counter-intuitive behaviour. But I don't see a better option that
doesn't mean making the defaults less useful.
From b07e19223bee52b7bb9b50afea39e4baaa0a46f3 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 16 Mar 2023 15:10:15 -0400
Subject: [PATCH v2 3/3] Direct SSL connections documentation

---
 doc/src/sgml/libpq.sgml | 102 
 1 file changed, 93 insertions(+), 9 deletions(-)

diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
index 3706d349ab..e2f0891ea5 100644
--- a/doc/src/sgml/libpq.sgml
+++ b/doc/src/sgml/libpq.sgml
@@ -1701,10 +1701,13 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 Note that if GSSAPI encryption is possible,
 that will be used in preference to SSL
 encryption, regardless of the value of sslmode.
-To force use of SSL encryption in an
-environment that has working GSSAPI
-infrastructure (such as a Kerberos server), also
-set gssencmode to disable.
+To negotiate SSL encryption in an environment that
+has working GSSAPI infrastructure (such as a
+Kerberos server), also set gssencmode
+to disable.
+Use of non-default values of sslnegotiation can
+also cause SSL to be used instead of
+negotiating GSSAPI encryption.

   
  
@@ -1731,6 +1734,75 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
   
  
 
+ 
+  sslnegotiation
+  
+   
+This option controls whether PostgreSQL
+will perform its protocol negotiation to request encryption from the
+server or will just directly make a standard SSL
+connection.  Traditional PostgreSQL
+protocol negotiation is the default and the most flexible with
+different server configurations. If the server is known to support
+direct SSL connections then the latter requires one
+fewer round trip reducing connection latency and also allows the use
+of protocol agnostic SSL network tools.
+   
+
+
+ 
+  postgres
+  
+   
+ perform PostgreSQL protocol
+ negotiation. This is the default if the option is not provided.
+   
+  
+ 
+
+ 
+  direct
+  
+   
+ first attempt to establish a standard SSL connection and if that
+ fails reconnect and perform the negotiation. This fallback
+ process adds significant latency if the initial SSL connection
+ fails.
+   
+  
+ 
+
+ 
+  requiredirect
+  
+   
+ attempt to establish a standard SSL connection and if that fails
+ return a connection failure immediately.
+   
+  
+ 
+
+
+   
+If sslmode set to disable
+or allow then sslnegotiation is
+ignored. If gssencmode is set
+to require then sslnegotiation
+must be the default postgres value.
+   
+
+   
+Moreover, note if gssencmode is set
+to prefer and sslnegotiation
+to direct then the effective preference will be
+direct SSL connections, followed by
+negotiated GSS connections, followed by
+negotiated SSL connections, possibly followed
+lastly by unencrypted connections.
+   
+  
+ 
+
  
   sslcompression
   
@@ -1884,11 +1956,13 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
 

 The Server Name Indication can be used by SSL-aware proxies to route
-connections without having to decrypt the SSL stream.  (Note that this
-requires a proxy that is a

Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark
Here's a first cut at ALPN support.

Currently it's using a hard coded "Postgres/3.0" protocol (hard coded
both in the client and the server...). And it's hard coded to be
required for direct connections and supported but not required for
regular connections.

IIRC I put a variable labeled a "GUC" but forgot to actually make it a
GUC. But I'm thinking of maybe removing that variable since I don't
see much of a use case for controlling this manually. I *think* ALPN
is supported by all the versions of OpenSSL we support.

The other patches are unchanged (modulo a free() that I missed in the
client before). They still have the semi-open issues I mentioned in
the previous email.




--
greg
From 32185020927824c4b57af900100a37f92ae6a040 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v3 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -61,6 +61,9 @@ bool		SSLPreferServerCiphers;
 int		

Re: Experiments with Postgres and SSL

2023-03-20 Thread Greg Stark
Sorry, checking the cfbot apparently I had a typo in the #ifndef USE_SSL case.
From 4b6e01c7f569a919d660cd80ce64cb913bc9f220 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v4 4/4] alpn support

---
 src/backend/libpq/be-secure-openssl.c| 65 
 src/backend/libpq/be-secure.c|  3 ++
 src/backend/postmaster/postmaster.c  | 23 +
 src/bin/psql/command.c   |  7 ++-
 src/include/libpq/libpq-be.h |  1 +
 src/include/libpq/libpq.h|  1 +
 src/interfaces/libpq/fe-connect.c|  5 ++
 src/interfaces/libpq/fe-secure-openssl.c | 25 +
 src/interfaces/libpq/libpq-int.h |  1 +
 9 files changed, 129 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..034e1cf2ec 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,11 @@ be_tls_open_server(Port *port)
 	/* set up debugging/info callback */
 	SSL_CTX_set_info_callback(SSL_context, info_cb);
 
+	if (ssl_enable_alpn) {
+		elog(WARNING, "Enabling ALPN Callback");
+		SSL_CTX_set_alpn_select_cb(SSL_context, alpn_cb, port);
+	}
+
 	if (!(port->ssl = SSL_new(SSL_context)))
 	{
 		ereport(COMMERROR,
@@ -1250,6 +1261,60 @@ info_cb(const SSL *ssl, int type, int args)
 	}
 }
 
+static unsigned char alpn_protos[] = {
+	12, 'P','o','s','t','g','r','e','s','/','3','.','0'
+};
+static unsigned int alpn_protos_len = sizeof(alpn_protos);
+
+/*
+ * Server callback for ALPN negotiation. We use use the standard "helper"
+ * function even though currently we only accept one value. We store the
+ * negotiated protocol in Port->ssl_alpn_protocol and rely on higher level
+ * logic (in postmaster.c) to decide what to do with that info.
+ *
+ * XXX Not sure if it's kosher to use palloc and elog() inside OpenSSL
+ * callbacks. If we throw an error from here is there a risk of messing up
+ * OpenSSL data structures? It's possible it's ok becuase this is only called
+ * during connection negotiation which we don't try to recover from.
+ */
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata)
+{
+	/* why does OpenSSL provide a helper function that requires a nonconst
+	 * vector when the callback is declared to take a const vector? What are
+	 * we to do with that?*/
+	int retval;
+	Assert(userdata != NULL);
+	Assert(out != NULL);
+	Assert(outlen != NULL);
+	Assert(in != NULL);
+
+	retval = SSL_select_next_proto((unsigned char **)out, outlen,
+   alpn_protos, alpn_protos_len,
+   in, inlen);
+	if (*out == NULL || *outlen > alpn_protos_len || outlen <= 0)
+		elog(ERROR, "SSL_select_next_proto returned nonsensical results");
+
+	if (retval == OPENSSL_NPN_NEGOTIATED)
+	{
+		struct Port *port = (struct Port *)userdata;
+		
+		port->ssl_alpn_protocol = palloc0(*outlen+1);
+		/* SSL uses unsigned chars but Postgres uses host signedness of chars */
+		strncpy(port->ssl_alpn_protocol, (char*)*out, *outlen);
+		return SSL_TLSEXT_ERR_OK;
+	} else if (retval == OPENSSL_NPN_NO_OVERLAP) {
+		return SSL_TLSEXT_ERR_NOACK;
+	} else {
+		return SSL_TLSEXT_ERR_NOACK;
+	}
+}
+
+
 /*
  * Set DH parameters for generating ephemeral DH keys.  The
  * DH parameters can take a long time to compute, so they must be
diff --git a/src/backend/libpq/be-secure.c b/src/backend/libpq/be-secure.c
index 1020b3adb0..79a61900ba 100644
--- a/src/backend/libpq/be-secure.c
+++ b/src/backend/libpq/be-secure.c
@@ -61,6 +61,9 @@ bool		SSLPreferServerCiphers;
 int			ssl_min_protocol_version = PG_TLS1_2_VERSION;
 int			ssl_max_protocol_version = PG_TLS_ANY;
 
+/* GUC variable: if false ignore ALPN negotiation */
+bool		ssl_enable_alpn = true;
+
 /*  */
 /*			 Procedures common to all secure sessions			*/
 /*  */
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index ec1d895a23..2640b69fed 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -1934,6 +1934,8 @@ ServerLoop(void)

Re: Experiments with Postgres and SSL

2023-03-31 Thread Greg Stark
On Mon, 20 Mar 2023 at 16:31, Greg Stark  wrote:
>
> Here's a first cut at ALPN support.
>
> Currently it's using a hard coded "Postgres/3.0" protocol

Apparently that is explicitly disrecommended by the IETF folk. They
want something like "TBD" so people don't start using a string until
it's been added to the registry. So I've changed this for now (to
"TBD-pgsql")

Ok, I think this has pretty much everything I was hoping to do.

The one thing I'm not sure of is it seems some codepaths in postmaster
have ereport(COMMERROR) followed by returning an error whereas other
codepaths just have ereport(FATAL). And I don't actually see much
logic in which do which. (I get the principle behind COMMERR it just
seems like it doesn't really match the code).

I realized I had exactly the infrastructure needed to allow pipelining
the SSL ClientHello like Neon wanted to do so I added that too. It's
kind of redundant with direct SSL connections but seems like there may
be reasons to use that instead.



-- 
greg
From 083df15eff52f025064e2879a404270e405f7dde Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Thu, 16 Mar 2023 11:55:16 -0400
Subject: [PATCH v5 2/6] Direct SSL connections client support

---
 src/interfaces/libpq/fe-connect.c | 91 +--
 src/interfaces/libpq/libpq-fe.h   |  1 +
 src/interfaces/libpq/libpq-int.h  |  3 +
 3 files changed, 89 insertions(+), 6 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
index bb7347cb0c..7cd0eb261f 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -274,6 +274,10 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
 		"SSL-Mode", "", 12,		/* sizeof("verify-full") == 12 */
 	offsetof(struct pg_conn, sslmode)},
 
+	{"sslnegotiation", "PGSSLNEGOTIATION", "postgres", NULL,
+		"SSL-Negotiation", "", 14,		/* strlen("requiredirect") == 14 */
+	offsetof(struct pg_conn, sslnegotiation)},
+
 	{"sslcompression", "PGSSLCOMPRESSION", "0", NULL,
 		"SSL-Compression", "", 1,
 	offsetof(struct pg_conn, sslcompression)},
@@ -1504,11 +1508,36 @@ connectOptions2(PGconn *conn)
 		}
 #endif
 	}
+
+	/*
+	 * validate sslnegotiation option, default is "postgres" for the postgres
+	 * style negotiated connection with an extra round trip but more options.
+	 */
+	if (conn->sslnegotiation)
+	{
+		if (strcmp(conn->sslnegotiation, "postgres") != 0
+			&& strcmp(conn->sslnegotiation, "direct") != 0
+			&& strcmp(conn->sslnegotiation, "requiredirect") != 0)
+		{
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "invalid %s value: \"%s\"",
+	"sslnegotiation", conn->sslnegotiation);
+			return false;
+		}
+
+#ifndef USE_SSL
+		if (conn->sslnegotiation[0] != 'p') {
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "sslnegotiation value \"%s\" invalid when SSL support is not compiled in",
+	conn->sslnegotiation);
+			return false;
+		}
+#endif
+	}
 	else
 	{
-		conn->sslmode = strdup(DefaultSSLMode);
-		if (!conn->sslmode)
-			goto oom_error;
+		libpq_append_conn_error(conn, "sslnegotiation missing?");
+		return false;
 	}
 
 	/*
@@ -1614,6 +1643,18 @@ connectOptions2(PGconn *conn)
 	conn->gssencmode);
 			return false;
 		}
+#endif
+#ifdef USE_SSL
+		/* GSS is incompatible with direct SSL connections so it requires the
+		 * default postgres style connection ssl negotiation  */
+		if (strcmp(conn->gssencmode, "require") == 0 &&
+			strcmp(conn->sslnegotiation, "postgres") != 0)
+		{
+			conn->status = CONNECTION_BAD;
+			libpq_append_conn_error(conn, "gssencmode value \"%s\" invalid when Direct SSL negotiation is enabled",
+	conn->gssencmode);
+			return false;
+		}
 #endif
 	}
 	else
@@ -2747,11 +2788,12 @@ keep_going:		/* We will come back to here until there is
 		/* initialize these values based on SSL mode */
 		conn->allow_ssl_try = (conn->sslmode[0] != 'd');	/* "disable" */
 		conn->wait_ssl_try = (conn->sslmode[0] == 'a'); /* "allow" */
+		/* direct ssl is incompatible with "allow" or "disabled" ssl */
+		conn->allow_direct_ssl_try = conn->allow_ssl_try && !conn->wait_ssl_try && (conn->sslnegotiation[0] != 'p');
 #endif
 #ifdef ENABLE_GSS
 		conn->try_gss = (conn->gssencmode[0] != 'd');	/* "disable" */
 #endif
-
 		reset_connection_state_machine = false;
 		need_new_connection = true;
 	}
@@ -3209,6 +3251,28 @@ keep_going:		/* We will come back to here until there is
 if (pqsecure_initialize(conn, false, true) < 0)
 	goto error_return;
 
+/* If SSL is enabled and direct SSL connections are enabled
+ * and we haven't already established an SSL connection (or
+ * already tried a direct connection and failed or succeeded)
+ * then try just enabling SSL directly.
+ *
+ * If we fail then we'll either fail the connection (if
+ * sslnegotiation is set to requiredirect or turn
+ * allow_direct_ssl_try to false
+ */
+if (conn->allow_ssl_try
+	&& !conn->wait_ssl_

Re: Experiments with Postgres and SSL

2023-03-31 Thread Greg Stark
And the cfbot wants a small tweak
From 3d0a502c25504da32b7a362831c700b4e891f79b Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Fri, 31 Mar 2023 02:31:31 -0400
Subject: [PATCH v6 5/6] Allow pipelining data after ssl request

---
 src/backend/postmaster/postmaster.c | 54 ++---
 1 file changed, 42 insertions(+), 12 deletions(-)

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 9b4b37b997..33b317f98b 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -2001,13 +2001,14 @@ ProcessSSLStartup(Port *port)
 
 		if (port->raw_buf_remaining > 0)
 		{
-			/* This shouldn't be possible -- it would mean the client sent
-			 * encrypted data before we established a session key...
-			 */
-			elog(LOG, "Buffered unencrypted data remains after negotiating native SSL connection");
+			ereport(COMMERROR,
+	(errcode(ERRCODE_PROTOCOL_VIOLATION),
+	 errmsg("received unencrypted data after SSL request"),
+	 errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middle attack.")));
 			return STATUS_ERROR;
 		}
-		pfree(port->raw_buf);
+		if (port->raw_buf)
+			pfree(port->raw_buf);
 
 		if (port->ssl_alpn_protocol == NULL)
 		{
@@ -2158,15 +2159,44 @@ ProcessStartupPacket(Port *port, bool ssl_done, bool gss_done)
 		}
 
 #ifdef USE_SSL
-		if (SSLok == 'S' && secure_open_server(port) == -1)
-			return STATUS_ERROR;
+		if (SSLok == 'S')
+		{
+			/* push unencrypted buffered data back through SSL setup */
+			len = pq_buffer_has_data();
+			if (len > 0)
+			{
+buf = palloc(len);
+if (pq_getbytes(buf, len) == EOF)
+	return STATUS_ERROR; /* shouldn't be possible */
+port->raw_buf = buf;
+port->raw_buf_remaining = len;
+port->raw_buf_consumed = 0;
+			}
+			Assert(pq_buffer_has_data() == 0);
+
+			if (secure_open_server(port) == -1)
+return STATUS_ERROR;
+			
+			/*
+			 * At this point we should have no data already buffered.  If we do,
+			 * it was received before we performed the SSL handshake, so it wasn't
+			 * encrypted and indeed may have been injected by a man-in-the-middle.
+			 * We report this case to the client.
+			 */
+			if (port->raw_buf_remaining > 0)
+ereport(FATAL,
+		(errcode(ERRCODE_PROTOCOL_VIOLATION),
+		 errmsg("received unencrypted data after SSL request"),
+		 errdetail("This could be either a client-software bug or evidence of an attempted man-in-the-middle attack.")));
+			if (port->raw_buf)
+pfree(port->raw_buf);
+		}
 #endif
 
-		/*
-		 * At this point we should have no data already buffered.  If we do,
-		 * it was received before we performed the SSL handshake, so it wasn't
-		 * encrypted and indeed may have been injected by a man-in-the-middle.
-		 * We report this case to the client.
+		/* This can only really occur now if there was data pipelined after
+		 * the SSL Request but we have refused to do SSL. In that case we need
+		 * to give up because the client has presumably assumed the SSL
+		 * request would be accepted.
 		 */
 		if (pq_buffer_has_data())
 			ereport(FATAL,
-- 
2.40.0

From 5413f1a1ee897640bd3bb99bae226eec7e2e9f50 Mon Sep 17 00:00:00 2001
From: Greg Stark 
Date: Mon, 20 Mar 2023 14:09:30 -0400
Subject: [PATCH v6 4/6] Direct SSL connections ALPN support

---
 src/backend/libpq/be-secure-openssl.c | 66 +++
 src/backend/libpq/be-secure.c |  3 +
 src/backend/postmaster/postmaster.c   | 25 +++
 src/backend/utils/misc/guc_tables.c   |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  1 +
 src/bin/psql/command.c|  7 +-
 src/include/libpq/libpq-be.h  |  1 +
 src/include/libpq/libpq.h |  1 +
 src/include/libpq/pqcomm.h| 19 ++
 src/interfaces/libpq/fe-connect.c |  5 ++
 src/interfaces/libpq/fe-secure-openssl.c  | 31 +
 src/interfaces/libpq/libpq-int.h  |  1 +
 12 files changed, 167 insertions(+), 2 deletions(-)

diff --git a/src/backend/libpq/be-secure-openssl.c b/src/backend/libpq/be-secure-openssl.c
index 685aa2ed69..620ffafb0b 100644
--- a/src/backend/libpq/be-secure-openssl.c
+++ b/src/backend/libpq/be-secure-openssl.c
@@ -67,6 +67,12 @@ static int	ssl_external_passwd_cb(char *buf, int size, int rwflag, void *userdat
 static int	dummy_ssl_passwd_cb(char *buf, int size, int rwflag, void *userdata);
 static int	verify_cb(int ok, X509_STORE_CTX *ctx);
 static void info_cb(const SSL *ssl, int type, int args);
+static int  alpn_cb(SSL *ssl,
+	const unsigned char **out,
+	unsigned char *outlen,
+	const unsigned char *in,
+	unsigned int inlen,
+	void *userdata);
 static bool initialize_dh(SSL_CTX *context, bool isServerStart);
 static bool initialize_ecdh(SSL_CTX *context, bool isServerStart);
 static const char *SSLerrmessage(unsigned long ecode);
@@ -432,6 +438,13 @@ be_tls_open_

Re: Experiments with Postgres and SSL

2024-03-28 Thread Matthias van de Meent
On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas  wrote:
>
> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
> some time rebase and fix various little things:

With the recent changes to backend startup committed by you, this
patchset has gotten major apply failures.

Could you provide a new version of the patchset so that it can be
reviewed in the context of current HEAD?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Experiments with Postgres and SSL

2024-03-28 Thread Heikki Linnakangas

On 28/03/2024 13:15, Matthias van de Meent wrote:

On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas  wrote:


I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
some time rebase and fix various little things:


With the recent changes to backend startup committed by you, this
patchset has gotten major apply failures.

Could you provide a new version of the patchset so that it can be
reviewed in the context of current HEAD?


Here you are.

--
Heikki Linnakangas
Neon (https://neon.tech)
From 83484696e470ab130bcd3038f0e28d494065071a Mon Sep 17 00:00:00 2001
From: Heikki Linnakangas 
Date: Tue, 5 Mar 2024 15:40:30 +0200
Subject: [PATCH v9 1/6] Move Kerberos module

So that we can reuse it in new tests.
---
 src/test/kerberos/t/001_auth.pl   | 174 ++--
 src/test/perl/PostgreSQL/Test/Kerberos.pm | 229 ++
 2 files changed, 240 insertions(+), 163 deletions(-)
 create mode 100644 src/test/perl/PostgreSQL/Test/Kerberos.pm

diff --git a/src/test/kerberos/t/001_auth.pl b/src/test/kerberos/t/001_auth.pl
index e51e87d0a2e..d4f1ec58092 100644
--- a/src/test/kerberos/t/001_auth.pl
+++ b/src/test/kerberos/t/001_auth.pl
@@ -21,6 +21,7 @@ use strict;
 use warnings FATAL => 'all';
 use PostgreSQL::Test::Utils;
 use PostgreSQL::Test::Cluster;
+use PostgreSQL::Test::Kerberos;
 use Test::More;
 use Time::HiRes qw(usleep);
 
@@ -34,177 +35,27 @@ elsif (!$ENV{PG_TEST_EXTRA} || $ENV{PG_TEST_EXTRA} !~ /\bkerberos\b/)
 	  'Potentially unsafe test GSSAPI/Kerberos not enabled in PG_TEST_EXTRA';
 }
 
-my ($krb5_bin_dir, $krb5_sbin_dir);
-
-if ($^O eq 'darwin' && -d "/opt/homebrew")
-{
-	# typical paths for Homebrew on ARM
-	$krb5_bin_dir = '/opt/homebrew/opt/krb5/bin';
-	$krb5_sbin_dir = '/opt/homebrew/opt/krb5/sbin';
-}
-elsif ($^O eq 'darwin')
-{
-	# typical paths for Homebrew on Intel
-	$krb5_bin_dir = '/usr/local/opt/krb5/bin';
-	$krb5_sbin_dir = '/usr/local/opt/krb5/sbin';
-}
-elsif ($^O eq 'freebsd')
-{
-	$krb5_bin_dir = '/usr/local/bin';
-	$krb5_sbin_dir = '/usr/local/sbin';
-}
-elsif ($^O eq 'linux')
-{
-	$krb5_sbin_dir = '/usr/sbin';
-}
-
-my $krb5_config = 'krb5-config';
-my $kinit = 'kinit';
-my $klist = 'klist';
-my $kdb5_util = 'kdb5_util';
-my $kadmin_local = 'kadmin.local';
-my $krb5kdc = 'krb5kdc';
-
-if ($krb5_bin_dir && -d $krb5_bin_dir)
-{
-	$krb5_config = $krb5_bin_dir . '/' . $krb5_config;
-	$kinit = $krb5_bin_dir . '/' . $kinit;
-	$klist = $krb5_bin_dir . '/' . $klist;
-}
-if ($krb5_sbin_dir && -d $krb5_sbin_dir)
-{
-	$kdb5_util = $krb5_sbin_dir . '/' . $kdb5_util;
-	$kadmin_local = $krb5_sbin_dir . '/' . $kadmin_local;
-	$krb5kdc = $krb5_sbin_dir . '/' . $krb5kdc;
-}
-
-my $host = 'auth-test-localhost.postgresql.example.com';
-my $hostaddr = '127.0.0.1';
-my $realm = 'EXAMPLE.COM';
-
-my $krb5_conf = "${PostgreSQL::Test::Utils::tmp_check}/krb5.conf";
-my $kdc_conf = "${PostgreSQL::Test::Utils::tmp_check}/kdc.conf";
-my $krb5_cache = "${PostgreSQL::Test::Utils::tmp_check}/krb5cc";
-my $krb5_log = "${PostgreSQL::Test::Utils::log_path}/krb5libs.log";
-my $kdc_log = "${PostgreSQL::Test::Utils::log_path}/krb5kdc.log";
-my $kdc_port = PostgreSQL::Test::Cluster::get_free_port();
-my $kdc_datadir = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc";
-my $kdc_pidfile = "${PostgreSQL::Test::Utils::tmp_check}/krb5kdc.pid";
-my $keytab = "${PostgreSQL::Test::Utils::tmp_check}/krb5.keytab";
-
 my $pgpass = "${PostgreSQL::Test::Utils::tmp_check}/.pgpass";
 
 my $dbname = 'postgres';
 my $username = 'test1';
 my $application = '001_auth.pl';
 
-note "setting up Kerberos";
-
-my ($stdout, $krb5_version);
-run_log [ $krb5_config, '--version' ], '>', \$stdout
-  or BAIL_OUT("could not execute krb5-config");
-BAIL_OUT("Heimdal is not supported") if $stdout =~ m/heimdal/;
-$stdout =~ m/Kerberos 5 release ([0-9]+\.[0-9]+)/
-  or BAIL_OUT("could not get Kerberos version");
-$krb5_version = $1;
-
 # Construct a pgpass file to make sure we don't use it
 append_to_file($pgpass, '*:*:*:*:abc123');
 
 chmod 0600, $pgpass or die $!;
 
-# Build the krb5.conf to use.
-#
-# Explicitly specify the default (test) realm and the KDC for
-# that realm to avoid the Kerberos library trying to look up
-# that information in DNS, and also because we're using a
-# non-standard KDC port.
-#
-# Also explicitly disable DNS lookups since this isn't really
-# our domain and we shouldn't be causing random DNS requests
-# to be sent out (not to mention that broken DNS environments
-# can cause the tests to take an extra long time and timeout).
-#
-# Reverse DNS is explicitly disabled to avoid any issue with a
-# captive portal or other cases where the reverse DNS succeeds
-# and the Kerberos library uses that as the canonical name of
-# the host and then tries to acquire a cross-realm ticket.
-append_to_file(
-	$krb5_conf,
-	qq![logging]
-default = FILE:$krb5_log
-kdc = FILE:$kdc_log
-
-[libdefaults]
-dns_lookup_realm = false
-dns_lookup_kdc = false
-default_realm = $realm
-forwardable = false
-rdns = fals

Re: Experiments with Postgres and SSL

2024-04-04 Thread Matthias van de Meent
On Thu, 28 Mar 2024, 13:37 Heikki Linnakangas,  wrote:
>
> On 28/03/2024 13:15, Matthias van de Meent wrote:
> > On Tue, 5 Mar 2024 at 15:08, Heikki Linnakangas  wrote:
> >>
> >> I hope I didn't joggle your elbow reviewing this, Jacob, but I spent
> >> some time rebase and fix various little things:
> >
> > With the recent changes to backend startup committed by you, this
> > patchset has gotten major apply failures.
> >
> > Could you provide a new version of the patchset so that it can be
> > reviewed in the context of current HEAD?
>
> Here you are.

Sorry for the delay. I've run some tests and didn't find any specific
issues in the patchset.

I did get sidetracked on trying to further improve the test suite,
where I was trying to find out how to use Test::More::subtests, but
have now decided it's not worth the lost time now vs adding this as a
feature in 17.

Some remaining comments:

patches 0001/0002: not reviewed in detail.

Patch 0003:

The read size in secure_raw_read is capped to port->raw_buf_remaining
if the raw buf has any data. While the user will probably call into
this function again, I think that's a waste of cycles.

pq_buffer_has_data now doesn't have any protections against
desynchronized state between PqRecvLength and PqRecvPointer. An
Assert(PqRecvLength >= PqRecvPointer) to that value would be
appreciated.

(in backend_startup.c)
> + elog(LOG, "Detected direct SSL handshake");

I think this should be gated at a lower log level, or a GUC, as this
wouls easily DOS a logfile by bulk sending of SSL handshake bytes.

0004:

backend_startup.c
> +if (!ssl_enable_alpn)
> +{
> +elog(WARNING, "Received direct SSL connection without 
> ssl_enable_alpn enabled");

This is too verbose, too.

> +   if (!port->alpn_used)
> +{
> +ereport(COMMERROR,
> +(errcode(ERRCODE_PROTOCOL_VIOLATION),
> + errmsg("Received direct SSL connection request without 
> required ALPN protocol negotiation extension")));

If ssl_enable_alpn is disabled, we shouln't report a COMMERROR when
the client does indeed not have alpn enabled.

0005:

As mentioned above, I'd have loved to use subtests here for the cube()
of tests, but I got in too much of a rabbit hole to get that done.

0006:

In CONNECTION_FAILED, we use connection_failed() to select whether we
need a new connection or stop trying altogether, but that function's
description states:

> + * Out-of-line portion of the CONNECTION_FAILED() macro
> + *
> + * Returns true, if we should retry the connection with different encryption 
> method.

Which to me reads like we should reuse the connection, and try a
different method on that same connection. Maybe we can improve the
wording to something like
+ * Returns true, if we should reconnect with a different encryption method.
to make the reconnect part more clear.

In select_next_encryption_method, there are several copies of this pattern:

if ((remaining_methods & ENC_METHOD) != 0)
{
conn->current_enc_method = ENC_METHOD;
return true;
}

I think a helper macro would reduce the verbosity of the scaffolding,
like in the attached SELECT_NEXT_METHOD.diff.txt.


Kind regards,

Matthias van de Meent
 src/interfaces/libpq/fe-connect.c | 51 +++
 1 file changed, 20 insertions(+), 31 deletions(-)

diff --git a/src/interfaces/libpq/fe-connect.c 
b/src/interfaces/libpq/fe-connect.c
index edc324dad0..ef95b07978 100644
--- a/src/interfaces/libpq/fe-connect.c
+++ b/src/interfaces/libpq/fe-connect.c
@@ -4344,6 +4344,15 @@ select_next_encryption_method(PGconn *conn, bool 
have_valid_connection)
 {
int remaining_methods;
 
+#define SELECT_NEXT_METHOD(method) \
+   do { \
+   if ((remaining_methods & method) != 0) \
+   { \
+   conn->current_enc_method = method; \
+   return true; \
+   } \
+   } while (false)
+
remaining_methods = conn->allowed_enc_methods & 
~conn->failed_enc_methods;
 
/*
@@ -4373,20 +4382,14 @@ select_next_encryption_method(PGconn *conn, bool 
have_valid_connection)
}
}
}
-   if ((remaining_methods & ENC_GSSAPI) != 0)
-   {
-   conn->current_enc_method = ENC_GSSAPI;
-   return true;
-   }
}
+
+   SELECT_NEXT_METHOD(ENC_GSSAPI);
 #endif
 
/* With sslmode=allow, try plaintext connection before SSL. */
-   if (conn->sslmode[0] == 'a' && (remaining_methods & ENC_PLAINTEXT) != 0)
-   {
-   conn->current_enc_method = ENC_PLAINTEXT;
-   return true;
-   }
+   if (conn->sslmode[0] == 'a')
+   SELECT_NEXT_METHOD(ENC_PLAINTEXT);
 
/*
 * Try SSL. If enabled, try direct SSL. Unless we have a valid TCP
@@ -4396,33 +4399,19 @@ select_n

Re: Experiments with Postgres and SSL

2024-04-07 Thread Heikki Linnakangas

Committed this. Thank you to everyone involved!

On 04/04/2024 14:08, Matthias van de Meent wrote:

Patch 0003:

The read size in secure_raw_read is capped to port->raw_buf_remaining
if the raw buf has any data. While the user will probably call into
this function again, I think that's a waste of cycles.


Hmm, yeah, I suppose we could read more data in the same call. It seems 
simpler not to. The case that "raw_buf_remaining > 0" is a very rare.



pq_buffer_has_data now doesn't have any protections against
desynchronized state between PqRecvLength and PqRecvPointer. An
Assert(PqRecvLength >= PqRecvPointer) to that value would be
appreciated.


Added.


0006:

In CONNECTION_FAILED, we use connection_failed() to select whether we
need a new connection or stop trying altogether, but that function's
description states:


+ * Out-of-line portion of the CONNECTION_FAILED() macro
+ *
+ * Returns true, if we should retry the connection with different encryption 
method.


Which to me reads like we should reuse the connection, and try a
different method on that same connection. Maybe we can improve the
wording to something like
+ * Returns true, if we should reconnect with a different encryption method.
to make the reconnect part more clear.


Changed to "Returns true, if we should reconnect and retry with a 
different encryption method".



In select_next_encryption_method, there are several copies of this pattern:

if ((remaining_methods & ENC_METHOD) != 0)
{
 conn->current_enc_method = ENC_METHOD;
 return true;
}

I think a helper macro would reduce the verbosity of the scaffolding,
like in the attached SELECT_NEXT_METHOD.diff.txt.


Applied.

In addition to the above, I made heavy changes to the tests. I wanted to 
test not just the outcome (SSL, GSSAPI, plaintext, or fail), but also 
the steps and reconnections needed to get there. To facilitate that, I 
rewrote how the expected outcome was represented in the test script. It 
now uses a table-driven approach, with a line for each test iteration, 
ie. for each different combination of options that are tested.


I then added some more logging, so that whenever the server receives an 
SSLRequest or GSSENCRequest packet, it logs a line. That's controlled by 
a new not-in-sample GUC ("trace_connection_negotiation"), intended only 
for the test and debugging. The test scrapes the log for the lines that 
it prints, and the expected output includes a compact trace of expected 
events. For example, the expected output for "user=testuser 
gssencmode=prefer sslmode=prefer sslnegotiation=direct", when GSS and 
SSL are both disabled in the server, looks like this:


# USER  GSSENCMODE  SSLMODE   SSLNEGOTIATION  EVENTS -> OUTCOME
testuserprefer  preferdirect  connect, 
directsslreject, reconnect, sslreject, authok  -> plain


That means, we expect libpq to first try direct SSL, which is rejected 
by the server. It should then reconnect and attempt traditional 
negotiated SSL, which is also rejected. Finally, it should try plaintext 
authentication, without reconnecting, which succeeds.


That actually revealed a couple of slightly bogus behaviors with the 
current code. Here's one example:


# XXX: libpq retries the connection unnecessarily in this case:
nogssuser   require allow connect, gssaccept, authfail, 
reconnect, gssaccept, authfail -> fail


That means, with "gssencmode=require sslmode=allow", if the server 
accepts the GSS encryption but refuses the connection at authentication, 
libpq will reconnect and go through the same motions again. The second 
attempt is pointless, we know it's going to fail. The refactoring to the 
libpq state machine fixed that issue as a side-effect.


I removed the server ssl_enable_alpn and libpq sslalpn options. The idea 
was that they could be useful for testing, but we didn't actually have 
any tests that would use them, and you get the same result by testing 
with an older server or client version. I'm open to adding them back if 
we also add tests that benefit from them, but they were pretty pointless 
as they were.


One important open item now is that we need to register a proper ALPN 
protocol ID with IANA.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-07 Thread Tom Lane
Heikki Linnakangas  writes:
> Committed this. Thank you to everyone involved!

Looks like perlcritic isn't too happy with the test code:
koel and crake say

./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - chmod at line 138, column 2.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)
./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - open at line 184, column 1.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)


regards, tom lane




Re: Experiments with Postgres and SSL

2024-04-07 Thread Heikki Linnakangas

On 08/04/2024 04:28, Tom Lane wrote:

Heikki Linnakangas  writes:

Committed this. Thank you to everyone involved!


Looks like perlcritic isn't too happy with the test code:
koel and crake say

./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - chmod at line 138, column 2.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)
./src/test/libpq_encryption/t/001_negotiate_encryption.pl: Return value of 
flagged function ignored - open at line 184, column 1.  See pages 208,278 of 
PBP.  ([InputOutput::RequireCheckedSyscalls] Severity: 5)


Fixed, thanks.

I'll make a note in my personal TODO list to add perlcritic to cirrus CI 
if possible. I rely heavily on that nowadays to catch issues before the 
buildfarm.


--
Heikki Linnakangas
Neon (https://neon.tech)





Re: Experiments with Postgres and SSL

2024-04-08 Thread Heikki Linnakangas

On 08/04/2024 04:25, Heikki Linnakangas wrote:

One important open item now is that we need to register a proper ALPN
protocol ID with IANA.


I sent a request for that: 
https://mailarchive.ietf.org/arch/msg/tls-reg-review/9LWPzQfOpbc8dTT7vc9ahNeNaiw/


--
Heikki Linnakangas
Neon (https://neon.tech)