Re: resumption problem

2020-03-27 Thread Viktor Dukhovni
On Fri, Mar 27, 2020 at 10:10:16PM +, Jeremy Harris wrote:

> >> A simple code addition to avoid that call in the client case sounds
> >> in order. 
> 
> Testing, it appears to work - I get resumption and not that error.
> And the Exim testsuite shows no regressions, at least on my laptop
> (which is Fedora 31, with 1.1.1d).

On a Fedora 31 system I also don't see those directives in the system
openssl.cnf or includes.  Mind you, closer inspection of the code
suggests that in the config file also "RequestCAPath" and "ClientCAPath"
would result in setting the bidirectional CA list.  But I don't find
those either.


> >>> Another possibility is that your system-wide openssl.cnf file has a
> >>> "RequestCAFile" or "ClientCAFile" setting.
> >>
> >> Neither appears to be present in /etc/pki/tls/openssl.cnf
> > 
> > And neither has any ".include" directives?

So my best guess is that you were testing with approximately a stock
1.1.1 that predates 1.1.1a, modulo security fixes.  Otherwise, it
is unclear how the client CA list (server -> client) ended up being
sent from client -> server.

-- 
Viktor.


Re: resumption problem

2020-03-27 Thread Jeremy Harris
On 27/03/2020 21:52, Viktor Dukhovni wrote:
> On Fri, Mar 27, 2020 at 09:25:28PM +, Jeremy Harris wrote:
> 
>>> If the distro started with 1.1.1 and only backported security fixes, you
>>> could be running an OpenSSL version with the unintentional bidirectional
>>> setting.
>>
>> .. either this, or even an unpatched basic 1.1.1 .
>>
>> A simple code addition to avoid that call in the client case sounds
>> in order. 

Testing, it appears to work - I get resumption and not that error.
And the Exim testsuite shows no regressions, at least on my laptop
(which is Fedora 31, with 1.1.1d).

>>> Another possibility is that your system-wide openssl.cnf file has a
>>> "RequestCAFile" or "ClientCAFile" setting.
>>
>> Neither appears to be present in /etc/pki/tls/openssl.cnf
> 
> And neither has any ".include" directives?

One, but that file doesn't have either string, either.
-- 
Cheers,
  Jeremy


Re: resumption problem

2020-03-27 Thread Viktor Dukhovni
On Fri, Mar 27, 2020 at 09:25:28PM +, Jeremy Harris wrote:

> > If the distro started with 1.1.1 and only backported security fixes, you
> > could be running an OpenSSL version with the unintentional bidirectional
> > setting.
> 
> .. either this, or even an unpatched basic 1.1.1 .
> 
> A simple code addition to avoid that call in the client case sounds
> in order.  Would the above likely explain the error I'm getting?

You could explicitly set the client CA list to an empty stack,
as a final step in initializing the SSL_CTX:

SSL_CTX_set0_CA_list(ctx, NULL);

> > Another possibility is that your system-wide openssl.cnf file has a
> > "RequestCAFile" or "ClientCAFile" setting.
> 
> Neither appears to be present in /etc/pki/tls/openssl.cnf

And neither has any ".include" directives?

-- 
Viktor.


Re: resumption problem

2020-03-27 Thread Jeremy Harris
On 27/03/2020 21:07, Viktor Dukhovni wrote:
> That function should only affect the server -> client direction.
> Briefly, in OpenSSL 1.1.1 it affected both the client and server
> directions, but this was fixed in OpenSSL 1.1.1a.

If Centos is following the same pattern in 8 as they did in 7,
they do list the letter when there is one; I have a 7 system
claiming "1.0.2k-fips".  So:

> If the distro started with 1.1.1 and only backported security fixes, you
> could be running an OpenSSL version with the unintentional bidirectional
> setting.

.. either this, or even an unpatched basic 1.1.1 .

A simple code addition to avoid that call in the client case sounds
in order.  Would the above likely explain the error I'm getting?


> Another possibility is that your system-wide openssl.cnf file has a
> "RequestCAFile" or "ClientCAFile" setting.

Neither appears to be present in /etc/pki/tls/openssl.cnf
-- 
Cheers,
  Jeremy


Re: resumption problem

2020-03-27 Thread Viktor Dukhovni
On Fri, Mar 27, 2020 at 08:20:55PM +, Jeremy Harris wrote:

> > Right, you're running out of space by trying to send too many
> > CA names.  It is better to have this fail, so you can figure
> > what is trying to dump your entire trusted CA list (of names)
> > to the server, than to actually have that silently "work".
> 
> It's /etc/pki/tls/certs/ca-bundle.crt (on Centos 8) and
> it's being loaded using SSL_CTX_set_client_CA_list()
> into the client, supposedly to be the acceptable CAs
> for the cert sent by the server.

That function should only affect the server -> client direction.
Briefly, in OpenSSL 1.1.1 it affected both the client and server
directions, but this was fixed in OpenSSL 1.1.1a.

If the distro started with 1.1.1 and only backported security fixes, you
could be running an OpenSSL version with the unintentional bidirectional
setting.

Another possibility is that your system-wide openssl.cnf file has a
"RequestCAFile" or "ClientCAFile" setting.  Both unfortunately set not
only the server->client direction, but also the client->server
direction.  This may be a bug (oversight), the second of these
should not touch the bidirectional list.

  "ssl/ssl_conf.c" line 500:

static int cmd_RequestCAFile(SSL_CONF_CTX *cctx, const char *value)
{
if (cctx->canames == NULL)
cctx->canames = sk_X509_NAME_new_null();
if (cctx->canames == NULL)
return 0;
return SSL_add_file_cert_subjects_to_stack(cctx->canames, value);
}

static int cmd_ClientCAFile(SSL_CONF_CTX *cctx, const char *value)
{
return cmd_RequestCAFile(cctx, value);
}

  "ssl/ssl_conf.c" line 883:

int SSL_CONF_CTX_finish(SSL_CONF_CTX *cctx)
{
...

if (cctx->canames) {
if (cctx->ssl)
SSL_set0_CA_list(cctx->ssl, cctx->canames);
else if (cctx->ctx)
SSL_CTX_set0_CA_list(cctx->ctx, cctx->canames);
else
sk_X509_NAME_pop_free(cctx->canames, X509_NAME_free);
cctx->canames = NULL;
}
return 1;
}

Can you confirm whether any such directive is present in your
system-wide openssl.cnf (possibly via .include)?

> >  Perhaps your "openssl.cnf" (Linux distro mistake?) causes the
> >  damage for all applications even without explicit code to that end
> >  in Exim?  Or you're calling something to make it happen.
> 
> The Exim config is saying to use that file, so I do have a workaround
> point there.

It is actually not easy to avoid using the system-wide config.
It is loaded by default during library initialization, and
you need custom initialization calls to avoid loading it.

> However, the question is: what is [1] the proper fix?  We want to
> accept "the usual, pretty big, set of well-known CAs".  Is that an
> incorrect way of telling OpenSSL that list?

It makes some sense to send a CA list to clients, but ideally one that
is not too large.  In Postfix I recommend a small or no CAfile (from
which the CAlist is initialized), and the bulk of the trust store
configured via a CApath (which is not used to send CA names to clients).

Most clients (real MTAs, not random Java apps) handle an empty CA list
just fine, and use the (at most) one client cert they have.

> And how come it only fails under resumption and when a client-cert is
> also used?

Here, I am not sure, but perhaps this is because the extension is only
valid with TLS 1.3, and on resumption we already know the server
protocol version.  On initial handshakes the protocol might be
TLS 1.2, and perhaps as a result the list is not sent???

> 1]  We also need to code to be compatible across a wide range
> of OpenSSL versions.  Minimising #ifdeffery is a factor.

Understood, I'm in the same boat with Postfix.

-- 
Viktor.


Re: resumption problem

2020-03-27 Thread Jeremy Harris
On 26/03/2020 00:58, Viktor Dukhovni wrote:
> On Thu, Mar 26, 2020 at 12:40:08AM +, Jeremy Harris wrote:
> 
>> Looks like I'm wrong, from the behaviour.
>>
>> It's the second of the possible places, and "i" is 129.
>> It appears to be failing the   WPACKET_sub_allocate_bytes_u16()
>> call.  %rsi before the call, which I think should be
>> the "namelen" arg, has value 172.
> 
> Right, you're running out of space by trying to send too many
> CA names.  It is better to have this fail, so you can figure
> what is trying to dump your entire trusted CA list (of names)
> to the server, than to actually have that silently "work".

It's /etc/pki/tls/certs/ca-bundle.crt (on Centos 8) and
it's being loaded using SSL_CTX_set_client_CA_list()
into the client, supposedly to be the acceptable CAs
for the cert sent by the server.

The load call doesn't have a return value to check. The
list presented had 133 entries.

>  Perhaps your
> "openssl.cnf" (Linux distro mistake?) causes the damage
> for all applications even without explicit code to that
> end in Exim?  Or you're calling something to make it happen.

The Exim config is saying to use that file, so I do have a
workaround point there.  However, the question is: what is [1] the
proper fix?  We want to accept "the usual, pretty big, set of
well-known CAs".  Is that an incorrect way of telling OpenSSL
that list?  And how come it only fails under resumption and
when a client-cert is also used?

1]  We also need to code to be compatible across a wide range
of OpenSSL versions.  Minimising #ifdeffery is a factor.
-- 
Cheers,
  Jeremy


Re: Certificate subject match validation

2020-03-27 Thread Viktor Dukhovni
On Fri, Mar 27, 2020 at 07:38:35PM +0200, George-Theodor Serbana wrote:

> I am writing a SSL/TLS client (using Boost.Beast but underlying it's using
> OpenSSL) and although I have set on the SSL context the 'verify_peer' flag,
> there is no verification to prove the server presents an X509 which
> contains in the Subject Alternative Names the hostname of that server.
> 
> As this is probably the dumbest type of attack someone could do (using a
> valid certificate with another domain name), I am thinking I'm doing
> something wrong. But from the documentation, I saw that using "verify_peer"
> should perform all the verifications...

It verifies the trust chain.  To also verify the peer name, you need to
specify the peer name via:

SSL_set1_host() 

> Now if not even this simple check is being done, how about expiration of
> the certificate, revocation status and other checks? Should they be
> performed manually as well?

No, that's what VERIFY_PEER is for.

> For now I am using X509_VERIFY_PARAM_set1_host with SSL_CTX_set1_param to
> do this specific check.

That's the slightly less convenient legacy API from OpenSSL 1.0.2.
In 1.1.0 and later, you can use SSL_set1_host() (and in some
cases also SSL_add1_host()).

See the SSL_set1_host(3) manpage for details.

-- 
Viktor.


Certificate subject match validation

2020-03-27 Thread George-Theodor Serbana
I am writing a SSL/TLS client (using Boost.Beast but underlying it's using
OpenSSL) and although I have set on the SSL context the 'verify_peer' flag,
there is no verification to prove the server presents an X509 which
contains in the Subject Alternative Names the hostname of that server.

As this is probably the dumbest type of attack someone could do (using a
valid certificate with another domain name), I am thinking I'm doing
something wrong. But from the documentation, I saw that using "verify_peer"
should perform all the verifications...

Now if not even this simple check is being done, how about expiration of
the certificate, revocation status and other checks? Should they be
performed manually as well?

For now I am using X509_VERIFY_PARAM_set1_host with SSL_CTX_set1_param to
do this specific check.

Best regards,
Theodor


Re: Explicit thread cleanup in OpenSSL 1.1.1 possible?

2020-03-27 Thread Stephan Mühlstrasser

Hello Michael,

Am 27.03.20 um 15:46 schrieb Michael Wojcik:
As a workaround, what about first making a JNI call to a trivial shared 
object that does an explicit dlopen of the OpenSSL shared object? The 
JVM wouldn't know about that load, and its subsequent unload of the 
shared object wouldn't remove it from the process address space, because 
there would still be a reference to it.


thanks for the suggestion. This sounds similar to the trick that is 
already built into OpenSSL.


Through other channels I found now two other possible solutions for the 
problem:


1) Use JNI_OnUnload() to call OPENSSL_cleanup()

JNI_OnUnload() is called when the JVM unloads the JNI library. 
OPENSSL_cleanup() cleans up everything and no further per-thread cleanup 
happens after unloading the JNI shared library.


2) Use a C++ static object in the JNI library where the destructor calls 
OPENSSL_cleanup()


When debugging the problem I came across this neat trick on stackoverflow:

https://stackoverflow.com/a/11394263

class ResourceHolder {
public:
ResourceHolder() {
// at start
}

~ResourceHolder() {
// at exit
}
};

ResourceHolder theHolder;

I think that putting a call to OPENSSL_cleanup() into the destructor of 
the ResourceHolder class would also fix the problem, and it would be 
Java-agnostic.


Approach 1) does fix the crash for me. If approach 2) works as well, I 
will use that one.


--
Stephan



Re: Explicit thread cleanup in OpenSSL 1.1.1 possible?

2020-03-27 Thread Michael Wojcik
As a workaround, what about first making a JNI call to a trivial shared object 
that does an explicit dlopen of the OpenSSL shared object? The JVM wouldn't 
know about that load, and its subsequent unload of the shared object wouldn't 
remove it from the process address space, because there would still be a 
reference to it.


Explicit thread cleanup in OpenSSL 1.1.1 possible?

2020-03-27 Thread Stephan Mühlstrasser

Hi,

with OpenSSL 1.1.1 it is possible to turn off the automatic cleanup with 
an atexit() handler by passing the flag OPENSSL_INIT_NO_ATEXIT to 
OPENSSL_init_crypto().


Is it possible to configure this also at the thread level, so that no 
automatic thread cleanup occurs, with the option to do an explicit 
per-thread cleanup? I looked at documentation and source code of OpenSSL 
1.1.1 and this seems not to be possible, but I wanted to ask nevertheless.


Background: We are embedding OpenSSL in a Java JNI library, and one 
particular JVM (IBM J9 on z/OS) loads our Java class and the JNI library 
in a dedicated thread. At the end of the program the JVM then unloads 
the the Java class and the JNI library before the thread terminates, and 
when the OpenSSL thread cleanup functions are called the JVM crashes 
because the shared library is gone from the address space. This happens 
because the pinning of the OpenSSL shared library does not work on z/OS.


--
Stephan