Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Tomas Vondra Mon, 08 Jul 2019 12:30:42 -0700

On Mon, Jul 08, 2019 at 02:39:44PM -0400, Stephen Frost wrote:

Greetings,


* Bruce Momjian (br...@momjian.us) wrote:

On Mon, Jul  8, 2019 at 11:47:33AM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > On Mon, Jul  8, 2019 at 11:18:01AM -0400, Joe Conway wrote:
> > > On 7/8/19 10:19 AM, Bruce Momjian wrote:
> > > > When people are asking for multiple keys (not just for key rotation),
> > > > they are asking to have multiple keys that can be supplied by users only
> > > > when they need to access the data.  Yes, the keys are always in the
> > > > datbase, but the feature request is that they are only unlocked when the
> > > > user needs to access the data.  Obviously, that will not work for
> > > > autovacuum when the encryption is at the block level.
> > >
> > > > If the key is always unlocked, there is questionable security value of
> > > > having multiple keys, beyond key rotation.
> > >
> > > That is not true. Having multiple keys also allows you to reduce the
> > > amount of data encrypted with a single key, which is desirable because:
> > >
> > > 1. It makes cryptanalysis more difficult
> > > 2. Puts less data at risk if someone gets "lucky" in doing brute force
> >
> > What systems use multiple keys like that?  I know of no website that
> > does that.  Your arguments seem hypothetical.  What is your goal here?
>
> Not sure what the reference to 'website' is here, but one doesn't get
> certificates for TLS/SSL usage that aren't time-bounded, and when it
> comes to the actual on-the-wire encryption that's used, that's a
> symmetric key that's generated on-the-fly for every connection.
>
> Wouldn't the fact that they generate a different key for every
> connection be a pretty clear indication that it's a good idea to use
> multiple keys and not use the same key over and over..?
>
> Of course, we can discuss if what websites do with over-the-wire
> encryption is sensible to compare to what we want to do in PG for
> data-at-rest, but then we shouldn't be talking about what websites do,
> it'd make more sense to look at other data-at-rest encryption systems
> and consider what they're doing.

(I talked to Joe on chat for clarity.)  In modern TLS, the certificate is
used only for authentication, and Diffie–Hellman is used for key
exchange:

        https://en.wikipedia.org/wiki/Diffie%E2%80%93Hellman_key_exchange


Right, and the key that's figured out for each connection is at least
specific to the server AND client keys/certificates, thus meaning that
they're changed at least as frequently as those change (and clients end
up creating ones on the fly randomly if they don't have one, iirc).

So, the question is whether you can pass so much data in TLS that using
the same key for the entire session is a security issue.  TLS originally
had key renegotiation, but that was removed in TLS 1.3:

        
https://www.cloudinsidr.com/content/known-attack-vectors-against-tls-implementation-vulnerabilities/
        To mitigate these types of attacks, TLS 1.3 disallows renegotiation.


It was removed due to attacks targeting the renegotiation, not because
doing re-keying by itself was a bad idea, or because using multiple keys
was seen as a bad idea.

Of course, a database is going to process even more data so if the
amount of data encrypted is a problem, we might have a problem too in
using a single key.  This is not related to whether we use one key for
the entire cluster or multiple keys per tablespace --- the problem is
the same.  I guess we could create 1024 keys and use the bottom bits of
the block number to decide what key to use.  However, that still only
pushes the goalposts farther.


All of this is about pushing the goalposts farther away, as I see it.
There's going to be trade-offs here and there isn't going to be any "one
right answer" when it comes to this space.  That's why I'm inclined to
argue that we should try to come up with a relatively *good* solution
that doesn't create a huge amount of work for us, and then build on
that.  To that end, leveraging metadata that we already have outside of
the catalogs (databases, tablespaces, potentially other information that
we store, essentially, in the filesystem metadata already) to decide on
what key to use, and how many we can support, strikes me as a good
initial target.

Anyway, I will to research the reasonable data size that can be secured
with a single key via AES.  I will look at how PGP encrypts large files
too.


This seems unlikely to lead to a definitive result, but it would be
interesting to hear if there have been studies around that and what
their conclusions were.

When it comes to concerns about autovacuum or other system processes,
those don't have any direct user connections or interactions, so having
them be more privileged and having access to more is reasonable.


I think Bruce's proposal was to minimize the time the key is "unlocked"
in memory by only unlocking them when the user connects and supplies
some sort of secret (passphrase), and remove them from memory when the
user disconnects. So there's no way for the auxiliary processes to gain
access to those keys, because only the user knows the secret.

FWIW I have doubts this scheme actually measurably improves privacy in
practice, because most busy applications will end up having the keys in
the memory all the time anyway.

It also assumes memory is unsafe, i.e. bad actors can read it, and
that's probably a valid concern (root access, vulnerabilities etc.). But
in that case we already have plenty of issues with data in flight
anyway, and I doubt TDE is an answer to that.

Ideally, all of this would leverage a vaulting system or other mechanism
which manages access to the keys and allows their usage to be limited.
That's been generally accepted as a good way to bridge the gap between
having to ask users every time for a key and having keys stored
long-term in memory.


Right. I agree with this.

Having *only* the keys for the data which the
currently connected user is allowed to access would certainly be a great
initial capability, even if system processes (including potentially WAL
replay) have to have access to all of the keys.  And yes, shared buffers
being unencrypted and accessible by every backend continues to be an
issue- it'd be great to improve on that situation too.  I don't think
having everything encrypted in shared buffers is likely the solution,
rather, segregating it up might make more sense, again, along similar
lines to keys and using metadata that's outside of the catalogs, which
has been discussed previously, though I don't think anyone's actively
working on it.


I very much doubt TDE is a solution to this. Essentially, TDE is a good
data-at-rest solution, but this seems more like protecting data during
execution. And in that case I think we may need an entirely different
encryption scheme.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com

PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Reply via email to