Re: Moving forward with TDE [PATCH v3]

2024-02-01 Thread vignesh C
On Mon, 22 Jan 2024 at 11:47, Peter Smith  wrote:
>
> 2024-01 Commitfest.
>
> Hi, This patch has a CF status of "Needs Review" [1], but it seems
> there were CFbot test failures last time it was run [2]. Please have a
> look and post an updated version if necessary.

The patch which you submitted has been awaiting your attention for
quite some time now.  As such, we have moved it to "Returned with
Feedback" and removed it from the reviewing queue. Depending on
timing, this may be reversible.  Kindly address the feedback you have
received, and resubmit the patch to the next CommitFest.

Regards,
Vignesh




Re: Moving forward with TDE [PATCH v3]

2024-01-21 Thread Peter Smith
2024-01 Commitfest.

Hi, This patch has a CF status of "Needs Review" [1], but it seems
there were CFbot test failures last time it was run [2]. Please have a
look and post an updated version if necessary.

==
[1] https://commitfest.postgresql.org/46/3985/
[2] https://cirrus-ci.com/task/5498215743619072

Kind Regards,
Peter Smith.




Re: Moving forward with TDE [PATCH v3]

2023-11-08 Thread David Christensen
On Tue, Nov 7, 2023 at 5:49 PM Andres Freund  wrote:

> Hi,
>
> On 2023-11-06 09:56:37 -0500, Stephen Frost wrote:
> > * Andres Freund (and...@anarazel.de) wrote:
> > > I still am quite quite unconvinced that using the LSN as a nonce is a
> good
> > > design decision.
> >
> > This is a really important part of the overall path to moving this
> > forward, so I wanted to jump to it and have a specific discussion
> > around this.  I agree that there are downsides to using the LSN, some of
> > which we could possibly address (eg: include the timeline ID in the IV),
> > but others that would be harder to deal with.
>
> > The question then is- what's the alternative?
> >
> > One approach would be to change the page format to include space for an
> > explicit nonce.  I don't see the community accepting such a new page
> > format as the only format we support though as that would mean no
> > pg_upgrade support along with wasted space if TDE isn't being used.
>
> Right.
>

Hmm, if we /were/ to introduce some sort of page format change, Couldn't
that be a use case for modifying the pd_version field?  Could v4 pages be
read in and written out as v5 pages with different interpretations?


> > Ideally, we'd instead be able to support multiple page formats where
> > users could decide when they create their cluster what features they
> > want- and luckily we even have such an effort underway with patches
> > posted for review [1].
>
> I think there are some details wrong with that patch - IMO the existing
> macros
> should just continue to work as-is and instead the places that want the
> more
> narrow definition should be moved to the new macros and it changes places
> that
> should continue to use compile time constants - but it doesn't seem like a
> fundamentally bad idea to me.  I certainly like it much better than making
> the
> page size runtime configurable.
>

There had been some discussion about this WRT renaming macros and the like
(constants, etc)—I think a new pass eliminating the variable blocksize
pieces and seeing if we can minimize churn here is worthwhile, will take a
look and see what the minimally-viable set of changes is here.


> (I'll try to reply with the above points to [1])
>
>
> > Certainly, with the base page-special-feature patch, we could have an
> option
> > for users to choose that they want a better nonce than the LSN, or we
> could
> > bundle that assumption in with, say, the authenticated-encryption feature
> > (if you want authenticated encryption, then you want more from the
> > encryption system than the basics, and therefore we presume you also
> want a
> > better nonce than the LSN).
>
> I don't think we should support using the LSN as a nonce if we have an
> alternative. The cost and complexity overhead is just not worth it.  Yes,
> it'll be harder for users to migrate to encryption, but adding complexity
> elsewhere in the system to get an inferior result isn't worth it.
>

>From my read, XTS (which I'd see as inferior to authenticated encryption,
but better than some other options) could use LSN as an IV without leakage
concerns, perhaps mixing in the BlockNumber as well.  If we are going to
allow multiple encryption types, I think we may need to consider that needs
for IVs may be different, so this may need to be something that is
selectable per encryption type.

I am unclear how much of a requirement this is, but seems like having a
design supporting this to be pluggable—even if a static lookup table
internally for encryption type, block length, IV source, etc—seems the most
future proof if we had to retire an encryption method or prevent creation
of specific methods, say.


> > Another approach would be a separate fork, but that then has a number of
> > downsides too- every write has to touch that too, and a single page of
> > nonces would cover a pretty large number of pages also.
>
> Yea, the costs of doing so is nontrivial. If you were trying to implement
> encryption on the smgr level - which I doubt you should but am not certain
> about - my suggestion would be to interleave pages with metadata like
> nonces
> and AEAD with the data pages. I.e. one metadata page would be followed by
>   (BLCKSZ - SizeOfPageHeaderData) / (sizeof(nonce) + sizeof(AEAD))
> pages containing actual relation data.  That way you still get decent
> locality
> during scans and writes.
>

Hmm, this is actually an interesting idea, I will think about this a bit.


> Relation forks were a mistake, we shouldn't use them in more places.
>
>
> I think it'd be much better if we also encrypted forks, rather than just
> the
> main fork...
>

I believe the existing code should just work by modifying
the PageNeedsToBeEncrypted macro; I will test that and see if anything
blows up.

David


Re: Moving forward with TDE [PATCH v3]

2023-11-08 Thread David Christensen
On Tue, Nov 7, 2023 at 6:47 PM Andres Freund  wrote:

> Hi,
>
> On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> > On Sat, 4 Nov 2023 at 03:38, Andres Freund  wrote:
> > > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > > I'm quite surprised at the significant number of changes being made
> > > > outside the core storage manager files. I thought that changing out
> > > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > > would be the most obvious change to implement cluster-wide encryption
> > > > with the least code touched, as relations don't need to know whether
> > > > the files they're writing are encrypted, right? Is there a reason to
> > > > not implement this at the smgr level that I overlooked in the
> > > > documentation of these patches?
> > >
> > > You can't really implement encryption transparently inside an smgr
> without
> > > significant downsides. You need a way to store an initialization vector
> > > associated with the page (or you can store that elsewhere, but then
> you've
> > > doubled the worst cse amount of random reads/writes). The patch uses
> the LSN
> > > as the IV (which I doubt is a good idea). For authenticated encryption
> further
> > > additional storage space is required.
> >
> > I am unaware of any user of the smgr API that doesn't also use the
> > buffer cache, and thus implicitly the Page layout with PageHeader
> > [^1]
>
> Everything indeed uses a PageHeader - but there are a number of places
> that do
> *not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes
> that
> those fields are zero - and changing that wouldn't be trivial / free,
> because
> we do a lot of bitmasking/shifting with constants derived from
>
> #define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))
>
> which obviously wouldn't be constant anymore if you could reserve space on
> the
> page.
>

While not constants, I was able to get this working with variable values
here in a way that did not have the overhead of the original patch for the
vismap specifically using Montgomery Multiplication for division/mod. This
was actually the heaviest of the changes from moving to runtime-calculated,
so we might be able to use this approach in this specific case even if only
this change is required for this specific fork.


> > The API of smgr is also tailored to page-sized quanta of data
> > with mostly relation-level information. I don't see why there would be
> > a veil covering the layout of Page for smgr when all other information
> > already points to the use of PageHeader and Page layouts. In my view,
> > it would even make sense to allow the smgr to get exclusive access to
> > some part of the page in the current Page layout.
> >
> > Yes, I agree that there will be an impact on usable page size if you
> > want authenticated encryption, and that AMs will indeed need to
> > account for storage space now being used by the smgr - inconvenient,
> > but it serves a purpose. That would happen regardless of whether smgr
> > or some higher system decides where to store the data for encryption -
> > as long as it is on the page, the AM effectively can't use those
> > bytes.
> > But I'd say that's best solved by making the Page documentation and
> > PageInit API explicit about the potential use of that space by the
> > chosen storage method (encrypted, plain, ...) instead of requiring the
> > various AMs to manually consider encryption when using Postgres' APIs
> > for writing data to disk without hitting shared buffers; page space
> > management is already a task of AMs, but handling the actual
> > encryption is not.
>
> I don't particularly disagree with any detail here - but to me reserving
> space
> for nonces etc at PageInit() time pretty much is the opposite of handling
> encryption inside smgr.
>

Originally, I was anticipating that we might want different space amounts
reserved on different classes of pages (apart from encryption), so while
we'd be storing the default page reserved size in pg_control we'd not be
limited to this in the structure of the page calls.  We could presumably
just move the logic into PageInit() itself if every reserved allocation is
the same and individual call sites wouldn't need to know about it.  The
call sites do have more context as to the requirements of the page or the
"type" of page in play, which if we made it dependent on page type would
need to get passed in somehow, which was where the reserved_page_size
parameter came in to the current patch.

>
> > Should the AM really care whether the data on disk is encrypted or
> > not? I don't think so. When the disk contains encrypted bytes, but
> > smgrread() and smgrwrite() both produce and accept plaintext data,
> > who's going to complain? Requiring AMs to be mindful about encryption
> > on all common paths only adds pitfalls where encryption would be
> > forgotten by the developer of AMs in one path or another.
>
> I agree with that - I think the way the 

Re: Moving forward with TDE [PATCH v3]

2023-11-07 Thread Andres Freund
Hi,

On 2023-11-06 11:26:44 +0100, Matthias van de Meent wrote:
> On Sat, 4 Nov 2023 at 03:38, Andres Freund  wrote:
> > On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > > I'm quite surprised at the significant number of changes being made
> > > outside the core storage manager files. I thought that changing out
> > > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > > would be the most obvious change to implement cluster-wide encryption
> > > with the least code touched, as relations don't need to know whether
> > > the files they're writing are encrypted, right? Is there a reason to
> > > not implement this at the smgr level that I overlooked in the
> > > documentation of these patches?
> >
> > You can't really implement encryption transparently inside an smgr without
> > significant downsides. You need a way to store an initialization vector
> > associated with the page (or you can store that elsewhere, but then you've
> > doubled the worst cse amount of random reads/writes). The patch uses the LSN
> > as the IV (which I doubt is a good idea). For authenticated encryption 
> > further
> > additional storage space is required.
> 
> I am unaware of any user of the smgr API that doesn't also use the
> buffer cache, and thus implicitly the Page layout with PageHeader
> [^1]

Everything indeed uses a PageHeader - but there are a number of places that do
*not* utilize pd_lower/upper/special. E.g. visibilitymap.c just assumes that
those fields are zero - and changing that wouldn't be trivial / free, because
we do a lot of bitmasking/shifting with constants derived from

#define MAPSIZE (BLCKSZ - MAXALIGN(SizeOfPageHeaderData))

which obviously wouldn't be constant anymore if you could reserve space on the
page.


> The API of smgr is also tailored to page-sized quanta of data
> with mostly relation-level information. I don't see why there would be
> a veil covering the layout of Page for smgr when all other information
> already points to the use of PageHeader and Page layouts. In my view,
> it would even make sense to allow the smgr to get exclusive access to
> some part of the page in the current Page layout.
> 
> Yes, I agree that there will be an impact on usable page size if you
> want authenticated encryption, and that AMs will indeed need to
> account for storage space now being used by the smgr - inconvenient,
> but it serves a purpose. That would happen regardless of whether smgr
> or some higher system decides where to store the data for encryption -
> as long as it is on the page, the AM effectively can't use those
> bytes.
> But I'd say that's best solved by making the Page documentation and
> PageInit API explicit about the potential use of that space by the
> chosen storage method (encrypted, plain, ...) instead of requiring the
> various AMs to manually consider encryption when using Postgres' APIs
> for writing data to disk without hitting shared buffers; page space
> management is already a task of AMs, but handling the actual
> encryption is not.

I don't particularly disagree with any detail here - but to me reserving space
for nonces etc at PageInit() time pretty much is the opposite of handling
encryption inside smgr.


> Should the AM really care whether the data on disk is encrypted or
> not? I don't think so. When the disk contains encrypted bytes, but
> smgrread() and smgrwrite() both produce and accept plaintext data,
> who's going to complain? Requiring AMs to be mindful about encryption
> on all common paths only adds pitfalls where encryption would be
> forgotten by the developer of AMs in one path or another.

I agree with that - I think the way the patch currently is designed is not
right.


There's other stuff you can't trivially do at the smgr level. E.g. if
checksums or encryption is enabled, you need to copy the buffer to compute
checksums / do IO if in shared buffers, because somebody could set a hint bit
even with just a shared content lock. But you don't need that when coming from
private buffers during index builds.



> I think that getting PageInit to allocate the smgr-specific area would
> take some effort, too (which would potentially require adding some
> relational context to PageInit, so that it knows which page of which
> relation it is going to initialize), but IMHO that would be more
> natural than requiring all index and table AMs to be aware the actual
> encryption of its pages and require manual handling of that encryption
> when the page needs to be written to disk, when it otherwise already
> conforms to the various buffer management and file extension APIs
> currently in use in PostgreSQL. I would expect "transparent" data
> encryption to be handled at the file write layer (i.e. smgr), not
> inside the AMs.

As mentioned above - I agree that the relevant code shouldn't be in index
AMs. But I somewhat doubt that smgr is the right level either. For one, the
place computing checksums needs awareness of locking / sharing 

Re: Moving forward with TDE [PATCH v3]

2023-11-07 Thread Andres Freund
Hi,

On 2023-11-06 09:56:37 -0500, Stephen Frost wrote:
> * Andres Freund (and...@anarazel.de) wrote:
> > I still am quite quite unconvinced that using the LSN as a nonce is a good
> > design decision.
> 
> This is a really important part of the overall path to moving this
> forward, so I wanted to jump to it and have a specific discussion
> around this.  I agree that there are downsides to using the LSN, some of
> which we could possibly address (eg: include the timeline ID in the IV),
> but others that would be harder to deal with.

> The question then is- what's the alternative?
> 
> One approach would be to change the page format to include space for an
> explicit nonce.  I don't see the community accepting such a new page
> format as the only format we support though as that would mean no
> pg_upgrade support along with wasted space if TDE isn't being used.

Right.


> Ideally, we'd instead be able to support multiple page formats where
> users could decide when they create their cluster what features they
> want- and luckily we even have such an effort underway with patches
> posted for review [1].

I think there are some details wrong with that patch - IMO the existing macros
should just continue to work as-is and instead the places that want the more
narrow definition should be moved to the new macros and it changes places that
should continue to use compile time constants - but it doesn't seem like a
fundamentally bad idea to me.  I certainly like it much better than making the
page size runtime configurable.

(I'll try to reply with the above points to [1])


> Certainly, with the base page-special-feature patch, we could have an option
> for users to choose that they want a better nonce than the LSN, or we could
> bundle that assumption in with, say, the authenticated-encryption feature
> (if you want authenticated encryption, then you want more from the
> encryption system than the basics, and therefore we presume you also want a
> better nonce than the LSN).

I don't think we should support using the LSN as a nonce if we have an
alternative. The cost and complexity overhead is just not worth it.  Yes,
it'll be harder for users to migrate to encryption, but adding complexity
elsewhere in the system to get an inferior result isn't worth it.


> Another approach would be a separate fork, but that then has a number of
> downsides too- every write has to touch that too, and a single page of
> nonces would cover a pretty large number of pages also.

Yea, the costs of doing so is nontrivial. If you were trying to implement
encryption on the smgr level - which I doubt you should but am not certain
about - my suggestion would be to interleave pages with metadata like nonces
and AEAD with the data pages. I.e. one metadata page would be followed by
  (BLCKSZ - SizeOfPageHeaderData) / (sizeof(nonce) + sizeof(AEAD))
pages containing actual relation data.  That way you still get decent locality
during scans and writes.

Relation forks were a mistake, we shouldn't use them in more places.


I think it'd be much better if we also encrypted forks, rather than just the
main fork...

Greetings,

Andres Freund




Re: Moving forward with TDE [PATCH v3]

2023-11-07 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Mon, Nov  6, 2023 at 09:56:37AM -0500, Stephen Frost wrote:
> > The gist is, without a suggestion of things to try, we're left
> > to our own devices to try and figure out things which might be
> > successful, only to have those turned down too when we come back with
> > them, see [1] for what feels like an example of this.  Given your
> > feedback overall, which I'm very thankful for, I'm hopeful that you see
> > that this is, indeed, a useful feature that people are asking for and
> > therefore are willing to spend some time on it, but if the feedback is
> > that nothing on the page is acceptable to use for the nonce, we can't
> > put the nonce somewhere else, and we can't change the page format, then
> > everything else is just moving deck chairs around on the titanic that
> > has been this effort.
> 
> Yeah, I know the feeling, though I thought XTS was immune enough to
> nonce/LSN reuse that it was acceptable.

Ultimately it depends on the attack vector you're trying to address, but
generally, I think you're right about the XTS tweak reuse not being that
big of a deal.  XTS isn't CTR or GCM.

With FDE (full disk encryption) you're expecting the attacker to steal
the physical laptop, hard drive, etc, generally, and so the downside of
using the same tweak with XTS over and over again isn't that bad (and is
what most FDE solutions do, aiui, by simply using the sector number; we
could do something similar to that by using the relfilenode + block
number) because that re-use is a problem if the attacker is able to see
multiple copies of the same block over time where the block has been
encrypted with different data but the same key and tweak.

Using the LSN actually is better than what the FDE solutions do because
the LSN varies much more often than the sector number.  Sure, it doesn't
change with every write and maybe an attacker could glean something from
that, but that's quite narrow.  The downside from the LSN based approach
with XTS is probably more that using the LSN means that we can't encrypt
the LSN itself and that is a leak too- but then again, we leak that
through the simple WAL filenames too, to some extent, so it doesn't
strike me as a huge issue.

XTS as a block cipher doesn't suffer from the IV-reuse issue that you
have with streaming ciphers where the same key+IV and different data
leads to being able to trivally retrive the plaintext though and I worry
that maybe that's what people were thinking.

The README and comments I don't think were terribly clear on this and I
think may have even been from back when CTR was being considered, where
IV reuse would have resulted in plaintext being trivially available.

> What got me sunk on the feature was the complexity of adding temporary
> file encryption support and that tipped the scales in the negative for
> me in community value of the feature vs. added complexity. (Yeah, I used
> a Titanic reference in the last sentence. ;-) )  However, I am open to
> the community value and complexity values changing over time.  My blog
> post on the topic:

We do need to address the temporary file situation too and we do have a
bit of an issue that how we deal with temporary files today in PG isn't
very consistent and there's too many ways to do that.  There's a patch
that works on that, though it has some bitrot that we're working on
addressing currently.

There is value in simply fixing that situation wrt temporary file
management independent of encryption, though of course then encryption
of those temporary files becomes much simpler.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread David Christensen
On Fri, Nov 3, 2023 at 9:53 PM Andres Freund  wrote:

> On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > > From: David Christensen 
> > > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> > >
> > > When using an encrypted cluster, we need to ensure that the WAL is also
> > > encrypted. While we could go with an page-based approach, we use
> instead a
> > > per-record approach, using GCM for the encryption method and storing
> the AuthTag
> > > in the xl_crc field.
>
> What was the reason for this decision?
>

This was mainly to prevent IV reuse by using a per-record encryption rather
than per-page, since partial writes out on the WAL buffer would result in
reuse there.  This was somewhat of an experiment since authenticated data
per record was basically equivalent in function to the CRC.

There was a switch here so normal clusters use the crc field with the
existing CRC implementation, only encrypted clusters use this alternate
approach.


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread David Christensen
Hi, thanks for the detailed feedback here.

I do think it's worth addressing the question Stephen raised as far as what
we use for the IV[1]; whether LSN or something else entirely, and if so
what.  The choice of LSN here is fairly fundamental to the existing
implementation, so if we decide to do something different some of this
might be moot.

Best,

David

[1]
https://www.mail-archive.com/pgsql-hackers@lists.postgresql.org/msg154613.html


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Bruce Momjian
On Mon, Nov  6, 2023 at 09:56:37AM -0500, Stephen Frost wrote:
> The gist is, without a suggestion of things to try, we're left
> to our own devices to try and figure out things which might be
> successful, only to have those turned down too when we come back with
> them, see [1] for what feels like an example of this.  Given your
> feedback overall, which I'm very thankful for, I'm hopeful that you see
> that this is, indeed, a useful feature that people are asking for and
> therefore are willing to spend some time on it, but if the feedback is
> that nothing on the page is acceptable to use for the nonce, we can't
> put the nonce somewhere else, and we can't change the page format, then
> everything else is just moving deck chairs around on the titanic that
> has been this effort.

Yeah, I know the feeling, though I thought XTS was immune enough to
nonce/LSN reuse that it was acceptable.

What got me sunk on the feature was the complexity of adding temporary
file encryption support and that tipped the scales in the negative for
me in community value of the feature vs. added complexity. (Yeah, I used
a Titanic reference in the last sentence. ;-) )  However, I am open to
the community value and complexity values changing over time.  My blog
post on the topic:

https://momjian.us/main/blogs/pgblog/2023.html#October_19_2023

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Bruce Momjian
On Thu, Nov  2, 2023 at 07:32:28PM -0700, Andres Freund wrote:
> On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> > +Implementation
> > +--
> > +
> > +To enable cluster file encryption, the initdb option
> > +--cluster-key-command must be used, which specifies a command to
> > +retrieve the KEK.
> 
> FWIW, I think "cluster file encryption" is somewhat ambiguous. That could also
> mean encrypting on the file system level or such.

We could call it:

*  cluster data file encryption
*  cluster data encryption
*  database cluster encryption

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Stephen Frost
Greetings,

Thanks for your feedback on this.

* Andres Freund (and...@anarazel.de) wrote:
> I still am quite quite unconvinced that using the LSN as a nonce is a good
> design decision.

This is a really important part of the overall path to moving this
forward, so I wanted to jump to it and have a specific discussion
around this.  I agree that there are downsides to using the LSN, some of
which we could possibly address (eg: include the timeline ID in the IV),
but others that would be harder to deal with.

The question then is- what's the alternative?

One approach would be to change the page format to include space for an
explicit nonce.  I don't see the community accepting such a new page
format as the only format we support though as that would mean no
pg_upgrade support along with wasted space if TDE isn't being used.
Ideally, we'd instead be able to support multiple page formats where
users could decide when they create their cluster what features they
want- and luckily we even have such an effort underway with patches
posted for review [1].  Certainly, with the base page-special-feature
patch, we could have an option for users to choose that they want a
better nonce than the LSN, or we could bundle that assumption in with,
say, the authenticated-encryption feature (if you want authenticated
encryption, then you want more from the encryption system than the
basics, and therefore we presume you also want a better nonce than the
LSN).

Another approach would be a separate fork, but that then has a number of
downsides too- every write has to touch that too, and a single page of
nonces would cover a pretty large number of pages also.

Ultimately, I agree with you that the LSN isn't perfect and we really
shouldn't be calling it 'ideal' as it isn't, and we can work to fix that
language in the patch, but the lack of any alternative being proposed
that might be acceptable makes this feel a bit like rock management [2].

My apologies if there's something good that's already been specifically
pushed and I just missed it; if so, a link to that suggestion and
discussion would be greatly appreciated.

Thanks again!

Stephen

[1]: https://commitfest.postgresql.org/45/3986/
[2]: https://en.wikipedia.org/wiki/Wikipedia:Bring_me_a_rock ; though
that isn't great for a quick summary (which I tried to find on an
isolated page somewhere and didn't).

The gist is, without a suggestion of things to try, we're left
to our own devices to try and figure out things which might be
successful, only to have those turned down too when we come back with
them, see [1] for what feels like an example of this.  Given your
feedback overall, which I'm very thankful for, I'm hopeful that you see
that this is, indeed, a useful feature that people are asking for and
therefore are willing to spend some time on it, but if the feedback is
that nothing on the page is acceptable to use for the nonce, we can't
put the nonce somewhere else, and we can't change the page format, then
everything else is just moving deck chairs around on the titanic that
has been this effort.


signature.asc
Description: PGP signature


Re: Moving forward with TDE [PATCH v3]

2023-11-06 Thread Matthias van de Meent
On Sat, 4 Nov 2023 at 03:38, Andres Freund  wrote:
>
> Hi,
>
> On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> > I'm quite surprised at the significant number of changes being made
> > outside the core storage manager files. I thought that changing out
> > mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> > would be the most obvious change to implement cluster-wide encryption
> > with the least code touched, as relations don't need to know whether
> > the files they're writing are encrypted, right? Is there a reason to
> > not implement this at the smgr level that I overlooked in the
> > documentation of these patches?
>
> You can't really implement encryption transparently inside an smgr without
> significant downsides. You need a way to store an initialization vector
> associated with the page (or you can store that elsewhere, but then you've
> doubled the worst cse amount of random reads/writes). The patch uses the LSN
> as the IV (which I doubt is a good idea). For authenticated encryption further
> additional storage space is required.

I am unaware of any user of the smgr API that doesn't also use the
buffer cache, and thus implicitly the Page layout with PageHeader
[^1]. The API of smgr is also tailored to page-sized quanta of data
with mostly relation-level information. I don't see why there would be
a veil covering the layout of Page for smgr when all other information
already points to the use of PageHeader and Page layouts. In my view,
it would even make sense to allow the smgr to get exclusive access to
some part of the page in the current Page layout.

Yes, I agree that there will be an impact on usable page size if you
want authenticated encryption, and that AMs will indeed need to
account for storage space now being used by the smgr - inconvenient,
but it serves a purpose. That would happen regardless of whether smgr
or some higher system decides where to store the data for encryption -
as long as it is on the page, the AM effectively can't use those
bytes.
But I'd say that's best solved by making the Page documentation and
PageInit API explicit about the potential use of that space by the
chosen storage method (encrypted, plain, ...) instead of requiring the
various AMs to manually consider encryption when using Postgres' APIs
for writing data to disk without hitting shared buffers; page space
management is already a task of AMs, but handling the actual
encryption is not.

Should the AM really care whether the data on disk is encrypted or
not? I don't think so. When the disk contains encrypted bytes, but
smgrread() and smgrwrite() both produce and accept plaintext data,
who's going to complain? Requiring AMs to be mindful about encryption
on all common paths only adds pitfalls where encryption would be
forgotten by the developer of AMs in one path or another.

> To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
> increases in additional situations (a more aggressive version of
> wal_log_hint_bits) - which can't be done below smgr, where we don't know about
> what WAL logging was done. Nor can you easily just add space on the page below
> md.c, for the purpose of storing an LSN independent IV and the authentication
> data.

I think that getting PageInit to allocate the smgr-specific area would
take some effort, too (which would potentially require adding some
relational context to PageInit, so that it knows which page of which
relation it is going to initialize), but IMHO that would be more
natural than requiring all index and table AMs to be aware the actual
encryption of its pages and require manual handling of that encryption
when the page needs to be written to disk, when it otherwise already
conforms to the various buffer management and file extension APIs
currently in use in PostgreSQL. I would expect "transparent" data
encryption to be handled at the file write layer (i.e. smgr), not
inside the AMs.

Kind regards,

Matthias van de Meent

[^1] ReadBuffer_common uses PageIsVerifiedExtended which verifies that
a page conforms with Postgres' Page layout if checksums are enabled.
Furthermore, all builtin index AMs utilize pd_special, further
implying the use of a PageInit/PageHeader-based page layout.
Additionally, the heap tableAM also complies, and both FSM and VM also
use postgres' Page layout.
As for other AMs that I could check: bloom, rum, and pgvector's
ivfflat and hnsw all use page layouts.




Re: Moving forward with TDE [PATCH v3]

2023-11-03 Thread Andres Freund
On 2023-11-02 19:32:28 -0700, Andres Freund wrote:
> > From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> > From: David Christensen 
> > Date: Fri, 29 Sep 2023 15:16:00 -0400
> > Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
> >
> > When using an encrypted cluster, we need to ensure that the WAL is also
> > encrypted. While we could go with an page-based approach, we use instead a
> > per-record approach, using GCM for the encryption method and storing the 
> > AuthTag
> > in the xl_crc field.

What was the reason for this decision?

?




Re: Moving forward with TDE [PATCH v3]

2023-11-03 Thread Andres Freund
Hi,

On 2023-11-02 22:09:40 +0100, Matthias van de Meent wrote:
> I'm quite surprised at the significant number of changes being made
> outside the core storage manager files. I thought that changing out
> mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
> would be the most obvious change to implement cluster-wide encryption
> with the least code touched, as relations don't need to know whether
> the files they're writing are encrypted, right? Is there a reason to
> not implement this at the smgr level that I overlooked in the
> documentation of these patches?

You can't really implement encryption transparently inside an smgr without
significant downsides. You need a way to store an initialization vector
associated with the page (or you can store that elsewhere, but then you've
doubled the worst cse amount of random reads/writes). The patch uses the LSN
as the IV (which I doubt is a good idea). For authenticated encryption further
additional storage space is required.

To be able to to use the LSN as the IV, the patch needs to ensure that the LSN
increases in additional situations (a more aggressive version of
wal_log_hint_bits) - which can't be done below smgr, where we don't know about
what WAL logging was done. Nor can you easily just add space on the page below
md.c, for the purpose of storing an LSN independent IV and the authentication
data.

You could decide that the security that XTS provides is sufficient (XTS could
be used without a real IV, with some downsides), but we'd pretty much prevent
ourselves from ever implementing authenticated encryption.

Greetings,

Andres Freund




Re: Moving forward with TDE [PATCH v3]

2023-11-02 Thread Andres Freund
Hi,

On 2023-10-31 16:23:17 -0500, David Christensen wrote:
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and
> add other knowledge
> 0005 - Encrypted/authenticated WAL
>
> These are very broad strokes at this point and should be split up a bit
> more to make things more granular and easier to review, but I wanted to get
> this update out.

Yes, particularly 0003 really needs to be split - as is it's not easily
reviewable.



> From 327e86d52be1df8de9c3a324cb06b85ba5db9604 Mon Sep 17 00:00:00 2001
> From: David Christensen 
> Date: Fri, 29 Sep 2023 15:16:00 -0400
> Subject: [PATCH v3 5/5] Add encrypted/authenticated WAL
>
> When using an encrypted cluster, we need to ensure that the WAL is also
> encrypted. While we could go with an page-based approach, we use instead a
> per-record approach, using GCM for the encryption method and storing the 
> AuthTag
> in the xl_crc field.
>
> We change the xl_crc field to instead be a union struct, with a compile-time
> adjustable size to allow us to customize the number of bytes allocated to the
> GCM authtag.  This allows us to easily adjust the size of bytes needed to
> support our authentication.  (Testing has included up to 12, but leaving at 
> this
> point to 4 due to keeping the size of the WAL records the same.)

Ugh, that'll be quite a bit of overhead in some workloads... You can't really
use such a build for non-encrypted workloads, making this a not very
deployable path...


> @@ -905,20 +905,28 @@ XLogInsertRecord(XLogRecData *rdata,
>   {
>   /*
>* Now that xl_prev has been filled in, calculate CRC of the 
> record
> -  * header.
> +  * header.  If we are using encrypted WAL, this CRC is 
> overwritten by
> +  * the authentication tag, so just zero
>*/
> - rdata_crc = rechdr->xl_crc;
> - COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, xl_crc));
> - FIN_CRC32C(rdata_crc);
> - rechdr->xl_crc = rdata_crc;
> + if (!encrypt_wal)
> + {
> + rdata_crc = rechdr->xl_integrity.crc;
> + COMP_CRC32C(rdata_crc, rechdr, offsetof(XLogRecord, 
> xl_integrity.crc));
> + FIN_CRC32C(rdata_crc);
> + rechdr->xl_integrity.crc = rdata_crc;
> + }
> + else
> + memset(>xl_integrity, 0, 
> sizeof(rechdr->xl_integrity));

Why aren't you encrypting most of the data here?  Just as for CRC computation,
encrypting a large record in XLOG_BLCKSZ


>   * XLogRecordDataHeaderLong structs all begin with a single 'id' byte. It's
>   * used to distinguish between block references, and the main data structs.
>   */
> +
> +#define XL_AUTHTAG_SIZE 4
> +#define XL_HEADER_PAD 2
> +
>  typedef struct XLogRecord
>  {
>   uint32  xl_tot_len; /* total len of entire record */
> @@ -45,14 +49,16 @@ typedef struct XLogRecord
>   XLogRecPtr  xl_prev;/* ptr to previous record in 
> log */
>   uint8   xl_info;/* flag bits, see below */
>   RmgrId  xl_rmid;/* resource manager for this 
> record */
> - /* 2 bytes of padding here, initialize to zero */
> - pg_crc32c   xl_crc; /* CRC for this record */
> -
> + uint8   xl_pad[XL_HEADER_PAD];  /* required alignment 
> padding */

What does "required" mean here? And why is this defined in a separate define?
And why do we need the explicit field here at all? The union will still
provide sufficient alignment for a uint32.

It also doesn't seem right to remove the comment about needing to zero out the
space.


> From 7557acf60f52da4a86fd9f902bab4804c028dd4b Mon Sep 17 00:00:00 2001
> From: David Christensen 
> Date: Tue, 31 Oct 2023 15:24:02 -0400
> Subject: [PATCH v3 3/5] Backend-related changes




> --- a/src/backend/access/heap/rewriteheap.c
> +++ b/src/backend/access/heap/rewriteheap.c
> @@ -323,6 +323,11 @@ end_heap_rewrite(RewriteState state)
>   state->rs_buffer,
>   true);
>
> + PageEncryptInplace(state->rs_buffer, MAIN_FORKNUM,
> +
> RelationIsPermanent(state->rs_new_rel),
> +state->rs_blockno,
> +
> RelationGetSmgr(state->rs_new_rel)->smgr_rlocator.locator.relNumber
> + );
>   PageSetChecksumInplace(state->rs_buffer, state->rs_blockno);
>
>   smgrextend(RelationGetSmgr(state->rs_new_rel), MAIN_FORKNUM,

I don't think it's ok to have to make such changes in a 

Re: Moving forward with TDE [PATCH v3]

2023-11-02 Thread Matthias van de Meent
On Tue, 31 Oct 2023 at 22:23, David Christensen
 wrote:
>
> Greetings,
>
> I am including an updated version of this patch series; it has been rebased 
> onto 6ec62b7799 and reworked somewhat.
>
> The patches are as follows:
>
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption

I'm quite surprised at the significant number of changes being made
outside the core storage manager files. I thought that changing out
mdsmgr with an encrypted smgr (that could wrap mdsmgr if so desired)
would be the most obvious change to implement cluster-wide encryption
with the least code touched, as relations don't need to know whether
the files they're writing are encrypted, right? Is there a reason to
not implement this at the smgr level that I overlooked in the
documentation of these patches?

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)




Re: Moving forward with TDE [PATCH v3]

2023-10-31 Thread Bruce Momjian
On Tue, Oct 31, 2023 at 04:32:38PM -0500, David Christensen wrote:
> On Tue, Oct 31, 2023 at 4:30 PM Bruce Momjian  wrote:
> Temporary /files/ are handled in a different patch set and are not included
> here (not sure of the status of integrating at this point). I  believe that
> this patch should handle temporary heap files just fine since they go through
> the Page API.

Yes, that's what I thought, thanks.

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.




Re: Moving forward with TDE [PATCH v3]

2023-10-31 Thread David Christensen
On Tue, Oct 31, 2023 at 4:30 PM Bruce Momjian  wrote:

> On Tue, Oct 31, 2023 at 04:23:17PM -0500, David Christensen wrote:
> > Greetings,
> >
> > I am including an updated version of this patch series; it has been
> rebased
> > onto 6ec62b7799 and reworked somewhat.
> >
> > The patches are as follows:
> >
> > 0001 - doc updates
> > 0002 - Basic key management and cipher support
> > 0003 - Backend-related changes to support heap encryption
> > 0004 - modifications to bin tools and programs to manage key rotation
> and add
> > other knowledge
> > 0005 - Encrypted/authenticated WAL
> >
> > These are very broad strokes at this point and should be split up a bit
> more to
> > make things more granular and easier to review, but I wanted to get this
> update
> > out.
>
> This lacks temp table file encryption, right?


Temporary /files/ are handled in a different patch set and are not included
here (not sure of the status of integrating at this point). I  believe that
this patch should handle temporary heap files just fine since they go
through the Page API.


Re: Moving forward with TDE [PATCH v3]

2023-10-31 Thread Bruce Momjian
On Tue, Oct 31, 2023 at 04:23:17PM -0500, David Christensen wrote:
> Greetings,
> 
> I am including an updated version of this patch series; it has been rebased
> onto 6ec62b7799 and reworked somewhat.
> 
> The patches are as follows:
> 
> 0001 - doc updates
> 0002 - Basic key management and cipher support
> 0003 - Backend-related changes to support heap encryption
> 0004 - modifications to bin tools and programs to manage key rotation and add
> other knowledge
> 0005 - Encrypted/authenticated WAL
> 
> These are very broad strokes at this point and should be split up a bit more 
> to
> make things more granular and easier to review, but I wanted to get this 
> update
> out.

This lacks temp table file encryption, right?

-- 
  Bruce Momjian  https://momjian.us
  EDB  https://enterprisedb.com

  Only you can decide what is important to you.