Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Maksim Stepachev Mon, 28 Sep 2020 01:04:43 -0700

Hi, Pavel.

As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843]
is "PATCH AVAILABLE". Is this ticket finished?


чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin <[email protected]>:

> Hello all.
>
> I'm working on TDE cache group key rotation [1] and I have a couple of
> questions about partition re-encryption.
>
> As described in the wiki [2], the process of re-encryption at the
> moment consists of sequentially marking memory pages as dirty, this
> process looks not resource-intensive.
> Do you think it is necessary to do this in a multithreaded mode or
> single thread is enough?
> (We started testing re-encryption on dedicated servers (Xeon E5-2680
> 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy =
> CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result,
> single-threaded encryption loaded disk within 30%. At the same time,
> the total re-encryption speed was around 60 MB/s, which allows one
> node to re-encrypt 1 TB of data in about 5 hours, and it seems that
> this performance is enough.)
>
> The second question is about the approach to storing the re-encryption
> status.
> At the moment, the re-encryption status includes two parameters - the
> total number of pages in the partition at the time of the start of
> re-encryption (int) and the index of the last re-encrypted page (int).
> These 8 bytes are stored in the metapage on the checkpoint (which
> ensures that if the checkpoint does not happen, we will continue the
> process from the last page written to disk).
> However, if multithread partition scanning does not make sense, then
> it seems that it is possible to change the implementation and don't
> change the metapage structure. Store only the "pointer" of the
> partition (and the cache group) in the metastore and scan in strict
> order.
> The approach with storing the status in the metapage of the partition
> seems to me more flexible, stable and has a number of advantages over
> the "pointer" approach:
> 1. Since we saving the total number of pages at the re-encryption
> startup - we will not scan extra pages that may be added to the
> partition later.
> 2. We can move partitions between nodes and re-encryption should
> continue from a certain point on the new node.
> 3. If a partition is (re)created during cache group re-encryption, it
> will not be re-encrypted (since its re-encryption status will be reset
> and all data is encrypted with the latest encryption key after
> (re)creation.
>
> Do you think single-threaded mode is enough?
> Is it better to keep the re-encryption status in the metapage or store
> the "pointer" in the metastore?
>
> [1] https://issues.apache.org/jira/browse/IGNITE-12843
> [2]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption
>
> пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin <[email protected]>:
> >
> > Hello,
> >
> > I'll expand the answer a bit about calculating CRC, the problem is not
> > that it is calculated twice, but that now for encrypted pages,
> > EncryptedFileIO checks physical integrity, and FilePageStore checks
> > the correctness of the encryption key, but from my point of view, it
> > should be vice versa - the lower (delegated) FileIO should check the
> > physical integrity and EncryptedFileIO should check the correctness of
> > the encryption key.
> >
> > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin <[email protected]>:
> > >
> > > Hello,
> > >
> > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > filePageStore - what should we do with this?
> > >
> > > We need to calculate the CRC of encrypted data, because we may be
> > > using the wrong encryption key to decrypt data, in which case we will
> > > not understand if the physical integrity is violated or the wrong
> > > encryption key is used.
> > >
> > > > 9. Question - How do we optimize when we can check that this page is
> > > > already encrypted by parallel loading? Maybe we should do this in
> Phase 4?
> > >
> > > To do this, we need to store the encryption key ID in memory (at
> > > least), but this is not easy to do right now without breaking binary
> > > compatibility.
> > >
> > > > 7. Question -the current implementation does not use the throttling
> that
> > > > is implemented in PDS. Users should set the throughput such as 5 MB
> per
> > > > second, but not the timeout, packet size, or stream size.
> > >
> > > I've added a simple rate limiter for this.
> > >
> > > > 8. Question - why we add a lot of system properties?
> > > >> Can you, please, list system properties that should be moved to the
> configuration?
> > >
> > > It's about the background re-encryption properties, for now, it is:
> > > - re-encryption speed limit (in megabytes per second)
> > > - threads count used for re-encryption
> > > - count of pages in batch, processed under checkpoint lock
> > > - flag to completely disable background re-encryption
> > >
> > > > 11. We should remember about complicated test scenarios with failover
> > >
> > > PR contains tests for re-encryption (and key rotation) on unstable
> > > topology (with baseline change and without it). I'll expand them if I
> > > missed some cases.
> > >
> > > > 13. Will re-encryption continue after the cluster is completely
> stopped?
> > >
> > > Yes, as I mentioned earlier, we save the re-encryption status in the
> > > meta page of each re-encrypted partition and trigger re-encryption on
> > > node startup if needed (more detailed description on the wiki).
> > >
> > > Thanks a lot for your comments, I am still working on PR and expanding
> > > wiki documentation. I'll let you know when it will be ready for the
> > > review.
> > >
> > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk <
> [email protected]>:
> > > >
> > > > Hello Nikolay,
> > > >
> > > >
> > > > > > 10. Question - CRC is read in two places encryptionFileIO and
> > > > > filePageStore - what should we do with this?
> > > > >
> > > > > filePageStore checks CRC of the encrypted page. This required to
> confirm
> > > > > the page not corrupted on the disk.
> > > > > encryptionFileIO checks CRC of the decrypted page(CRC itself
> stored in the
> > > > > encrypted data).
> > > > > This required to be sure the decrypted page contains correct data
> and not
> > > > > replaced with some malicious content.
> > > > >
> > > >
> > > > I still do not see why we need CRC twice, can you please elaborate
> on this
> > > > statement? If an attacker is able to replace the contents of an
> encrypted
> > > > page, it means that they have access to the encryption key. What will
> > > > prevent them from calculating the CRC of malicious content and then
> > > > encrypting it?
>

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to