Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Alexei Scherbakov Mon, 25 May 2020 02:23:31 -0700

пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>:


> > This willl takes us to the re-encryption using full rebalancing
>
> Rebalance will require 2x efforts for reencryption
>
> 1. Read and send data from supplier node.
> 2. Reencrypt and write data on demander node.
>
> Instead of
>
> 1. Read, reencrypt and write data on «demander» node.
>

Usually, reading and sending is not a bottleneck. And don't forget we can
run out of WAL history and fall back to full rebalancing with partition
eviction eliminating all efforts from offline re-encryption.

On the other side, for a grid having many nodes one-by-one re-encryption
can take a long time.
It should also be possible to re-encrypt all data as fast as possible if,
for example, if a load can be switched to another grid, where offline
encryption will come in handy.

So, I suggest to implement offline re-encryption and online re-encryption
using rebalancing as a first step.

Next step can be online in-place re-encryption. It's important to measure
business impact from it on online grid.


>
>
> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <alexey.scherbak...@gmail.com>
> написал(а):
> >
> > For me, the one big disadvantage for offline re-encryption is the
> > possibility to run out of WAL history.
> > If an re-encryption takes a long time we will get full rebalancing with
> > partition eviction.
> > This willl takes us to the re-encryption using full rebalancing, proposed
> > by me earlier.
> >
> >
> >
> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org>:
> >
> >>> And definitely this approach is much simplier to implement
> >>
> >> I agree.
> >>
> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> fully offline procedure:
> >>
> >> 1. Stop node.
> >> 2. Execute some control.sh command that will reencrypt all data without
> >> starting node
> >> 3. Start node.
> >>
> >> Pavel, can you, please, write it one more time - what disadvantages in
> >> offline procedure?
> >>
> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> alexey.scherbak...@gmail.com>
> >> написал(а):
> >>>
> >>> And definitely this approach is much simplier to implement because all
> >>> corner cases are handled by rebalancing code.
> >>>
> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> alexey.scherbak...@gmail.com
> >>>> :
> >>>
> >>>> I mean: serving supply requests.
> >>>>
> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >>>> alexey.scherbak...@gmail.com>:
> >>>>
> >>>>> Nikolay,
> >>>>>
> >>>>> Can you explain why such restriction is necessary ?
> >>>>> Most likely having a currently re-encrypting node serving only demand
> >>>>> requests will have least preformance impact on a grid.
> >>>>>
> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <nizhi...@apache.org>:
> >>>>>
> >>>>>> Hello, Alexei.
> >>>>>>
> >>>>>> I think we want to implement this feature without nodes restart.
> >>>>>> In the ideal scenario all nodes will stay alive and respond to the
> >> user
> >>>>>> requests.
> >>>>>>
> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >>>>>> alexey.scherbak...@gmail.com> написал(а):
> >>>>>>>
> >>>>>>> Pavel Pereslegin,
> >>>>>>>
> >>>>>>> I see another opportunity.
> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> a
> >>>>>> key,
> >>>>>>> start node and wait for rebalancing to complete.
> >>>>>>> Data will be re-encrypted during rebalancing.
> >>>>>>>
> >>>>>>> Did I miss something ?
> >>>>>>>
> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <ivan.glu...@gmail.com>:
> >>>>>>>
> >>>>>>>> Folks,
> >>>>>>>>
> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> interested
> >>>>>> in TDE
> >>>>>>>> in general and keys rotations specifically, but we don't have
> enough
> >>>>>> time
> >>>>>>>> so far.
> >>>>>>>> We'll dive into this feature and participate in reviews next
> month.
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> Best Regards,
> >>>>>>>> Ivan Rakov
> >>>>>>>>
> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> xxt...@gmail.com
> >>>
> >>>>>>>> wrote:
> >>>>>>>>
> >>>>>>>>> Hello, Alexey.
> >>>>>>>>>
> >>>>>>>>>> is the encryption key for the data the same on all nodes in the
> >>>>>>>> cluster?
> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, the
> key
> >>>>>> is
> >>>>>>>>> the same on all nodes.
> >>>>>>>>>
> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> >>>>>>>>>> encrypted with both new and old keys at the same time.
> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> same
> >>>>>> time.
> >>>>>>>>> Currently, we only store one key for one cache group. To rotate a
> >>>>>> key,
> >>>>>>>>> at a certain point in time it is necessary to support several
> keys
> >>>>>> (at
> >>>>>>>>> least for reading the WAL).
> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> >>>>>> identifier
> >>>>>>>>> on each encrypted page (we currently have some unused space on
> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> Thus,
> >> we
> >>>>>>>>> will have several keys for reading and one key for writing. I
> >> assume
> >>>>>>>>> that the old key will be automatically deleted when a specific
> WAL
> >>>>>>>>> segment is deleted (and re-encryption is finished).
> >>>>>>>>>
> >>>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
> >>>>>>>>> Yes.
> >>>>>>>>>
> >>>>>>>>>> If a node goes down during the re-encryption, but the rest of
> the
> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>>>> complete?
> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete when
> >> we
> >>>>>>>>> set the new key on all nodes so that the updates will be
> encrypted
> >>>>>>>>> with the new key (as required by PCI DSS).
> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> >>>>>> cluster
> >>>>>>>>> wide).
> >>>>>>>>>
> >>>>>>>>> I forgot to mention that with “in place” re-encryption it will be
> >>>>>>>>> impossible to quickly cancel re-encryption, because by canceling
> we
> >>>>>>>>> mean re-encryption with the old key.
> >>>>>>>>>
> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> >>>>>> described
> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> re-encryption
> >>>>>> if
> >>>>>>>>> we'll go this way. In short, send the new encryption key
> >>>>>> cluster-wide,
> >>>>>>>>> each node adds a new key and starts background re-encryption.
> >>>>>>>>>
> >>>>>>>>> [1]
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>>>> .
> >>>>>>>>>
> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> >>>>>> alexey.goncha...@gmail.com
> >>>>>>>>> :
> >>>>>>>>>>
> >>>>>>>>>> Pavel, Anton,
> >>>>>>>>>>
> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> Clearly,
> >>>>>>>>> during
> >>>>>>>>>> the re-encryption there will exist pages encrypted with both new
> >> and
> >>>>>>>> old
> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the
> data
> >>>>>>>> after
> >>>>>>>>> it
> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but the
> >>>>>> rest of
> >>>>>>>>> the
> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
> >>>>>>>> complete?
> >>>>>>>>> By
> >>>>>>>>>> the way, is the encryption key for the data the same on all
> nodes
> >> in
> >>>>>>>> the
> >>>>>>>>>> cluster?
> >>>>>>>>>>
> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <a...@apache.org>:
> >>>>>>>>>>
> >>>>>>>>>>> +1 to "In place re-encryption".
> >>>>>>>>>>>
> >>>>>>>>>>> - It has a simple design.
> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt the
> >> data.
> >>>>>>>>>>> (Friendly to load).
> >>>>>>>>>>> - Easy to throttle.
> >>>>>>>>>>> - Easy to continue.
> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
> >> pages
> >>>>>>>>> without
> >>>>>>>>>>> restoring them to on-heap.
> >>>>>>>>>>>
> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> >> xxt...@gmail.com
> >>>>>>>
> >>>>>>>>> wrote:
> >>>>>>>>>>>
> >>>>>>>>>>>> Hello Igniters.
> >>>>>>>>>>>>
> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent
> Data
> >>>>>>>>>>>> Encryption was implemented [1], but some security standards
> (PCI
> >>>>>>>> DSS
> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> >> Currently,
> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> >>>>>>>>> encryption
> >>>>>>>>>>>> keys are stored in metastore.
> >>>>>>>>>>>>
> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want
> >> to
> >>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I
> >> see
> >>>>>>>>> two
> >>>>>>>>>>>> different strategies.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 1. In place re-encryption:
> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> >>>>>>>>> datastore,
> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> pages
> >>>>>>>> will
> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
> >> with
> >>>>>>>>>>>> updates). This strategy requires store the identifier (number)
> >> of
> >>>>>>>> the
> >>>>>>>>>>>> encryption key into the encrypted page.
> >>>>>>>>>>>> pros:
> >>>>>>>>>>>> - can work in the background with minimal performance impact
> >>>>>>>> (this
> >>>>>>>>>>>> impact can be managed).
> >>>>>>>>>>>> cons:
> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> >>>>>>>> historical
> >>>>>>>>>>>> rebalance.
> >>>>>>>>>>>>
> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> create
> >>>>>>>>>>>> partition copy encrypted with the new key and then replace the
> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> >>>>>>>>>>>> pros:
> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> >>>>>>>>>>>> cons:
> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology)
> can
> >>>>>>>> be
> >>>>>>>>>>>> difficult to implement.
> >>>>>>>>>>>>
> >>>>>>>>>>>> (See more detailed comparison [5])
> >>>>>>>>>>>>
> >>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure
> (It
> >> is
> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> once
> >>>>>>>> every
> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> maintenance
> >>>>>>>> mode
> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
> >> in
> >>>>>>>>> such
> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> >> faster.
> >>>>>>>>>>>>
> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
> >>>>>> which
> >>>>>>>>> of
> >>>>>>>>>>>> the proposed options is best suited for this?
> >>>>>>>>>>>>
> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >>>>>>>>>>>> [2]
> >>>>>>>>>
> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >>>>>>>>>>>> [3]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >>>>>>>>>>>> [4]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >>>>>>>>>>>> .
> >>>>>>>>>>>> [5]
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>
> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >>>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> --
> >>>>>>>
> >>>>>>> Best regards,
> >>>>>>> Alexei Scherbakov
> >>>>>>
> >>>>>>
> >>>>>
> >>>>> --
> >>>>>
> >>>>> Best regards,
> >>>>> Alexei Scherbakov
> >>>>>
> >>>>
> >>>>
> >>>> --
> >>>>
> >>>> Best regards,
> >>>> Alexei Scherbakov
> >>>>
> >>>
> >>>
> >>> --
> >>>
> >>> Best regards,
> >>> Alexei Scherbakov
> >>
> >>
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>
>

-- 

Best regards,
Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to