Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Maksim Stepachev Tue, 07 Jul 2020 03:49:38 -0700

Hi!

Do you have any updates about this issue? What types of implementations
have you chosen (in-place, offline, or in the background)? I know that we
want to add a partition defragmentation function, we can add a hole to
integrate the re-encryption scheme. Could you update your IEP with your
plans?


пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <[email protected]>:

> Nikolay, Alexei,
>
> thanks for your suggestions.
>
> Offline re-encryption does not seem so simple, we need to read/replace
> the existing encryption keys on all nodes (therefore, we should be
> able to read/write metastore/WAL and exchange data between the
> baseline nodes). Re-encryption in maintenance mode (for example, in a
> stable read-only cluster) will be simple, but it still looks very
> inconvenient, at least because users will need to interrupt all
> operations.
>
> The main advantage of online "in place" re-encryption is that we'll
> support multiple keys for reading, and this procedure does not
> directly depend on background re-encryption.
>
> So, the first step is similar to rotating the master key when the new
> key was set for writing on all nodes - that’s it, the cache group key
> rotation is complete (this is what PCI DSS requires - encrypt new
> updates with new keys).
> The second step is to re-encrypt the existing data, As I said
> previously I thought about scanning all partition pages in some
> background mode (store progress on the metapage to continue after
> restart), but rebalance approach should also work here if I figure out
> how to automate this process.
>
> пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> [email protected]>:
> >
> >
> >
> > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <[email protected]>:
> >>
> >> > This willl takes us to the re-encryption using full rebalancing
> >>
> >> Rebalance will require 2x efforts for reencryption
> >>
> >> 1. Read and send data from supplier node.
> >> 2. Reencrypt and write data on demander node.
> >>
> >> Instead of
> >>
> >> 1. Read, reencrypt and write data on «demander» node.
> >
> >
> > Usually, reading and sending is not a bottleneck. And don't forget we
> can run out of WAL history and fall back to full rebalancing with partition
> eviction eliminating all efforts from offline re-encryption.
> >
> > On the other side, for a grid having many nodes one-by-one re-encryption
> can take a long time.
> > It should also be possible to re-encrypt all data as fast as possible
> if, for example, if a load can be switched to another grid, where offline
> encryption will come in handy.
> >
> > So, I suggest to implement offline re-encryption and online
> re-encryption using rebalancing as a first step.
> >
> > Next step can be online in-place re-encryption. It's important to
> measure business impact from it on online grid.
> >
> >>
> >>
> >>
> >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> [email protected]> написал(а):
> >> >
> >> > For me, the one big disadvantage for offline re-encryption is the
> >> > possibility to run out of WAL history.
> >> > If an re-encryption takes a long time we will get full rebalancing
> with
> >> > partition eviction.
> >> > This willl takes us to the re-encryption using full rebalancing,
> proposed
> >> > by me earlier.
> >> >
> >> >
> >> >
> >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <[email protected]>:
> >> >
> >> >>> And definitely this approach is much simplier to implement
> >> >>
> >> >> I agree.
> >> >>
> >> >> If we allow to made nodes offline for reencryption then we can
> implement a
> >> >> fully offline procedure:
> >> >>
> >> >> 1. Stop node.
> >> >> 2. Execute some control.sh command that will reencrypt all data
> without
> >> >> starting node
> >> >> 3. Start node.
> >> >>
> >> >> Pavel, can you, please, write it one more time - what disadvantages
> in
> >> >> offline procedure?
> >> >>
> >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> [email protected]>
> >> >> написал(а):
> >> >>>
> >> >>> And definitely this approach is much simplier to implement because
> all
> >> >>> corner cases are handled by rebalancing code.
> >> >>>
> >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> >> >> [email protected]
> >> >>>> :
> >> >>>
> >> >>>> I mean: serving supply requests.
> >> >>>>
> >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> >> >>>> [email protected]>:
> >> >>>>
> >> >>>>> Nikolay,
> >> >>>>>
> >> >>>>> Can you explain why such restriction is necessary ?
> >> >>>>> Most likely having a currently re-encrypting node serving only
> demand
> >> >>>>> requests will have least preformance impact on a grid.
> >> >>>>>
> >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[email protected]
> >:
> >> >>>>>
> >> >>>>>> Hello, Alexei.
> >> >>>>>>
> >> >>>>>> I think we want to implement this feature without nodes restart.
> >> >>>>>> In the ideal scenario all nodes will stay alive and respond to
> the
> >> >> user
> >> >>>>>> requests.
> >> >>>>>>
> >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> >> >>>>>> [email protected]> написал(а):
> >> >>>>>>>
> >> >>>>>>> Pavel Pereslegin,
> >> >>>>>>>
> >> >>>>>>> I see another opportunity.
> >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> >> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> change
> >> >> a
> >> >>>>>> key,
> >> >>>>>>> start node and wait for rebalancing to complete.
> >> >>>>>>> Data will be re-encrypted during rebalancing.
> >> >>>>>>>
> >> >>>>>>> Did I miss something ?
> >> >>>>>>>
> >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[email protected]>:
> >> >>>>>>>
> >> >>>>>>>> Folks,
> >> >>>>>>>>
> >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> interested
> >> >>>>>> in TDE
> >> >>>>>>>> in general and keys rotations specifically, but we don't have
> enough
> >> >>>>>> time
> >> >>>>>>>> so far.
> >> >>>>>>>> We'll dive into this feature and participate in reviews next
> month.
> >> >>>>>>>>
> >> >>>>>>>> --
> >> >>>>>>>> Best Regards,
> >> >>>>>>>> Ivan Rakov
> >> >>>>>>>>
> >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> [email protected]
> >> >>>
> >> >>>>>>>> wrote:
> >> >>>>>>>>
> >> >>>>>>>>> Hello, Alexey.
> >> >>>>>>>>>
> >> >>>>>>>>>> is the encryption key for the data the same on all nodes in
> the
> >> >>>>>>>> cluster?
> >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key,
> the key
> >> >>>>>> is
> >> >>>>>>>>> the same on all nodes.
> >> >>>>>>>>>
> >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> same
> >> >>>>>> time.
> >> >>>>>>>>> Currently, we only store one key for one cache group. To
> rotate a
> >> >>>>>> key,
> >> >>>>>>>>> at a certain point in time it is necessary to support several
> keys
> >> >>>>>> (at
> >> >>>>>>>>> least for reading the WAL).
> >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> >> >>>>>> identifier
> >> >>>>>>>>> on each encrypted page (we currently have some unused space on
> >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> Thus,
> >> >> we
> >> >>>>>>>>> will have several keys for reading and one key for writing. I
> >> >> assume
> >> >>>>>>>>> that the old key will be automatically deleted when a
> specific WAL
> >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> >> >>>>>>>>>
> >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> restarts?
> >> >>>>>>>>> Yes.
> >> >>>>>>>>>
> >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest
> of the
> >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> procedure
> >> >>>>>>>> complete?
> >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete
> when
> >> >> we
> >> >>>>>>>>> set the new key on all nodes so that the updates will be
> encrypted
> >> >>>>>>>>> with the new key (as required by PCI DSS).
> >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> >> >>>>>> cluster
> >> >>>>>>>>> wide).
> >> >>>>>>>>>
> >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> will be
> >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> canceling we
> >> >>>>>>>>> mean re-encryption with the old key.
> >> >>>>>>>>>
> >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> >> >>>>>> described
> >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> re-encryption
> >> >>>>>> if
> >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> >> >>>>>> cluster-wide,
> >> >>>>>>>>> each node adds a new key and starts background re-encryption.
> >> >>>>>>>>>
> >> >>>>>>>>> [1]
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >> >>>>>>>>> .
> >> >>>>>>>>>
> >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> >> >>>>>> [email protected]
> >> >>>>>>>>> :
> >> >>>>>>>>>>
> >> >>>>>>>>>> Pavel, Anton,
> >> >>>>>>>>>>
> >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> >> >> Clearly,
> >> >>>>>>>>> during
> >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both
> new
> >> >> and
> >> >>>>>>>> old
> >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> the data
> >> >>>>>>>> after
> >> >>>>>>>>> it
> >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but
> the
> >> >>>>>> rest of
> >> >>>>>>>>> the
> >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> procedure
> >> >>>>>>>> complete?
> >> >>>>>>>>> By
> >> >>>>>>>>>> the way, is the encryption key for the data the same on all
> nodes
> >> >> in
> >> >>>>>>>> the
> >> >>>>>>>>>> cluster?
> >> >>>>>>>>>>
> >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[email protected]
> >:
> >> >>>>>>>>>>
> >> >>>>>>>>>>> +1 to "In place re-encryption".
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> - It has a simple design.
> >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt
> the
> >> >> data.
> >> >>>>>>>>>>> (Friendly to load).
> >> >>>>>>>>>>> - Easy to throttle.
> >> >>>>>>>>>>> - Easy to continue.
> >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> re-encrypt
> >> >> pages
> >> >>>>>>>>> without
> >> >>>>>>>>>>> restoring them to on-heap.
> >> >>>>>>>>>>>
> >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> >> >> [email protected]
> >> >>>>>>>
> >> >>>>>>>>> wrote:
> >> >>>>>>>>>>>
> >> >>>>>>>>>>>> Hello Igniters.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> Transparent Data
> >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> standards (PCI
> >> >>>>>>>> DSS
> >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> >> >> Currently,
> >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> >> >>>>>>>>> encryption
> >> >>>>>>>>>>>> keys are stored in metastore.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and
> want
> >> >> to
> >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> data, I
> >> >> see
> >> >>>>>>>>> two
> >> >>>>>>>>>>>> different strategies.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> 1. In place re-encryption:
> >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> >> >>>>>>>>> datastore,
> >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> pages
> >> >>>>>>>> will
> >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> along
> >> >> with
> >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> (number)
> >> >> of
> >> >>>>>>>> the
> >> >>>>>>>>>>>> encryption key into the encrypted page.
> >> >>>>>>>>>>>> pros:
> >> >>>>>>>>>>>> - can work in the background with minimal performance
> impact
> >> >>>>>>>> (this
> >> >>>>>>>>>>>> impact can be managed).
> >> >>>>>>>>>>>> cons:
> >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> >> >>>>>>>> historical
> >> >>>>>>>>>>>> rebalance.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> create
> >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace
> the
> >> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> >> >>>>>>>>>>>> pros:
> >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> >> >>>>>>>>>>>> cons:
> >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> topology) can
> >> >>>>>>>> be
> >> >>>>>>>>>>>> difficult to implement.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> (See more detailed comparison [5])
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> procedure (It
> >> >> is
> >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> once
> >> >>>>>>>> every
> >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> maintenance
> >> >>>>>>>> mode
> >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster)
> and
> >> >> in
> >> >>>>>>>>> such
> >> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> >> >> faster.
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption
> and
> >> >>>>>> which
> >> >>>>>>>>> of
> >> >>>>>>>>>>>> the proposed options is best suited for this?
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> >> >>>>>>>>>>>> [2]
> >> >>>>>>>>>
> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> >> >>>>>>>>>>>> [3]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> >> >>>>>>>>>>>> [4]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> >> >>>>>>>>>>>> .
> >> >>>>>>>>>>>> [5]
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>
> >> >>
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> >> >>>>>>>>>>>>
> >> >>>>>>>>>>>
> >> >>>>>>>>>
> >> >>>>>>>>
> >> >>>>>>>
> >> >>>>>>>
> >> >>>>>>> --
> >> >>>>>>>
> >> >>>>>>> Best regards,
> >> >>>>>>> Alexei Scherbakov
> >> >>>>>>
> >> >>>>>>
> >> >>>>>
> >> >>>>> --
> >> >>>>>
> >> >>>>> Best regards,
> >> >>>>> Alexei Scherbakov
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>>
> >> >>>> Best regards,
> >> >>>> Alexei Scherbakov
> >> >>>>
> >> >>>
> >> >>>
> >> >>> --
> >> >>>
> >> >>> Best regards,
> >> >>> Alexei Scherbakov
> >> >>
> >> >>
> >> >
> >> > --
> >> >
> >> > Best regards,
> >> > Alexei Scherbakov
> >>
> >
> >
> > --
> >
> > Best regards,
> > Alexei Scherbakov
>

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to