Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Pavel Pereslegin Tue, 07 Jul 2020 07:40:52 -0700

Hello, Maksim.

For implementation, I chose so-called "in place background
re-encryption" design.


The first step is to rotate the key for writing data, it only works on
the active cluster, at the moment..
The second step is re-encryption (to remove previous encryption key).
If node was restarted reencryption starts after metastorage becomes
ready for read/write. Each "re-encrypted" partition (including index)
has an attribute on the meta page that indicates whether background
re-encryption should be continued.

I updated the description in wiki [1].
Some more details in jira [2].
Draft PR [3].

[1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
[2] https://issues.apache.org/jira/browse/IGNITE-12843
[3] https://github.com/apache/ignite/pull/7941

вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev <maksim.stepac...@gmail.com>:
>
> Hi!
>
> Do you have any updates about this issue? What types of implementations
> have you chosen (in-place, offline, or in the background)? I know that we
> want to add a partition defragmentation function, we can add a hole to
> integrate the re-encryption scheme. Could you update your IEP with your
> plans?
>
> пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <xxt...@gmail.com>:
>
> > Nikolay, Alexei,
> >
> > thanks for your suggestions.
> >
> > Offline re-encryption does not seem so simple, we need to read/replace
> > the existing encryption keys on all nodes (therefore, we should be
> > able to read/write metastore/WAL and exchange data between the
> > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > stable read-only cluster) will be simple, but it still looks very
> > inconvenient, at least because users will need to interrupt all
> > operations.
> >
> > The main advantage of online "in place" re-encryption is that we'll
> > support multiple keys for reading, and this procedure does not
> > directly depend on background re-encryption.
> >
> > So, the first step is similar to rotating the master key when the new
> > key was set for writing on all nodes - that’s it, the cache group key
> > rotation is complete (this is what PCI DSS requires - encrypt new
> > updates with new keys).
> > The second step is to re-encrypt the existing data, As I said
> > previously I thought about scanning all partition pages in some
> > background mode (store progress on the metapage to continue after
> > restart), but rebalance approach should also work here if I figure out
> > how to automate this process.
> >
> > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>:
> > >
> > >
> > >
> > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>:
> > >>
> > >> > This willl takes us to the re-encryption using full rebalancing
> > >>
> > >> Rebalance will require 2x efforts for reencryption
> > >>
> > >> 1. Read and send data from supplier node.
> > >> 2. Reencrypt and write data on demander node.
> > >>
> > >> Instead of
> > >>
> > >> 1. Read, reencrypt and write data on «demander» node.
> > >
> > >
> > > Usually, reading and sending is not a bottleneck. And don't forget we
> > can run out of WAL history and fall back to full rebalancing with partition
> > eviction eliminating all efforts from offline re-encryption.
> > >
> > > On the other side, for a grid having many nodes one-by-one re-encryption
> > can take a long time.
> > > It should also be possible to re-encrypt all data as fast as possible
> > if, for example, if a load can be switched to another grid, where offline
> > encryption will come in handy.
> > >
> > > So, I suggest to implement offline re-encryption and online
> > re-encryption using rebalancing as a first step.
> > >
> > > Next step can be online in-place re-encryption. It's important to
> > measure business impact from it on online grid.
> > >
> > >>
> > >>
> > >>
> > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com> написал(а):
> > >> >
> > >> > For me, the one big disadvantage for offline re-encryption is the
> > >> > possibility to run out of WAL history.
> > >> > If an re-encryption takes a long time we will get full rebalancing
> > with
> > >> > partition eviction.
> > >> > This willl takes us to the re-encryption using full rebalancing,
> > proposed
> > >> > by me earlier.
> > >> >
> > >> >
> > >> >
> > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org>:
> > >> >
> > >> >>> And definitely this approach is much simplier to implement
> > >> >>
> > >> >> I agree.
> > >> >>
> > >> >> If we allow to made nodes offline for reencryption then we can
> > implement a
> > >> >> fully offline procedure:
> > >> >>
> > >> >> 1. Stop node.
> > >> >> 2. Execute some control.sh command that will reencrypt all data
> > without
> > >> >> starting node
> > >> >> 3. Start node.
> > >> >>
> > >> >> Pavel, can you, please, write it one more time - what disadvantages
> > in
> > >> >> offline procedure?
> > >> >>
> > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> > alexey.scherbak...@gmail.com>
> > >> >> написал(а):
> > >> >>>
> > >> >>> And definitely this approach is much simplier to implement because
> > all
> > >> >>> corner cases are handled by rebalancing code.
> > >> >>>
> > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> > >> >> alexey.scherbak...@gmail.com
> > >> >>>> :
> > >> >>>
> > >> >>>> I mean: serving supply requests.
> > >> >>>>
> > >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> > >> >>>> alexey.scherbak...@gmail.com>:
> > >> >>>>
> > >> >>>>> Nikolay,
> > >> >>>>>
> > >> >>>>> Can you explain why such restriction is necessary ?
> > >> >>>>> Most likely having a currently re-encrypting node serving only
> > demand
> > >> >>>>> requests will have least preformance impact on a grid.
> > >> >>>>>
> > >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <nizhi...@apache.org
> > >:
> > >> >>>>>
> > >> >>>>>> Hello, Alexei.
> > >> >>>>>>
> > >> >>>>>> I think we want to implement this feature without nodes restart.
> > >> >>>>>> In the ideal scenario all nodes will stay alive and respond to
> > the
> > >> >> user
> > >> >>>>>> requests.
> > >> >>>>>>
> > >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> > >> >>>>>> alexey.scherbak...@gmail.com> написал(а):
> > >> >>>>>>>
> > >> >>>>>>> Pavel Pereslegin,
> > >> >>>>>>>
> > >> >>>>>>> I see another opportunity.
> > >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
> > >> >>>>>>> It's a trivial procedure for me: stop a node, clear database,
> > change
> > >> >> a
> > >> >>>>>> key,
> > >> >>>>>>> start node and wait for rebalancing to complete.
> > >> >>>>>>> Data will be re-encrypted during rebalancing.
> > >> >>>>>>>
> > >> >>>>>>> Did I miss something ?
> > >> >>>>>>>
> > >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <ivan.glu...@gmail.com>:
> > >> >>>>>>>
> > >> >>>>>>>> Folks,
> > >> >>>>>>>>
> > >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> > interested
> > >> >>>>>> in TDE
> > >> >>>>>>>> in general and keys rotations specifically, but we don't have
> > enough
> > >> >>>>>> time
> > >> >>>>>>>> so far.
> > >> >>>>>>>> We'll dive into this feature and participate in reviews next
> > month.
> > >> >>>>>>>>
> > >> >>>>>>>> --
> > >> >>>>>>>> Best Regards,
> > >> >>>>>>>> Ivan Rakov
> > >> >>>>>>>>
> > >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> > xxt...@gmail.com
> > >> >>>
> > >> >>>>>>>> wrote:
> > >> >>>>>>>>
> > >> >>>>>>>>> Hello, Alexey.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> is the encryption key for the data the same on all nodes in
> > the
> > >> >>>>>>>> cluster?
> > >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key,
> > the key
> > >> >>>>>> is
> > >> >>>>>>>>> the same on all nodes.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> > >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> > >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the
> > same
> > >> >>>>>> time.
> > >> >>>>>>>>> Currently, we only store one key for one cache group. To
> > rotate a
> > >> >>>>>> key,
> > >> >>>>>>>>> at a certain point in time it is necessary to support several
> > keys
> > >> >>>>>> (at
> > >> >>>>>>>>> least for reading the WAL).
> > >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
> > >> >>>>>> identifier
> > >> >>>>>>>>> on each encrypted page (we currently have some unused space on
> > >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here).
> > Thus,
> > >> >> we
> > >> >>>>>>>>> will have several keys for reading and one key for writing. I
> > >> >> assume
> > >> >>>>>>>>> that the old key will be automatically deleted when a
> > specific WAL
> > >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> > >> >>>>>>>>>
> > >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> > restarts?
> > >> >>>>>>>>> Yes.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest
> > of the
> > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > procedure
> > >> >>>>>>>> complete?
> > >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete
> > when
> > >> >> we
> > >> >>>>>>>>> set the new key on all nodes so that the updates will be
> > encrypted
> > >> >>>>>>>>> with the new key (as required by PCI DSS).
> > >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
> > >> >>>>>> cluster
> > >> >>>>>>>>> wide).
> > >> >>>>>>>>>
> > >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> > will be
> > >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> > canceling we
> > >> >>>>>>>>> mean re-encryption with the old key.
> > >> >>>>>>>>>
> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> > >> >>>>>>>>> Initial design for re-encryption with "partition copying" is
> > >> >>>>>> described
> > >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> > re-encryption
> > >> >>>>>> if
> > >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> > >> >>>>>> cluster-wide,
> > >> >>>>>>>>> each node adds a new key and starts background re-encryption.
> > >> >>>>>>>>>
> > >> >>>>>>>>> [1]
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > >> >>>>>>>>> .
> > >> >>>>>>>>>
> > >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> > >> >>>>>> alexey.goncha...@gmail.com
> > >> >>>>>>>>> :
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> Pavel, Anton,
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work?
> > >> >> Clearly,
> > >> >>>>>>>>> during
> > >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both
> > new
> > >> >> and
> > >> >>>>>>>> old
> > >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> > the data
> > >> >>>>>>>> after
> > >> >>>>>>>>> it
> > >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but
> > the
> > >> >>>>>> rest of
> > >> >>>>>>>>> the
> > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > procedure
> > >> >>>>>>>> complete?
> > >> >>>>>>>>> By
> > >> >>>>>>>>>> the way, is the encryption key for the data the same on all
> > nodes
> > >> >> in
> > >> >>>>>>>> the
> > >> >>>>>>>>>> cluster?
> > >> >>>>>>>>>>
> > >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <a...@apache.org
> > >:
> > >> >>>>>>>>>>
> > >> >>>>>>>>>>> +1 to "In place re-encryption".
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> - It has a simple design.
> > >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt
> > the
> > >> >> data.
> > >> >>>>>>>>>>> (Friendly to load).
> > >> >>>>>>>>>>> - Easy to throttle.
> > >> >>>>>>>>>>> - Easy to continue.
> > >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> > >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> > re-encrypt
> > >> >> pages
> > >> >>>>>>>>> without
> > >> >>>>>>>>>>> restoring them to on-heap.
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> > >> >> xxt...@gmail.com
> > >> >>>>>>>
> > >> >>>>>>>>> wrote:
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>>>> Hello Igniters.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> > Transparent Data
> > >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> > standards (PCI
> > >> >>>>>>>> DSS
> > >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> > >> >> Currently,
> > >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
> > >> >>>>>>>>> encryption
> > >> >>>>>>>>>>>> keys are stored in metastore.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and
> > want
> > >> >> to
> > >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> > data, I
> > >> >> see
> > >> >>>>>>>>> two
> > >> >>>>>>>>>>>> different strategies.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> 1. In place re-encryption:
> > >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
> > >> >>>>>>>>> datastore,
> > >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint
> > pages
> > >> >>>>>>>> will
> > >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> > along
> > >> >> with
> > >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> > (number)
> > >> >> of
> > >> >>>>>>>> the
> > >> >>>>>>>>>>>> encryption key into the encrypted page.
> > >> >>>>>>>>>>>> pros:
> > >> >>>>>>>>>>>> - can work in the background with minimal performance
> > impact
> > >> >>>>>>>> (this
> > >> >>>>>>>>>>>> impact can be managed).
> > >> >>>>>>>>>>>> cons:
> > >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
> > >> >>>>>>>> historical
> > >> >>>>>>>>>>>> rebalance.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> > >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] -
> > create
> > >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace
> > the
> > >> >>>>>>>>>>>> original partition file with the new one (see details [4]).
> > >> >>>>>>>>>>>> pros:
> > >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> > >> >>>>>>>>>>>> cons:
> > >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> > topology) can
> > >> >>>>>>>> be
> > >> >>>>>>>>>>>> difficult to implement.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> (See more detailed comparison [5])
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> > procedure (It
> > >> >> is
> > >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least
> > once
> > >> >>>>>>>> every
> > >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> > maintenance
> > >> >>>>>>>> mode
> > >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster)
> > and
> > >> >> in
> > >> >>>>>>>>> such
> > >> >>>>>>>>>>>> case the approach with partition copying seems simpler and
> > >> >> faster.
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption
> > and
> > >> >>>>>> which
> > >> >>>>>>>>> of
> > >> >>>>>>>>>>>> the proposed options is best suited for this?
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > >> >>>>>>>>>>>> [2]
> > >> >>>>>>>>>
> > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > >> >>>>>>>>>>>> [3]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > >> >>>>>>>>>>>> [4]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > >> >>>>>>>>>>>> .
> > >> >>>>>>>>>>>> [5]
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>
> > >> >>
> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > >> >>>>>>>>>>>>
> > >> >>>>>>>>>>>
> > >> >>>>>>>>>
> > >> >>>>>>>>
> > >> >>>>>>>
> > >> >>>>>>>
> > >> >>>>>>> --
> > >> >>>>>>>
> > >> >>>>>>> Best regards,
> > >> >>>>>>> Alexei Scherbakov
> > >> >>>>>>
> > >> >>>>>>
> > >> >>>>>
> > >> >>>>> --
> > >> >>>>>
> > >> >>>>> Best regards,
> > >> >>>>> Alexei Scherbakov
> > >> >>>>>
> > >> >>>>
> > >> >>>>
> > >> >>>> --
> > >> >>>>
> > >> >>>> Best regards,
> > >> >>>> Alexei Scherbakov
> > >> >>>>
> > >> >>>
> > >> >>>
> > >> >>> --
> > >> >>>
> > >> >>> Best regards,
> > >> >>> Alexei Scherbakov
> > >> >>
> > >> >>
> > >> >
> > >> > --
> > >> >
> > >> > Best regards,
> > >> > Alexei Scherbakov
> > >>
> > >
> > >
> > > --
> > >
> > > Best regards,
> > > Alexei Scherbakov
> >

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to