Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Pavel Pereslegin Mon, 25 May 2020 02:50:48 -0700

Nikolay, Alexei,

thanks for your suggestions.


Offline re-encryption does not seem so simple, we need to read/replace
the existing encryption keys on all nodes (therefore, we should be
able to read/write metastore/WAL and exchange data between the
baseline nodes). Re-encryption in maintenance mode (for example, in a
stable read-only cluster) will be simple, but it still looks very
inconvenient, at least because users will need to interrupt all
operations.

The main advantage of online "in place" re-encryption is that we'll
support multiple keys for reading, and this procedure does not
directly depend on background re-encryption.

So, the first step is similar to rotating the master key when the new
key was set for writing on all nodes - that’s it, the cache group key
rotation is complete (this is what PCI DSS requires - encrypt new
updates with new keys).
The second step is to re-encrypt the existing data, As I said
previously I thought about scanning all partition pages in some
background mode (store progress on the metapage to continue after
restart), but rebalance approach should also work here if I figure out
how to automate this process.

пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <alexey.scherbak...@gmail.com>:
>
>
>
> пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>:
>>
>> > This willl takes us to the re-encryption using full rebalancing
>>
>> Rebalance will require 2x efforts for reencryption
>>
>> 1. Read and send data from supplier node.
>> 2. Reencrypt and write data on demander node.
>>
>> Instead of
>>
>> 1. Read, reencrypt and write data on «demander» node.
>
>
> Usually, reading and sending is not a bottleneck. And don't forget we can run 
> out of WAL history and fall back to full rebalancing with partition eviction 
> eliminating all efforts from offline re-encryption.
>
> On the other side, for a grid having many nodes one-by-one re-encryption can 
> take a long time.
> It should also be possible to re-encrypt all data as fast as possible if, for 
> example, if a load can be switched to another grid, where offline encryption 
> will come in handy.
>
> So, I suggest to implement offline re-encryption and online re-encryption 
> using rebalancing as a first step.
>
> Next step can be online in-place re-encryption. It's important to measure 
> business impact from it on online grid.
>
>>
>>
>>
>> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <alexey.scherbak...@gmail.com> 
>> > написал(а):
>> >
>> > For me, the one big disadvantage for offline re-encryption is the
>> > possibility to run out of WAL history.
>> > If an re-encryption takes a long time we will get full rebalancing with
>> > partition eviction.
>> > This willl takes us to the re-encryption using full rebalancing, proposed
>> > by me earlier.
>> >
>> >
>> >
>> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org>:
>> >
>> >>> And definitely this approach is much simplier to implement
>> >>
>> >> I agree.
>> >>
>> >> If we allow to made nodes offline for reencryption then we can implement a
>> >> fully offline procedure:
>> >>
>> >> 1. Stop node.
>> >> 2. Execute some control.sh command that will reencrypt all data without
>> >> starting node
>> >> 3. Start node.
>> >>
>> >> Pavel, can you, please, write it one more time - what disadvantages in
>> >> offline procedure?
>> >>
>> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <alexey.scherbak...@gmail.com>
>> >> написал(а):
>> >>>
>> >>> And definitely this approach is much simplier to implement because all
>> >>> corner cases are handled by rebalancing code.
>> >>>
>> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
>> >> alexey.scherbak...@gmail.com
>> >>>> :
>> >>>
>> >>>> I mean: serving supply requests.
>> >>>>
>> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>> >>>> alexey.scherbak...@gmail.com>:
>> >>>>
>> >>>>> Nikolay,
>> >>>>>
>> >>>>> Can you explain why such restriction is necessary ?
>> >>>>> Most likely having a currently re-encrypting node serving only demand
>> >>>>> requests will have least preformance impact on a grid.
>> >>>>>
>> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <nizhi...@apache.org>:
>> >>>>>
>> >>>>>> Hello, Alexei.
>> >>>>>>
>> >>>>>> I think we want to implement this feature without nodes restart.
>> >>>>>> In the ideal scenario all nodes will stay alive and respond to the
>> >> user
>> >>>>>> requests.
>> >>>>>>
>> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>> >>>>>> alexey.scherbak...@gmail.com> написал(а):
>> >>>>>>>
>> >>>>>>> Pavel Pereslegin,
>> >>>>>>>
>> >>>>>>> I see another opportunity.
>> >>>>>>> We can use rebalancing to re-encrypt node data with a new key.
>> >>>>>>> It's a trivial procedure for me: stop a node, clear database, change
>> >> a
>> >>>>>> key,
>> >>>>>>> start node and wait for rebalancing to complete.
>> >>>>>>> Data will be re-encrypted during rebalancing.
>> >>>>>>>
>> >>>>>>> Did I miss something ?
>> >>>>>>>
>> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <ivan.glu...@gmail.com>:
>> >>>>>>>
>> >>>>>>>> Folks,
>> >>>>>>>>
>> >>>>>>>> Just keeping you informed: I and my colleagues are highly interested
>> >>>>>> in TDE
>> >>>>>>>> in general and keys rotations specifically, but we don't have enough
>> >>>>>> time
>> >>>>>>>> so far.
>> >>>>>>>> We'll dive into this feature and participate in reviews next month.
>> >>>>>>>>
>> >>>>>>>> --
>> >>>>>>>> Best Regards,
>> >>>>>>>> Ivan Rakov
>> >>>>>>>>
>> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <xxt...@gmail.com
>> >>>
>> >>>>>>>> wrote:
>> >>>>>>>>
>> >>>>>>>>> Hello, Alexey.
>> >>>>>>>>>
>> >>>>>>>>>> is the encryption key for the data the same on all nodes in the
>> >>>>>>>> cluster?
>> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, the key
>> >>>>>> is
>> >>>>>>>>> the same on all nodes.
>> >>>>>>>>>
>> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
>> >>>>>>>>>> encrypted with both new and old keys at the same time.
>> >>>>>>>>> Yes, there will be pages encrypted with different keys at the same
>> >>>>>> time.
>> >>>>>>>>> Currently, we only store one key for one cache group. To rotate a
>> >>>>>> key,
>> >>>>>>>>> at a certain point in time it is necessary to support several keys
>> >>>>>> (at
>> >>>>>>>>> least for reading the WAL).
>> >>>>>>>>> For the "in place" strategy, we'll store the encryption key
>> >>>>>> identifier
>> >>>>>>>>> on each encrypted page (we currently have some unused space on
>> >>>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus,
>> >> we
>> >>>>>>>>> will have several keys for reading and one key for writing. I
>> >> assume
>> >>>>>>>>> that the old key will be automatically deleted when a specific WAL
>> >>>>>>>>> segment is deleted (and re-encryption is finished).
>> >>>>>>>>>
>> >>>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>> >>>>>>>>> Yes.
>> >>>>>>>>>
>> >>>>>>>>>> If a node goes down during the re-encryption, but the rest of the
>> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>> >>>>>>>> complete?
>> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete when
>> >> we
>> >>>>>>>>> set the new key on all nodes so that the updates will be encrypted
>> >>>>>>>>> with the new key (as required by PCI DSS).
>> >>>>>>>>> Status of re-encryption can be obtained separately (locally or
>> >>>>>> cluster
>> >>>>>>>>> wide).
>> >>>>>>>>>
>> >>>>>>>>> I forgot to mention that with “in place” re-encryption it will be
>> >>>>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>> >>>>>>>>> mean re-encryption with the old key.
>> >>>>>>>>>
>> >>>>>>>>>> How do you see the whole key rotation procedure will work?
>> >>>>>>>>> Initial design for re-encryption with "partition copying" is
>> >>>>>> described
>> >>>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
>> >>>>>> if
>> >>>>>>>>> we'll go this way. In short, send the new encryption key
>> >>>>>> cluster-wide,
>> >>>>>>>>> each node adds a new key and starts background re-encryption.
>> >>>>>>>>>
>> >>>>>>>>> [1]
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>>>>>>>> .
>> >>>>>>>>>
>> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>> >>>>>> alexey.goncha...@gmail.com
>> >>>>>>>>> :
>> >>>>>>>>>>
>> >>>>>>>>>> Pavel, Anton,
>> >>>>>>>>>>
>> >>>>>>>>>> How do you see the whole key rotation procedure will work?
>> >> Clearly,
>> >>>>>>>>> during
>> >>>>>>>>>> the re-encryption there will exist pages encrypted with both new
>> >> and
>> >>>>>>>> old
>> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>> >>>>>>>> after
>> >>>>>>>>> it
>> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but the
>> >>>>>> rest of
>> >>>>>>>>> the
>> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>> >>>>>>>> complete?
>> >>>>>>>>> By
>> >>>>>>>>>> the way, is the encryption key for the data the same on all nodes
>> >> in
>> >>>>>>>> the
>> >>>>>>>>>> cluster?
>> >>>>>>>>>>
>> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <a...@apache.org>:
>> >>>>>>>>>>
>> >>>>>>>>>>> +1 to "In place re-encryption".
>> >>>>>>>>>>>
>> >>>>>>>>>>> - It has a simple design.
>> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt the
>> >> data.
>> >>>>>>>>>>> (Friendly to load).
>> >>>>>>>>>>> - Easy to throttle.
>> >>>>>>>>>>> - Easy to continue.
>> >>>>>>>>>>> - Design compatible with the multi-key architecture.
>> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt
>> >> pages
>> >>>>>>>>> without
>> >>>>>>>>>>> restoring them to on-heap.
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
>> >> xxt...@gmail.com
>> >>>>>>>
>> >>>>>>>>> wrote:
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Hello Igniters.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>> >>>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>> >>>>>>>> DSS
>> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
>> >> Currently,
>> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>> >>>>>>>>> encryption
>> >>>>>>>>>>>> keys are stored in metastore.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want
>> >> to
>> >>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I
>> >> see
>> >>>>>>>>> two
>> >>>>>>>>>>>> different strategies.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 1. In place re-encryption:
>> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the
>> >>>>>>>>> datastore,
>> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>> >>>>>>>> will
>> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along
>> >> with
>> >>>>>>>>>>>> updates). This strategy requires store the identifier (number)
>> >> of
>> >>>>>>>> the
>> >>>>>>>>>>>> encryption key into the encrypted page.
>> >>>>>>>>>>>> pros:
>> >>>>>>>>>>>> - can work in the background with minimal performance impact
>> >>>>>>>> (this
>> >>>>>>>>>>>> impact can be managed).
>> >>>>>>>>>>>> cons:
>> >>>>>>>>>>>> - page duplication in the WAL may affect performance and
>> >>>>>>>> historical
>> >>>>>>>>>>>> rebalance.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> 2. Copy partition with re-encryption.
>> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>> >>>>>>>>>>>> partition copy encrypted with the new key and then replace the
>> >>>>>>>>>>>> original partition file with the new one (see details [4]).
>> >>>>>>>>>>>> pros:
>> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
>> >>>>>>>>>>>> cons:
>> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>> >>>>>>>> be
>> >>>>>>>>>>>> difficult to implement.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> (See more detailed comparison [5])
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It
>> >> is
>> >>>>>>>>>>>> recommended to change the key every 6 months, but at least once
>> >>>>>>>> every
>> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>> >>>>>>>> mode
>> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and
>> >> in
>> >>>>>>>>> such
>> >>>>>>>>>>>> case the approach with partition copying seems simpler and
>> >> faster.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
>> >>>>>> which
>> >>>>>>>>> of
>> >>>>>>>>>>>> the proposed options is best suited for this?
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>> >>>>>>>>>>>> [2]
>> >>>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>> >>>>>>>>>>>> [3]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>> >>>>>>>>>>>> [4]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>> >>>>>>>>>>>> .
>> >>>>>>>>>>>> [5]
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>
>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>> >>>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>> --
>> >>>>>>>
>> >>>>>>> Best regards,
>> >>>>>>> Alexei Scherbakov
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>> --
>> >>>>>
>> >>>>> Best regards,
>> >>>>> Alexei Scherbakov
>> >>>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>>
>> >>>> Best regards,
>> >>>> Alexei Scherbakov
>> >>>>
>> >>>
>> >>>
>> >>> --
>> >>>
>> >>> Best regards,
>> >>> Alexei Scherbakov
>> >>
>> >>
>> >
>> > --
>> >
>> > Best regards,
>> > Alexei Scherbakov
>>
>
>
> --
>
> Best regards,
> Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to