Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Nikolay Izhikov Mon, 25 May 2020 01:27:31 -0700

> And definitely this approach is much simplier to implement

I agree.


If we allow to made nodes offline for reencryption then we can implement a 
fully offline procedure:

1. Stop node.
2. Execute some control.sh command that will reencrypt all data without 
starting node
3. Start node.

Pavel, can you, please, write it one more time - what disadvantages in offline 
procedure?

> 25 мая 2020 г., в 11:20, Alexei Scherbakov <[email protected]> 
> написал(а):
> 
> And definitely this approach is much simplier to implement because all
> corner cases are handled by rebalancing code.
> 
> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <[email protected]
>> :
> 
>> I mean: serving supply requests.
>> 
>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
>> [email protected]>:
>> 
>>> Nikolay,
>>> 
>>> Can you explain why such restriction is necessary ?
>>> Most likely having a currently re-encrypting node serving only demand
>>> requests will have least preformance impact on a grid.
>>> 
>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <[email protected]>:
>>> 
>>>> Hello, Alexei.
>>>> 
>>>> I think we want to implement this feature without nodes restart.
>>>> In the ideal scenario all nodes will stay alive and respond to the user
>>>> requests.
>>>> 
>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
>>>> [email protected]> написал(а):
>>>>> 
>>>>> Pavel Pereslegin,
>>>>> 
>>>>> I see another opportunity.
>>>>> We can use rebalancing to re-encrypt node data with a new key.
>>>>> It's a trivial procedure for me: stop a node, clear database, change a
>>>> key,
>>>>> start node and wait for rebalancing to complete.
>>>>> Data will be re-encrypted during rebalancing.
>>>>> 
>>>>> Did I miss something ?
>>>>> 
>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <[email protected]>:
>>>>> 
>>>>>> Folks,
>>>>>> 
>>>>>> Just keeping you informed: I and my colleagues are highly interested
>>>> in TDE
>>>>>> in general and keys rotations specifically, but we don't have enough
>>>> time
>>>>>> so far.
>>>>>> We'll dive into this feature and participate in reviews next month.
>>>>>> 
>>>>>> --
>>>>>> Best Regards,
>>>>>> Ivan Rakov
>>>>>> 
>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <[email protected]>
>>>>>> wrote:
>>>>>> 
>>>>>>> Hello, Alexey.
>>>>>>> 
>>>>>>>> is the encryption key for the data the same on all nodes in the
>>>>>> cluster?
>>>>>>> Yes, each encrypted cache group has its own encryption key, the key
>>>> is
>>>>>>> the same on all nodes.
>>>>>>> 
>>>>>>>> Clearly, during the re-encryption there will exist pages
>>>>>>>> encrypted with both new and old keys at the same time.
>>>>>>> Yes, there will be pages encrypted with different keys at the same
>>>> time.
>>>>>>> Currently, we only store one key for one cache group. To rotate a
>>>> key,
>>>>>>> at a certain point in time it is necessary to support several keys
>>>> (at
>>>>>>> least for reading the WAL).
>>>>>>> For the "in place" strategy, we'll store the encryption key
>>>> identifier
>>>>>>> on each encrypted page (we currently have some unused space on
>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus, we
>>>>>>> will have several keys for reading and one key for writing. I assume
>>>>>>> that the old key will be automatically deleted when a specific WAL
>>>>>>> segment is deleted (and re-encryption is finished).
>>>>>>> 
>>>>>>>> Will a node continue to re-encrypt the data after it restarts?
>>>>>>> Yes.
>>>>>>> 
>>>>>>>> If a node goes down during the re-encryption, but the rest of the
>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>> complete?
>>>>>>> I'm not sure, but it looks like the key rotation is complete when we
>>>>>>> set the new key on all nodes so that the updates will be encrypted
>>>>>>> with the new key (as required by PCI DSS).
>>>>>>> Status of re-encryption can be obtained separately (locally or
>>>> cluster
>>>>>>> wide).
>>>>>>> 
>>>>>>> I forgot to mention that with “in place” re-encryption it will be
>>>>>>> impossible to quickly cancel re-encryption, because by canceling we
>>>>>>> mean re-encryption with the old key.
>>>>>>> 
>>>>>>>> How do you see the whole key rotation procedure will work?
>>>>>>> Initial design for re-encryption with "partition copying" is
>>>> described
>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption
>>>> if
>>>>>>> we'll go this way. In short, send the new encryption key
>>>> cluster-wide,
>>>>>>> each node adds a new key and starts background re-encryption.
>>>>>>> 
>>>>>>> [1]
>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>> .
>>>>>>> 
>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
>>>> [email protected]
>>>>>>> :
>>>>>>>> 
>>>>>>>> Pavel, Anton,
>>>>>>>> 
>>>>>>>> How do you see the whole key rotation procedure will work? Clearly,
>>>>>>> during
>>>>>>>> the re-encryption there will exist pages encrypted with both new and
>>>>>> old
>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data
>>>>>> after
>>>>>>> it
>>>>>>>> restarts? If a node goes down during the re-encryption, but the
>>>> rest of
>>>>>>> the
>>>>>>>> cluster finishes re-encryption, will we consider the procedure
>>>>>> complete?
>>>>>>> By
>>>>>>>> the way, is the encryption key for the data the same on all nodes in
>>>>>> the
>>>>>>>> cluster?
>>>>>>>> 
>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <[email protected]>:
>>>>>>>> 
>>>>>>>>> +1 to "In place re-encryption".
>>>>>>>>> 
>>>>>>>>> - It has a simple design.
>>>>>>>>> - Clusters under load may require just load to re-encrypt the data.
>>>>>>>>> (Friendly to load).
>>>>>>>>> - Easy to throttle.
>>>>>>>>> - Easy to continue.
>>>>>>>>> - Design compatible with the multi-key architecture.
>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt pages
>>>>>>> without
>>>>>>>>> restoring them to on-heap.
>>>>>>>>> 
>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <[email protected]
>>>>> 
>>>>>>> wrote:
>>>>>>>>> 
>>>>>>>>>> Hello Igniters.
>>>>>>>>>> 
>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data
>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI
>>>>>> DSS
>>>>>>>>>> at least) require rotation of all encryption keys [2]. Currently,
>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache
>>>>>>> encryption
>>>>>>>>>> keys are stored in metastore.
>>>>>>>>>> 
>>>>>>>>>> I'm going to contribute cache encryption key rotation and want to
>>>>>>>>>> consult what is the best way to re-encrypting existing data, I see
>>>>>>> two
>>>>>>>>>> different strategies.
>>>>>>>>>> 
>>>>>>>>>> 1. In place re-encryption:
>>>>>>>>>> Using the old key, sequentially read all the pages from the
>>>>>>> datastore,
>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages
>>>>>> will
>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along with
>>>>>>>>>> updates). This strategy requires store the identifier (number) of
>>>>>> the
>>>>>>>>>> encryption key into the encrypted page.
>>>>>>>>>> pros:
>>>>>>>>>> - can work in the background with minimal performance impact
>>>>>> (this
>>>>>>>>>> impact can be managed).
>>>>>>>>>> cons:
>>>>>>>>>> - page duplication in the WAL may affect performance and
>>>>>> historical
>>>>>>>>>> rebalance.
>>>>>>>>>> 
>>>>>>>>>> 2. Copy partition with re-encryption.
>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create
>>>>>>>>>> partition copy encrypted with the new key and then replace the
>>>>>>>>>> original partition file with the new one (see details [4]).
>>>>>>>>>> pros:
>>>>>>>>>> - should work faster than "in place" re-encryption.
>>>>>>>>>> cons:
>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can
>>>>>> be
>>>>>>>>>> difficult to implement.
>>>>>>>>>> 
>>>>>>>>>> (See more detailed comparison [5])
>>>>>>>>>> 
>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It is
>>>>>>>>>> recommended to change the key every 6 months, but at least once
>>>>>> every
>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance
>>>>>> mode
>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and in
>>>>>>> such
>>>>>>>>>> case the approach with partition copying seems simpler and faster.
>>>>>>>>>> 
>>>>>>>>>> So, what do you think - do we need "online" re-encryption and
>>>> which
>>>>>>> of
>>>>>>>>>> the proposed options is best suited for this?
>>>>>>>>>> 
>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
>>>>>>>>>> [2]
>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
>>>>>>>>>> [3]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
>>>>>>>>>> [4]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
>>>>>>>>>> .
>>>>>>>>>> [5]
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>>> 
>>>>> --
>>>>> 
>>>>> Best regards,
>>>>> Alexei Scherbakov
>>>> 
>>>> 
>>> 
>>> --
>>> 
>>> Best regards,
>>> Alexei Scherbakov
>>> 
>> 
>> 
>> --
>> 
>> Best regards,
>> Alexei Scherbakov
>> 
> 
> 
> -- 
> 
> Best regards,
> Alexei Scherbakov

Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).

Reply via email to