Nikolay, Alexei, thanks for your suggestions.
Offline re-encryption does not seem so simple, we need to read/replace the existing encryption keys on all nodes (therefore, we should be able to read/write metastore/WAL and exchange data between the baseline nodes). Re-encryption in maintenance mode (for example, in a stable read-only cluster) will be simple, but it still looks very inconvenient, at least because users will need to interrupt all operations. The main advantage of online "in place" re-encryption is that we'll support multiple keys for reading, and this procedure does not directly depend on background re-encryption. So, the first step is similar to rotating the master key when the new key was set for writing on all nodes - that’s it, the cache group key rotation is complete (this is what PCI DSS requires - encrypt new updates with new keys). The second step is to re-encrypt the existing data, As I said previously I thought about scanning all partition pages in some background mode (store progress on the metapage to continue after restart), but rebalance approach should also work here if I figure out how to automate this process. пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <alexey.scherbak...@gmail.com>: > > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>: >> >> > This willl takes us to the re-encryption using full rebalancing >> >> Rebalance will require 2x efforts for reencryption >> >> 1. Read and send data from supplier node. >> 2. Reencrypt and write data on demander node. >> >> Instead of >> >> 1. Read, reencrypt and write data on «demander» node. > > > Usually, reading and sending is not a bottleneck. And don't forget we can run > out of WAL history and fall back to full rebalancing with partition eviction > eliminating all efforts from offline re-encryption. > > On the other side, for a grid having many nodes one-by-one re-encryption can > take a long time. > It should also be possible to re-encrypt all data as fast as possible if, for > example, if a load can be switched to another grid, where offline encryption > will come in handy. > > So, I suggest to implement offline re-encryption and online re-encryption > using rebalancing as a first step. > > Next step can be online in-place re-encryption. It's important to measure > business impact from it on online grid. > >> >> >> >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <alexey.scherbak...@gmail.com> >> > написал(а): >> > >> > For me, the one big disadvantage for offline re-encryption is the >> > possibility to run out of WAL history. >> > If an re-encryption takes a long time we will get full rebalancing with >> > partition eviction. >> > This willl takes us to the re-encryption using full rebalancing, proposed >> > by me earlier. >> > >> > >> > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org>: >> > >> >>> And definitely this approach is much simplier to implement >> >> >> >> I agree. >> >> >> >> If we allow to made nodes offline for reencryption then we can implement a >> >> fully offline procedure: >> >> >> >> 1. Stop node. >> >> 2. Execute some control.sh command that will reencrypt all data without >> >> starting node >> >> 3. Start node. >> >> >> >> Pavel, can you, please, write it one more time - what disadvantages in >> >> offline procedure? >> >> >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <alexey.scherbak...@gmail.com> >> >> написал(а): >> >>> >> >>> And definitely this approach is much simplier to implement because all >> >>> corner cases are handled by rebalancing code. >> >>> >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < >> >> alexey.scherbak...@gmail.com >> >>>> : >> >>> >> >>>> I mean: serving supply requests. >> >>>> >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < >> >>>> alexey.scherbak...@gmail.com>: >> >>>> >> >>>>> Nikolay, >> >>>>> >> >>>>> Can you explain why such restriction is necessary ? >> >>>>> Most likely having a currently re-encrypting node serving only demand >> >>>>> requests will have least preformance impact on a grid. >> >>>>> >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <nizhi...@apache.org>: >> >>>>> >> >>>>>> Hello, Alexei. >> >>>>>> >> >>>>>> I think we want to implement this feature without nodes restart. >> >>>>>> In the ideal scenario all nodes will stay alive and respond to the >> >> user >> >>>>>> requests. >> >>>>>> >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < >> >>>>>> alexey.scherbak...@gmail.com> написал(а): >> >>>>>>> >> >>>>>>> Pavel Pereslegin, >> >>>>>>> >> >>>>>>> I see another opportunity. >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key. >> >>>>>>> It's a trivial procedure for me: stop a node, clear database, change >> >> a >> >>>>>> key, >> >>>>>>> start node and wait for rebalancing to complete. >> >>>>>>> Data will be re-encrypted during rebalancing. >> >>>>>>> >> >>>>>>> Did I miss something ? >> >>>>>>> >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <ivan.glu...@gmail.com>: >> >>>>>>> >> >>>>>>>> Folks, >> >>>>>>>> >> >>>>>>>> Just keeping you informed: I and my colleagues are highly interested >> >>>>>> in TDE >> >>>>>>>> in general and keys rotations specifically, but we don't have enough >> >>>>>> time >> >>>>>>>> so far. >> >>>>>>>> We'll dive into this feature and participate in reviews next month. >> >>>>>>>> >> >>>>>>>> -- >> >>>>>>>> Best Regards, >> >>>>>>>> Ivan Rakov >> >>>>>>>> >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <xxt...@gmail.com >> >>> >> >>>>>>>> wrote: >> >>>>>>>> >> >>>>>>>>> Hello, Alexey. >> >>>>>>>>> >> >>>>>>>>>> is the encryption key for the data the same on all nodes in the >> >>>>>>>> cluster? >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, the key >> >>>>>> is >> >>>>>>>>> the same on all nodes. >> >>>>>>>>> >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages >> >>>>>>>>>> encrypted with both new and old keys at the same time. >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the same >> >>>>>> time. >> >>>>>>>>> Currently, we only store one key for one cache group. To rotate a >> >>>>>> key, >> >>>>>>>>> at a certain point in time it is necessary to support several keys >> >>>>>> (at >> >>>>>>>>> least for reading the WAL). >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key >> >>>>>> identifier >> >>>>>>>>> on each encrypted page (we currently have some unused space on >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here). Thus, >> >> we >> >>>>>>>>> will have several keys for reading and one key for writing. I >> >> assume >> >>>>>>>>> that the old key will be automatically deleted when a specific WAL >> >>>>>>>>> segment is deleted (and re-encryption is finished). >> >>>>>>>>> >> >>>>>>>>>> Will a node continue to re-encrypt the data after it restarts? >> >>>>>>>>> Yes. >> >>>>>>>>> >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest of the >> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure >> >>>>>>>> complete? >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete when >> >> we >> >>>>>>>>> set the new key on all nodes so that the updates will be encrypted >> >>>>>>>>> with the new key (as required by PCI DSS). >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or >> >>>>>> cluster >> >>>>>>>>> wide). >> >>>>>>>>> >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it will be >> >>>>>>>>> impossible to quickly cancel re-encryption, because by canceling we >> >>>>>>>>> mean re-encryption with the old key. >> >>>>>>>>> >> >>>>>>>>>> How do you see the whole key rotation procedure will work? >> >>>>>>>>> Initial design for re-encryption with "partition copying" is >> >>>>>> described >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place" re-encryption >> >>>>>> if >> >>>>>>>>> we'll go this way. In short, send the new encryption key >> >>>>>> cluster-wide, >> >>>>>>>>> each node adds a new key and starts background re-encryption. >> >>>>>>>>> >> >>>>>>>>> [1] >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >> >>>>>>>>> . >> >>>>>>>>> >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < >> >>>>>> alexey.goncha...@gmail.com >> >>>>>>>>> : >> >>>>>>>>>> >> >>>>>>>>>> Pavel, Anton, >> >>>>>>>>>> >> >>>>>>>>>> How do you see the whole key rotation procedure will work? >> >> Clearly, >> >>>>>>>>> during >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both new >> >> and >> >>>>>>>> old >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt the data >> >>>>>>>> after >> >>>>>>>>> it >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but the >> >>>>>> rest of >> >>>>>>>>> the >> >>>>>>>>>> cluster finishes re-encryption, will we consider the procedure >> >>>>>>>> complete? >> >>>>>>>>> By >> >>>>>>>>>> the way, is the encryption key for the data the same on all nodes >> >> in >> >>>>>>>> the >> >>>>>>>>>> cluster? >> >>>>>>>>>> >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <a...@apache.org>: >> >>>>>>>>>> >> >>>>>>>>>>> +1 to "In place re-encryption". >> >>>>>>>>>>> >> >>>>>>>>>>> - It has a simple design. >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt the >> >> data. >> >>>>>>>>>>> (Friendly to load). >> >>>>>>>>>>> - Easy to throttle. >> >>>>>>>>>>> - Easy to continue. >> >>>>>>>>>>> - Design compatible with the multi-key architecture. >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to re-encrypt >> >> pages >> >>>>>>>>> without >> >>>>>>>>>>> restoring them to on-heap. >> >>>>>>>>>>> >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin < >> >> xxt...@gmail.com >> >>>>>>> >> >>>>>>>>> wrote: >> >>>>>>>>>>> >> >>>>>>>>>>>> Hello Igniters. >> >>>>>>>>>>>> >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite Transparent Data >> >>>>>>>>>>>> Encryption was implemented [1], but some security standards (PCI >> >>>>>>>> DSS >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2]. >> >> Currently, >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache >> >>>>>>>>> encryption >> >>>>>>>>>>>> keys are stored in metastore. >> >>>>>>>>>>>> >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and want >> >> to >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing data, I >> >> see >> >>>>>>>>> two >> >>>>>>>>>>>> different strategies. >> >>>>>>>>>>>> >> >>>>>>>>>>>> 1. In place re-encryption: >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the >> >>>>>>>>> datastore, >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint pages >> >>>>>>>> will >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, along >> >> with >> >>>>>>>>>>>> updates). This strategy requires store the identifier (number) >> >> of >> >>>>>>>> the >> >>>>>>>>>>>> encryption key into the encrypted page. >> >>>>>>>>>>>> pros: >> >>>>>>>>>>>> - can work in the background with minimal performance impact >> >>>>>>>> (this >> >>>>>>>>>>>> impact can be managed). >> >>>>>>>>>>>> cons: >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and >> >>>>>>>> historical >> >>>>>>>>>>>> rebalance. >> >>>>>>>>>>>> >> >>>>>>>>>>>> 2. Copy partition with re-encryption. >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] - create >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace the >> >>>>>>>>>>>> original partition file with the new one (see details [4]). >> >>>>>>>>>>>> pros: >> >>>>>>>>>>>> - should work faster than "in place" re-encryption. >> >>>>>>>>>>>> cons: >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable topology) can >> >>>>>>>> be >> >>>>>>>>>>>> difficult to implement. >> >>>>>>>>>>>> >> >>>>>>>>>>>> (See more detailed comparison [5]) >> >>>>>>>>>>>> >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare procedure (It >> >> is >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least once >> >>>>>>>> every >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for maintenance >> >>>>>>>> mode >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) and >> >> in >> >>>>>>>>> such >> >>>>>>>>>>>> case the approach with partition copying seems simpler and >> >> faster. >> >>>>>>>>>>>> >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption and >> >>>>>> which >> >>>>>>>>> of >> >>>>>>>>>>>> the proposed options is best suited for this? >> >>>>>>>>>>>> >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186 >> >>>>>>>>>>>> [2] >> >>>>>>>>> https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf >> >>>>>>>>>>>> [3] >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy >> >>>>>>>>>>>> [4] >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >> >>>>>>>>>>>> . >> >>>>>>>>>>>> [5] >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison >> >>>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>> >> >>>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> -- >> >>>>>>> >> >>>>>>> Best regards, >> >>>>>>> Alexei Scherbakov >> >>>>>> >> >>>>>> >> >>>>> >> >>>>> -- >> >>>>> >> >>>>> Best regards, >> >>>>> Alexei Scherbakov >> >>>>> >> >>>> >> >>>> >> >>>> -- >> >>>> >> >>>> Best regards, >> >>>> Alexei Scherbakov >> >>>> >> >>> >> >>> >> >>> -- >> >>> >> >>> Best regards, >> >>> Alexei Scherbakov >> >> >> >> >> > >> > -- >> > >> > Best regards, >> > Alexei Scherbakov >> > > > -- > > Best regards, > Alexei Scherbakov