Hi! Do you have any updates about this issue? What types of implementations have you chosen (in-place, offline, or in the background)? I know that we want to add a partition defragmentation function, we can add a hole to integrate the re-encryption scheme. Could you update your IEP with your plans?
пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <xxt...@gmail.com>: > Nikolay, Alexei, > > thanks for your suggestions. > > Offline re-encryption does not seem so simple, we need to read/replace > the existing encryption keys on all nodes (therefore, we should be > able to read/write metastore/WAL and exchange data between the > baseline nodes). Re-encryption in maintenance mode (for example, in a > stable read-only cluster) will be simple, but it still looks very > inconvenient, at least because users will need to interrupt all > operations. > > The main advantage of online "in place" re-encryption is that we'll > support multiple keys for reading, and this procedure does not > directly depend on background re-encryption. > > So, the first step is similar to rotating the master key when the new > key was set for writing on all nodes - that’s it, the cache group key > rotation is complete (this is what PCI DSS requires - encrypt new > updates with new keys). > The second step is to re-encrypt the existing data, As I said > previously I thought about scanning all partition pages in some > background mode (store progress on the metapage to continue after > restart), but rebalance approach should also work here if I figure out > how to automate this process. > > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov < > alexey.scherbak...@gmail.com>: > > > > > > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>: > >> > >> > This willl takes us to the re-encryption using full rebalancing > >> > >> Rebalance will require 2x efforts for reencryption > >> > >> 1. Read and send data from supplier node. > >> 2. Reencrypt and write data on demander node. > >> > >> Instead of > >> > >> 1. Read, reencrypt and write data on «demander» node. > > > > > > Usually, reading and sending is not a bottleneck. And don't forget we > can run out of WAL history and fall back to full rebalancing with partition > eviction eliminating all efforts from offline re-encryption. > > > > On the other side, for a grid having many nodes one-by-one re-encryption > can take a long time. > > It should also be possible to re-encrypt all data as fast as possible > if, for example, if a load can be switched to another grid, where offline > encryption will come in handy. > > > > So, I suggest to implement offline re-encryption and online > re-encryption using rebalancing as a first step. > > > > Next step can be online in-place re-encryption. It's important to > measure business impact from it on online grid. > > > >> > >> > >> > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov < > alexey.scherbak...@gmail.com> написал(а): > >> > > >> > For me, the one big disadvantage for offline re-encryption is the > >> > possibility to run out of WAL history. > >> > If an re-encryption takes a long time we will get full rebalancing > with > >> > partition eviction. > >> > This willl takes us to the re-encryption using full rebalancing, > proposed > >> > by me earlier. > >> > > >> > > >> > > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org>: > >> > > >> >>> And definitely this approach is much simplier to implement > >> >> > >> >> I agree. > >> >> > >> >> If we allow to made nodes offline for reencryption then we can > implement a > >> >> fully offline procedure: > >> >> > >> >> 1. Stop node. > >> >> 2. Execute some control.sh command that will reencrypt all data > without > >> >> starting node > >> >> 3. Start node. > >> >> > >> >> Pavel, can you, please, write it one more time - what disadvantages > in > >> >> offline procedure? > >> >> > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov < > alexey.scherbak...@gmail.com> > >> >> написал(а): > >> >>> > >> >>> And definitely this approach is much simplier to implement because > all > >> >>> corner cases are handled by rebalancing code. > >> >>> > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < > >> >> alexey.scherbak...@gmail.com > >> >>>> : > >> >>> > >> >>>> I mean: serving supply requests. > >> >>>> > >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < > >> >>>> alexey.scherbak...@gmail.com>: > >> >>>> > >> >>>>> Nikolay, > >> >>>>> > >> >>>>> Can you explain why such restriction is necessary ? > >> >>>>> Most likely having a currently re-encrypting node serving only > demand > >> >>>>> requests will have least preformance impact on a grid. > >> >>>>> > >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <nizhi...@apache.org > >: > >> >>>>> > >> >>>>>> Hello, Alexei. > >> >>>>>> > >> >>>>>> I think we want to implement this feature without nodes restart. > >> >>>>>> In the ideal scenario all nodes will stay alive and respond to > the > >> >> user > >> >>>>>> requests. > >> >>>>>> > >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < > >> >>>>>> alexey.scherbak...@gmail.com> написал(а): > >> >>>>>>> > >> >>>>>>> Pavel Pereslegin, > >> >>>>>>> > >> >>>>>>> I see another opportunity. > >> >>>>>>> We can use rebalancing to re-encrypt node data with a new key. > >> >>>>>>> It's a trivial procedure for me: stop a node, clear database, > change > >> >> a > >> >>>>>> key, > >> >>>>>>> start node and wait for rebalancing to complete. > >> >>>>>>> Data will be re-encrypted during rebalancing. > >> >>>>>>> > >> >>>>>>> Did I miss something ? > >> >>>>>>> > >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <ivan.glu...@gmail.com>: > >> >>>>>>> > >> >>>>>>>> Folks, > >> >>>>>>>> > >> >>>>>>>> Just keeping you informed: I and my colleagues are highly > interested > >> >>>>>> in TDE > >> >>>>>>>> in general and keys rotations specifically, but we don't have > enough > >> >>>>>> time > >> >>>>>>>> so far. > >> >>>>>>>> We'll dive into this feature and participate in reviews next > month. > >> >>>>>>>> > >> >>>>>>>> -- > >> >>>>>>>> Best Regards, > >> >>>>>>>> Ivan Rakov > >> >>>>>>>> > >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin < > xxt...@gmail.com > >> >>> > >> >>>>>>>> wrote: > >> >>>>>>>> > >> >>>>>>>>> Hello, Alexey. > >> >>>>>>>>> > >> >>>>>>>>>> is the encryption key for the data the same on all nodes in > the > >> >>>>>>>> cluster? > >> >>>>>>>>> Yes, each encrypted cache group has its own encryption key, > the key > >> >>>>>> is > >> >>>>>>>>> the same on all nodes. > >> >>>>>>>>> > >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages > >> >>>>>>>>>> encrypted with both new and old keys at the same time. > >> >>>>>>>>> Yes, there will be pages encrypted with different keys at the > same > >> >>>>>> time. > >> >>>>>>>>> Currently, we only store one key for one cache group. To > rotate a > >> >>>>>> key, > >> >>>>>>>>> at a certain point in time it is necessary to support several > keys > >> >>>>>> (at > >> >>>>>>>>> least for reading the WAL). > >> >>>>>>>>> For the "in place" strategy, we'll store the encryption key > >> >>>>>> identifier > >> >>>>>>>>> on each encrypted page (we currently have some unused space on > >> >>>>>>>>> encrypted page, so I don't expect any memory overhead here). > Thus, > >> >> we > >> >>>>>>>>> will have several keys for reading and one key for writing. I > >> >> assume > >> >>>>>>>>> that the old key will be automatically deleted when a > specific WAL > >> >>>>>>>>> segment is deleted (and re-encryption is finished). > >> >>>>>>>>> > >> >>>>>>>>>> Will a node continue to re-encrypt the data after it > restarts? > >> >>>>>>>>> Yes. > >> >>>>>>>>> > >> >>>>>>>>>> If a node goes down during the re-encryption, but the rest > of the > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the > procedure > >> >>>>>>>> complete? > >> >>>>>>>>> I'm not sure, but it looks like the key rotation is complete > when > >> >> we > >> >>>>>>>>> set the new key on all nodes so that the updates will be > encrypted > >> >>>>>>>>> with the new key (as required by PCI DSS). > >> >>>>>>>>> Status of re-encryption can be obtained separately (locally or > >> >>>>>> cluster > >> >>>>>>>>> wide). > >> >>>>>>>>> > >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it > will be > >> >>>>>>>>> impossible to quickly cancel re-encryption, because by > canceling we > >> >>>>>>>>> mean re-encryption with the old key. > >> >>>>>>>>> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work? > >> >>>>>>>>> Initial design for re-encryption with "partition copying" is > >> >>>>>> described > >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place" > re-encryption > >> >>>>>> if > >> >>>>>>>>> we'll go this way. In short, send the new encryption key > >> >>>>>> cluster-wide, > >> >>>>>>>>> each node adds a new key and starts background re-encryption. > >> >>>>>>>>> > >> >>>>>>>>> [1] > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>> > >> >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > >> >>>>>>>>> . > >> >>>>>>>>> > >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < > >> >>>>>> alexey.goncha...@gmail.com > >> >>>>>>>>> : > >> >>>>>>>>>> > >> >>>>>>>>>> Pavel, Anton, > >> >>>>>>>>>> > >> >>>>>>>>>> How do you see the whole key rotation procedure will work? > >> >> Clearly, > >> >>>>>>>>> during > >> >>>>>>>>>> the re-encryption there will exist pages encrypted with both > new > >> >> and > >> >>>>>>>> old > >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt > the data > >> >>>>>>>> after > >> >>>>>>>>> it > >> >>>>>>>>>> restarts? If a node goes down during the re-encryption, but > the > >> >>>>>> rest of > >> >>>>>>>>> the > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the > procedure > >> >>>>>>>> complete? > >> >>>>>>>>> By > >> >>>>>>>>>> the way, is the encryption key for the data the same on all > nodes > >> >> in > >> >>>>>>>> the > >> >>>>>>>>>> cluster? > >> >>>>>>>>>> > >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <a...@apache.org > >: > >> >>>>>>>>>> > >> >>>>>>>>>>> +1 to "In place re-encryption". > >> >>>>>>>>>>> > >> >>>>>>>>>>> - It has a simple design. > >> >>>>>>>>>>> - Clusters under load may require just load to re-encrypt > the > >> >> data. > >> >>>>>>>>>>> (Friendly to load). > >> >>>>>>>>>>> - Easy to throttle. > >> >>>>>>>>>>> - Easy to continue. > >> >>>>>>>>>>> - Design compatible with the multi-key architecture. > >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to > re-encrypt > >> >> pages > >> >>>>>>>>> without > >> >>>>>>>>>>> restoring them to on-heap. > >> >>>>>>>>>>> > >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin < > >> >> xxt...@gmail.com > >> >>>>>>> > >> >>>>>>>>> wrote: > >> >>>>>>>>>>> > >> >>>>>>>>>>>> Hello Igniters. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite > Transparent Data > >> >>>>>>>>>>>> Encryption was implemented [1], but some security > standards (PCI > >> >>>>>>>> DSS > >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2]. > >> >> Currently, > >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk, cache > >> >>>>>>>>> encryption > >> >>>>>>>>>>>> keys are stored in metastore. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation and > want > >> >> to > >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing > data, I > >> >> see > >> >>>>>>>>> two > >> >>>>>>>>>>>> different strategies. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> 1. In place re-encryption: > >> >>>>>>>>>>>> Using the old key, sequentially read all the pages from the > >> >>>>>>>>> datastore, > >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After checkpoint > pages > >> >>>>>>>> will > >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual, > along > >> >> with > >> >>>>>>>>>>>> updates). This strategy requires store the identifier > (number) > >> >> of > >> >>>>>>>> the > >> >>>>>>>>>>>> encryption key into the encrypted page. > >> >>>>>>>>>>>> pros: > >> >>>>>>>>>>>> - can work in the background with minimal performance > impact > >> >>>>>>>> (this > >> >>>>>>>>>>>> impact can be managed). > >> >>>>>>>>>>>> cons: > >> >>>>>>>>>>>> - page duplication in the WAL may affect performance and > >> >>>>>>>> historical > >> >>>>>>>>>>>> rebalance. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> 2. Copy partition with re-encryption. > >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3] - > create > >> >>>>>>>>>>>> partition copy encrypted with the new key and then replace > the > >> >>>>>>>>>>>> original partition file with the new one (see details [4]). > >> >>>>>>>>>>>> pros: > >> >>>>>>>>>>>> - should work faster than "in place" re-encryption. > >> >>>>>>>>>>>> cons: > >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable > topology) can > >> >>>>>>>> be > >> >>>>>>>>>>>> difficult to implement. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> (See more detailed comparison [5]) > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare > procedure (It > >> >> is > >> >>>>>>>>>>>> recommended to change the key every 6 months, but at least > once > >> >>>>>>>> every > >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for > maintenance > >> >>>>>>>> mode > >> >>>>>>>>>>>> (for example, on a stable topology in a read-only cluster) > and > >> >> in > >> >>>>>>>>> such > >> >>>>>>>>>>>> case the approach with partition copying seems simpler and > >> >> faster. > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> So, what do you think - do we need "online" re-encryption > and > >> >>>>>> which > >> >>>>>>>>> of > >> >>>>>>>>>>>> the proposed options is best suited for this? > >> >>>>>>>>>>>> > >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186 > >> >>>>>>>>>>>> [2] > >> >>>>>>>>> > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf > >> >>>>>>>>>>>> [3] > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>> > >> >> > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy > >> >>>>>>>>>>>> [4] > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>> > >> >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > >> >>>>>>>>>>>> . > >> >>>>>>>>>>>> [5] > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>> > >> >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison > >> >>>>>>>>>>>> > >> >>>>>>>>>>> > >> >>>>>>>>> > >> >>>>>>>> > >> >>>>>>> > >> >>>>>>> > >> >>>>>>> -- > >> >>>>>>> > >> >>>>>>> Best regards, > >> >>>>>>> Alexei Scherbakov > >> >>>>>> > >> >>>>>> > >> >>>>> > >> >>>>> -- > >> >>>>> > >> >>>>> Best regards, > >> >>>>> Alexei Scherbakov > >> >>>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> > >> >>>> Best regards, > >> >>>> Alexei Scherbakov > >> >>>> > >> >>> > >> >>> > >> >>> -- > >> >>> > >> >>> Best regards, > >> >>> Alexei Scherbakov > >> >> > >> >> > >> > > >> > -- > >> > > >> > Best regards, > >> > Alexei Scherbakov > >> > > > > > > -- > > > > Best regards, > > Alexei Scherbakov >