Hello everyone, yesterday we discussed the implementation of TDE over a
conference call. I added a summary of this call here:

   1. The wiki documentation should be expanded. It should describe the
   steps - how it works under the hood. What are the domain objects in the
   implementation?
   2. We should try to run the existing test suites in encryption mode.
   Encryption should not affect any PDS or other tests.
   3. SPI requires an additional method such as getKeyDigest, because the
   current implementation of GridEncryptionManager#masterKeyDigest() looks
   strange. We reset the master key to calculate the digest. This will not
   work well if we want to use VOLT as a key provider implementation.
   4. Recommendation - the encryption processor should be divided into
   external subclasses, and we should use the OOP decomposition pattern for
   it. Right now, this class has more than 2000 lines and does not support
   SOLID. This is similar to inline unrelated logic with a single class.
   5. Recommendation - we should not use tuples and triples, because this
   is a marker of a design problem.
   6. Strict recommendation - please don't put context everywhere. it
   should only be used in the parent class. You can pass the necessary
   dependencies through the constructor, as in the DI pattern.
   7. Question -the current implementation does not use the throttling that
   is implemented in PDS. Users should set the throughput such as 5 MB per
   second, but not the timeout, packet size, or stream size.
   8. Question - why we add a lot of system properties? Why we didn’t add a
   configuration for it?
   9. Question - How do we optimize when we can check that this page is
   already encrypted by parallel loading? Maybe we should do this in Phase 4?
   10. Question - CRC is read in two places encryptionFileIO and
   filePageStore - what should we do with this?
   11. We should remember about complicated test scenarios with failover
   like node left when encryption started and joined after it finished. In the
   process, the baseline changed node left before / after / in the middle of
   this process. And etc.
   12. How to use a sandbox to protect our cluster of master and user key
   stealing via compute?
   13. Will re-encryption continue after the cluster is completely stopped?

If I forgot some points, you can add them to the message.


вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin <xxt...@gmail.com>:

> Hello, Maksim.
>
> For implementation, I chose so-called "in place background
> re-encryption" design.
>
> The first step is to rotate the key for writing data, it only works on
> the active cluster, at the moment..
> The second step is re-encryption (to remove previous encryption key).
> If node was restarted reencryption starts after metastorage becomes
> ready for read/write. Each "re-encrypted" partition (including index)
> has an attribute on the meta page that indicates whether background
> re-encryption should be continued.
>
> I updated the description in wiki [1].
> Some more details in jira [2].
> Draft PR [3].
>
> [1]
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384
> [2] https://issues.apache.org/jira/browse/IGNITE-12843
> [3] https://github.com/apache/ignite/pull/7941
>
> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev <maksim.stepac...@gmail.com>:
> >
> > Hi!
> >
> > Do you have any updates about this issue? What types of implementations
> > have you chosen (in-place, offline, or in the background)? I know that we
> > want to add a partition defragmentation function, we can add a hole to
> > integrate the re-encryption scheme. Could you update your IEP with your
> > plans?
> >
> > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin <xxt...@gmail.com>:
> >
> > > Nikolay, Alexei,
> > >
> > > thanks for your suggestions.
> > >
> > > Offline re-encryption does not seem so simple, we need to read/replace
> > > the existing encryption keys on all nodes (therefore, we should be
> > > able to read/write metastore/WAL and exchange data between the
> > > baseline nodes). Re-encryption in maintenance mode (for example, in a
> > > stable read-only cluster) will be simple, but it still looks very
> > > inconvenient, at least because users will need to interrupt all
> > > operations.
> > >
> > > The main advantage of online "in place" re-encryption is that we'll
> > > support multiple keys for reading, and this procedure does not
> > > directly depend on background re-encryption.
> > >
> > > So, the first step is similar to rotating the master key when the new
> > > key was set for writing on all nodes - that’s it, the cache group key
> > > rotation is complete (this is what PCI DSS requires - encrypt new
> > > updates with new keys).
> > > The second step is to re-encrypt the existing data, As I said
> > > previously I thought about scanning all partition pages in some
> > > background mode (store progress on the metapage to continue after
> > > restart), but rebalance approach should also work here if I figure out
> > > how to automate this process.
> > >
> > > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>:
> > > >
> > > >
> > > >
> > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov <nizhi...@apache.org>:
> > > >>
> > > >> > This willl takes us to the re-encryption using full rebalancing
> > > >>
> > > >> Rebalance will require 2x efforts for reencryption
> > > >>
> > > >> 1. Read and send data from supplier node.
> > > >> 2. Reencrypt and write data on demander node.
> > > >>
> > > >> Instead of
> > > >>
> > > >> 1. Read, reencrypt and write data on «demander» node.
> > > >
> > > >
> > > > Usually, reading and sending is not a bottleneck. And don't forget we
> > > can run out of WAL history and fall back to full rebalancing with
> partition
> > > eviction eliminating all efforts from offline re-encryption.
> > > >
> > > > On the other side, for a grid having many nodes one-by-one
> re-encryption
> > > can take a long time.
> > > > It should also be possible to re-encrypt all data as fast as possible
> > > if, for example, if a load can be switched to another grid, where
> offline
> > > encryption will come in handy.
> > > >
> > > > So, I suggest to implement offline re-encryption and online
> > > re-encryption using rebalancing as a first step.
> > > >
> > > > Next step can be online in-place re-encryption. It's important to
> > > measure business impact from it on online grid.
> > > >
> > > >>
> > > >>
> > > >>
> > > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com> написал(а):
> > > >> >
> > > >> > For me, the one big disadvantage for offline re-encryption is the
> > > >> > possibility to run out of WAL history.
> > > >> > If an re-encryption takes a long time we will get full rebalancing
> > > with
> > > >> > partition eviction.
> > > >> > This willl takes us to the re-encryption using full rebalancing,
> > > proposed
> > > >> > by me earlier.
> > > >> >
> > > >> >
> > > >> >
> > > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov <nizhi...@apache.org
> >:
> > > >> >
> > > >> >>> And definitely this approach is much simplier to implement
> > > >> >>
> > > >> >> I agree.
> > > >> >>
> > > >> >> If we allow to made nodes offline for reencryption then we can
> > > implement a
> > > >> >> fully offline procedure:
> > > >> >>
> > > >> >> 1. Stop node.
> > > >> >> 2. Execute some control.sh command that will reencrypt all data
> > > without
> > > >> >> starting node
> > > >> >> 3. Start node.
> > > >> >>
> > > >> >> Pavel, can you, please, write it one more time - what
> disadvantages
> > > in
> > > >> >> offline procedure?
> > > >> >>
> > > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov <
> > > alexey.scherbak...@gmail.com>
> > > >> >> написал(а):
> > > >> >>>
> > > >> >>> And definitely this approach is much simplier to implement
> because
> > > all
> > > >> >>> corner cases are handled by rebalancing code.
> > > >> >>>
> > > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov <
> > > >> >> alexey.scherbak...@gmail.com
> > > >> >>>> :
> > > >> >>>
> > > >> >>>> I mean: serving supply requests.
> > > >> >>>>
> > > >> >>>> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov <
> > > >> >>>> alexey.scherbak...@gmail.com>:
> > > >> >>>>
> > > >> >>>>> Nikolay,
> > > >> >>>>>
> > > >> >>>>> Can you explain why such restriction is necessary ?
> > > >> >>>>> Most likely having a currently re-encrypting node serving only
> > > demand
> > > >> >>>>> requests will have least preformance impact on a grid.
> > > >> >>>>>
> > > >> >>>>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov <
> nizhi...@apache.org
> > > >:
> > > >> >>>>>
> > > >> >>>>>> Hello, Alexei.
> > > >> >>>>>>
> > > >> >>>>>> I think we want to implement this feature without nodes
> restart.
> > > >> >>>>>> In the ideal scenario all nodes will stay alive and respond
> to
> > > the
> > > >> >> user
> > > >> >>>>>> requests.
> > > >> >>>>>>
> > > >> >>>>>>> 24 мая 2020 г., в 15:24, Alexei Scherbakov <
> > > >> >>>>>> alexey.scherbak...@gmail.com> написал(а):
> > > >> >>>>>>>
> > > >> >>>>>>> Pavel Pereslegin,
> > > >> >>>>>>>
> > > >> >>>>>>> I see another opportunity.
> > > >> >>>>>>> We can use rebalancing to re-encrypt node data with a new
> key.
> > > >> >>>>>>> It's a trivial procedure for me: stop a node, clear
> database,
> > > change
> > > >> >> a
> > > >> >>>>>> key,
> > > >> >>>>>>> start node and wait for rebalancing to complete.
> > > >> >>>>>>> Data will be re-encrypted during rebalancing.
> > > >> >>>>>>>
> > > >> >>>>>>> Did I miss something ?
> > > >> >>>>>>>
> > > >> >>>>>>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov <
> ivan.glu...@gmail.com>:
> > > >> >>>>>>>
> > > >> >>>>>>>> Folks,
> > > >> >>>>>>>>
> > > >> >>>>>>>> Just keeping you informed: I and my colleagues are highly
> > > interested
> > > >> >>>>>> in TDE
> > > >> >>>>>>>> in general and keys rotations specifically, but we don't
> have
> > > enough
> > > >> >>>>>> time
> > > >> >>>>>>>> so far.
> > > >> >>>>>>>> We'll dive into this feature and participate in reviews
> next
> > > month.
> > > >> >>>>>>>>
> > > >> >>>>>>>> --
> > > >> >>>>>>>> Best Regards,
> > > >> >>>>>>>> Ivan Rakov
> > > >> >>>>>>>>
> > > >> >>>>>>>> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin <
> > > xxt...@gmail.com
> > > >> >>>
> > > >> >>>>>>>> wrote:
> > > >> >>>>>>>>
> > > >> >>>>>>>>> Hello, Alexey.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> is the encryption key for the data the same on all nodes
> in
> > > the
> > > >> >>>>>>>> cluster?
> > > >> >>>>>>>>> Yes, each encrypted cache group has its own encryption
> key,
> > > the key
> > > >> >>>>>> is
> > > >> >>>>>>>>> the same on all nodes.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Clearly, during the re-encryption there will exist pages
> > > >> >>>>>>>>>> encrypted with both new and old keys at the same time.
> > > >> >>>>>>>>> Yes, there will be pages encrypted with different keys at
> the
> > > same
> > > >> >>>>>> time.
> > > >> >>>>>>>>> Currently, we only store one key for one cache group. To
> > > rotate a
> > > >> >>>>>> key,
> > > >> >>>>>>>>> at a certain point in time it is necessary to support
> several
> > > keys
> > > >> >>>>>> (at
> > > >> >>>>>>>>> least for reading the WAL).
> > > >> >>>>>>>>> For the "in place" strategy, we'll store the encryption
> key
> > > >> >>>>>> identifier
> > > >> >>>>>>>>> on each encrypted page (we currently have some unused
> space on
> > > >> >>>>>>>>> encrypted page, so I don't expect any memory overhead
> here).
> > > Thus,
> > > >> >> we
> > > >> >>>>>>>>> will have several keys for reading and one key for
> writing. I
> > > >> >> assume
> > > >> >>>>>>>>> that the old key will be automatically deleted when a
> > > specific WAL
> > > >> >>>>>>>>> segment is deleted (and re-encryption is finished).
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> Will a node continue to re-encrypt the data after it
> > > restarts?
> > > >> >>>>>>>>> Yes.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> If a node goes down during the re-encryption, but the
> rest
> > > of the
> > > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > > procedure
> > > >> >>>>>>>> complete?
> > > >> >>>>>>>>> I'm not sure, but it looks like the key rotation is
> complete
> > > when
> > > >> >> we
> > > >> >>>>>>>>> set the new key on all nodes so that the updates will be
> > > encrypted
> > > >> >>>>>>>>> with the new key (as required by PCI DSS).
> > > >> >>>>>>>>> Status of re-encryption can be obtained separately
> (locally or
> > > >> >>>>>> cluster
> > > >> >>>>>>>>> wide).
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> I forgot to mention that with “in place” re-encryption it
> > > will be
> > > >> >>>>>>>>> impossible to quickly cancel re-encryption, because by
> > > canceling we
> > > >> >>>>>>>>> mean re-encryption with the old key.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>>> How do you see the whole key rotation procedure will
> work?
> > > >> >>>>>>>>> Initial design for re-encryption with "partition copying"
> is
> > > >> >>>>>> described
> > > >> >>>>>>>>> here [1]. I'll prepare detailed design for "in place"
> > > re-encryption
> > > >> >>>>>> if
> > > >> >>>>>>>>> we'll go this way. In short, send the new encryption key
> > > >> >>>>>> cluster-wide,
> > > >> >>>>>>>>> each node adds a new key and starts background
> re-encryption.
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> [1]
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > >> >>>>>>>>> .
> > > >> >>>>>>>>>
> > > >> >>>>>>>>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk <
> > > >> >>>>>> alexey.goncha...@gmail.com
> > > >> >>>>>>>>> :
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> Pavel, Anton,
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> How do you see the whole key rotation procedure will
> work?
> > > >> >> Clearly,
> > > >> >>>>>>>>> during
> > > >> >>>>>>>>>> the re-encryption there will exist pages encrypted with
> both
> > > new
> > > >> >> and
> > > >> >>>>>>>> old
> > > >> >>>>>>>>>> keys at the same time. Will a node continue to re-encrypt
> > > the data
> > > >> >>>>>>>> after
> > > >> >>>>>>>>> it
> > > >> >>>>>>>>>> restarts? If a node goes down during the re-encryption,
> but
> > > the
> > > >> >>>>>> rest of
> > > >> >>>>>>>>> the
> > > >> >>>>>>>>>> cluster finishes re-encryption, will we consider the
> > > procedure
> > > >> >>>>>>>> complete?
> > > >> >>>>>>>>> By
> > > >> >>>>>>>>>> the way, is the encryption key for the data the same on
> all
> > > nodes
> > > >> >> in
> > > >> >>>>>>>> the
> > > >> >>>>>>>>>> cluster?
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov <
> a...@apache.org
> > > >:
> > > >> >>>>>>>>>>
> > > >> >>>>>>>>>>> +1 to "In place re-encryption".
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> - It has a simple design.
> > > >> >>>>>>>>>>> - Clusters under load may require just load to
> re-encrypt
> > > the
> > > >> >> data.
> > > >> >>>>>>>>>>> (Friendly to load).
> > > >> >>>>>>>>>>> - Easy to throttle.
> > > >> >>>>>>>>>>> - Easy to continue.
> > > >> >>>>>>>>>>> - Design compatible with the multi-key architecture.
> > > >> >>>>>>>>>>> - It can be optimized to use own WAL buffer and to
> > > re-encrypt
> > > >> >> pages
> > > >> >>>>>>>>> without
> > > >> >>>>>>>>>>> restoring them to on-heap.
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>> On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin <
> > > >> >> xxt...@gmail.com
> > > >> >>>>>>>
> > > >> >>>>>>>>> wrote:
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>>>> Hello Igniters.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Recently, master key rotation for Apache Ignite
> > > Transparent Data
> > > >> >>>>>>>>>>>> Encryption was implemented [1], but some security
> > > standards (PCI
> > > >> >>>>>>>> DSS
> > > >> >>>>>>>>>>>> at least) require rotation of all encryption keys [2].
> > > >> >> Currently,
> > > >> >>>>>>>>>>>> encryption occurs when reading/writing pages to disk,
> cache
> > > >> >>>>>>>>> encryption
> > > >> >>>>>>>>>>>> keys are stored in metastore.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> I'm going to contribute cache encryption key rotation
> and
> > > want
> > > >> >> to
> > > >> >>>>>>>>>>>> consult what is the best way to re-encrypting existing
> > > data, I
> > > >> >> see
> > > >> >>>>>>>>> two
> > > >> >>>>>>>>>>>> different strategies.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> 1. In place re-encryption:
> > > >> >>>>>>>>>>>> Using the old key, sequentially read all the pages
> from the
> > > >> >>>>>>>>> datastore,
> > > >> >>>>>>>>>>>> mark as dirty and log them into the WAL. After
> checkpoint
> > > pages
> > > >> >>>>>>>> will
> > > >> >>>>>>>>>>>> be stored to disk encrypted with the new key (as usual,
> > > along
> > > >> >> with
> > > >> >>>>>>>>>>>> updates). This strategy requires store the identifier
> > > (number)
> > > >> >> of
> > > >> >>>>>>>> the
> > > >> >>>>>>>>>>>> encryption key into the encrypted page.
> > > >> >>>>>>>>>>>> pros:
> > > >> >>>>>>>>>>>> - can work in the background with minimal performance
> > > impact
> > > >> >>>>>>>> (this
> > > >> >>>>>>>>>>>> impact can be managed).
> > > >> >>>>>>>>>>>> cons:
> > > >> >>>>>>>>>>>> - page duplication in the WAL may affect performance
> and
> > > >> >>>>>>>> historical
> > > >> >>>>>>>>>>>> rebalance.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> 2. Copy partition with re-encryption.
> > > >> >>>>>>>>>>>> This strategy is similar to partition snapshotting [3]
> -
> > > create
> > > >> >>>>>>>>>>>> partition copy encrypted with the new key and then
> replace
> > > the
> > > >> >>>>>>>>>>>> original partition file with the new one (see details
> [4]).
> > > >> >>>>>>>>>>>> pros:
> > > >> >>>>>>>>>>>> - should work faster than "in place" re-encryption.
> > > >> >>>>>>>>>>>> cons:
> > > >> >>>>>>>>>>>> - re-encryption in active cluster (and on unstable
> > > topology) can
> > > >> >>>>>>>> be
> > > >> >>>>>>>>>>>> difficult to implement.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> (See more detailed comparison [5])
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> Re-encryption of existing data is a long and rare
> > > procedure (It
> > > >> >> is
> > > >> >>>>>>>>>>>> recommended to change the key every 6 months, but at
> least
> > > once
> > > >> >>>>>>>> every
> > > >> >>>>>>>>>>>> 2 years). Thus, re-encryption can be implemented for
> > > maintenance
> > > >> >>>>>>>> mode
> > > >> >>>>>>>>>>>> (for example, on a stable topology in a read-only
> cluster)
> > > and
> > > >> >> in
> > > >> >>>>>>>>> such
> > > >> >>>>>>>>>>>> case the approach with partition copying seems simpler
> and
> > > >> >> faster.
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> So, what do you think - do we need "online"
> re-encryption
> > > and
> > > >> >>>>>> which
> > > >> >>>>>>>>> of
> > > >> >>>>>>>>>>>> the proposed options is best suited for this?
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>> [1] https://issues.apache.org/jira/browse/IGNITE-12186
> > > >> >>>>>>>>>>>> [2]
> > > >> >>>>>>>>>
> > > https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf
> > > >> >>>>>>>>>>>> [3]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy
> > > >> >>>>>>>>>>>> [4]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign
> > > >> >>>>>>>>>>>> .
> > > >> >>>>>>>>>>>> [5]
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>
> > > >> >>
> > >
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison
> > > >> >>>>>>>>>>>>
> > > >> >>>>>>>>>>>
> > > >> >>>>>>>>>
> > > >> >>>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>>
> > > >> >>>>>>> --
> > > >> >>>>>>>
> > > >> >>>>>>> Best regards,
> > > >> >>>>>>> Alexei Scherbakov
> > > >> >>>>>>
> > > >> >>>>>>
> > > >> >>>>>
> > > >> >>>>> --
> > > >> >>>>>
> > > >> >>>>> Best regards,
> > > >> >>>>> Alexei Scherbakov
> > > >> >>>>>
> > > >> >>>>
> > > >> >>>>
> > > >> >>>> --
> > > >> >>>>
> > > >> >>>> Best regards,
> > > >> >>>> Alexei Scherbakov
> > > >> >>>>
> > > >> >>>
> > > >> >>>
> > > >> >>> --
> > > >> >>>
> > > >> >>> Best regards,
> > > >> >>> Alexei Scherbakov
> > > >> >>
> > > >> >>
> > > >> >
> > > >> > --
> > > >> >
> > > >> > Best regards,
> > > >> > Alexei Scherbakov
> > > >>
> > > >
> > > >
> > > > --
> > > >
> > > > Best regards,
> > > > Alexei Scherbakov
> > >
>

Reply via email to