Ok, but this does not need to be something which is _explicitly_ sent
to it as I believe a receiving node can derive this on its own - if we
way that gen is a hash of keyspace + table + table id, for example
(which is same across the cluster for each node).

On Tue, 16 Nov 2021 at 13:55, Bowen Song <bo...@bso.ng.invalid> wrote:
>
> If the same user chosen key Km is used across all nodes in the same
> cluster, the sender will only need to share their SSTable generation GEN
> with the receiving side. This is because the receiving side will need to
> use the GEN to reproduce the KEK used in the source node. The receiving
> side will then need to unwrap Kr with the KEK and re-wrap it with a new
> KEK' derived from their own GEN. GEN is not considered as a secret.
>
>
> On 16/11/2021 12:13, Stefan Miklosovic wrote:
> > Thanks for the insights of everybody.
> >
> > I would like to return to Km. If we require that all Km's are the same
> > before streaming, is it not true that we do not need to move any
> > secrets around at all? So TLS would not be required either as only
> > encrypted tables would ever be streamed. That way Kr would never ever
> > leave the node and new Km would be rolled over first. To use correct
> > Km, we would have hash of that upon received table from the
> > recipient's perspective. This would also avoid the fairly complex
> > algorithm in the last Bowen's reply when I got that right.
> >
> > On Tue, 16 Nov 2021 at 13:02, bened...@apache.org <bened...@apache.org> 
> > wrote:
> >> We already have the facility to authenticate peers, I am suggesting we 
> >> should e.g. refuse to enable encryption if there is no such facility 
> >> configured for a replica, or fail to start if there is encrypted data 
> >> present and no authentication facility configured.
> >>
> >> It is in my opinion much more problematic to remove encryption from data 
> >> and ship it to another node in the network than it is to ship data that is 
> >> already unencrypted to another node on the network. Either is bad, but it 
> >> is probably fine to leave the unencrypted case to the cognizance of the 
> >> operator who may be happy relying on their general expectation that there 
> >> are no nefarious actors on the network. Encrypting data suggests this is 
> >> not an acceptable assumption, so I think we should make it harder for 
> >> users that require encryption to accidentally misconfigure in this way, 
> >> since they probably have higher security expectations (and compliance 
> >> requirements) than users that do not encrypt their data at rest.
> >>
> >>
> >> From: Bowen Song <bo...@bso.ng.INVALID>
> >> Date: Tuesday, 16 November 2021 at 11:56
> >> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> >> I think authenticating a receiving node is important, but it is perhaps
> >> not in the scope of this ticket (or CEP if it becomes one). This applies
> >> to not only encrypted SSTables, but also unencrypted SSTables. A
> >> malicious node can join the cluster and send bogus requests to other
> >> nodes is a general problem not specific to the on-disk encryption.
> >>
> >> On 16/11/2021 10:50, bened...@apache.org wrote:
> >>> I assume the key would be decrypted before being streamed, or perhaps 
> >>> encrypted using a public key provided to you by the receiving node. This 
> >>> would permit efficient “zero copy” streaming for the data portion, but 
> >>> not require any knowledge of the recipient node’s master key(s).
> >>>
> >>> Either way, we would still want to ensure we had some authentication of 
> >>> the recipient node before streaming the file as it would effectively be 
> >>> decrypted to any node that could request this streaming action.
> >>>
> >>>
> >>> From: Stefan Miklosovic <stefan.mikloso...@instaclustr.com>
> >>> Date: Tuesday, 16 November 2021 at 10:45
> >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org>
> >>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption
> >>> Ok but this also means that Km would need to be the same for all nodes 
> >>> right?
> >>>
> >>> If we are rolling in node by node fashion, Km is changed at node 1, we
> >>> change the wrapped key which is stored on disk and we stream this
> >>> table to the other node which is still on the old Km. Would this work?
> >>> I think we would need to rotate first before anything is streamed. Or
> >>> no?
> >>>
> >>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>> Yes, that's correct. The actual key used to encrypt the SSTable will
> >>>> stay the same once the SSTable is created. This is a widely used
> >>>> practice in many encrypt-at-rest applications. One good example is the
> >>>> LUKS full disk encryption, which also supports multiple keys to unlock
> >>>> (decrypt) the same data. Multiple unlocking keys is only possible
> >>>> because the actual key used to encrypt the data is randomly generated
> >>>> and then stored encrypted by (a key derived from) a user chosen key.
> >>>>
> >>>> If this approach is adopted, the streaming process can share the Kr
> >>>> without disclosing the Km, therefore enableling zero-copy streaming.
> >>>>
> >>>> On 16/11/2021 08:56, Stefan Miklosovic wrote:
> >>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very
> >>>>> key for the actual sstable encryption would be always the same, it is
> >>>>> just what is wrapped would differ. So if we rotate, we basically only
> >>>>> change Km hence KEK hence the result of wrapping but there would still
> >>>>> be the original Kr key used.
> >>>>>
> >>>>> Jeremiah - I will prepare that branch very soon.
> >>>>>
> >>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote:
> >>>>>>>        The second question is about key rotation. If an operator 
> >>>>>>> needs to
> >>>>>>>        roll the key because it was compromised or there is some 
> >>>>>>> policy around
> >>>>>>>        that, we should be able to provide some way to rotate it. Our 
> >>>>>>> idea is
> >>>>>>>        to write a tool (either a subcommand of nodetool 
> >>>>>>> (rewritesstables)
> >>>>>>>        command or a completely standalone one in tools) which would 
> >>>>>>> take the
> >>>>>>>        first, original key, the second, new key and dir with sstables 
> >>>>>>> as
> >>>>>>>        input and it would literally took the data and it would 
> >>>>>>> rewrite it to
> >>>>>>>        the second set of sstables which would be encrypted with the 
> >>>>>>> second
> >>>>>>>        key. What do you think about this?
> >>>>>>        I would rather suggest that “what key encrypted this” be part 
> >>>>>> of the sstable metadata, and allow there to be multiple keys in the 
> >>>>>> system.  This way you can just add a new “current key” so new sstables 
> >>>>>> use the new key, but existing sstables would use the old key.  An 
> >>>>>> operator could then trigger a “nodetool upgradesstables —all” to 
> >>>>>> rewrite the existing sstables with the new “current key”.
> >>>>>>
> >>>>>> There's a much better approach to solve this issue. You can stored a
> >>>>>> wrapped key in an encryption info file alone side the SSTable file.
> >>>>>> Here's how it works:
> >>>>>> 1. randomly generate a key Kr
> >>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted 
> >>>>>> SSTable
> >>>>>> file on disk
> >>>>>> 3. derive a key encryption key KEK from the SSTable file's information
> >>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so
> >>>>>> you have KEK = KDF(UUID+GEN, Km)
> >>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, 
> >>>>>> KEK)
> >>>>>> 5. hash the Km, the hash will used as a key ID to identify which master
> >>>>>> key was used to encrypt the key Kr if the server has multiple master
> >>>>>> keys in use
> >>>>>> 6. store the the WKr and the hash of Km in a separate file alone side
> >>>>>> the SSTable file
> >>>>>>
> >>>>>> In the read path, the Kr should be kept in memory to help improve
> >>>>>> performance and this will also allow zero-downtime master key rotation.
> >>>>>>
> >>>>>> During a key rotation:
> >>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km)
> >>>>>> 2. read the WKr from the encryption information file, and unwrap
> >>>>>> (decrypt) it using the KEK to get the Kr
> >>>>>> 3. derive a new KEK' from the new master key Km' in the same way as 
> >>>>>> above
> >>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK')
> >>>>>> 5. hash the new master key Km', and store it together with the WKr' in
> >>>>>> the encryption info file
> >>>>>>
> >>>>>> Since the key rotation only involves rewriting the encryption info 
> >>>>>> file,
> >>>>>> the operation should take only a few milliseconds per SSTable file, it
> >>>>>> will be much faster than decrypting and then re-encrypting the SSTable 
> >>>>>> data.
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote:
> >>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan 
> >>>>>>>> Miklosovic<stefan.mikloso...@instaclustr.com>  wrote:
> >>>>>>>>
> >>>>>>>> Hey,
> >>>>>>>>
> >>>>>>>> there are two points we are not completely sure about.
> >>>>>>>>
> >>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each
> >>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored
> >>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 -
> >>>>>>>> which has a different key - it would not be able to decrypt that. Our
> >>>>>>>> idea is to actually send data over the wire _decrypted_ however it
> >>>>>>>> would be still secure if internode communication is done via TLS. Is
> >>>>>>>> this approach good with you?
> >>>>>>>>
> >>>>>>> So would you fail startup if someone enabled sstable encryption but 
> >>>>>>> did not have TLS for internode communication?  Another concern here 
> >>>>>>> is making sure zero copy streaming does not get triggered for this 
> >>>>>>> case.
> >>>>>>> Have you considered having some way to distribute the keys to all 
> >>>>>>> nodes such that you don’t need to decrypt on the sending side?  
> >>>>>>> Having to do this will mean a lot more overhead for the sending side 
> >>>>>>> of a streaming operation.
> >>>>>>>
> >>>>>>>> The second question is about key rotation. If an operator needs to
> >>>>>>>> roll the key because it was compromised or there is some policy 
> >>>>>>>> around
> >>>>>>>> that, we should be able to provide some way to rotate it. Our idea is
> >>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables)
> >>>>>>>> command or a completely standalone one in tools) which would take the
> >>>>>>>> first, original key, the second, new key and dir with sstables as
> >>>>>>>> input and it would literally took the data and it would rewrite it to
> >>>>>>>> the second set of sstables which would be encrypted with the second
> >>>>>>>> key. What do you think about this?
> >>>>>>> I would rather suggest that “what key encrypted this” be part of the 
> >>>>>>> sstable metadata, and allow there to be multiple keys in the system.  
> >>>>>>> This way you can just add a new “current key” so new sstables use the 
> >>>>>>> new key, but existing sstables would use the old key.  An operator 
> >>>>>>> could then trigger a “nodetool upgradesstables —all” to rewrite the 
> >>>>>>> existing sstables with the new “current key”.
> >>>>>>>
> >>>>>>>> Regards
> >>>>>>>>
> >>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net>  wrote:
> >>>>>>>>> Same reaction here - great to have traction on this ticket. 
> >>>>>>>>> Shylaja, thanks for your work on this and to Stefan as well! It 
> >>>>>>>>> would be wonderful to have the feature complete.
> >>>>>>>>>
> >>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s 
> >>>>>>>>> testing strategy since the original patch was written. I see that 
> >>>>>>>>> the 2016 version adds a couple round-trip unit tests with a small 
> >>>>>>>>> amount of static data. It would be good to see randomized tests 
> >>>>>>>>> fleshed out that exercise more of the read/write path; or which add 
> >>>>>>>>> variants of existing read/write path tests that enable encryption.
> >>>>>>>>>
> >>>>>>>>> – Scott
> >>>>>>>>>
> >>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dri...@gmail.com>  
> >>>>>>>>>> wrote:
> >>>>>>>>>>
> >>>>>>>>>> We already have a ticket and this predated CEPs, and being an
> >>>>>>>>>> obviously good improvement to have that many have been asking for 
> >>>>>>>>>> for
> >>>>>>>>>> some time now, I don't see the need for a CEP here.
> >>>>>>>>>>
> >>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic
> >>>>>>>>>> <stefan.mikloso...@instaclustr.com>  wrote:
> >>>>>>>>>>> Hi list,
> >>>>>>>>>>>
> >>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this 
> >>>>>>>>>>> list
> >>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 
> >>>>>>>>>>> work in
> >>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, 
> >>>>>>>>>>> mostly
> >>>>>>>>>>> cosmetic.
> >>>>>>>>>>>
> >>>>>>>>>>> I would like to know if there is a general consensus about me 
> >>>>>>>>>>> going to
> >>>>>>>>>>> create a CEP for this feature or what is your perception on this. 
> >>>>>>>>>>> I
> >>>>>>>>>>> know we have it a little bit backwards here as we should first 
> >>>>>>>>>>> discuss
> >>>>>>>>>>> and then code but I am super glad that we have some POC we can
> >>>>>>>>>>> elaborate further on and CEP would just cement  and summarise the
> >>>>>>>>>>> approach / other implementation aspects of this feature.
> >>>>>>>>>>>
> >>>>>>>>>>> I think that having 9633 merged will fill quite a big operational 
> >>>>>>>>>>> gap
> >>>>>>>>>>> when it comes to security. There are a lot of enterprises who 
> >>>>>>>>>>> desire
> >>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket 
> >>>>>>>>>>> with
> >>>>>>>>>>> 50 watchers which was inactive for such a long time.
> >>>>>>>>>>>
> >>>>>>>>>>> Regards
> >>>>>>>>>>>
> >>>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>>>>>
> >>>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>>>>
> >>>>>>>>> ---------------------------------------------------------------------
> >>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>>>
> >>>>>>>> ---------------------------------------------------------------------
> >>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>>
> >>>>>>> ---------------------------------------------------------------------
> >>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org
> >>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org
> >>>>>>>
> >>>>> ---------------------------------------------------------------------
> >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>>
> >>>> ---------------------------------------------------------------------
> >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >>> For additional commands, e-mail: dev-h...@cassandra.apache.org
> >>>
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> >> For additional commands, e-mail: dev-h...@cassandra.apache.org
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> > For additional commands, e-mail: dev-h...@cassandra.apache.org
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
> For additional commands, e-mail: dev-h...@cassandra.apache.org
>

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org
For additional commands, e-mail: dev-h...@cassandra.apache.org

Reply via email to