Ok, but this does not need to be something which is _explicitly_ sent to it as I believe a receiving node can derive this on its own - if we way that gen is a hash of keyspace + table + table id, for example (which is same across the cluster for each node).
On Tue, 16 Nov 2021 at 13:55, Bowen Song <bo...@bso.ng.invalid> wrote: > > If the same user chosen key Km is used across all nodes in the same > cluster, the sender will only need to share their SSTable generation GEN > with the receiving side. This is because the receiving side will need to > use the GEN to reproduce the KEK used in the source node. The receiving > side will then need to unwrap Kr with the KEK and re-wrap it with a new > KEK' derived from their own GEN. GEN is not considered as a secret. > > > On 16/11/2021 12:13, Stefan Miklosovic wrote: > > Thanks for the insights of everybody. > > > > I would like to return to Km. If we require that all Km's are the same > > before streaming, is it not true that we do not need to move any > > secrets around at all? So TLS would not be required either as only > > encrypted tables would ever be streamed. That way Kr would never ever > > leave the node and new Km would be rolled over first. To use correct > > Km, we would have hash of that upon received table from the > > recipient's perspective. This would also avoid the fairly complex > > algorithm in the last Bowen's reply when I got that right. > > > > On Tue, 16 Nov 2021 at 13:02, bened...@apache.org <bened...@apache.org> > > wrote: > >> We already have the facility to authenticate peers, I am suggesting we > >> should e.g. refuse to enable encryption if there is no such facility > >> configured for a replica, or fail to start if there is encrypted data > >> present and no authentication facility configured. > >> > >> It is in my opinion much more problematic to remove encryption from data > >> and ship it to another node in the network than it is to ship data that is > >> already unencrypted to another node on the network. Either is bad, but it > >> is probably fine to leave the unencrypted case to the cognizance of the > >> operator who may be happy relying on their general expectation that there > >> are no nefarious actors on the network. Encrypting data suggests this is > >> not an acceptable assumption, so I think we should make it harder for > >> users that require encryption to accidentally misconfigure in this way, > >> since they probably have higher security expectations (and compliance > >> requirements) than users that do not encrypt their data at rest. > >> > >> > >> From: Bowen Song <bo...@bso.ng.INVALID> > >> Date: Tuesday, 16 November 2021 at 11:56 > >> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > >> I think authenticating a receiving node is important, but it is perhaps > >> not in the scope of this ticket (or CEP if it becomes one). This applies > >> to not only encrypted SSTables, but also unencrypted SSTables. A > >> malicious node can join the cluster and send bogus requests to other > >> nodes is a general problem not specific to the on-disk encryption. > >> > >> On 16/11/2021 10:50, bened...@apache.org wrote: > >>> I assume the key would be decrypted before being streamed, or perhaps > >>> encrypted using a public key provided to you by the receiving node. This > >>> would permit efficient “zero copy” streaming for the data portion, but > >>> not require any knowledge of the recipient node’s master key(s). > >>> > >>> Either way, we would still want to ensure we had some authentication of > >>> the recipient node before streaming the file as it would effectively be > >>> decrypted to any node that could request this streaming action. > >>> > >>> > >>> From: Stefan Miklosovic <stefan.mikloso...@instaclustr.com> > >>> Date: Tuesday, 16 November 2021 at 10:45 > >>> To: dev@cassandra.apache.org <dev@cassandra.apache.org> > >>> Subject: Re: Resurrection of CASSANDRA-9633 - SSTable encryption > >>> Ok but this also means that Km would need to be the same for all nodes > >>> right? > >>> > >>> If we are rolling in node by node fashion, Km is changed at node 1, we > >>> change the wrapped key which is stored on disk and we stream this > >>> table to the other node which is still on the old Km. Would this work? > >>> I think we would need to rotate first before anything is streamed. Or > >>> no? > >>> > >>> On Tue, 16 Nov 2021 at 11:17, Bowen Song <bo...@bso.ng.invalid> wrote: > >>>> Yes, that's correct. The actual key used to encrypt the SSTable will > >>>> stay the same once the SSTable is created. This is a widely used > >>>> practice in many encrypt-at-rest applications. One good example is the > >>>> LUKS full disk encryption, which also supports multiple keys to unlock > >>>> (decrypt) the same data. Multiple unlocking keys is only possible > >>>> because the actual key used to encrypt the data is randomly generated > >>>> and then stored encrypted by (a key derived from) a user chosen key. > >>>> > >>>> If this approach is adopted, the streaming process can share the Kr > >>>> without disclosing the Km, therefore enableling zero-copy streaming. > >>>> > >>>> On 16/11/2021 08:56, Stefan Miklosovic wrote: > >>>>> Hi Bowen, Very interesting idea indeed. So if I got it right, the very > >>>>> key for the actual sstable encryption would be always the same, it is > >>>>> just what is wrapped would differ. So if we rotate, we basically only > >>>>> change Km hence KEK hence the result of wrapping but there would still > >>>>> be the original Kr key used. > >>>>> > >>>>> Jeremiah - I will prepare that branch very soon. > >>>>> > >>>>> On Tue, 16 Nov 2021 at 01:09, Bowen Song <bo...@bso.ng.invalid> wrote: > >>>>>>> The second question is about key rotation. If an operator > >>>>>>> needs to > >>>>>>> roll the key because it was compromised or there is some > >>>>>>> policy around > >>>>>>> that, we should be able to provide some way to rotate it. Our > >>>>>>> idea is > >>>>>>> to write a tool (either a subcommand of nodetool > >>>>>>> (rewritesstables) > >>>>>>> command or a completely standalone one in tools) which would > >>>>>>> take the > >>>>>>> first, original key, the second, new key and dir with sstables > >>>>>>> as > >>>>>>> input and it would literally took the data and it would > >>>>>>> rewrite it to > >>>>>>> the second set of sstables which would be encrypted with the > >>>>>>> second > >>>>>>> key. What do you think about this? > >>>>>> I would rather suggest that “what key encrypted this” be part > >>>>>> of the sstable metadata, and allow there to be multiple keys in the > >>>>>> system. This way you can just add a new “current key” so new sstables > >>>>>> use the new key, but existing sstables would use the old key. An > >>>>>> operator could then trigger a “nodetool upgradesstables —all” to > >>>>>> rewrite the existing sstables with the new “current key”. > >>>>>> > >>>>>> There's a much better approach to solve this issue. You can stored a > >>>>>> wrapped key in an encryption info file alone side the SSTable file. > >>>>>> Here's how it works: > >>>>>> 1. randomly generate a key Kr > >>>>>> 2. encrypt the SSTable file with the key Kr, store the encrypted > >>>>>> SSTable > >>>>>> file on disk > >>>>>> 3. derive a key encryption key KEK from the SSTable file's information > >>>>>> (e.g.: table UUID + generation) and the user chosen master key Km, so > >>>>>> you have KEK = KDF(UUID+GEN, Km) > >>>>>> 4. wrap (encrypt) the key Kr with the KEK, so you have WKr = KW(Kr, > >>>>>> KEK) > >>>>>> 5. hash the Km, the hash will used as a key ID to identify which master > >>>>>> key was used to encrypt the key Kr if the server has multiple master > >>>>>> keys in use > >>>>>> 6. store the the WKr and the hash of Km in a separate file alone side > >>>>>> the SSTable file > >>>>>> > >>>>>> In the read path, the Kr should be kept in memory to help improve > >>>>>> performance and this will also allow zero-downtime master key rotation. > >>>>>> > >>>>>> During a key rotation: > >>>>>> 1. derive the KEK in the same way: KEK = KDF(UUID+GEN, Km) > >>>>>> 2. read the WKr from the encryption information file, and unwrap > >>>>>> (decrypt) it using the KEK to get the Kr > >>>>>> 3. derive a new KEK' from the new master key Km' in the same way as > >>>>>> above > >>>>>> 4. wrap (encrypt) the key Kr with KEK' to get WKr' = KW(Kr, KEK') > >>>>>> 5. hash the new master key Km', and store it together with the WKr' in > >>>>>> the encryption info file > >>>>>> > >>>>>> Since the key rotation only involves rewriting the encryption info > >>>>>> file, > >>>>>> the operation should take only a few milliseconds per SSTable file, it > >>>>>> will be much faster than decrypting and then re-encrypting the SSTable > >>>>>> data. > >>>>>> > >>>>>> > >>>>>> > >>>>>> On 15/11/2021 18:42, Jeremiah D Jordan wrote: > >>>>>>>> On Nov 14, 2021, at 3:53 PM, Stefan > >>>>>>>> Miklosovic<stefan.mikloso...@instaclustr.com> wrote: > >>>>>>>> > >>>>>>>> Hey, > >>>>>>>> > >>>>>>>> there are two points we are not completely sure about. > >>>>>>>> > >>>>>>>> The first one is streaming. If there is a cluster of 5 nodes, each > >>>>>>>> node has its own unique encryption key. Hence, if a SSTable is stored > >>>>>>>> on a disk with the key for node 1 and this is streamed to node 2 - > >>>>>>>> which has a different key - it would not be able to decrypt that. Our > >>>>>>>> idea is to actually send data over the wire _decrypted_ however it > >>>>>>>> would be still secure if internode communication is done via TLS. Is > >>>>>>>> this approach good with you? > >>>>>>>> > >>>>>>> So would you fail startup if someone enabled sstable encryption but > >>>>>>> did not have TLS for internode communication? Another concern here > >>>>>>> is making sure zero copy streaming does not get triggered for this > >>>>>>> case. > >>>>>>> Have you considered having some way to distribute the keys to all > >>>>>>> nodes such that you don’t need to decrypt on the sending side? > >>>>>>> Having to do this will mean a lot more overhead for the sending side > >>>>>>> of a streaming operation. > >>>>>>> > >>>>>>>> The second question is about key rotation. If an operator needs to > >>>>>>>> roll the key because it was compromised or there is some policy > >>>>>>>> around > >>>>>>>> that, we should be able to provide some way to rotate it. Our idea is > >>>>>>>> to write a tool (either a subcommand of nodetool (rewritesstables) > >>>>>>>> command or a completely standalone one in tools) which would take the > >>>>>>>> first, original key, the second, new key and dir with sstables as > >>>>>>>> input and it would literally took the data and it would rewrite it to > >>>>>>>> the second set of sstables which would be encrypted with the second > >>>>>>>> key. What do you think about this? > >>>>>>> I would rather suggest that “what key encrypted this” be part of the > >>>>>>> sstable metadata, and allow there to be multiple keys in the system. > >>>>>>> This way you can just add a new “current key” so new sstables use the > >>>>>>> new key, but existing sstables would use the old key. An operator > >>>>>>> could then trigger a “nodetool upgradesstables —all” to rewrite the > >>>>>>> existing sstables with the new “current key”. > >>>>>>> > >>>>>>>> Regards > >>>>>>>> > >>>>>>>> On Sat, 13 Nov 2021 at 19:35,<sc...@paradoxica.net> wrote: > >>>>>>>>> Same reaction here - great to have traction on this ticket. > >>>>>>>>> Shylaja, thanks for your work on this and to Stefan as well! It > >>>>>>>>> would be wonderful to have the feature complete. > >>>>>>>>> > >>>>>>>>> One thing I’d mention is that a lot’s changed about the project’s > >>>>>>>>> testing strategy since the original patch was written. I see that > >>>>>>>>> the 2016 version adds a couple round-trip unit tests with a small > >>>>>>>>> amount of static data. It would be good to see randomized tests > >>>>>>>>> fleshed out that exercise more of the read/write path; or which add > >>>>>>>>> variants of existing read/write path tests that enable encryption. > >>>>>>>>> > >>>>>>>>> – Scott > >>>>>>>>> > >>>>>>>>>> On Nov 13, 2021, at 7:53 AM, Brandon Williams<dri...@gmail.com> > >>>>>>>>>> wrote: > >>>>>>>>>> > >>>>>>>>>> We already have a ticket and this predated CEPs, and being an > >>>>>>>>>> obviously good improvement to have that many have been asking for > >>>>>>>>>> for > >>>>>>>>>> some time now, I don't see the need for a CEP here. > >>>>>>>>>> > >>>>>>>>>> On Sat, Nov 13, 2021 at 5:01 AM Stefan Miklosovic > >>>>>>>>>> <stefan.mikloso...@instaclustr.com> wrote: > >>>>>>>>>>> Hi list, > >>>>>>>>>>> > >>>>>>>>>>> an engineer from Intel - Shylaja Kokoori (who is watching this > >>>>>>>>>>> list > >>>>>>>>>>> closely) has retrofitted the original code from CASSANDRA-9633 > >>>>>>>>>>> work in > >>>>>>>>>>> times of 3.4 to the current trunk with my help here and there, > >>>>>>>>>>> mostly > >>>>>>>>>>> cosmetic. > >>>>>>>>>>> > >>>>>>>>>>> I would like to know if there is a general consensus about me > >>>>>>>>>>> going to > >>>>>>>>>>> create a CEP for this feature or what is your perception on this. > >>>>>>>>>>> I > >>>>>>>>>>> know we have it a little bit backwards here as we should first > >>>>>>>>>>> discuss > >>>>>>>>>>> and then code but I am super glad that we have some POC we can > >>>>>>>>>>> elaborate further on and CEP would just cement and summarise the > >>>>>>>>>>> approach / other implementation aspects of this feature. > >>>>>>>>>>> > >>>>>>>>>>> I think that having 9633 merged will fill quite a big operational > >>>>>>>>>>> gap > >>>>>>>>>>> when it comes to security. There are a lot of enterprises who > >>>>>>>>>>> desire > >>>>>>>>>>> this feature so much. I can not remember when I last saw a ticket > >>>>>>>>>>> with > >>>>>>>>>>> 50 watchers which was inactive for such a long time. > >>>>>>>>>>> > >>>>>>>>>>> Regards > >>>>>>>>>>> > >>>>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>>>>>> > >>>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>>>>> > >>>>>>>>> --------------------------------------------------------------------- > >>>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>>>> > >>>>>>>> --------------------------------------------------------------------- > >>>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>>> > >>>>>>> --------------------------------------------------------------------- > >>>>>>> To unsubscribe, e-mail:dev-unsubscr...@cassandra.apache.org > >>>>>>> For additional commands, e-mail:dev-h...@cassandra.apache.org > >>>>>>> > >>>>> --------------------------------------------------------------------- > >>>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>>> > >>>> --------------------------------------------------------------------- > >>>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>>> > >>> --------------------------------------------------------------------- > >>> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >>> For additional commands, e-mail: dev-h...@cassandra.apache.org > >>> > >> --------------------------------------------------------------------- > >> To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > >> For additional commands, e-mail: dev-h...@cassandra.apache.org > > --------------------------------------------------------------------- > > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > > For additional commands, e-mail: dev-h...@cassandra.apache.org > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org > For additional commands, e-mail: dev-h...@cassandra.apache.org > --------------------------------------------------------------------- To unsubscribe, e-mail: dev-unsubscr...@cassandra.apache.org For additional commands, e-mail: dev-h...@cassandra.apache.org