Hi All I typed up a number of replies which I have below, but I have one major overriding question: Is there a reason we aren't implementing encryption-at-rest almost exactly the same way that most relational databases do? ie: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
I ask this because it seems like we're going to end up with something similar to what they did in terms of requirements, plus... "For the *past 16 months*, there has been discussion about whether and how to implement Transparent Data Encryption (tde) in Postgres. Many other relational databases support tde, and *some security standards require* it. However, it is also debatable how much security value tde provides. The tde *400-email thread* became difficult for people to follow..." What still isn't clear to me is the scope that we're trying to cover here. Encryption at rest suggests that we need to have the data encrypted on the brokers, and *only* on the brokers, since they're the durable units of storage. Any encryption over the wire should be covered by TLS. I think that our goals for this should be (from https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models) > TDE protects data from theft when file system access controls are > compromised: > > - Malicious user steals storage devices and reads database files > directly. > - Malicious backup operator takes backup. > - Protecting data at rest (persistent data) > > This does not protect from users who can read system memory, e.g., shared > buffers, which root users can do. > I am not a security expert nor am I an expert on relational databases. However, I can't identify any reason why the approach outlined by PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my understanding) wouldn't work for data-at-rest encryption. In addition, we'd get the added benefit of being consistent with other solutions, which is an easier sell when discussing security with management (Kafka? Oh yeah, their encryption solution is just like the one we already have in place for our Postgres solutions), and may let us avoid reinventing a good part of the wheel. ------------------ @Ryanne One more complicating factor, regarding joins - the foreign key joiner requires access to the value to extract the foreign key - if it's encrypted, the FKJ would need to decrypt it to apply the value extractor. @Soenk re (1) > When people hear that this is not part of Apache Kafka itself, but that > would need to develop something themselves that more often than not is the > end of that discussion. Using something that is not "stock" is quite often > simply not an option. > I strongly feel that this is a needed feature in Kafka and that there is a > large number of people out there that would want to use it - but I may very > well be mistaken, responses to this thread have not exactly been plentiful > this last year and a half.. I agree with you on the default vs. non-default points made. We must all note that this mailing list is *not *representative of the typical users of Kafka, and that many organizations are predominantly looking to use out-of-the-box solutions. This will only become more common as hosted Kafka solutions (think AWS hosted Kafka) gain more traction. I think the goal of this KIP to provide that out-of-the-box experience is extremely important, especially for all the reasons noted so far (GDPR, privacy, financials, interest by many parties but no default solution). re: (4) >> Regarding plaintext data in RocksDB instances, I am a bit torn to be >> honest. On the one hand, I feel like this scenario is not something that we >> can fully control. I agree with this in principle. I think that our responsibility to encrypt data at rest ends the moment that data leaves the broker. That being said, it isn't unreasonable. I am going to think more about this and see if I can come up with something. On Fri, May 8, 2020 at 5:05 AM Sönke Liebau <soenke.lie...@opencore.com.invalid> wrote: > Hey everybody, > > thanks a lot for reading and giving feedback!! I'll try and answer all > points that I found going through the thread in this mail, but if I miss > something please feel free to let me know! I've added a running number to > the discussed topics for ease of reference down the road. > > I'll go through the KIP and update it with everything that I have written > below after sending this mail. > > @Tom: > (1) If I understand your concerns correctly you feel that this > functionality would have a hard time getting approved into Apache Kafka > because it can be achieved with custom Serializers in the same way and that > we should maybe develop this outside of Apache Kafka at first. > I feel like it is precisely the fact that this is not part of core Apache > Kafka that makes people think twice about doing end-to-end encryption. I > may be working in a market (Germany) that is a bit special when compared to > the rest of the world where encryption and things like that are concerned, > but I've personally sat in multiple meetings where this feature was > discussed. It is not necessarily the end-to-end encryption itself, but the > at-rest encryption that you get with it. > When people hear that this is not part of Apache Kafka itself, but that > would need to develop something themselves that more often than not is the > end of that discussion. Using something that is not "stock" is quite often > simply not an option. > Even if they decide to go forward with it, they'll find Hendrik's blog post > from 4 years ago on this, probably the Whitepapers from Confluent and > Lenses and maybe a few implementations on github - all of which just serve > to further muddy the waters. Not because any of these resources are bad or > wrong, but just because information and implementations are spread out over > a lot of different places. Developing this outside of Apache Kafka would > simply serve to add one more item to this list that would not really matter > I'm afraid. > > I strongly feel that this is a needed feature in Kafka and that there is a > large number of people out there that would want to use it - but I may very > well be mistaken, responses to this thread have not exactly been plentiful > this last year and a half.. > > @Mike: > (2) Regarding the encryption of headers, my current idea is to keep this > configurable. I have seen customers use headers for stuff like account > numbers which under the GDPR are considered to be personal data that should > be encrypted wherever possible. So in some instances it might be useful to > encrypt header fields as well. > My current PoC implementation allows specifying a Regex for headers that > should be encrypted, which would allow having encrypted and unencrypted > headers in the same record to hopefully suit most use cases. > > (3) Also, my plan is to not change the message format, but to > "encrypt-in-place" and add a header field with the necessary information > for decryption, which would then be removed by the decrypting consumer. > There may be some out-of-date intentions still in the KIP, I'll go through > it and update. > > @Ryanne: > First off, I fully agree that we should avoid painting ourselves into a > corner with an early client-only implementation. I scaled down this Kip > from earlier attempts that included things like key rollover and > broker-side implementations because I could not get any feedback from the > community on those for a long time and felt that maybe there was no > appetite for the full-blown solution. So I decided to try with a more > limited scope. I am very happy to discuss/go for the fully featured version > again :) > > (4) Regarding plaintext data in RocksDB instances, I am a bit torn to be > honest. On the one hand, I feel like this scenario is not something that we > can fully control. Kafka Streams in this case is a client that takes data > from Kafka, decrypts it and then puts it somewhere in plaintext. To me this > scenario differs only slightly from for example someone writing a backup > job that reads a topic and writes it to a textfile - not much we can do > about it. > That being said, Kafka Streams is part of Apache Kafka, so does merit > special consideration. I'll have to dig into how StateStores are used a bit > (I am not the worlds largest expert - or any kind of expert on that) to try > and come up with an idea. > > > (5) On key encryption and hashing, this is definitely an issue that we need > a solution for. I currently have key encryption configurable in my > implementation. When encryption is enabled, an option would of course be to > hash the original key and store the key data together with the value in an > encrypted form. Any salt added to the key before hashing could be encrypted > along with the data. This would allow all key-based functionality like > compaction, joins etc. to keep working without having to know the cleartext > key. > > I've also considered deterministic encryption which would keep the > encrypted key the same, but I am fairly certain that we will want to allow > regular key rotation (more on this in next paragraph) without re-encrypting > older data and that would then change the encrypted key and break all these > things. > Regarding re-encrypting existing keys when a crypto key is compromised, I > think we need to be very careful with this if we do it in-place on the > broker. If we add functionality along the lines of compaction, which reads > re-encrypts and rewrites segment files we have to make sure that producers > chose partitions on the cleartext value, otherwise all records starting > from the key change may go to a different partition of the topic.. > > (6) Key rollover would be a cool feature to have. I was up until now only > thinking about supporting regular key rollover functionality that would > change keys for all records going forward tbh - mostly for complexity > reasons - I think there was actually a sentence in the original KIP to this > regard. But if you and others feel this is needed then I am happy to > discuss this. > If we implement this on the broker we could use topic compaction for > inspiration, read all segment files and check records one by one, if the > key used for that record has been "retired/compromised/..." re-encrypt with > new key and write a new segment file. Lots of things to consider around > this regarding performance, how to trigger etc. but in principle this could > work I think. > One issue I can see with this is if we use envelope encryption for the keys > to address the rogue admin issue, so the broker doesn't have access to the > actual key encrypting the data, this would make that operation impossible. > > > > I hope I got to all items that were raised, but may very well have > overlooked something, please let me know if I did - and of course your > thoughts on what I wrote! > > I'll update the KIP today as well. > > Best regards, > Sönke > > > > > On Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > Tom, good point, I've done exactly that -- hashing record keys -- but > it's > > unclear to me what should happen when the hash key must be rotated. In my > > case the (external) solution involved rainbow tables, versioned keys, and > > custom materializers that were aware of older keys for each record. > > > > In particular I had a pipeline that would re-key records and re-ingest > > them, while opportunistically overwriting records materialized with the > old > > key. > > > > For a native solution I think maybe we'd need to carry around any old > > versions of each record key, perhaps as metadata. Then brokers and > > materializers can compact records based on _any_ overlapping key, maybe? > > Not sure. > > > > Ryanne > > > > On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> wrote: > > > > > Hi Rayanne, > > > > > > You raise some good points there. > > > > > > Similarly, if the whole record is encrypted, it becomes impossible to > do > > > > joins, group bys etc, which just need the record key and maybe don't > > have > > > > access to the encryption key. Maybe only record _values_ should be > > > > encrypted, and maybe Kafka Streams could defer decryption until the > > > actual > > > > value is inspected. That way joins etc are possible without the > > > encryption > > > > key, and RocksDB would not need to decrypt values before > materializing > > to > > > > disk. > > > > > > > > > > It's getting a bit late here, so maybe I overlooked something, but > > wouldn't > > > the natural thing to do be to make the "encrypted" key a hash of the > > > original key, and let the value of the encrypted value be the cipher > text > > > of the (original key, original value) pair. A scheme like this would > > > preserve equality of the key (strictly speaking there's a chance of > > > collision of course). I guess this could also be a solution for the > > > compacted topic issue Sönke mentioned. > > > > > > Cheers, > > > > > > Tom > > > > > > > > > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com> > > wrote: > > > > > > > Thanks Sönke, this is an area in which Kafka is really, really far > > > behind. > > > > > > > > I've built secure systems around Kafka as laid out in the KIP. One > > issue > > > > that is not addressed in the KIP is re-encryption of records after a > > key > > > > rotation. When a key is compromised, it's important that any data > > > encrypted > > > > using that key is immediately destroyed or re-encrypted with a new > key. > > > > Ideally first-class support for end-to-end encryption in Kafka would > > make > > > > this possible natively, or else I'm not sure what the point would be. > > It > > > > seems to me that the brokers would need to be involved in this > process, > > > so > > > > perhaps a client-first approach will be painting ourselves into a > > corner. > > > > Not sure. > > > > > > > > Another issue is whether materialized tables, e.g. in Kafka Streams, > > > would > > > > see unencrypted or encrypted records. If we implemented the KIP as > > > written, > > > > it would still result in a bunch of plain text data in RocksDB > > > everywhere. > > > > Again, I'm not sure what the point would be. Perhaps using custom > > serdes > > > > would actually be a more holistic approach, since Kafka Streams etc > > could > > > > leverage these as well. > > > > > > > > Similarly, if the whole record is encrypted, it becomes impossible to > > do > > > > joins, group bys etc, which just need the record key and maybe don't > > have > > > > access to the encryption key. Maybe only record _values_ should be > > > > encrypted, and maybe Kafka Streams could defer decryption until the > > > actual > > > > value is inspected. That way joins etc are possible without the > > > encryption > > > > key, and RocksDB would not need to decrypt values before > materializing > > to > > > > disk. > > > > > > > > This is why I've implemented encryption on a per-field basis, not at > > the > > > > record level, when addressing kafka security in the past. And I've > had > > to > > > > build external pipelines that purge, re-encrypt, and re-ingest > records > > > when > > > > keys are compromised. > > > > > > > > This KIP might be a step in the right direction, not sure. But I'm > > > hesitant > > > > to support the idea of end-to-end encryption without a plan to > address > > > the > > > > myriad other problems. > > > > > > > > That said, we need this badly and I hope something shakes out. > > > > > > > > Ryanne > > > > > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > > > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > > > > > All, > > > > > > > > > > I've asked for comments on this KIP in the past, but since I didn't > > > > really > > > > > get any feedback I've decided to reduce the initial scope of the > KIP > > a > > > > bit > > > > > and try again. > > > > > > > > > > I have reworked to KIP to provide a limited, but useful set of > > features > > > > for > > > > > this initial KIP and laid out a very rough roadmap of what I'd > > envision > > > > > this looking like in a final version. > > > > > > > > > > I am aware that the KIP is currently light on implementation > details, > > > but > > > > > would like to get some feedback on the general approach before > fully > > > > > speccing everything. > > > > > > > > > > The KIP can be found at > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > > > > > > > > > > I would very much appreciate any feedback! > > > > > > > > > > Best regards, > > > > > Sönke > > > > > > > > > > > > > > > > > -- > Sönke Liebau > Partner > Tel. +49 179 7940878 > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany >