@Ryanne > Seems that could still get us per-topic keys (vs encrypting the entire > volume), which would be my main requirement.
Agreed, I think that per-topic separation of keys would be very valuable for multi-tenancy. My 2 cents is that if encryption at rest is sufficient to satisfy GDPR + other similar data protection measures, then we should aim to do that first. The demand is real and privacy laws wont likely be loosening any time soon. That being said, I am not sufficiently familiar with the myriad of data laws. I will look into it some more though, as I am now curious. On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada <maulin.vasav...@gmail.com> wrote: > Hi Sonke > > Thanks for bringing this for discussion. There are lot of considerations > even if we assume we have end-to-end encryption done. Example depending > upon company's setup there could be restrictions on how/which encryption > keys are shared. Environment could have multiple security and network > boundaries beyond which keys are not allowed to be shared. That will mean > that consumers may not be able to decrypt the messages at all if the data > is moved from one zone to another. If we have mirroring done, are > mirror-makers supposed to decrypt and encrypt again OR they would be pretty > much bytes-in bytes-out paradigm that it is today? Also having a polyglot > Kafka client base will force you to support encryption/decryption libraries > that work for all the languages and that may not work depending upon the > scope of the team owning Kafka Infrastructure. > > Combining disk encryption with TLS+ACLs could be enough instead of having > end-to-end message level encryption. What is your opinion on that? > > We have experimented with end-to-end encryption with custom > serializers/deserializers and I felt that was good enough because > other challenges I mentioned before may not be ease to address with a > generic solution. > > Thanks > Maulin > > > > Thanks > Maulin > > > > > On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan <ryannedo...@gmail.com> wrote: > > > Adam, I agree, seems reasonable to limit the broker's responsibility to > > encrypting only data at rest. I guess whole segment files could be > > encrypted with the same key, and rotating keys would just involve > > re-encrypting entire segments. Maybe a key rotation would involve closing > > all affected segments and kicking off a background task to re-encrypt > them. > > Certainly that would not impede ingestion of new records, and seems > > consumers could use the old segments until they are replaced with the > newly > > encrypted ones. > > > > Seems that could still get us per-topic keys (vs encrypting the entire > > volume), which would be my main requirement. > > > > Not really "end-to-end", but combined with TLS or something, seems > > reasonable. > > > > Ryanne > > > > On Sat, May 9, 2020, 11:00 AM Adam Bellemare <adam.bellem...@gmail.com> > > wrote: > > > > > Hi All > > > > > > I typed up a number of replies which I have below, but I have one major > > > overriding question: Is there a reason we aren't implementing > > > encryption-at-rest almost exactly the same way that most relational > > > databases do? ie: > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption > > > > > > I ask this because it seems like we're going to end up with something > > > similar to what they did in terms of requirements, plus... > > > > > > "For the *past 16 months*, there has been discussion about whether and > > how > > > to implement Transparent Data Encryption (tde) in Postgres. Many other > > > relational databases support tde, and *some security standards require* > > it. > > > However, it is also debatable how much security value tde provides. > > > The tde *400-email > > > thread* became difficult for people to follow..." > > > What still isn't clear to me is the scope that we're trying to cover > > here. > > > Encryption at rest suggests that we need to have the data encrypted on > > the > > > brokers, and *only* on the brokers, since they're the durable units of > > > storage. Any encryption over the wire should be covered by TLS. I > think > > > that our goals for this should be (from > > > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models > > > ) > > > > > > > TDE protects data from theft when file system access controls are > > > > compromised: > > > > > > > > - Malicious user steals storage devices and reads database files > > > > directly. > > > > - Malicious backup operator takes backup. > > > > - Protecting data at rest (persistent data) > > > > > > > > This does not protect from users who can read system memory, e.g., > > shared > > > > buffers, which root users can do. > > > > > > > > > > I am not a security expert nor am I an expert on relational databases. > > > However, I can't identify any reason why the approach outlined by > > > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my > > > understanding) wouldn't work for data-at-rest encryption. In addition, > > we'd > > > get the added benefit of being consistent with other solutions, which > is > > an > > > easier sell when discussing security with management (Kafka? Oh yeah, > > their > > > encryption solution is just like the one we already have in place for > our > > > Postgres solutions), and may let us avoid reinventing a good part of > the > > > wheel. > > > > > > > > > ------------------ > > > > > > @Ryanne > > > One more complicating factor, regarding joins - the foreign key joiner > > > requires access to the value to extract the foreign key - if it's > > > encrypted, the FKJ would need to decrypt it to apply the value > extractor. > > > > > > @Soenk re (1) > > > > When people hear that this is not part of Apache Kafka itself, but > that > > > > would need to develop something themselves that more often than not > is > > > the > > > > end of that discussion. Using something that is not "stock" is quite > > > often > > > > simply not an option. > > > > > > > I strongly feel that this is a needed feature in Kafka and that there > > is > > > a > > > > large number of people out there that would want to use it - but I > may > > > very > > > > well be mistaken, responses to this thread have not exactly been > > > plentiful > > > > this last year and a half.. > > > > > > I agree with you on the default vs. non-default points made. We must > all > > > note that this mailing list is *not *representative of the typical > users > > of > > > Kafka, and that many organizations are predominantly looking to use > > > out-of-the-box solutions. This will only become more common as hosted > > Kafka > > > solutions (think AWS hosted Kafka) gain more traction. I think the goal > > of > > > this KIP to provide that out-of-the-box experience is extremely > > important, > > > especially for all the reasons noted so far (GDPR, privacy, financials, > > > interest by many parties but no default solution). > > > > > > re: (4) > > > >> Regarding plaintext data in RocksDB instances, I am a bit torn to be > > > >> honest. On the one hand, I feel like this scenario is not something > > that > > > we > > > >> can fully control. > > > > > > I agree with this in principle. I think that our responsibility to > > encrypt > > > data at rest ends the moment that data leaves the broker. That being > > said, > > > it isn't unreasonable. I am going to think more about this and see if I > > can > > > come up with something. > > > > > > > > > > > > > > > > > > On Fri, May 8, 2020 at 5:05 AM Sönke Liebau > > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > > > Hey everybody, > > > > > > > > thanks a lot for reading and giving feedback!! I'll try and answer > all > > > > points that I found going through the thread in this mail, but if I > > miss > > > > something please feel free to let me know! I've added a running > number > > to > > > > the discussed topics for ease of reference down the road. > > > > > > > > I'll go through the KIP and update it with everything that I have > > written > > > > below after sending this mail. > > > > > > > > @Tom: > > > > (1) If I understand your concerns correctly you feel that this > > > > functionality would have a hard time getting approved into Apache > Kafka > > > > because it can be achieved with custom Serializers in the same way > and > > > that > > > > we should maybe develop this outside of Apache Kafka at first. > > > > I feel like it is precisely the fact that this is not part of core > > Apache > > > > Kafka that makes people think twice about doing end-to-end > encryption. > > I > > > > may be working in a market (Germany) that is a bit special when > > compared > > > to > > > > the rest of the world where encryption and things like that are > > > concerned, > > > > but I've personally sat in multiple meetings where this feature was > > > > discussed. It is not necessarily the end-to-end encryption itself, > but > > > the > > > > at-rest encryption that you get with it. > > > > When people hear that this is not part of Apache Kafka itself, but > that > > > > would need to develop something themselves that more often than not > is > > > the > > > > end of that discussion. Using something that is not "stock" is quite > > > often > > > > simply not an option. > > > > Even if they decide to go forward with it, they'll find Hendrik's > blog > > > post > > > > from 4 years ago on this, probably the Whitepapers from Confluent and > > > > Lenses and maybe a few implementations on github - all of which just > > > serve > > > > to further muddy the waters. Not because any of these resources are > bad > > > or > > > > wrong, but just because information and implementations are spread > out > > > over > > > > a lot of different places. Developing this outside of Apache Kafka > > would > > > > simply serve to add one more item to this list that would not really > > > matter > > > > I'm afraid. > > > > > > > > I strongly feel that this is a needed feature in Kafka and that there > > is > > > a > > > > large number of people out there that would want to use it - but I > may > > > very > > > > well be mistaken, responses to this thread have not exactly been > > > plentiful > > > > this last year and a half.. > > > > > > > > @Mike: > > > > (2) Regarding the encryption of headers, my current idea is to keep > > this > > > > configurable. I have seen customers use headers for stuff like > account > > > > numbers which under the GDPR are considered to be personal data that > > > should > > > > be encrypted wherever possible. So in some instances it might be > useful > > > to > > > > encrypt header fields as well. > > > > My current PoC implementation allows specifying a Regex for headers > > that > > > > should be encrypted, which would allow having encrypted and > unencrypted > > > > headers in the same record to hopefully suit most use cases. > > > > > > > > (3) Also, my plan is to not change the message format, but to > > > > "encrypt-in-place" and add a header field with the necessary > > information > > > > for decryption, which would then be removed by the decrypting > consumer. > > > > There may be some out-of-date intentions still in the KIP, I'll go > > > through > > > > it and update. > > > > > > > > @Ryanne: > > > > First off, I fully agree that we should avoid painting ourselves > into a > > > > corner with an early client-only implementation. I scaled down this > Kip > > > > from earlier attempts that included things like key rollover and > > > > broker-side implementations because I could not get any feedback from > > the > > > > community on those for a long time and felt that maybe there was no > > > > appetite for the full-blown solution. So I decided to try with a more > > > > limited scope. I am very happy to discuss/go for the fully featured > > > version > > > > again :) > > > > > > > > (4) Regarding plaintext data in RocksDB instances, I am a bit torn to > > be > > > > honest. On the one hand, I feel like this scenario is not something > > that > > > we > > > > can fully control. Kafka Streams in this case is a client that takes > > data > > > > from Kafka, decrypts it and then puts it somewhere in plaintext. To > me > > > this > > > > scenario differs only slightly from for example someone writing a > > backup > > > > job that reads a topic and writes it to a textfile - not much we can > do > > > > about it. > > > > That being said, Kafka Streams is part of Apache Kafka, so does merit > > > > special consideration. I'll have to dig into how StateStores are > used a > > > bit > > > > (I am not the worlds largest expert - or any kind of expert on that) > to > > > try > > > > and come up with an idea. > > > > > > > > > > > > (5) On key encryption and hashing, this is definitely an issue that > we > > > need > > > > a solution for. I currently have key encryption configurable in my > > > > implementation. When encryption is enabled, an option would of course > > be > > > to > > > > hash the original key and store the key data together with the value > in > > > an > > > > encrypted form. Any salt added to the key before hashing could be > > > encrypted > > > > along with the data. This would allow all key-based functionality > like > > > > compaction, joins etc. to keep working without having to know the > > > cleartext > > > > key. > > > > > > > > I've also considered deterministic encryption which would keep the > > > > encrypted key the same, but I am fairly certain that we will want to > > > allow > > > > regular key rotation (more on this in next paragraph) without > > > re-encrypting > > > > older data and that would then change the encrypted key and break all > > > these > > > > things. > > > > Regarding re-encrypting existing keys when a crypto key is > > compromised, I > > > > think we need to be very careful with this if we do it in-place on > the > > > > broker. If we add functionality along the lines of compaction, which > > > reads > > > > re-encrypts and rewrites segment files we have to make sure that > > > producers > > > > chose partitions on the cleartext value, otherwise all records > starting > > > > from the key change may go to a different partition of the topic.. > > > > > > > > (6) Key rollover would be a cool feature to have. I was up until now > > only > > > > thinking about supporting regular key rollover functionality that > would > > > > change keys for all records going forward tbh - mostly for complexity > > > > reasons - I think there was actually a sentence in the original KIP > to > > > this > > > > regard. But if you and others feel this is needed then I am happy to > > > > discuss this. > > > > If we implement this on the broker we could use topic compaction for > > > > inspiration, read all segment files and check records one by one, if > > the > > > > key used for that record has been "retired/compromised/..." > re-encrypt > > > with > > > > new key and write a new segment file. Lots of things to consider > around > > > > this regarding performance, how to trigger etc. but in principle this > > > could > > > > work I think. > > > > One issue I can see with this is if we use envelope encryption for > the > > > keys > > > > to address the rogue admin issue, so the broker doesn't have access > to > > > the > > > > actual key encrypting the data, this would make that operation > > > impossible. > > > > > > > > > > > > > > > > I hope I got to all items that were raised, but may very well have > > > > overlooked something, please let me know if I did - and of course > your > > > > thoughts on what I wrote! > > > > > > > > I'll update the KIP today as well. > > > > > > > > Best regards, > > > > Sönke > > > > > > > > > > > > > > > > > > > > On Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com> > > wrote: > > > > > > > > > Tom, good point, I've done exactly that -- hashing record keys -- > but > > > > it's > > > > > unclear to me what should happen when the hash key must be rotated. > > In > > > my > > > > > case the (external) solution involved rainbow tables, versioned > keys, > > > and > > > > > custom materializers that were aware of older keys for each record. > > > > > > > > > > In particular I had a pipeline that would re-key records and > > re-ingest > > > > > them, while opportunistically overwriting records materialized with > > the > > > > old > > > > > key. > > > > > > > > > > For a native solution I think maybe we'd need to carry around any > old > > > > > versions of each record key, perhaps as metadata. Then brokers and > > > > > materializers can compact records based on _any_ overlapping key, > > > maybe? > > > > > Not sure. > > > > > > > > > > Ryanne > > > > > > > > > > On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> > > wrote: > > > > > > > > > > > Hi Rayanne, > > > > > > > > > > > > You raise some good points there. > > > > > > > > > > > > Similarly, if the whole record is encrypted, it becomes > impossible > > to > > > > do > > > > > > > joins, group bys etc, which just need the record key and maybe > > > don't > > > > > have > > > > > > > access to the encryption key. Maybe only record _values_ should > > be > > > > > > > encrypted, and maybe Kafka Streams could defer decryption until > > the > > > > > > actual > > > > > > > value is inspected. That way joins etc are possible without the > > > > > > encryption > > > > > > > key, and RocksDB would not need to decrypt values before > > > > materializing > > > > > to > > > > > > > disk. > > > > > > > > > > > > > > > > > > > It's getting a bit late here, so maybe I overlooked something, > but > > > > > wouldn't > > > > > > the natural thing to do be to make the "encrypted" key a hash of > > the > > > > > > original key, and let the value of the encrypted value be the > > cipher > > > > text > > > > > > of the (original key, original value) pair. A scheme like this > > would > > > > > > preserve equality of the key (strictly speaking there's a chance > of > > > > > > collision of course). I guess this could also be a solution for > the > > > > > > compacted topic issue Sönke mentioned. > > > > > > > > > > > > Cheers, > > > > > > > > > > > > Tom > > > > > > > > > > > > > > > > > > > > > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan < > ryannedo...@gmail.com > > > > > > > > wrote: > > > > > > > > > > > > > Thanks Sönke, this is an area in which Kafka is really, really > > far > > > > > > behind. > > > > > > > > > > > > > > I've built secure systems around Kafka as laid out in the KIP. > > One > > > > > issue > > > > > > > that is not addressed in the KIP is re-encryption of records > > after > > > a > > > > > key > > > > > > > rotation. When a key is compromised, it's important that any > data > > > > > > encrypted > > > > > > > using that key is immediately destroyed or re-encrypted with a > > new > > > > key. > > > > > > > Ideally first-class support for end-to-end encryption in Kafka > > > would > > > > > make > > > > > > > this possible natively, or else I'm not sure what the point > would > > > be. > > > > > It > > > > > > > seems to me that the brokers would need to be involved in this > > > > process, > > > > > > so > > > > > > > perhaps a client-first approach will be painting ourselves > into a > > > > > corner. > > > > > > > Not sure. > > > > > > > > > > > > > > Another issue is whether materialized tables, e.g. in Kafka > > > Streams, > > > > > > would > > > > > > > see unencrypted or encrypted records. If we implemented the KIP > > as > > > > > > written, > > > > > > > it would still result in a bunch of plain text data in RocksDB > > > > > > everywhere. > > > > > > > Again, I'm not sure what the point would be. Perhaps using > custom > > > > > serdes > > > > > > > would actually be a more holistic approach, since Kafka Streams > > etc > > > > > could > > > > > > > leverage these as well. > > > > > > > > > > > > > > Similarly, if the whole record is encrypted, it becomes > > impossible > > > to > > > > > do > > > > > > > joins, group bys etc, which just need the record key and maybe > > > don't > > > > > have > > > > > > > access to the encryption key. Maybe only record _values_ should > > be > > > > > > > encrypted, and maybe Kafka Streams could defer decryption until > > the > > > > > > actual > > > > > > > value is inspected. That way joins etc are possible without the > > > > > > encryption > > > > > > > key, and RocksDB would not need to decrypt values before > > > > materializing > > > > > to > > > > > > > disk. > > > > > > > > > > > > > > This is why I've implemented encryption on a per-field basis, > not > > > at > > > > > the > > > > > > > record level, when addressing kafka security in the past. And > > I've > > > > had > > > > > to > > > > > > > build external pipelines that purge, re-encrypt, and re-ingest > > > > records > > > > > > when > > > > > > > keys are compromised. > > > > > > > > > > > > > > This KIP might be a step in the right direction, not sure. But > > I'm > > > > > > hesitant > > > > > > > to support the idea of end-to-end encryption without a plan to > > > > address > > > > > > the > > > > > > > myriad other problems. > > > > > > > > > > > > > > That said, we need this badly and I hope something shakes out. > > > > > > > > > > > > > > Ryanne > > > > > > > > > > > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > > > > > > > <soenke.lie...@opencore.com.invalid> wrote: > > > > > > > > > > > > > > > All, > > > > > > > > > > > > > > > > I've asked for comments on this KIP in the past, but since I > > > didn't > > > > > > > really > > > > > > > > get any feedback I've decided to reduce the initial scope of > > the > > > > KIP > > > > > a > > > > > > > bit > > > > > > > > and try again. > > > > > > > > > > > > > > > > I have reworked to KIP to provide a limited, but useful set > of > > > > > features > > > > > > > for > > > > > > > > this initial KIP and laid out a very rough roadmap of what > I'd > > > > > envision > > > > > > > > this looking like in a final version. > > > > > > > > > > > > > > > > I am aware that the KIP is currently light on implementation > > > > details, > > > > > > but > > > > > > > > would like to get some feedback on the general approach > before > > > > fully > > > > > > > > speccing everything. > > > > > > > > > > > > > > > > The KIP can be found at > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > > > > > > > > > > > > > > > > > > > I would very much appreciate any feedback! > > > > > > > > > > > > > > > > Best regards, > > > > > > > > Sönke > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Sönke Liebau > > > > Partner > > > > Tel. +49 179 7940878 > > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany > > > > > > > > > >