Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Maybe worth taking a look at TDE in HDFS: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html A complete solution requires several Hadoop services. I suspect that would scare the Kafka community a bit, but maybe it's unreasonable to expect Kafka brokers to do all we've mentioned. Of particular note, seems TDE uses multiple layers of keys to avoid re-encrypting data when keys are rotated, iiuc. Ryanne On Sat, May 16, 2020, 9:04 AM Adam Bellemare wrote: > Hi Sönke > > I've been giving it more thought over the last few days, and looking into > other systems as well, and I think that I've derailed your proposal a bit > with suggesting that at-rest encryption may be sufficient. I believe that > many of us are lacking the context of the sorts of discussions you have had > with stakeholders concerned about encryption. Anyways, a very brief > abbreviation of my thoughts: > > 1) We should look to do encryption at-rest, but it should be outside the > scope of this KIP. (Is disk encryption as provided by the OS or cloud > provider sufficient?) > > 2) For end-to-end encryption, the part that concerns me is the various > roles that the broker may play in this plan. For instance, in Phase 2: > > > This phase will concentrate on server-side configuration of encryption. > Topic settings will be added that allow the specification of encryption > settings that consumers and producers should use. Producers and Consumers > will be enabled to fetch these settings and use them for encryption without > the end-user having to configure anything in addition. > > > Brokers will be extended with pluggable Key Managers that will allow for > automatic key rotation later on. A basic, keystore based implementation > will be created. > Again, I am not a security expert, but it seems to me that if we want > end-to-end encryption on par with the sort of encryption we see in our > RelationalDB cousins, it would require that the broker (which could be > hosted remotely, with a potentially malicious admin) have no knowledge of > any of the keys, nor be responsible for any sort of key rotation. I believe > that all of this would be required to be handled by the clients themselves > (though please correct me if I am misinterpreting this), and that to reduce > attack surface possibilities, we should handle the encryption + decryption > keys in a manner similar to how we handle TLS keys (client must supply > their own). > > Ryanne does point out that automatic key-rotation of end-to-end encrypted > data would be an incredibly useful feature to have. However, I am not sure > how to square this against what is done with relational databases, as it > seems that they require that the client perform any updates or changes to > the encryption keys and data and wash their hands completely of that duty > (which makes sense - keep the database out of it, reduce the attack > surface). End-to-end, by definition requires that the broker be unable to > decrypt any of the data, and having it responsible for rolling keys, while > seemingly useful, does deftly throw end-to-end out the window. > > Final Q: > Would it be reasonable to create a new optional service in the Kafka > project that is strictly responsible for these sorts of encryption matters? > Something like Confluent's schema registry, but as a mechanism for > coordinating key rotations with clients, encryption key registrations per > topic, etc.? KeyManager would plug into here, could use Kafka as the > storage layer for the keys (as we do with schemas, but encrypted themselves > of course) or use the whole thing as just a thin layer over a full blown > remote KeyManager that simply coordinates the producers, consumers, and > keys required for the data per topic. This independent service would give > organizations the ability to host it locally for security purposes, while > farming out the brokers to perhaps less trustworthy sources? > > Adam > > > > > > > On Sun, May 10, 2020 at 7:52 PM Adam Bellemare > wrote: > > > @Ryanne > > > Seems that could still get us per-topic keys (vs encrypting the entire > > > volume), which would be my main requirement. > > > > Agreed, I think that per-topic separation of keys would be very valuable > > for multi-tenancy. > > > > > > My 2 cents is that if encryption at rest is sufficient to satisfy GDPR + > > other similar data protection measures, then we should aim to do that > > first. The demand is real and privacy laws wont likely be loosening any > > time soon. That being said, I am not sufficiently familiar with the > myriad > > of data laws. I will look into it some more though, as I am now curious. > > > > > > On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada < > maulin.vasav...@gmail.com> > > wrote: > > > >> Hi Sonke > >> > >> Thanks for bringing this for discussion. There are lot of considerations > >> even if we assume we have end-to-end encryption done. Example depending > >> upon company's setup there cou
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Sönke I've been giving it more thought over the last few days, and looking into other systems as well, and I think that I've derailed your proposal a bit with suggesting that at-rest encryption may be sufficient. I believe that many of us are lacking the context of the sorts of discussions you have had with stakeholders concerned about encryption. Anyways, a very brief abbreviation of my thoughts: 1) We should look to do encryption at-rest, but it should be outside the scope of this KIP. (Is disk encryption as provided by the OS or cloud provider sufficient?) 2) For end-to-end encryption, the part that concerns me is the various roles that the broker may play in this plan. For instance, in Phase 2: > This phase will concentrate on server-side configuration of encryption. Topic settings will be added that allow the specification of encryption settings that consumers and producers should use. Producers and Consumers will be enabled to fetch these settings and use them for encryption without the end-user having to configure anything in addition. > Brokers will be extended with pluggable Key Managers that will allow for automatic key rotation later on. A basic, keystore based implementation will be created. Again, I am not a security expert, but it seems to me that if we want end-to-end encryption on par with the sort of encryption we see in our RelationalDB cousins, it would require that the broker (which could be hosted remotely, with a potentially malicious admin) have no knowledge of any of the keys, nor be responsible for any sort of key rotation. I believe that all of this would be required to be handled by the clients themselves (though please correct me if I am misinterpreting this), and that to reduce attack surface possibilities, we should handle the encryption + decryption keys in a manner similar to how we handle TLS keys (client must supply their own). Ryanne does point out that automatic key-rotation of end-to-end encrypted data would be an incredibly useful feature to have. However, I am not sure how to square this against what is done with relational databases, as it seems that they require that the client perform any updates or changes to the encryption keys and data and wash their hands completely of that duty (which makes sense - keep the database out of it, reduce the attack surface). End-to-end, by definition requires that the broker be unable to decrypt any of the data, and having it responsible for rolling keys, while seemingly useful, does deftly throw end-to-end out the window. Final Q: Would it be reasonable to create a new optional service in the Kafka project that is strictly responsible for these sorts of encryption matters? Something like Confluent's schema registry, but as a mechanism for coordinating key rotations with clients, encryption key registrations per topic, etc.? KeyManager would plug into here, could use Kafka as the storage layer for the keys (as we do with schemas, but encrypted themselves of course) or use the whole thing as just a thin layer over a full blown remote KeyManager that simply coordinates the producers, consumers, and keys required for the data per topic. This independent service would give organizations the ability to host it locally for security purposes, while farming out the brokers to perhaps less trustworthy sources? Adam On Sun, May 10, 2020 at 7:52 PM Adam Bellemare wrote: > @Ryanne > > Seems that could still get us per-topic keys (vs encrypting the entire > > volume), which would be my main requirement. > > Agreed, I think that per-topic separation of keys would be very valuable > for multi-tenancy. > > > My 2 cents is that if encryption at rest is sufficient to satisfy GDPR + > other similar data protection measures, then we should aim to do that > first. The demand is real and privacy laws wont likely be loosening any > time soon. That being said, I am not sufficiently familiar with the myriad > of data laws. I will look into it some more though, as I am now curious. > > > On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada > wrote: > >> Hi Sonke >> >> Thanks for bringing this for discussion. There are lot of considerations >> even if we assume we have end-to-end encryption done. Example depending >> upon company's setup there could be restrictions on how/which encryption >> keys are shared. Environment could have multiple security and network >> boundaries beyond which keys are not allowed to be shared. That will mean >> that consumers may not be able to decrypt the messages at all if the data >> is moved from one zone to another. If we have mirroring done, are >> mirror-makers supposed to decrypt and encrypt again OR they would be >> pretty >> much bytes-in bytes-out paradigm that it is today? Also having a polyglot >> Kafka client base will force you to support encryption/decryption >> libraries >> that work for all the languages and that may not work depending upon the >> scope of the team owning Kafka Infrastructure. >> >
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
@Ryanne > Seems that could still get us per-topic keys (vs encrypting the entire > volume), which would be my main requirement. Agreed, I think that per-topic separation of keys would be very valuable for multi-tenancy. My 2 cents is that if encryption at rest is sufficient to satisfy GDPR + other similar data protection measures, then we should aim to do that first. The demand is real and privacy laws wont likely be loosening any time soon. That being said, I am not sufficiently familiar with the myriad of data laws. I will look into it some more though, as I am now curious. On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada wrote: > Hi Sonke > > Thanks for bringing this for discussion. There are lot of considerations > even if we assume we have end-to-end encryption done. Example depending > upon company's setup there could be restrictions on how/which encryption > keys are shared. Environment could have multiple security and network > boundaries beyond which keys are not allowed to be shared. That will mean > that consumers may not be able to decrypt the messages at all if the data > is moved from one zone to another. If we have mirroring done, are > mirror-makers supposed to decrypt and encrypt again OR they would be pretty > much bytes-in bytes-out paradigm that it is today? Also having a polyglot > Kafka client base will force you to support encryption/decryption libraries > that work for all the languages and that may not work depending upon the > scope of the team owning Kafka Infrastructure. > > Combining disk encryption with TLS+ACLs could be enough instead of having > end-to-end message level encryption. What is your opinion on that? > > We have experimented with end-to-end encryption with custom > serializers/deserializers and I felt that was good enough because > other challenges I mentioned before may not be ease to address with a > generic solution. > > Thanks > Maulin > > > > Thanks > Maulin > > > > > On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan wrote: > > > Adam, I agree, seems reasonable to limit the broker's responsibility to > > encrypting only data at rest. I guess whole segment files could be > > encrypted with the same key, and rotating keys would just involve > > re-encrypting entire segments. Maybe a key rotation would involve closing > > all affected segments and kicking off a background task to re-encrypt > them. > > Certainly that would not impede ingestion of new records, and seems > > consumers could use the old segments until they are replaced with the > newly > > encrypted ones. > > > > Seems that could still get us per-topic keys (vs encrypting the entire > > volume), which would be my main requirement. > > > > Not really "end-to-end", but combined with TLS or something, seems > > reasonable. > > > > Ryanne > > > > On Sat, May 9, 2020, 11:00 AM Adam Bellemare > > wrote: > > > > > Hi All > > > > > > I typed up a number of replies which I have below, but I have one major > > > overriding question: Is there a reason we aren't implementing > > > encryption-at-rest almost exactly the same way that most relational > > > databases do? ie: > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption > > > > > > I ask this because it seems like we're going to end up with something > > > similar to what they did in terms of requirements, plus... > > > > > > "For the *past 16 months*, there has been discussion about whether and > > how > > > to implement Transparent Data Encryption (tde) in Postgres. Many other > > > relational databases support tde, and *some security standards require* > > it. > > > However, it is also debatable how much security value tde provides. > > > The tde *400-email > > > thread* became difficult for people to follow..." > > > What still isn't clear to me is the scope that we're trying to cover > > here. > > > Encryption at rest suggests that we need to have the data encrypted on > > the > > > brokers, and *only* on the brokers, since they're the durable units of > > > storage. Any encryption over the wire should be covered by TLS. I > think > > > that our goals for this should be (from > > > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models > > > ) > > > > > > > TDE protects data from theft when file system access controls are > > > > compromised: > > > > > > > >- Malicious user steals storage devices and reads database files > > > >directly. > > > >- Malicious backup operator takes backup. > > > >- Protecting data at rest (persistent data) > > > > > > > > This does not protect from users who can read system memory, e.g., > > shared > > > > buffers, which root users can do. > > > > > > > > > > I am not a security expert nor am I an expert on relational databases. > > > However, I can't identify any reason why the approach outlined by > > > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my > > > understanding) wouldn't work for data-at-rest encryption. In addition, > > we'd > > > get
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Sonke Thanks for bringing this for discussion. There are lot of considerations even if we assume we have end-to-end encryption done. Example depending upon company's setup there could be restrictions on how/which encryption keys are shared. Environment could have multiple security and network boundaries beyond which keys are not allowed to be shared. That will mean that consumers may not be able to decrypt the messages at all if the data is moved from one zone to another. If we have mirroring done, are mirror-makers supposed to decrypt and encrypt again OR they would be pretty much bytes-in bytes-out paradigm that it is today? Also having a polyglot Kafka client base will force you to support encryption/decryption libraries that work for all the languages and that may not work depending upon the scope of the team owning Kafka Infrastructure. Combining disk encryption with TLS+ACLs could be enough instead of having end-to-end message level encryption. What is your opinion on that? We have experimented with end-to-end encryption with custom serializers/deserializers and I felt that was good enough because other challenges I mentioned before may not be ease to address with a generic solution. Thanks Maulin Thanks Maulin On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan wrote: > Adam, I agree, seems reasonable to limit the broker's responsibility to > encrypting only data at rest. I guess whole segment files could be > encrypted with the same key, and rotating keys would just involve > re-encrypting entire segments. Maybe a key rotation would involve closing > all affected segments and kicking off a background task to re-encrypt them. > Certainly that would not impede ingestion of new records, and seems > consumers could use the old segments until they are replaced with the newly > encrypted ones. > > Seems that could still get us per-topic keys (vs encrypting the entire > volume), which would be my main requirement. > > Not really "end-to-end", but combined with TLS or something, seems > reasonable. > > Ryanne > > On Sat, May 9, 2020, 11:00 AM Adam Bellemare > wrote: > > > Hi All > > > > I typed up a number of replies which I have below, but I have one major > > overriding question: Is there a reason we aren't implementing > > encryption-at-rest almost exactly the same way that most relational > > databases do? ie: > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption > > > > I ask this because it seems like we're going to end up with something > > similar to what they did in terms of requirements, plus... > > > > "For the *past 16 months*, there has been discussion about whether and > how > > to implement Transparent Data Encryption (tde) in Postgres. Many other > > relational databases support tde, and *some security standards require* > it. > > However, it is also debatable how much security value tde provides. > > The tde *400-email > > thread* became difficult for people to follow..." > > What still isn't clear to me is the scope that we're trying to cover > here. > > Encryption at rest suggests that we need to have the data encrypted on > the > > brokers, and *only* on the brokers, since they're the durable units of > > storage. Any encryption over the wire should be covered by TLS. I think > > that our goals for this should be (from > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models > > ) > > > > > TDE protects data from theft when file system access controls are > > > compromised: > > > > > >- Malicious user steals storage devices and reads database files > > >directly. > > >- Malicious backup operator takes backup. > > >- Protecting data at rest (persistent data) > > > > > > This does not protect from users who can read system memory, e.g., > shared > > > buffers, which root users can do. > > > > > > > I am not a security expert nor am I an expert on relational databases. > > However, I can't identify any reason why the approach outlined by > > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my > > understanding) wouldn't work for data-at-rest encryption. In addition, > we'd > > get the added benefit of being consistent with other solutions, which is > an > > easier sell when discussing security with management (Kafka? Oh yeah, > their > > encryption solution is just like the one we already have in place for our > > Postgres solutions), and may let us avoid reinventing a good part of the > > wheel. > > > > > > -- > > > > @Ryanne > > One more complicating factor, regarding joins - the foreign key joiner > > requires access to the value to extract the foreign key - if it's > > encrypted, the FKJ would need to decrypt it to apply the value extractor. > > > > @Soenk re (1) > > > When people hear that this is not part of Apache Kafka itself, but that > > > would need to develop something themselves that more often than not is > > the > > > end of that discussion. Using something that is not "stock" is quit
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Adam, I agree, seems reasonable to limit the broker's responsibility to encrypting only data at rest. I guess whole segment files could be encrypted with the same key, and rotating keys would just involve re-encrypting entire segments. Maybe a key rotation would involve closing all affected segments and kicking off a background task to re-encrypt them. Certainly that would not impede ingestion of new records, and seems consumers could use the old segments until they are replaced with the newly encrypted ones. Seems that could still get us per-topic keys (vs encrypting the entire volume), which would be my main requirement. Not really "end-to-end", but combined with TLS or something, seems reasonable. Ryanne On Sat, May 9, 2020, 11:00 AM Adam Bellemare wrote: > Hi All > > I typed up a number of replies which I have below, but I have one major > overriding question: Is there a reason we aren't implementing > encryption-at-rest almost exactly the same way that most relational > databases do? ie: > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption > > I ask this because it seems like we're going to end up with something > similar to what they did in terms of requirements, plus... > > "For the *past 16 months*, there has been discussion about whether and how > to implement Transparent Data Encryption (tde) in Postgres. Many other > relational databases support tde, and *some security standards require* it. > However, it is also debatable how much security value tde provides. > The tde *400-email > thread* became difficult for people to follow..." > What still isn't clear to me is the scope that we're trying to cover here. > Encryption at rest suggests that we need to have the data encrypted on the > brokers, and *only* on the brokers, since they're the durable units of > storage. Any encryption over the wire should be covered by TLS. I think > that our goals for this should be (from > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models > ) > > > TDE protects data from theft when file system access controls are > > compromised: > > > >- Malicious user steals storage devices and reads database files > >directly. > >- Malicious backup operator takes backup. > >- Protecting data at rest (persistent data) > > > > This does not protect from users who can read system memory, e.g., shared > > buffers, which root users can do. > > > > I am not a security expert nor am I an expert on relational databases. > However, I can't identify any reason why the approach outlined by > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my > understanding) wouldn't work for data-at-rest encryption. In addition, we'd > get the added benefit of being consistent with other solutions, which is an > easier sell when discussing security with management (Kafka? Oh yeah, their > encryption solution is just like the one we already have in place for our > Postgres solutions), and may let us avoid reinventing a good part of the > wheel. > > > -- > > @Ryanne > One more complicating factor, regarding joins - the foreign key joiner > requires access to the value to extract the foreign key - if it's > encrypted, the FKJ would need to decrypt it to apply the value extractor. > > @Soenk re (1) > > When people hear that this is not part of Apache Kafka itself, but that > > would need to develop something themselves that more often than not is > the > > end of that discussion. Using something that is not "stock" is quite > often > > simply not an option. > > > I strongly feel that this is a needed feature in Kafka and that there is > a > > large number of people out there that would want to use it - but I may > very > > well be mistaken, responses to this thread have not exactly been > plentiful > > this last year and a half.. > > I agree with you on the default vs. non-default points made. We must all > note that this mailing list is *not *representative of the typical users of > Kafka, and that many organizations are predominantly looking to use > out-of-the-box solutions. This will only become more common as hosted Kafka > solutions (think AWS hosted Kafka) gain more traction. I think the goal of > this KIP to provide that out-of-the-box experience is extremely important, > especially for all the reasons noted so far (GDPR, privacy, financials, > interest by many parties but no default solution). > > re: (4) > >> Regarding plaintext data in RocksDB instances, I am a bit torn to be > >> honest. On the one hand, I feel like this scenario is not something that > we > >> can fully control. > > I agree with this in principle. I think that our responsibility to encrypt > data at rest ends the moment that data leaves the broker. That being said, > it isn't unreasonable. I am going to think more about this and see if I can > come up with something. > > > > > > On Fri, May 8, 2020 at 5:05 AM Sönke Liebau > wrote: > > > Hey everybody, > > > > thanks a lot for reading and gi
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi All I typed up a number of replies which I have below, but I have one major overriding question: Is there a reason we aren't implementing encryption-at-rest almost exactly the same way that most relational databases do? ie: https://wiki.postgresql.org/wiki/Transparent_Data_Encryption I ask this because it seems like we're going to end up with something similar to what they did in terms of requirements, plus... "For the *past 16 months*, there has been discussion about whether and how to implement Transparent Data Encryption (tde) in Postgres. Many other relational databases support tde, and *some security standards require* it. However, it is also debatable how much security value tde provides. The tde *400-email thread* became difficult for people to follow..." What still isn't clear to me is the scope that we're trying to cover here. Encryption at rest suggests that we need to have the data encrypted on the brokers, and *only* on the brokers, since they're the durable units of storage. Any encryption over the wire should be covered by TLS. I think that our goals for this should be (from https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models) > TDE protects data from theft when file system access controls are > compromised: > >- Malicious user steals storage devices and reads database files >directly. >- Malicious backup operator takes backup. >- Protecting data at rest (persistent data) > > This does not protect from users who can read system memory, e.g., shared > buffers, which root users can do. > I am not a security expert nor am I an expert on relational databases. However, I can't identify any reason why the approach outlined by PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my understanding) wouldn't work for data-at-rest encryption. In addition, we'd get the added benefit of being consistent with other solutions, which is an easier sell when discussing security with management (Kafka? Oh yeah, their encryption solution is just like the one we already have in place for our Postgres solutions), and may let us avoid reinventing a good part of the wheel. -- @Ryanne One more complicating factor, regarding joins - the foreign key joiner requires access to the value to extract the foreign key - if it's encrypted, the FKJ would need to decrypt it to apply the value extractor. @Soenk re (1) > When people hear that this is not part of Apache Kafka itself, but that > would need to develop something themselves that more often than not is the > end of that discussion. Using something that is not "stock" is quite often > simply not an option. > I strongly feel that this is a needed feature in Kafka and that there is a > large number of people out there that would want to use it - but I may very > well be mistaken, responses to this thread have not exactly been plentiful > this last year and a half.. I agree with you on the default vs. non-default points made. We must all note that this mailing list is *not *representative of the typical users of Kafka, and that many organizations are predominantly looking to use out-of-the-box solutions. This will only become more common as hosted Kafka solutions (think AWS hosted Kafka) gain more traction. I think the goal of this KIP to provide that out-of-the-box experience is extremely important, especially for all the reasons noted so far (GDPR, privacy, financials, interest by many parties but no default solution). re: (4) >> Regarding plaintext data in RocksDB instances, I am a bit torn to be >> honest. On the one hand, I feel like this scenario is not something that we >> can fully control. I agree with this in principle. I think that our responsibility to encrypt data at rest ends the moment that data leaves the broker. That being said, it isn't unreasonable. I am going to think more about this and see if I can come up with something. On Fri, May 8, 2020 at 5:05 AM Sönke Liebau wrote: > Hey everybody, > > thanks a lot for reading and giving feedback!! I'll try and answer all > points that I found going through the thread in this mail, but if I miss > something please feel free to let me know! I've added a running number to > the discussed topics for ease of reference down the road. > > I'll go through the KIP and update it with everything that I have written > below after sending this mail. > > @Tom: > (1) If I understand your concerns correctly you feel that this > functionality would have a hard time getting approved into Apache Kafka > because it can be achieved with custom Serializers in the same way and that > we should maybe develop this outside of Apache Kafka at first. > I feel like it is precisely the fact that this is not part of core Apache > Kafka that makes people think twice about doing end-to-end encryption. I > may be working in a market (Germany) that is a bit special when compared to > the rest of the world where encryption and things like that are concerned,
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Tbh tom is right it is entirely possible to support end 2 end encryption today without broker or client changes with serializers. Infact i know many companies doing this.As such maybe a good approach would be to provide a default encryption and decryption serde thats able to be used rather than any client or broker changes at all. This way those who already have a working solution does not change and basically youre providing a default solution to those who have not already made one so that its easier to adopt.Sent from my Samsung Galaxy smartphone. Original message From: Sönke Liebau Date: 08/05/2020 10:05 (GMT+00:00) To: dev Subject: Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka Hey everybody,thanks a lot for reading and giving feedback!! I'll try and answer allpoints that I found going through the thread in this mail, but if I misssomething please feel free to let me know! I've added a running number tothe discussed topics for ease of reference down the road.I'll go through the KIP and update it with everything that I have writtenbelow after sending this mail.@Tom:(1) If I understand your concerns correctly you feel that thisfunctionality would have a hard time getting approved into Apache Kafkabecause it can be achieved with custom Serializers in the same way and thatwe should maybe develop this outside of Apache Kafka at first.I feel like it is precisely the fact that this is not part of core ApacheKafka that makes people think twice about doing end-to-end encryption. Imay be working in a market (Germany) that is a bit special when compared tothe rest of the world where encryption and things like that are concerned,but I've personally sat in multiple meetings where this feature wasdiscussed. It is not necessarily the end-to-end encryption itself, but theat-rest encryption that you get with it.When people hear that this is not part of Apache Kafka itself, but thatwould need to develop something themselves that more often than not is theend of that discussion. Using something that is not "stock" is quite oftensimply not an option.Even if they decide to go forward with it, they'll find Hendrik's blog postfrom 4 years ago on this, probably the Whitepapers from Confluent andLenses and maybe a few implementations on github - all of which just serveto further muddy the waters. Not because any of these resources are bad orwrong, but just because information and implementations are spread out overa lot of different places. Developing this outside of Apache Kafka wouldsimply serve to add one more item to this list that would not really matterI'm afraid.I strongly feel that this is a needed feature in Kafka and that there is alarge number of people out there that would want to use it - but I may verywell be mistaken, responses to this thread have not exactly been plentifulthis last year and a half..@Mike:(2) Regarding the encryption of headers, my current idea is to keep thisconfigurable. I have seen customers use headers for stuff like accountnumbers which under the GDPR are considered to be personal data that shouldbe encrypted wherever possible. So in some instances it might be useful toencrypt header fields as well.My current PoC implementation allows specifying a Regex for headers thatshould be encrypted, which would allow having encrypted and unencryptedheaders in the same record to hopefully suit most use cases.(3) Also, my plan is to not change the message format, but to"encrypt-in-place" and add a header field with the necessary informationfor decryption, which would then be removed by the decrypting consumer.There may be some out-of-date intentions still in the KIP, I'll go throughit and update.@Ryanne:First off, I fully agree that we should avoid painting ourselves into acorner with an early client-only implementation. I scaled down this Kipfrom earlier attempts that included things like key rollover andbroker-side implementations because I could not get any feedback from thecommunity on those for a long time and felt that maybe there was noappetite for the full-blown solution. So I decided to try with a morelimited scope. I am very happy to discuss/go for the fully featured versionagain :)(4) Regarding plaintext data in RocksDB instances, I am a bit torn to behonest. On the one hand, I feel like this scenario is not something that wecan fully control. Kafka Streams in this case is a client that takes datafrom Kafka, decrypts it and then puts it somewhere in plaintext. To me thisscenario differs only slightly from for example someone writing a backupjob that reads a topic and writes it to a textfile - not much we can doabout it.That being said, Kafka Streams is part of Apache Kafka, so does meritspecial consideration. I'll have to dig into how StateStores are used a bit(I am not the worlds largest expert - or any
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hey everybody, thanks a lot for reading and giving feedback!! I'll try and answer all points that I found going through the thread in this mail, but if I miss something please feel free to let me know! I've added a running number to the discussed topics for ease of reference down the road. I'll go through the KIP and update it with everything that I have written below after sending this mail. @Tom: (1) If I understand your concerns correctly you feel that this functionality would have a hard time getting approved into Apache Kafka because it can be achieved with custom Serializers in the same way and that we should maybe develop this outside of Apache Kafka at first. I feel like it is precisely the fact that this is not part of core Apache Kafka that makes people think twice about doing end-to-end encryption. I may be working in a market (Germany) that is a bit special when compared to the rest of the world where encryption and things like that are concerned, but I've personally sat in multiple meetings where this feature was discussed. It is not necessarily the end-to-end encryption itself, but the at-rest encryption that you get with it. When people hear that this is not part of Apache Kafka itself, but that would need to develop something themselves that more often than not is the end of that discussion. Using something that is not "stock" is quite often simply not an option. Even if they decide to go forward with it, they'll find Hendrik's blog post from 4 years ago on this, probably the Whitepapers from Confluent and Lenses and maybe a few implementations on github - all of which just serve to further muddy the waters. Not because any of these resources are bad or wrong, but just because information and implementations are spread out over a lot of different places. Developing this outside of Apache Kafka would simply serve to add one more item to this list that would not really matter I'm afraid. I strongly feel that this is a needed feature in Kafka and that there is a large number of people out there that would want to use it - but I may very well be mistaken, responses to this thread have not exactly been plentiful this last year and a half.. @Mike: (2) Regarding the encryption of headers, my current idea is to keep this configurable. I have seen customers use headers for stuff like account numbers which under the GDPR are considered to be personal data that should be encrypted wherever possible. So in some instances it might be useful to encrypt header fields as well. My current PoC implementation allows specifying a Regex for headers that should be encrypted, which would allow having encrypted and unencrypted headers in the same record to hopefully suit most use cases. (3) Also, my plan is to not change the message format, but to "encrypt-in-place" and add a header field with the necessary information for decryption, which would then be removed by the decrypting consumer. There may be some out-of-date intentions still in the KIP, I'll go through it and update. @Ryanne: First off, I fully agree that we should avoid painting ourselves into a corner with an early client-only implementation. I scaled down this Kip from earlier attempts that included things like key rollover and broker-side implementations because I could not get any feedback from the community on those for a long time and felt that maybe there was no appetite for the full-blown solution. So I decided to try with a more limited scope. I am very happy to discuss/go for the fully featured version again :) (4) Regarding plaintext data in RocksDB instances, I am a bit torn to be honest. On the one hand, I feel like this scenario is not something that we can fully control. Kafka Streams in this case is a client that takes data from Kafka, decrypts it and then puts it somewhere in plaintext. To me this scenario differs only slightly from for example someone writing a backup job that reads a topic and writes it to a textfile - not much we can do about it. That being said, Kafka Streams is part of Apache Kafka, so does merit special consideration. I'll have to dig into how StateStores are used a bit (I am not the worlds largest expert - or any kind of expert on that) to try and come up with an idea. (5) On key encryption and hashing, this is definitely an issue that we need a solution for. I currently have key encryption configurable in my implementation. When encryption is enabled, an option would of course be to hash the original key and store the key data together with the value in an encrypted form. Any salt added to the key before hashing could be encrypted along with the data. This would allow all key-based functionality like compaction, joins etc. to keep working without having to know the cleartext key. I've also considered deterministic encryption which would keep the encrypted key the same, but I am fairly certain that we will want to allow regular key rotation (more on this in next paragraph) without re-encrypting older
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Tom, good point, I've done exactly that -- hashing record keys -- but it's unclear to me what should happen when the hash key must be rotated. In my case the (external) solution involved rainbow tables, versioned keys, and custom materializers that were aware of older keys for each record. In particular I had a pipeline that would re-key records and re-ingest them, while opportunistically overwriting records materialized with the old key. For a native solution I think maybe we'd need to carry around any old versions of each record key, perhaps as metadata. Then brokers and materializers can compact records based on _any_ overlapping key, maybe? Not sure. Ryanne On Thu, May 7, 2020, 12:05 PM Tom Bentley wrote: > Hi Rayanne, > > You raise some good points there. > > Similarly, if the whole record is encrypted, it becomes impossible to do > > joins, group bys etc, which just need the record key and maybe don't have > > access to the encryption key. Maybe only record _values_ should be > > encrypted, and maybe Kafka Streams could defer decryption until the > actual > > value is inspected. That way joins etc are possible without the > encryption > > key, and RocksDB would not need to decrypt values before materializing to > > disk. > > > > It's getting a bit late here, so maybe I overlooked something, but wouldn't > the natural thing to do be to make the "encrypted" key a hash of the > original key, and let the value of the encrypted value be the cipher text > of the (original key, original value) pair. A scheme like this would > preserve equality of the key (strictly speaking there's a chance of > collision of course). I guess this could also be a solution for the > compacted topic issue Sönke mentioned. > > Cheers, > > Tom > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan wrote: > > > Thanks Sönke, this is an area in which Kafka is really, really far > behind. > > > > I've built secure systems around Kafka as laid out in the KIP. One issue > > that is not addressed in the KIP is re-encryption of records after a key > > rotation. When a key is compromised, it's important that any data > encrypted > > using that key is immediately destroyed or re-encrypted with a new key. > > Ideally first-class support for end-to-end encryption in Kafka would make > > this possible natively, or else I'm not sure what the point would be. It > > seems to me that the brokers would need to be involved in this process, > so > > perhaps a client-first approach will be painting ourselves into a corner. > > Not sure. > > > > Another issue is whether materialized tables, e.g. in Kafka Streams, > would > > see unencrypted or encrypted records. If we implemented the KIP as > written, > > it would still result in a bunch of plain text data in RocksDB > everywhere. > > Again, I'm not sure what the point would be. Perhaps using custom serdes > > would actually be a more holistic approach, since Kafka Streams etc could > > leverage these as well. > > > > Similarly, if the whole record is encrypted, it becomes impossible to do > > joins, group bys etc, which just need the record key and maybe don't have > > access to the encryption key. Maybe only record _values_ should be > > encrypted, and maybe Kafka Streams could defer decryption until the > actual > > value is inspected. That way joins etc are possible without the > encryption > > key, and RocksDB would not need to decrypt values before materializing to > > disk. > > > > This is why I've implemented encryption on a per-field basis, not at the > > record level, when addressing kafka security in the past. And I've had to > > build external pipelines that purge, re-encrypt, and re-ingest records > when > > keys are compromised. > > > > This KIP might be a step in the right direction, not sure. But I'm > hesitant > > to support the idea of end-to-end encryption without a plan to address > the > > myriad other problems. > > > > That said, we need this badly and I hope something shakes out. > > > > Ryanne > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > > wrote: > > > > > All, > > > > > > I've asked for comments on this KIP in the past, but since I didn't > > really > > > get any feedback I've decided to reduce the initial scope of the KIP a > > bit > > > and try again. > > > > > > I have reworked to KIP to provide a limited, but useful set of features > > for > > > this initial KIP and laid out a very rough roadmap of what I'd envision > > > this looking like in a final version. > > > > > > I am aware that the KIP is currently light on implementation details, > but > > > would like to get some feedback on the general approach before fully > > > speccing everything. > > > > > > The KIP can be found at > > > > > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > > > > I would very much appreciate any feedback! > > > > > > Best regards, > > > Sönke > > > > > >
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi again, Of course I was overlooking at least one thing. Anyone who could guess the record keys could hash them and compare. To make it work the producer and consumer would need a shared secret to include in the hash computation. But the key management service could furnish them with this in addition to the encryption/decryption keys, so I think it should still work. Unless I've overlooked something else. Cheers, Tom On Thu, May 7, 2020 at 6:04 PM Tom Bentley wrote: > Hi Rayanne, > > You raise some good points there. > > Similarly, if the whole record is encrypted, it becomes impossible to do >> joins, group bys etc, which just need the record key and maybe don't have >> access to the encryption key. Maybe only record _values_ should be >> encrypted, and maybe Kafka Streams could defer decryption until the actual >> value is inspected. That way joins etc are possible without the encryption >> key, and RocksDB would not need to decrypt values before materializing to >> disk. >> > > It's getting a bit late here, so maybe I overlooked something, but > wouldn't the natural thing to do be to make the "encrypted" key a hash of > the original key, and let the value of the encrypted value be the cipher > text of the (original key, original value) pair. A scheme like this would > preserve equality of the key (strictly speaking there's a chance of > collision of course). I guess this could also be a solution for the > compacted topic issue Sönke mentioned. > > Cheers, > > Tom > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan wrote: > >> Thanks Sönke, this is an area in which Kafka is really, really far behind. >> >> I've built secure systems around Kafka as laid out in the KIP. One issue >> that is not addressed in the KIP is re-encryption of records after a key >> rotation. When a key is compromised, it's important that any data >> encrypted >> using that key is immediately destroyed or re-encrypted with a new key. >> Ideally first-class support for end-to-end encryption in Kafka would make >> this possible natively, or else I'm not sure what the point would be. It >> seems to me that the brokers would need to be involved in this process, so >> perhaps a client-first approach will be painting ourselves into a corner. >> Not sure. >> >> Another issue is whether materialized tables, e.g. in Kafka Streams, would >> see unencrypted or encrypted records. If we implemented the KIP as >> written, >> it would still result in a bunch of plain text data in RocksDB everywhere. >> Again, I'm not sure what the point would be. Perhaps using custom serdes >> would actually be a more holistic approach, since Kafka Streams etc could >> leverage these as well. >> >> Similarly, if the whole record is encrypted, it becomes impossible to do >> joins, group bys etc, which just need the record key and maybe don't have >> access to the encryption key. Maybe only record _values_ should be >> encrypted, and maybe Kafka Streams could defer decryption until the actual >> value is inspected. That way joins etc are possible without the encryption >> key, and RocksDB would not need to decrypt values before materializing to >> disk. >> >> This is why I've implemented encryption on a per-field basis, not at the >> record level, when addressing kafka security in the past. And I've had to >> build external pipelines that purge, re-encrypt, and re-ingest records >> when >> keys are compromised. >> >> This KIP might be a step in the right direction, not sure. But I'm >> hesitant >> to support the idea of end-to-end encryption without a plan to address the >> myriad other problems. >> >> That said, we need this badly and I hope something shakes out. >> >> Ryanne >> >> On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau >> wrote: >> >> > All, >> > >> > I've asked for comments on this KIP in the past, but since I didn't >> really >> > get any feedback I've decided to reduce the initial scope of the KIP a >> bit >> > and try again. >> > >> > I have reworked to KIP to provide a limited, but useful set of features >> for >> > this initial KIP and laid out a very rough roadmap of what I'd envision >> > this looking like in a final version. >> > >> > I am aware that the KIP is currently light on implementation details, >> but >> > would like to get some feedback on the general approach before fully >> > speccing everything. >> > >> > The KIP can be found at >> > >> > >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka >> > >> > >> > I would very much appreciate any feedback! >> > >> > Best regards, >> > Sönke >> > >> >
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Rayanne, You raise some good points there. Similarly, if the whole record is encrypted, it becomes impossible to do > joins, group bys etc, which just need the record key and maybe don't have > access to the encryption key. Maybe only record _values_ should be > encrypted, and maybe Kafka Streams could defer decryption until the actual > value is inspected. That way joins etc are possible without the encryption > key, and RocksDB would not need to decrypt values before materializing to > disk. > It's getting a bit late here, so maybe I overlooked something, but wouldn't the natural thing to do be to make the "encrypted" key a hash of the original key, and let the value of the encrypted value be the cipher text of the (original key, original value) pair. A scheme like this would preserve equality of the key (strictly speaking there's a chance of collision of course). I guess this could also be a solution for the compacted topic issue Sönke mentioned. Cheers, Tom On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan wrote: > Thanks Sönke, this is an area in which Kafka is really, really far behind. > > I've built secure systems around Kafka as laid out in the KIP. One issue > that is not addressed in the KIP is re-encryption of records after a key > rotation. When a key is compromised, it's important that any data encrypted > using that key is immediately destroyed or re-encrypted with a new key. > Ideally first-class support for end-to-end encryption in Kafka would make > this possible natively, or else I'm not sure what the point would be. It > seems to me that the brokers would need to be involved in this process, so > perhaps a client-first approach will be painting ourselves into a corner. > Not sure. > > Another issue is whether materialized tables, e.g. in Kafka Streams, would > see unencrypted or encrypted records. If we implemented the KIP as written, > it would still result in a bunch of plain text data in RocksDB everywhere. > Again, I'm not sure what the point would be. Perhaps using custom serdes > would actually be a more holistic approach, since Kafka Streams etc could > leverage these as well. > > Similarly, if the whole record is encrypted, it becomes impossible to do > joins, group bys etc, which just need the record key and maybe don't have > access to the encryption key. Maybe only record _values_ should be > encrypted, and maybe Kafka Streams could defer decryption until the actual > value is inspected. That way joins etc are possible without the encryption > key, and RocksDB would not need to decrypt values before materializing to > disk. > > This is why I've implemented encryption on a per-field basis, not at the > record level, when addressing kafka security in the past. And I've had to > build external pipelines that purge, re-encrypt, and re-ingest records when > keys are compromised. > > This KIP might be a step in the right direction, not sure. But I'm hesitant > to support the idea of end-to-end encryption without a plan to address the > myriad other problems. > > That said, we need this badly and I hope something shakes out. > > Ryanne > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau > wrote: > > > All, > > > > I've asked for comments on this KIP in the past, but since I didn't > really > > get any feedback I've decided to reduce the initial scope of the KIP a > bit > > and try again. > > > > I have reworked to KIP to provide a limited, but useful set of features > for > > this initial KIP and laid out a very rough roadmap of what I'd envision > > this looking like in a final version. > > > > I am aware that the KIP is currently light on implementation details, but > > would like to get some feedback on the general approach before fully > > speccing everything. > > > > The KIP can be found at > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > I would very much appreciate any feedback! > > > > Best regards, > > Sönke > > >
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Thanks Sönke, this is an area in which Kafka is really, really far behind. I've built secure systems around Kafka as laid out in the KIP. One issue that is not addressed in the KIP is re-encryption of records after a key rotation. When a key is compromised, it's important that any data encrypted using that key is immediately destroyed or re-encrypted with a new key. Ideally first-class support for end-to-end encryption in Kafka would make this possible natively, or else I'm not sure what the point would be. It seems to me that the brokers would need to be involved in this process, so perhaps a client-first approach will be painting ourselves into a corner. Not sure. Another issue is whether materialized tables, e.g. in Kafka Streams, would see unencrypted or encrypted records. If we implemented the KIP as written, it would still result in a bunch of plain text data in RocksDB everywhere. Again, I'm not sure what the point would be. Perhaps using custom serdes would actually be a more holistic approach, since Kafka Streams etc could leverage these as well. Similarly, if the whole record is encrypted, it becomes impossible to do joins, group bys etc, which just need the record key and maybe don't have access to the encryption key. Maybe only record _values_ should be encrypted, and maybe Kafka Streams could defer decryption until the actual value is inspected. That way joins etc are possible without the encryption key, and RocksDB would not need to decrypt values before materializing to disk. This is why I've implemented encryption on a per-field basis, not at the record level, when addressing kafka security in the past. And I've had to build external pipelines that purge, re-encrypt, and re-ingest records when keys are compromised. This KIP might be a step in the right direction, not sure. But I'm hesitant to support the idea of end-to-end encryption without a plan to address the myriad other problems. That said, we need this badly and I hope something shakes out. Ryanne On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau wrote: > All, > > I've asked for comments on this KIP in the past, but since I didn't really > get any feedback I've decided to reduce the initial scope of the KIP a bit > and try again. > > I have reworked to KIP to provide a limited, but useful set of features for > this initial KIP and laid out a very rough roadmap of what I'd envision > this looking like in a final version. > > I am aware that the KIP is currently light on implementation details, but > would like to get some feedback on the general approach before fully > speccing everything. > > The KIP can be found at > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > I would very much appreciate any feedback! > > Best regards, > Sönke >
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi I have just spotted this. I would be a little -1 encrypting headers these are NOT safe to encrypt. The whole original reason for headers was for non-sensitive but transport or other meta information details, very akin to tcp headers, e.g. those also are not encrypted. These should remain un-encrypted so tools that are simply bridging messages between brokers/systems, can rely on headers for this, without needing to peek inside the business payload part (or decrypting it). Second i would suggest we do not add additional section (again i would be a little -1 here) into the record specifically for this the whole point of headers being added, is additional bits such as this would levy on top of headers, e.g. the aes or other data that needs to transport with the record should be set into keys. Please see both the original KIP-82 but more importantly the case and uses that they were added for. https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers https://cwiki.apache.org/confluence/display/KAFKA/A+Case+for+Kafka+Headers Best Mike On 1 May 2020 at 23:18, Sönke Liebau wrote: Hi Tom, thanks for taking a look! Regarding your questions, I've answered below, but will also add more detail to the KIP around these questions. 1. The functionality in this first phase could indeed be achieved with custom serializers, that would then need to wrap the actual serializer that is to be used. However, looking forward I intend to add functionality that allows configuration to be configured broker-side via topic level configs and investigate encrypting entire batches of messages for performance. Both those things would require us to move past doing this in a serializer, so I think we should take that plunge now to avoid unnecessary refactoring later on. 2. Absolutely! I am currently working on a very (very) rough implementation to kind of prove the principle. I'll add those to the KIP as soon as I think they are in a somewhat final form. There are a lot of design details missing from the KIP, I didn't want to go all the way just for people to hate what I designed and have to start over ;) 3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of this KIP that allows configuring keys per topic pattern and will read the keys from a local file. This will provide encryption, but users would have to manually sync keystores across consumer and producer systems. Proper key management with rollover and retrieval from central vaults would come in a later phase. 4. I'm not 100% sure I follow your meaning here tbh. But I think the question may be academic in this first instance, as compression happens at batch level, so we can't encrypt at the record level after that. If we want to stick with encrypting individual records, that would have to happen pre-compression, unless I am mistaken about the internals here. Best regards, Sönke On Fri, 1 May 2020 at 18:19, Tom Bentley wrote: Hi Sönke, I never looked at the original version, but what you describe in the new version makes sense to me. Here are a few things which sprang to mind while I was reading: 1. It wasn't immediately obvious why this can't be achieved using custom serializers and deserializers. 2. It would be useful to fully define the Java interfaces you're talking about. 3 Would a KeyManager implementation be provided? 4. About compression+encryption: My understanding is CRIME used a chosen plaintext attack. AFAICS using compression would potentially allow a known plaintext attack, which is a weaker way of attacking a cipher. Even without compression in the picture known plaintext attacks would be possible, for example if the attacker knew the key was JSON encoded. Kind regards, Tom On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau wrote: All, I've asked for comments on this KIP in the past, but since I didn't really get any feedback I've decided to reduce the initial scope of the KIP a bit and try again. I have reworked to KIP to provide a limited, but useful set of features for this initial KIP and laid out a very rough roadmap of what I'd envision this looking like in a final version. I am aware that the KIP is currently light on implementation details, but would like to get some feedback on the general approach before fully speccing everything. The KIP can be found at https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka I would very much appreciate any feedback! Best regards, Sönke -- Sönke Liebau Partner Tel. +49 179 7940878 OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Small typo correction i meant headers at the end of this paragraph not keys (sorry long week already) corrected: " Second i would suggest we do not add additional section (again i would be a little -1 here) into the record specifically for this the whole point of headers being added, is additional bits such as this would levy on top of headers, e.g. the aes or other data that needs to transport with the record should be set into headers. " On 7 May 2020 at 8:47, Michael André Pearce wrote: Hi I have just spotted this. I would be a little -1 encrypting headers these are NOT safe to encrypt. The whole original reason for headers was for non-sensitive but transport or other meta information details, very akin to tcp headers, e.g. those also are not encrypted. These should remain un-encrypted so tools that are simply bridging messages between brokers/systems, can rely on headers for this, without needing to peek inside the business payload part (or decrypting it). Second i would suggest we do not add additional section (again i would be a little -1 here) into the record specifically for this the whole point of headers being added, is additional bits such as this would levy on top of headers, e.g. the aes or other data that needs to transport with the record should be set into keys. Please see both the original KIP-82 but more importantly the case and uses that they were added for. https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers https://cwiki.apache.org/confluence/display/KAFKA/A+Case+for+Kafka+Headers Best Mike On 1 May 2020 at 23:18, Sönke Liebau wrote: Hi Tom, thanks for taking a look! Regarding your questions, I've answered below, but will also add more detail to the KIP around these questions. 1. The functionality in this first phase could indeed be achieved with custom serializers, that would then need to wrap the actual serializer that is to be used. However, looking forward I intend to add functionality that allows configuration to be configured broker-side via topic level configs and investigate encrypting entire batches of messages for performance. Both those things would require us to move past doing this in a serializer, so I think we should take that plunge now to avoid unnecessary refactoring later on. 2. Absolutely! I am currently working on a very (very) rough implementation to kind of prove the principle. I'll add those to the KIP as soon as I think they are in a somewhat final form. There are a lot of design details missing from the KIP, I didn't want to go all the way just for people to hate what I designed and have to start over ;) 3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of this KIP that allows configuring keys per topic pattern and will read the keys from a local file. This will provide encryption, but users would have to manually sync keystores across consumer and producer systems. Proper key management with rollover and retrieval from central vaults would come in a later phase. 4. I'm not 100% sure I follow your meaning here tbh. But I think the question may be academic in this first instance, as compression happens at batch level, so we can't encrypt at the record level after that. If we want to stick with encrypting individual records, that would have to happen pre-compression, unless I am mistaken about the internals here. Best regards, Sönke On Fri, 1 May 2020 at 18:19, Tom Bentley wrote: Hi Sönke, I never looked at the original version, but what you describe in the new version makes sense to me. Here are a few things which sprang to mind while I was reading: 1. It wasn't immediately obvious why this can't be achieved using custom serializers and deserializers. 2. It would be useful to fully define the Java interfaces you're talking about. 3 Would a KeyManager implementation be provided? 4. About compression+encryption: My understanding is CRIME used a chosen plaintext attack. AFAICS using compression would potentially allow a known plaintext attack, which is a weaker way of attacking a cipher. Even without compression in the picture known plaintext attacks would be possible, for example if the attacker knew the key was JSON encoded. Kind regards, Tom On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau wrote: All, I've asked for comments on this KIP in the past, but since I didn't really get any feedback I've decided to reduce the initial scope of the KIP a bit and try again. I have reworked to KIP to provide a limited, but useful set of features for this initial KIP and laid out a very rough roadmap of what I'd envision this looking like in a final version. I am aware that the KIP is currently light on implementation details, but would like to get some feedback on the general approach before fully speccing everything. The KIP can be found at https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+t
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Sönke, Replies inline 1. The functionality in this first phase could indeed be achieved with > custom serializers, that would then need to wrap the actual serializer that > is to be used. However, looking forward I intend to add functionality that > allows configuration to be configured broker-side via topic level configs > and investigate encrypting entire batches of messages for performance. Both > those things would require us to move past doing this in a serializer, so I > think we should take that plunge now to avoid unnecessary refactoring later > on. > I suspect you might have a hard time getting this KIP approved when the immediate use cases it serves can already be implemented using custom serialization. Having a working implementation using custom serialization would: * prove there's interest in these features amongst end users * prove that there's interest in the specific features which would require end-to-end encryption to be implemented in Kafka itself * validate that the interfaces/abstractions in this proposal are the right ones All of those things would strengthen the argument for getting this into Apache Kafka eventually. > 2. Absolutely! I am currently working on a very (very) rough implementation > to kind of prove the principle. I'll add those to the KIP as soon as I > think they are in a somewhat final form. > There are a lot of design details missing from the KIP, I didn't want to go > all the way just for people to hate what I designed and have to start over > ;) > > 3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of > this KIP that allows configuring keys per topic pattern and will read the > keys from a local file. This will provide encryption, but users would have > to manually sync keystores across consumer and producer systems. Proper key > management with rollover and retrieval from central vaults would come in a > later phase. > I think this is the hard part in many respects. Having a working implementation for at least one key management system would presumably be a prerequisite for getting this merged. Even if this KIP got merged I think it's likely that there would be a desire to limit the number of implementations of the interfaces within Apache Kafka because of the maintenance and testing burden. (We've seen this in other areas previously, ConfigProviders being one example.) So again, this suggests to me that you might make more progress implementing this outside Apache Kafka for the moment. Having said all that, these are just my thoughts second guessing what the community might do. I might be wrong. Kind regards, Tom
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Tom, thanks for taking a look! Regarding your questions, I've answered below, but will also add more detail to the KIP around these questions. 1. The functionality in this first phase could indeed be achieved with custom serializers, that would then need to wrap the actual serializer that is to be used. However, looking forward I intend to add functionality that allows configuration to be configured broker-side via topic level configs and investigate encrypting entire batches of messages for performance. Both those things would require us to move past doing this in a serializer, so I think we should take that plunge now to avoid unnecessary refactoring later on. 2. Absolutely! I am currently working on a very (very) rough implementation to kind of prove the principle. I'll add those to the KIP as soon as I think they are in a somewhat final form. There are a lot of design details missing from the KIP, I didn't want to go all the way just for people to hate what I designed and have to start over ;) 3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of this KIP that allows configuring keys per topic pattern and will read the keys from a local file. This will provide encryption, but users would have to manually sync keystores across consumer and producer systems. Proper key management with rollover and retrieval from central vaults would come in a later phase. 4. I'm not 100% sure I follow your meaning here tbh. But I think the question may be academic in this first instance, as compression happens at batch level, so we can't encrypt at the record level after that. If we want to stick with encrypting individual records, that would have to happen pre-compression, unless I am mistaken about the internals here. Best regards, Sönke On Fri, 1 May 2020 at 18:19, Tom Bentley wrote: > Hi Sönke, > > I never looked at the original version, but what you describe in the new > version makes sense to me. > > Here are a few things which sprang to mind while I was reading: > > 1. It wasn't immediately obvious why this can't be achieved using custom > serializers and deserializers. > 2. It would be useful to fully define the Java interfaces you're talking > about. > 3 Would a KeyManager implementation be provided? > 4. About compression+encryption: My understanding is CRIME used a chosen > plaintext attack. AFAICS using compression would potentially allow a known > plaintext attack, which is a weaker way of attacking a cipher. Even without > compression in the picture known plaintext attacks would be possible, for > example if the attacker knew the key was JSON encoded. > > Kind regards, > > Tom > > On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau > wrote: > > > All, > > > > I've asked for comments on this KIP in the past, but since I didn't > really > > get any feedback I've decided to reduce the initial scope of the KIP a > bit > > and try again. > > > > I have reworked to KIP to provide a limited, but useful set of features > for > > this initial KIP and laid out a very rough roadmap of what I'd envision > > this looking like in a final version. > > > > I am aware that the KIP is currently light on implementation details, but > > would like to get some feedback on the general approach before fully > > speccing everything. > > > > The KIP can be found at > > > > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > > > > I would very much appreciate any feedback! > > > > Best regards, > > Sönke > > > -- Sönke Liebau Partner Tel. +49 179 7940878 OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka
Hi Sönke, I never looked at the original version, but what you describe in the new version makes sense to me. Here are a few things which sprang to mind while I was reading: 1. It wasn't immediately obvious why this can't be achieved using custom serializers and deserializers. 2. It would be useful to fully define the Java interfaces you're talking about. 3 Would a KeyManager implementation be provided? 4. About compression+encryption: My understanding is CRIME used a chosen plaintext attack. AFAICS using compression would potentially allow a known plaintext attack, which is a weaker way of attacking a cipher. Even without compression in the picture known plaintext attacks would be possible, for example if the attacker knew the key was JSON encoded. Kind regards, Tom On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau wrote: > All, > > I've asked for comments on this KIP in the past, but since I didn't really > get any feedback I've decided to reduce the initial scope of the KIP a bit > and try again. > > I have reworked to KIP to provide a limited, but useful set of features for > this initial KIP and laid out a very rough roadmap of what I'd envision > this looking like in a final version. > > I am aware that the KIP is currently light on implementation details, but > would like to get some feedback on the general approach before fully > speccing everything. > > The KIP can be found at > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka > > > I would very much appreciate any feedback! > > Best regards, > Sönke >