Tbh tom is right it is entirely possible to support end 2 end encryption today 
without broker or client changes with serializers. Infact i know many companies 
doing this.As such maybe a good approach would be to provide a default 
encryption and decryption serde thats able to be used rather than any client or 
broker changes at all. This way those who already have a working solution does 
not change and basically youre providing a default solution to those who have 
not already made one so that its easier to adopt.Sent from my Samsung Galaxy 
smartphone.
-------- Original message --------From: Sönke Liebau 
<soenke.lie...@opencore.com.INVALID> Date: 08/05/2020  10:05  (GMT+00:00) To: 
dev <dev@kafka.apache.org> Subject: Re: [DISCUSS] KIP-317 - Add end-to-end data 
encryption functionality to Apache Kafka Hey everybody,thanks a lot for reading 
and giving feedback!! I'll try and answer allpoints that I found going through 
the thread in this mail, but if I misssomething please feel free to let me 
know! I've added a running number tothe discussed topics for ease of reference 
down the road.I'll go through the KIP and update it with everything that I have 
writtenbelow after sending this mail.@Tom:(1) If I understand your concerns 
correctly you feel that thisfunctionality would have a hard time getting 
approved into Apache Kafkabecause it can be achieved with custom Serializers in 
the same way and thatwe should maybe develop this outside of Apache Kafka at 
first.I feel like it is precisely the fact that this is not part of core 
ApacheKafka that makes people think twice about doing end-to-end encryption. 
Imay be working in a market (Germany) that is a bit special when compared tothe 
rest of the world where encryption and things like that are concerned,but I've 
personally sat in multiple meetings where this feature wasdiscussed. It is not 
necessarily the end-to-end encryption itself, but theat-rest encryption that 
you get with it.When people hear that this is not part of Apache Kafka itself, 
but thatwould need to develop something themselves that more often than not is 
theend of that discussion. Using something that is not "stock" is quite 
oftensimply not an option.Even if they decide to go forward with it, they'll 
find Hendrik's blog postfrom 4 years ago on this, probably the Whitepapers from 
Confluent andLenses and maybe a few implementations on github - all of which 
just serveto further muddy the waters. Not because any of these resources are 
bad orwrong, but just because information and implementations are spread out 
overa lot of different places. Developing this outside of Apache Kafka 
wouldsimply serve to add one more item to this list that would not really 
matterI'm afraid.I strongly feel that this is a needed feature in Kafka and 
that there is alarge number of people out there that would want to use it - but 
I may verywell be mistaken, responses to this thread have not exactly been 
plentifulthis last year and a half..@Mike:(2) Regarding the encryption of 
headers, my current idea is to keep thisconfigurable. I have seen customers use 
headers for stuff like accountnumbers which under the GDPR are considered to be 
personal data that shouldbe encrypted wherever possible. So in some instances 
it might be useful toencrypt header fields as well.My current PoC 
implementation allows specifying a Regex for headers thatshould be encrypted, 
which would allow having encrypted and unencryptedheaders in the same record to 
hopefully suit most use cases.(3) Also, my plan is to not change the message 
format, but to"encrypt-in-place" and add a header field with the necessary 
informationfor decryption, which would then be removed by the decrypting 
consumer.There may be some out-of-date intentions still in the KIP, I'll go 
throughit and update.@Ryanne:First off, I fully agree that we should avoid 
painting ourselves into acorner with an early client-only implementation. I 
scaled down this Kipfrom earlier attempts that included things like key 
rollover andbroker-side implementations because I could not get any feedback 
from thecommunity on those for a long time and felt that maybe there was 
noappetite for the full-blown solution. So I decided to try with a morelimited 
scope. I am very happy to discuss/go for the fully featured versionagain :)(4) 
Regarding plaintext data in RocksDB instances, I am a bit torn to behonest. On 
the one hand, I feel like this scenario is not something that wecan fully 
control. Kafka Streams in this case is a client that takes datafrom Kafka, 
decrypts it and then puts it somewhere in plaintext. To me thisscenario differs 
only slightly from for example someone writing a backupjob that reads a topic 
and writes it to a textfile - not much we can doabout it.That being said, Kafka 
Streams is part of Apache Kafka, so does meritspecial consideration. I'll have 
to dig into how StateStores are used a bit(I am not the worlds largest expert - 
or any kind of expert on that) to tryand come up with an idea.(5) On key 
encryption and hashing, this is definitely an issue that we needa solution for. 
I currently have key encryption configurable in myimplementation. When 
encryption is enabled, an option would of course be tohash the original key and 
store the key data together with the value in anencrypted form. Any salt added 
to the key before hashing could be encryptedalong with the data. This would 
allow all key-based functionality likecompaction, joins etc. to keep working 
without having to know the cleartextkey.I've also considered deterministic 
encryption which would keep theencrypted key the same, but I am fairly certain 
that we will want to allowregular key rotation (more on this in next paragraph) 
without re-encryptingolder data and that would then change the encrypted key 
and break all thesethings.Regarding re-encrypting existing keys when a crypto 
key is compromised, Ithink we need to be very careful with this if we do it 
in-place on thebroker. If we add functionality along the lines of compaction, 
which readsre-encrypts and rewrites segment files we have to make sure that 
producerschose partitions on the cleartext value, otherwise all records 
startingfrom the key change may go to a different partition of the topic..(6) 
Key rollover would be a cool feature to have. I was up until now onlythinking 
about supporting regular key rollover functionality that wouldchange keys for 
all records going forward tbh - mostly for complexityreasons - I think there 
was actually a sentence in the original KIP to thisregard. But if you and 
others feel this is needed then I am happy todiscuss this.If we implement this 
on the broker we could use topic compaction forinspiration, read all segment 
files and check records one by one, if thekey used for that record has been 
"retired/compromised/..." re-encrypt withnew key and write a new segment file. 
Lots of things to consider aroundthis regarding performance, how to trigger 
etc. but in principle this couldwork I think.One issue I can see with this is 
if we use envelope encryption for the keysto address the rogue admin issue, so 
the broker doesn't have access to theactual key encrypting the data, this would 
make that operation impossible.I hope I got to all items that were raised, but 
may very well haveoverlooked something, please let me know if I did - and of 
course yourthoughts on what I wrote!I'll update the KIP today as well.Best 
regards,SönkeOn Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com> 
wrote:> Tom, good point, I've done exactly that -- hashing record keys -- but 
it's> unclear to me what should happen when the hash key must be rotated. In 
my> case the (external) solution involved rainbow tables, versioned keys, and> 
custom materializers that were aware of older keys for each record.>> In 
particular I had a pipeline that would re-key records and re-ingest> them, 
while opportunistically overwriting records materialized with the old> key.>> 
For a native solution I think maybe we'd need to carry around any old> versions 
of each record key, perhaps as metadata. Then brokers and> materializers can 
compact records based on _any_ overlapping key, maybe?> Not sure.>> Ryanne>> On 
Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com> wrote:>> > Hi 
Rayanne,> >> > You raise some good points there.> >> > Similarly, if the whole 
record is encrypted, it becomes impossible to do> > > joins, group bys etc, 
which just need the record key and maybe don't> have> > > access to the 
encryption key. Maybe only record _values_ should be> > > encrypted, and maybe 
Kafka Streams could defer decryption until the> > actual> > > value is 
inspected. That way joins etc are possible without the> > encryption> > > key, 
and RocksDB would not need to decrypt values before materializing> to> > > 
disk.> > >> >> > It's getting a bit late here, so maybe I overlooked something, 
but> wouldn't> > the natural thing to do be to make the "encrypted" key a hash 
of the> > original key, and let the value of the encrypted value be the cipher 
text> > of the (original key, original value) pair. A scheme like this would> > 
preserve equality of the key (strictly speaking there's a chance of> > 
collision of course). I guess this could also be a solution for the> > 
compacted topic issue Sönke mentioned.> >> > Cheers,> >> > Tom> >> >> >> > On 
Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <ryannedo...@gmail.com>> wrote:> >> > 
> Thanks Sönke, this is an area in which Kafka is really, really far> > 
behind.> > >> > > I've built secure systems around Kafka as laid out in the 
KIP. One> issue> > > that is not addressed in the KIP is re-encryption of 
records after a> key> > > rotation. When a key is compromised, it's important 
that any data> > encrypted> > > using that key is immediately destroyed or 
re-encrypted with a new key.> > > Ideally first-class support for end-to-end 
encryption in Kafka would> make> > > this possible natively, or else I'm not 
sure what the point would be.> It> > > seems to me that the brokers would need 
to be involved in this process,> > so> > > perhaps a client-first approach will 
be painting ourselves into a> corner.> > > Not sure.> > >> > > Another issue is 
whether materialized tables, e.g. in Kafka Streams,> > would> > > see 
unencrypted or encrypted records. If we implemented the KIP as> > written,> > > 
it would still result in a bunch of plain text data in RocksDB> > everywhere.> 
> > Again, I'm not sure what the point would be. Perhaps using custom> serdes> 
> > would actually be a more holistic approach, since Kafka Streams etc> could> 
> > leverage these as well.> > >> > > Similarly, if the whole record is 
encrypted, it becomes impossible to> do> > > joins, group bys etc, which just 
need the record key and maybe don't> have> > > access to the encryption key. 
Maybe only record _values_ should be> > > encrypted, and maybe Kafka Streams 
could defer decryption until the> > actual> > > value is inspected. That way 
joins etc are possible without the> > encryption> > > key, and RocksDB would 
not need to decrypt values before materializing> to> > > disk.> > >> > > This 
is why I've implemented encryption on a per-field basis, not at> the> > > 
record level, when addressing kafka security in the past. And I've had> to> > > 
build external pipelines that purge, re-encrypt, and re-ingest records> > when> 
> > keys are compromised.> > >> > > This KIP might be a step in the right 
direction, not sure. But I'm> > hesitant> > > to support the idea of end-to-end 
encryption without a plan to address> > the> > > myriad other problems.> > >> > 
> That said, we need this badly and I hope something shakes out.> > >> > > 
Ryanne> > >> > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau> > > 
<soenke.lie...@opencore.com.invalid> wrote:> > >> > > > All,> > > >> > > > I've 
asked for comments on this KIP in the past, but since I didn't> > > really> > > 
> get any feedback I've decided to reduce the initial scope of the KIP> a> > > 
bit> > > > and try again.> > > >> > > > I have reworked to KIP to provide a 
limited, but useful set of> features> > > for> > > > this initial KIP and laid 
out a very rough roadmap of what I'd> envision> > > > this looking like in a 
final version.> > > >> > > > I am aware that the KIP is currently light on 
implementation details,> > but> > > > would like to get some feedback on the 
general approach before fully> > > > speccing everything.> > > >> > > > The KIP 
can be found at> > > >> > > >> > >> >> 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka>
 > > >> > > >> > > > I would very much appreciate any feedback!> > > >> > > > 
Best regards,> > > > Sönke> > > >> > >> >>-- Sönke LiebauPartnerTel. +49 179 
7940878OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany

Reply via email to