@Ryanne
> Seems that could still get us per-topic keys (vs encrypting the entire
> volume), which would be my main requirement.

Agreed, I think that per-topic separation of keys would be very valuable
for multi-tenancy.


My 2 cents is that if encryption at rest is sufficient to satisfy GDPR +
other similar data protection measures, then we should aim to do that
first. The demand is real and privacy laws wont likely be loosening any
time soon. That being said, I am not sufficiently familiar with the myriad
of data laws. I will look into it some more though, as I am now curious.


On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada <maulin.vasav...@gmail.com>
wrote:

> Hi Sonke
>
> Thanks for bringing this for discussion. There are lot of considerations
> even if we assume we have end-to-end encryption done. Example depending
> upon company's setup there could be restrictions on how/which encryption
> keys are shared. Environment could have multiple security and network
> boundaries beyond which keys are not allowed to be shared. That will mean
> that consumers may not be able to decrypt the messages at all if the data
> is moved from one zone to another. If we have mirroring done, are
> mirror-makers supposed to decrypt and encrypt again OR they would be pretty
> much bytes-in bytes-out paradigm that it is today? Also having a polyglot
> Kafka client base will force you to support encryption/decryption libraries
> that work for all the languages and that may not work depending upon the
> scope of the team owning Kafka Infrastructure.
>
> Combining disk encryption with TLS+ACLs could be enough instead of having
> end-to-end message level encryption. What is your opinion on that?
>
> We have experimented with end-to-end encryption with custom
> serializers/deserializers and I felt that was good enough because
> other challenges I mentioned before may not be ease to address with a
> generic solution.
>
> Thanks
> Maulin
>
>
>
> Thanks
> Maulin
>
>
>
>
> On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan <ryannedo...@gmail.com> wrote:
>
> > Adam, I agree, seems reasonable to limit the broker's responsibility to
> > encrypting only data at rest. I guess whole segment files could be
> > encrypted with the same key, and rotating keys would just involve
> > re-encrypting entire segments. Maybe a key rotation would involve closing
> > all affected segments and kicking off a background task to re-encrypt
> them.
> > Certainly that would not impede ingestion of new records, and seems
> > consumers could use the old segments until they are replaced with the
> newly
> > encrypted ones.
> >
> > Seems that could still get us per-topic keys (vs encrypting the entire
> > volume), which would be my main requirement.
> >
> > Not really "end-to-end", but combined with TLS or something, seems
> > reasonable.
> >
> > Ryanne
> >
> > On Sat, May 9, 2020, 11:00 AM Adam Bellemare <adam.bellem...@gmail.com>
> > wrote:
> >
> > > Hi All
> > >
> > > I typed up a number of replies which I have below, but I have one major
> > > overriding question: Is there a reason we aren't implementing
> > > encryption-at-rest almost exactly the same way that most relational
> > > databases do? ie:
> > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> > >
> > > I ask this because it seems like we're going to end up with something
> > > similar to what they did in terms of requirements, plus...
> > >
> > > "For the *past 16 months*, there has been discussion about whether and
> > how
> > > to implement Transparent Data Encryption (tde) in Postgres. Many other
> > > relational databases support tde, and *some security standards require*
> > it.
> > > However, it is also debatable how much security value tde provides.
> > > The tde *400-email
> > > thread* became difficult for people to follow..."
> > > What still isn't clear to me is the scope that we're trying to cover
> > here.
> > > Encryption at rest suggests that we need to have the data encrypted on
> > the
> > > brokers, and *only* on the brokers, since they're the durable units of
> > > storage. Any encryption over the wire should be covered by TLS.  I
> think
> > > that our goals for this should be (from
> > >
> >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models
> > > )
> > >
> > > > TDE protects data from theft when file system access controls are
> > > > compromised:
> > > >
> > > >    - Malicious user steals storage devices and reads database files
> > > >    directly.
> > > >    - Malicious backup operator takes backup.
> > > >    - Protecting data at rest (persistent data)
> > > >
> > > > This does not protect from users who can read system memory, e.g.,
> > shared
> > > > buffers, which root users can do.
> > > >
> > >
> > > I am not a security expert nor am I an expert on relational databases.
> > > However, I can't identify any reason why the approach outlined by
> > > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
> > > understanding) wouldn't work for data-at-rest encryption. In addition,
> > we'd
> > > get the added benefit of being consistent with other solutions, which
> is
> > an
> > > easier sell when discussing security with management (Kafka? Oh yeah,
> > their
> > > encryption solution is just like the one we already have in place for
> our
> > > Postgres solutions), and may let us avoid reinventing a good part of
> the
> > > wheel.
> > >
> > >
> > > ------------------
> > >
> > > @Ryanne
> > > One more complicating factor, regarding joins - the foreign key joiner
> > > requires access to the value to extract the foreign key - if it's
> > > encrypted, the FKJ would need to decrypt it to apply the value
> extractor.
> > >
> > > @Soenk re (1)
> > > > When people hear that this is not part of Apache Kafka itself, but
> that
> > > > would need to develop something themselves that more often than not
> is
> > > the
> > > > end of that discussion. Using something that is not "stock" is quite
> > > often
> > > > simply not an option.
> > >
> > > > I strongly feel that this is a needed feature in Kafka and that there
> > is
> > > a
> > > > large number of people out there that would want to use it - but I
> may
> > > very
> > > > well be mistaken, responses to this thread have not exactly been
> > > plentiful
> > > > this last year and a half..
> > >
> > > I agree with you on the default vs. non-default points made. We must
> all
> > > note that this mailing list is *not *representative of the typical
> users
> > of
> > > Kafka, and that many organizations are predominantly looking to use
> > > out-of-the-box solutions. This will only become more common as hosted
> > Kafka
> > > solutions (think AWS hosted Kafka) gain more traction. I think the goal
> > of
> > > this KIP to provide that out-of-the-box experience is extremely
> > important,
> > > especially for all the reasons noted so far (GDPR, privacy, financials,
> > > interest by many parties but no default solution).
> > >
> > > re: (4)
> > > >> Regarding plaintext data in RocksDB instances, I am a bit torn to be
> > > >> honest. On the one hand, I feel like this scenario is not something
> > that
> > > we
> > > >> can fully control.
> > >
> > > I agree with this in principle. I think that our responsibility to
> > encrypt
> > > data at rest ends the moment that data leaves the broker. That being
> > said,
> > > it isn't unreasonable. I am going to think more about this and see if I
> > can
> > > come up with something.
> > >
> > >
> > >
> > >
> > >
> > > On Fri, May 8, 2020 at 5:05 AM Sönke Liebau
> > > <soenke.lie...@opencore.com.invalid> wrote:
> > >
> > > > Hey everybody,
> > > >
> > > > thanks a lot for reading and giving feedback!! I'll try and answer
> all
> > > > points that I found going through the thread in this mail, but if I
> > miss
> > > > something please feel free to let me know! I've added a running
> number
> > to
> > > > the discussed topics for ease of reference down the road.
> > > >
> > > > I'll go through the KIP and update it with everything that I have
> > written
> > > > below after sending this mail.
> > > >
> > > > @Tom:
> > > > (1) If I understand your concerns correctly you feel that this
> > > > functionality would have a hard time getting approved into Apache
> Kafka
> > > > because it can be achieved with custom Serializers in the same way
> and
> > > that
> > > > we should maybe develop this outside of Apache Kafka at first.
> > > > I feel like it is precisely the fact that this is not part of core
> > Apache
> > > > Kafka that makes people think twice about doing end-to-end
> encryption.
> > I
> > > > may be working in a market (Germany) that is a bit special when
> > compared
> > > to
> > > > the rest of the world where encryption and things like that are
> > > concerned,
> > > > but I've personally sat in multiple meetings where this feature was
> > > > discussed. It is not necessarily the end-to-end encryption itself,
> but
> > > the
> > > > at-rest encryption that you get with it.
> > > > When people hear that this is not part of Apache Kafka itself, but
> that
> > > > would need to develop something themselves that more often than not
> is
> > > the
> > > > end of that discussion. Using something that is not "stock" is quite
> > > often
> > > > simply not an option.
> > > > Even if they decide to go forward with it, they'll find Hendrik's
> blog
> > > post
> > > > from 4 years ago on this, probably the Whitepapers from Confluent and
> > > > Lenses and maybe a few implementations on github - all of which just
> > > serve
> > > > to further muddy the waters. Not because any of these resources are
> bad
> > > or
> > > > wrong, but just because information and implementations are spread
> out
> > > over
> > > > a lot of different places. Developing this outside of Apache Kafka
> > would
> > > > simply serve to add one more item to this list that would not really
> > > matter
> > > > I'm afraid.
> > > >
> > > > I strongly feel that this is a needed feature in Kafka and that there
> > is
> > > a
> > > > large number of people out there that would want to use it - but I
> may
> > > very
> > > > well be mistaken, responses to this thread have not exactly been
> > > plentiful
> > > > this last year and a half..
> > > >
> > > > @Mike:
> > > > (2) Regarding the encryption of headers, my current idea is to keep
> > this
> > > > configurable. I have seen customers use headers for stuff like
> account
> > > > numbers which under the GDPR are considered to be personal data that
> > > should
> > > > be encrypted wherever possible. So in some instances it might be
> useful
> > > to
> > > > encrypt header fields as well.
> > > > My current PoC implementation allows specifying a Regex for headers
> > that
> > > > should be encrypted, which would allow having encrypted and
> unencrypted
> > > > headers in the same record to hopefully suit most use cases.
> > > >
> > > > (3) Also, my plan is to not change the message format, but to
> > > > "encrypt-in-place" and add a header field with the necessary
> > information
> > > > for decryption, which would then be removed by the decrypting
> consumer.
> > > > There may be some out-of-date intentions still in the KIP, I'll go
> > > through
> > > > it and update.
> > > >
> > > > @Ryanne:
> > > > First off, I fully agree that we should avoid painting ourselves
> into a
> > > > corner with an early client-only implementation. I scaled down this
> Kip
> > > > from earlier attempts that included things like key rollover and
> > > > broker-side implementations because I could not get any feedback from
> > the
> > > > community on those for a long time and felt that maybe there was no
> > > > appetite for the full-blown solution. So I decided to try with a more
> > > > limited scope. I am very happy to discuss/go for the fully featured
> > > version
> > > > again :)
> > > >
> > > > (4) Regarding plaintext data in RocksDB instances, I am a bit torn to
> > be
> > > > honest. On the one hand, I feel like this scenario is not something
> > that
> > > we
> > > > can fully control. Kafka Streams in this case is a client that takes
> > data
> > > > from Kafka, decrypts it and then puts it somewhere in plaintext. To
> me
> > > this
> > > > scenario differs only slightly from for example someone writing a
> > backup
> > > > job that reads a topic and writes it to a textfile - not much we can
> do
> > > > about it.
> > > > That being said, Kafka Streams is part of Apache Kafka, so does merit
> > > > special consideration. I'll have to dig into how StateStores are
> used a
> > > bit
> > > > (I am not the worlds largest expert - or any kind of expert on that)
> to
> > > try
> > > > and come up with an idea.
> > > >
> > > >
> > > > (5) On key encryption and hashing, this is definitely an issue that
> we
> > > need
> > > > a solution for. I currently have key encryption configurable in my
> > > > implementation. When encryption is enabled, an option would of course
> > be
> > > to
> > > > hash the original key and store the key data together with the value
> in
> > > an
> > > > encrypted form. Any salt added to the key before hashing could be
> > > encrypted
> > > > along with the data. This would allow all key-based functionality
> like
> > > > compaction, joins etc. to keep working without having to know the
> > > cleartext
> > > > key.
> > > >
> > > > I've also considered deterministic encryption which would keep the
> > > > encrypted key the same, but I am fairly certain that we will want to
> > > allow
> > > > regular key rotation (more on this in next paragraph) without
> > > re-encrypting
> > > > older data and that would then change the encrypted key and break all
> > > these
> > > > things.
> > > > Regarding re-encrypting existing keys when a crypto key is
> > compromised, I
> > > > think we need to be very careful with this if we do it in-place on
> the
> > > > broker. If we add functionality along the lines of compaction, which
> > > reads
> > > > re-encrypts and rewrites segment files we have to make sure that
> > > producers
> > > > chose partitions on the cleartext value, otherwise all records
> starting
> > > > from the key change may go to a different partition of the topic..
> > > >
> > > > (6) Key rollover would be a cool feature to have. I was up until now
> > only
> > > > thinking about supporting regular key rollover functionality that
> would
> > > > change keys for all records going forward tbh - mostly for complexity
> > > > reasons - I think there was actually a sentence in the original KIP
> to
> > > this
> > > > regard. But if you and others feel this is needed then I am happy to
> > > > discuss this.
> > > > If we implement this on the broker we could use topic compaction for
> > > > inspiration, read all segment files and check records one by one, if
> > the
> > > > key used for that record has been "retired/compromised/..."
> re-encrypt
> > > with
> > > > new key and write a new segment file. Lots of things to consider
> around
> > > > this regarding performance, how to trigger etc. but in principle this
> > > could
> > > > work I think.
> > > > One issue I can see with this is if we use envelope encryption for
> the
> > > keys
> > > > to address the rogue admin issue, so the broker doesn't have access
> to
> > > the
> > > > actual key encrypting the data, this would make that operation
> > > impossible.
> > > >
> > > >
> > > >
> > > > I hope I got to all items that were raised, but may very well have
> > > > overlooked something, please let me know if I did - and of course
> your
> > > > thoughts on what I wrote!
> > > >
> > > > I'll update the KIP today as well.
> > > >
> > > > Best regards,
> > > > Sönke
> > > >
> > > >
> > > >
> > > >
> > > > On Thu, 7 May 2020 at 19:54, Ryanne Dolan <ryannedo...@gmail.com>
> > wrote:
> > > >
> > > > > Tom, good point, I've done exactly that -- hashing record keys --
> but
> > > > it's
> > > > > unclear to me what should happen when the hash key must be rotated.
> > In
> > > my
> > > > > case the (external) solution involved rainbow tables, versioned
> keys,
> > > and
> > > > > custom materializers that were aware of older keys for each record.
> > > > >
> > > > > In particular I had a pipeline that would re-key records and
> > re-ingest
> > > > > them, while opportunistically overwriting records materialized with
> > the
> > > > old
> > > > > key.
> > > > >
> > > > > For a native solution I think maybe we'd need to carry around any
> old
> > > > > versions of each record key, perhaps as metadata. Then brokers and
> > > > > materializers can compact records based on _any_ overlapping key,
> > > maybe?
> > > > > Not sure.
> > > > >
> > > > > Ryanne
> > > > >
> > > > > On Thu, May 7, 2020, 12:05 PM Tom Bentley <tbent...@redhat.com>
> > wrote:
> > > > >
> > > > > > Hi Rayanne,
> > > > > >
> > > > > > You raise some good points there.
> > > > > >
> > > > > > Similarly, if the whole record is encrypted, it becomes
> impossible
> > to
> > > > do
> > > > > > > joins, group bys etc, which just need the record key and maybe
> > > don't
> > > > > have
> > > > > > > access to the encryption key. Maybe only record _values_ should
> > be
> > > > > > > encrypted, and maybe Kafka Streams could defer decryption until
> > the
> > > > > > actual
> > > > > > > value is inspected. That way joins etc are possible without the
> > > > > > encryption
> > > > > > > key, and RocksDB would not need to decrypt values before
> > > > materializing
> > > > > to
> > > > > > > disk.
> > > > > > >
> > > > > >
> > > > > > It's getting a bit late here, so maybe I overlooked something,
> but
> > > > > wouldn't
> > > > > > the natural thing to do be to make the "encrypted" key a hash of
> > the
> > > > > > original key, and let the value of the encrypted value be the
> > cipher
> > > > text
> > > > > > of the (original key, original value) pair. A scheme like this
> > would
> > > > > > preserve equality of the key (strictly speaking there's a chance
> of
> > > > > > collision of course). I guess this could also be a solution for
> the
> > > > > > compacted topic issue Sönke mentioned.
> > > > > >
> > > > > > Cheers,
> > > > > >
> > > > > > Tom
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan <
> ryannedo...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Thanks Sönke, this is an area in which Kafka is really, really
> > far
> > > > > > behind.
> > > > > > >
> > > > > > > I've built secure systems around Kafka as laid out in the KIP.
> > One
> > > > > issue
> > > > > > > that is not addressed in the KIP is re-encryption of records
> > after
> > > a
> > > > > key
> > > > > > > rotation. When a key is compromised, it's important that any
> data
> > > > > > encrypted
> > > > > > > using that key is immediately destroyed or re-encrypted with a
> > new
> > > > key.
> > > > > > > Ideally first-class support for end-to-end encryption in Kafka
> > > would
> > > > > make
> > > > > > > this possible natively, or else I'm not sure what the point
> would
> > > be.
> > > > > It
> > > > > > > seems to me that the brokers would need to be involved in this
> > > > process,
> > > > > > so
> > > > > > > perhaps a client-first approach will be painting ourselves
> into a
> > > > > corner.
> > > > > > > Not sure.
> > > > > > >
> > > > > > > Another issue is whether materialized tables, e.g. in Kafka
> > > Streams,
> > > > > > would
> > > > > > > see unencrypted or encrypted records. If we implemented the KIP
> > as
> > > > > > written,
> > > > > > > it would still result in a bunch of plain text data in RocksDB
> > > > > > everywhere.
> > > > > > > Again, I'm not sure what the point would be. Perhaps using
> custom
> > > > > serdes
> > > > > > > would actually be a more holistic approach, since Kafka Streams
> > etc
> > > > > could
> > > > > > > leverage these as well.
> > > > > > >
> > > > > > > Similarly, if the whole record is encrypted, it becomes
> > impossible
> > > to
> > > > > do
> > > > > > > joins, group bys etc, which just need the record key and maybe
> > > don't
> > > > > have
> > > > > > > access to the encryption key. Maybe only record _values_ should
> > be
> > > > > > > encrypted, and maybe Kafka Streams could defer decryption until
> > the
> > > > > > actual
> > > > > > > value is inspected. That way joins etc are possible without the
> > > > > > encryption
> > > > > > > key, and RocksDB would not need to decrypt values before
> > > > materializing
> > > > > to
> > > > > > > disk.
> > > > > > >
> > > > > > > This is why I've implemented encryption on a per-field basis,
> not
> > > at
> > > > > the
> > > > > > > record level, when addressing kafka security in the past. And
> > I've
> > > > had
> > > > > to
> > > > > > > build external pipelines that purge, re-encrypt, and re-ingest
> > > > records
> > > > > > when
> > > > > > > keys are compromised.
> > > > > > >
> > > > > > > This KIP might be a step in the right direction, not sure. But
> > I'm
> > > > > > hesitant
> > > > > > > to support the idea of end-to-end encryption without a plan to
> > > > address
> > > > > > the
> > > > > > > myriad other problems.
> > > > > > >
> > > > > > > That said, we need this badly and I hope something shakes out.
> > > > > > >
> > > > > > > Ryanne
> > > > > > >
> > > > > > > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
> > > > > > > <soenke.lie...@opencore.com.invalid> wrote:
> > > > > > >
> > > > > > > > All,
> > > > > > > >
> > > > > > > > I've asked for comments on this KIP in the past, but since I
> > > didn't
> > > > > > > really
> > > > > > > > get any feedback I've decided to reduce the initial scope of
> > the
> > > > KIP
> > > > > a
> > > > > > > bit
> > > > > > > > and try again.
> > > > > > > >
> > > > > > > > I have reworked to KIP to provide a limited, but useful set
> of
> > > > > features
> > > > > > > for
> > > > > > > > this initial KIP and laid out a very rough roadmap of what
> I'd
> > > > > envision
> > > > > > > > this looking like in a final version.
> > > > > > > >
> > > > > > > > I am aware that the KIP is currently light on implementation
> > > > details,
> > > > > > but
> > > > > > > > would like to get some feedback on the general approach
> before
> > > > fully
> > > > > > > > speccing everything.
> > > > > > > >
> > > > > > > > The KIP can be found at
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> > > > > > > >
> > > > > > > >
> > > > > > > > I would very much appreciate any feedback!
> > > > > > > >
> > > > > > > > Best regards,
> > > > > > > > Sönke
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > > >
> > > > --
> > > > Sönke Liebau
> > > > Partner
> > > > Tel. +49 179 7940878
> > > > OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany
> > > >
> > >
> >
>

Reply via email to