Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-16 Thread Ryanne Dolan
Maybe worth taking a look at TDE in HDFS:

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/TransparentEncryption.html

A complete solution requires several Hadoop services. I suspect that would
scare the Kafka community a bit, but maybe it's unreasonable to expect
Kafka brokers to do all we've mentioned.

Of particular note, seems TDE uses multiple layers of keys to avoid
re-encrypting data when keys are rotated, iiuc.

Ryanne

On Sat, May 16, 2020, 9:04 AM Adam Bellemare 
wrote:

> Hi Sönke
>
> I've been giving it more thought over the last few days, and looking into
> other systems as well, and I think that I've derailed your proposal a bit
> with suggesting that at-rest encryption may be sufficient. I believe that
> many of us are lacking the context of the sorts of discussions you have had
> with stakeholders concerned about encryption. Anyways, a very brief
> abbreviation of my thoughts:
>
> 1) We should look to do encryption at-rest, but it should be outside the
> scope of this KIP. (Is disk encryption as provided by the OS or cloud
> provider sufficient?)
>
> 2) For end-to-end encryption, the part that concerns me is the various
> roles that the broker may play in this plan. For instance, in Phase 2:
>
> > This phase will concentrate on server-side configuration of encryption.
> Topic settings will be added that allow the specification of encryption
> settings that consumers and producers should use. Producers and Consumers
> will be enabled to fetch these settings and use them for encryption without
> the end-user having to configure anything in addition.
>
> > Brokers will be extended with pluggable Key Managers that will allow for
> automatic key rotation later on. A basic, keystore based implementation
> will be created.
> Again, I am not a security expert, but it seems to me that if we want
> end-to-end encryption on par with the sort of encryption we see in our
> RelationalDB cousins, it would require that the broker (which could be
> hosted remotely, with a potentially malicious admin) have no knowledge of
> any of the keys, nor be responsible for any sort of key rotation. I believe
> that all of this would be required to be handled by the clients themselves
> (though please correct me if I am misinterpreting this), and that to reduce
> attack surface possibilities, we should handle the encryption + decryption
> keys in a manner similar to how we handle TLS keys (client must supply
> their own).
>
> Ryanne does point out that automatic key-rotation of end-to-end encrypted
> data would be an incredibly useful feature to have. However, I am not sure
> how to square this against what is done with relational databases, as it
> seems that they require that the client perform any updates or changes to
> the encryption keys and data and wash their hands completely of that duty
> (which makes sense - keep the database out of it, reduce the attack
> surface). End-to-end, by definition requires that the broker be unable to
> decrypt any of the data, and having it responsible for rolling keys, while
> seemingly useful, does deftly throw end-to-end out the window.
>
> Final Q:
> Would it be reasonable to create a new optional service in the Kafka
> project that is strictly responsible for these sorts of encryption matters?
> Something like Confluent's schema registry, but as a mechanism for
> coordinating key rotations with clients, encryption key registrations per
> topic, etc.? KeyManager would plug into here, could use Kafka as the
> storage layer for the keys (as we do with schemas, but encrypted themselves
> of course) or use the whole thing as just a thin layer over a full blown
> remote KeyManager that simply coordinates the producers, consumers, and
> keys required for the data per topic. This independent service would give
> organizations the ability to host it locally for security purposes, while
> farming out the brokers to perhaps less trustworthy sources?
>
> Adam
>
>
>
>
>
>
> On Sun, May 10, 2020 at 7:52 PM Adam Bellemare 
> wrote:
>
> > @Ryanne
> > > Seems that could still get us per-topic keys (vs encrypting the entire
> > > volume), which would be my main requirement.
> >
> > Agreed, I think that per-topic separation of keys would be very valuable
> > for multi-tenancy.
> >
> >
> > My 2 cents is that if encryption at rest is sufficient to satisfy GDPR +
> > other similar data protection measures, then we should aim to do that
> > first. The demand is real and privacy laws wont likely be loosening any
> > time soon. That being said, I am not sufficiently familiar with the
> myriad
> > of data laws. I will look into it some more though, as I am now curious.
> >
> >
> > On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada <
> maulin.vasav...@gmail.com>
> > wrote:
> >
> >> Hi Sonke
> >>
> >> Thanks for bringing this for discussion. There are lot of considerations
> >> even if we assume we have end-to-end encryption done. Example depending
> >> upon company's setup there cou

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-16 Thread Adam Bellemare
Hi Sönke

I've been giving it more thought over the last few days, and looking into
other systems as well, and I think that I've derailed your proposal a bit
with suggesting that at-rest encryption may be sufficient. I believe that
many of us are lacking the context of the sorts of discussions you have had
with stakeholders concerned about encryption. Anyways, a very brief
abbreviation of my thoughts:

1) We should look to do encryption at-rest, but it should be outside the
scope of this KIP. (Is disk encryption as provided by the OS or cloud
provider sufficient?)

2) For end-to-end encryption, the part that concerns me is the various
roles that the broker may play in this plan. For instance, in Phase 2:

> This phase will concentrate on server-side configuration of encryption.
Topic settings will be added that allow the specification of encryption
settings that consumers and producers should use. Producers and Consumers
will be enabled to fetch these settings and use them for encryption without
the end-user having to configure anything in addition.

> Brokers will be extended with pluggable Key Managers that will allow for
automatic key rotation later on. A basic, keystore based implementation
will be created.
Again, I am not a security expert, but it seems to me that if we want
end-to-end encryption on par with the sort of encryption we see in our
RelationalDB cousins, it would require that the broker (which could be
hosted remotely, with a potentially malicious admin) have no knowledge of
any of the keys, nor be responsible for any sort of key rotation. I believe
that all of this would be required to be handled by the clients themselves
(though please correct me if I am misinterpreting this), and that to reduce
attack surface possibilities, we should handle the encryption + decryption
keys in a manner similar to how we handle TLS keys (client must supply
their own).

Ryanne does point out that automatic key-rotation of end-to-end encrypted
data would be an incredibly useful feature to have. However, I am not sure
how to square this against what is done with relational databases, as it
seems that they require that the client perform any updates or changes to
the encryption keys and data and wash their hands completely of that duty
(which makes sense - keep the database out of it, reduce the attack
surface). End-to-end, by definition requires that the broker be unable to
decrypt any of the data, and having it responsible for rolling keys, while
seemingly useful, does deftly throw end-to-end out the window.

Final Q:
Would it be reasonable to create a new optional service in the Kafka
project that is strictly responsible for these sorts of encryption matters?
Something like Confluent's schema registry, but as a mechanism for
coordinating key rotations with clients, encryption key registrations per
topic, etc.? KeyManager would plug into here, could use Kafka as the
storage layer for the keys (as we do with schemas, but encrypted themselves
of course) or use the whole thing as just a thin layer over a full blown
remote KeyManager that simply coordinates the producers, consumers, and
keys required for the data per topic. This independent service would give
organizations the ability to host it locally for security purposes, while
farming out the brokers to perhaps less trustworthy sources?

Adam






On Sun, May 10, 2020 at 7:52 PM Adam Bellemare 
wrote:

> @Ryanne
> > Seems that could still get us per-topic keys (vs encrypting the entire
> > volume), which would be my main requirement.
>
> Agreed, I think that per-topic separation of keys would be very valuable
> for multi-tenancy.
>
>
> My 2 cents is that if encryption at rest is sufficient to satisfy GDPR +
> other similar data protection measures, then we should aim to do that
> first. The demand is real and privacy laws wont likely be loosening any
> time soon. That being said, I am not sufficiently familiar with the myriad
> of data laws. I will look into it some more though, as I am now curious.
>
>
> On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada 
> wrote:
>
>> Hi Sonke
>>
>> Thanks for bringing this for discussion. There are lot of considerations
>> even if we assume we have end-to-end encryption done. Example depending
>> upon company's setup there could be restrictions on how/which encryption
>> keys are shared. Environment could have multiple security and network
>> boundaries beyond which keys are not allowed to be shared. That will mean
>> that consumers may not be able to decrypt the messages at all if the data
>> is moved from one zone to another. If we have mirroring done, are
>> mirror-makers supposed to decrypt and encrypt again OR they would be
>> pretty
>> much bytes-in bytes-out paradigm that it is today? Also having a polyglot
>> Kafka client base will force you to support encryption/decryption
>> libraries
>> that work for all the languages and that may not work depending upon the
>> scope of the team owning Kafka Infrastructure.
>>
>

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-10 Thread Adam Bellemare
@Ryanne
> Seems that could still get us per-topic keys (vs encrypting the entire
> volume), which would be my main requirement.

Agreed, I think that per-topic separation of keys would be very valuable
for multi-tenancy.


My 2 cents is that if encryption at rest is sufficient to satisfy GDPR +
other similar data protection measures, then we should aim to do that
first. The demand is real and privacy laws wont likely be loosening any
time soon. That being said, I am not sufficiently familiar with the myriad
of data laws. I will look into it some more though, as I am now curious.


On Sat, May 9, 2020 at 6:12 PM Maulin Vasavada 
wrote:

> Hi Sonke
>
> Thanks for bringing this for discussion. There are lot of considerations
> even if we assume we have end-to-end encryption done. Example depending
> upon company's setup there could be restrictions on how/which encryption
> keys are shared. Environment could have multiple security and network
> boundaries beyond which keys are not allowed to be shared. That will mean
> that consumers may not be able to decrypt the messages at all if the data
> is moved from one zone to another. If we have mirroring done, are
> mirror-makers supposed to decrypt and encrypt again OR they would be pretty
> much bytes-in bytes-out paradigm that it is today? Also having a polyglot
> Kafka client base will force you to support encryption/decryption libraries
> that work for all the languages and that may not work depending upon the
> scope of the team owning Kafka Infrastructure.
>
> Combining disk encryption with TLS+ACLs could be enough instead of having
> end-to-end message level encryption. What is your opinion on that?
>
> We have experimented with end-to-end encryption with custom
> serializers/deserializers and I felt that was good enough because
> other challenges I mentioned before may not be ease to address with a
> generic solution.
>
> Thanks
> Maulin
>
>
>
> Thanks
> Maulin
>
>
>
>
> On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan  wrote:
>
> > Adam, I agree, seems reasonable to limit the broker's responsibility to
> > encrypting only data at rest. I guess whole segment files could be
> > encrypted with the same key, and rotating keys would just involve
> > re-encrypting entire segments. Maybe a key rotation would involve closing
> > all affected segments and kicking off a background task to re-encrypt
> them.
> > Certainly that would not impede ingestion of new records, and seems
> > consumers could use the old segments until they are replaced with the
> newly
> > encrypted ones.
> >
> > Seems that could still get us per-topic keys (vs encrypting the entire
> > volume), which would be my main requirement.
> >
> > Not really "end-to-end", but combined with TLS or something, seems
> > reasonable.
> >
> > Ryanne
> >
> > On Sat, May 9, 2020, 11:00 AM Adam Bellemare 
> > wrote:
> >
> > > Hi All
> > >
> > > I typed up a number of replies which I have below, but I have one major
> > > overriding question: Is there a reason we aren't implementing
> > > encryption-at-rest almost exactly the same way that most relational
> > > databases do? ie:
> > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> > >
> > > I ask this because it seems like we're going to end up with something
> > > similar to what they did in terms of requirements, plus...
> > >
> > > "For the *past 16 months*, there has been discussion about whether and
> > how
> > > to implement Transparent Data Encryption (tde) in Postgres. Many other
> > > relational databases support tde, and *some security standards require*
> > it.
> > > However, it is also debatable how much security value tde provides.
> > > The tde *400-email
> > > thread* became difficult for people to follow..."
> > > What still isn't clear to me is the scope that we're trying to cover
> > here.
> > > Encryption at rest suggests that we need to have the data encrypted on
> > the
> > > brokers, and *only* on the brokers, since they're the durable units of
> > > storage. Any encryption over the wire should be covered by TLS.  I
> think
> > > that our goals for this should be (from
> > >
> >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models
> > > )
> > >
> > > > TDE protects data from theft when file system access controls are
> > > > compromised:
> > > >
> > > >- Malicious user steals storage devices and reads database files
> > > >directly.
> > > >- Malicious backup operator takes backup.
> > > >- Protecting data at rest (persistent data)
> > > >
> > > > This does not protect from users who can read system memory, e.g.,
> > shared
> > > > buffers, which root users can do.
> > > >
> > >
> > > I am not a security expert nor am I an expert on relational databases.
> > > However, I can't identify any reason why the approach outlined by
> > > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
> > > understanding) wouldn't work for data-at-rest encryption. In addition,
> > we'd
> > > get 

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-09 Thread Maulin Vasavada
Hi Sonke

Thanks for bringing this for discussion. There are lot of considerations
even if we assume we have end-to-end encryption done. Example depending
upon company's setup there could be restrictions on how/which encryption
keys are shared. Environment could have multiple security and network
boundaries beyond which keys are not allowed to be shared. That will mean
that consumers may not be able to decrypt the messages at all if the data
is moved from one zone to another. If we have mirroring done, are
mirror-makers supposed to decrypt and encrypt again OR they would be pretty
much bytes-in bytes-out paradigm that it is today? Also having a polyglot
Kafka client base will force you to support encryption/decryption libraries
that work for all the languages and that may not work depending upon the
scope of the team owning Kafka Infrastructure.

Combining disk encryption with TLS+ACLs could be enough instead of having
end-to-end message level encryption. What is your opinion on that?

We have experimented with end-to-end encryption with custom
serializers/deserializers and I felt that was good enough because
other challenges I mentioned before may not be ease to address with a
generic solution.

Thanks
Maulin



Thanks
Maulin




On Sat, May 9, 2020 at 2:05 PM Ryanne Dolan  wrote:

> Adam, I agree, seems reasonable to limit the broker's responsibility to
> encrypting only data at rest. I guess whole segment files could be
> encrypted with the same key, and rotating keys would just involve
> re-encrypting entire segments. Maybe a key rotation would involve closing
> all affected segments and kicking off a background task to re-encrypt them.
> Certainly that would not impede ingestion of new records, and seems
> consumers could use the old segments until they are replaced with the newly
> encrypted ones.
>
> Seems that could still get us per-topic keys (vs encrypting the entire
> volume), which would be my main requirement.
>
> Not really "end-to-end", but combined with TLS or something, seems
> reasonable.
>
> Ryanne
>
> On Sat, May 9, 2020, 11:00 AM Adam Bellemare 
> wrote:
>
> > Hi All
> >
> > I typed up a number of replies which I have below, but I have one major
> > overriding question: Is there a reason we aren't implementing
> > encryption-at-rest almost exactly the same way that most relational
> > databases do? ie:
> > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
> >
> > I ask this because it seems like we're going to end up with something
> > similar to what they did in terms of requirements, plus...
> >
> > "For the *past 16 months*, there has been discussion about whether and
> how
> > to implement Transparent Data Encryption (tde) in Postgres. Many other
> > relational databases support tde, and *some security standards require*
> it.
> > However, it is also debatable how much security value tde provides.
> > The tde *400-email
> > thread* became difficult for people to follow..."
> > What still isn't clear to me is the scope that we're trying to cover
> here.
> > Encryption at rest suggests that we need to have the data encrypted on
> the
> > brokers, and *only* on the brokers, since they're the durable units of
> > storage. Any encryption over the wire should be covered by TLS.  I think
> > that our goals for this should be (from
> >
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models
> > )
> >
> > > TDE protects data from theft when file system access controls are
> > > compromised:
> > >
> > >- Malicious user steals storage devices and reads database files
> > >directly.
> > >- Malicious backup operator takes backup.
> > >- Protecting data at rest (persistent data)
> > >
> > > This does not protect from users who can read system memory, e.g.,
> shared
> > > buffers, which root users can do.
> > >
> >
> > I am not a security expert nor am I an expert on relational databases.
> > However, I can't identify any reason why the approach outlined by
> > PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
> > understanding) wouldn't work for data-at-rest encryption. In addition,
> we'd
> > get the added benefit of being consistent with other solutions, which is
> an
> > easier sell when discussing security with management (Kafka? Oh yeah,
> their
> > encryption solution is just like the one we already have in place for our
> > Postgres solutions), and may let us avoid reinventing a good part of the
> > wheel.
> >
> >
> > --
> >
> > @Ryanne
> > One more complicating factor, regarding joins - the foreign key joiner
> > requires access to the value to extract the foreign key - if it's
> > encrypted, the FKJ would need to decrypt it to apply the value extractor.
> >
> > @Soenk re (1)
> > > When people hear that this is not part of Apache Kafka itself, but that
> > > would need to develop something themselves that more often than not is
> > the
> > > end of that discussion. Using something that is not "stock" is quit

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-09 Thread Ryanne Dolan
Adam, I agree, seems reasonable to limit the broker's responsibility to
encrypting only data at rest. I guess whole segment files could be
encrypted with the same key, and rotating keys would just involve
re-encrypting entire segments. Maybe a key rotation would involve closing
all affected segments and kicking off a background task to re-encrypt them.
Certainly that would not impede ingestion of new records, and seems
consumers could use the old segments until they are replaced with the newly
encrypted ones.

Seems that could still get us per-topic keys (vs encrypting the entire
volume), which would be my main requirement.

Not really "end-to-end", but combined with TLS or something, seems
reasonable.

Ryanne

On Sat, May 9, 2020, 11:00 AM Adam Bellemare 
wrote:

> Hi All
>
> I typed up a number of replies which I have below, but I have one major
> overriding question: Is there a reason we aren't implementing
> encryption-at-rest almost exactly the same way that most relational
> databases do? ie:
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption
>
> I ask this because it seems like we're going to end up with something
> similar to what they did in terms of requirements, plus...
>
> "For the *past 16 months*, there has been discussion about whether and how
> to implement Transparent Data Encryption (tde) in Postgres. Many other
> relational databases support tde, and *some security standards require* it.
> However, it is also debatable how much security value tde provides.
> The tde *400-email
> thread* became difficult for people to follow..."
> What still isn't clear to me is the scope that we're trying to cover here.
> Encryption at rest suggests that we need to have the data encrypted on the
> brokers, and *only* on the brokers, since they're the durable units of
> storage. Any encryption over the wire should be covered by TLS.  I think
> that our goals for this should be (from
> https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models
> )
>
> > TDE protects data from theft when file system access controls are
> > compromised:
> >
> >- Malicious user steals storage devices and reads database files
> >directly.
> >- Malicious backup operator takes backup.
> >- Protecting data at rest (persistent data)
> >
> > This does not protect from users who can read system memory, e.g., shared
> > buffers, which root users can do.
> >
>
> I am not a security expert nor am I an expert on relational databases.
> However, I can't identify any reason why the approach outlined by
> PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
> understanding) wouldn't work for data-at-rest encryption. In addition, we'd
> get the added benefit of being consistent with other solutions, which is an
> easier sell when discussing security with management (Kafka? Oh yeah, their
> encryption solution is just like the one we already have in place for our
> Postgres solutions), and may let us avoid reinventing a good part of the
> wheel.
>
>
> --
>
> @Ryanne
> One more complicating factor, regarding joins - the foreign key joiner
> requires access to the value to extract the foreign key - if it's
> encrypted, the FKJ would need to decrypt it to apply the value extractor.
>
> @Soenk re (1)
> > When people hear that this is not part of Apache Kafka itself, but that
> > would need to develop something themselves that more often than not is
> the
> > end of that discussion. Using something that is not "stock" is quite
> often
> > simply not an option.
>
> > I strongly feel that this is a needed feature in Kafka and that there is
> a
> > large number of people out there that would want to use it - but I may
> very
> > well be mistaken, responses to this thread have not exactly been
> plentiful
> > this last year and a half..
>
> I agree with you on the default vs. non-default points made. We must all
> note that this mailing list is *not *representative of the typical users of
> Kafka, and that many organizations are predominantly looking to use
> out-of-the-box solutions. This will only become more common as hosted Kafka
> solutions (think AWS hosted Kafka) gain more traction. I think the goal of
> this KIP to provide that out-of-the-box experience is extremely important,
> especially for all the reasons noted so far (GDPR, privacy, financials,
> interest by many parties but no default solution).
>
> re: (4)
> >> Regarding plaintext data in RocksDB instances, I am a bit torn to be
> >> honest. On the one hand, I feel like this scenario is not something that
> we
> >> can fully control.
>
> I agree with this in principle. I think that our responsibility to encrypt
> data at rest ends the moment that data leaves the broker. That being said,
> it isn't unreasonable. I am going to think more about this and see if I can
> come up with something.
>
>
>
>
>
> On Fri, May 8, 2020 at 5:05 AM Sönke Liebau
>  wrote:
>
> > Hey everybody,
> >
> > thanks a lot for reading and gi

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-09 Thread Adam Bellemare
Hi All

I typed up a number of replies which I have below, but I have one major
overriding question: Is there a reason we aren't implementing
encryption-at-rest almost exactly the same way that most relational
databases do? ie:
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption

I ask this because it seems like we're going to end up with something
similar to what they did in terms of requirements, plus...

"For the *past 16 months*, there has been discussion about whether and how
to implement Transparent Data Encryption (tde) in Postgres. Many other
relational databases support tde, and *some security standards require* it.
However, it is also debatable how much security value tde provides.
The tde *400-email
thread* became difficult for people to follow..."
What still isn't clear to me is the scope that we're trying to cover here.
Encryption at rest suggests that we need to have the data encrypted on the
brokers, and *only* on the brokers, since they're the durable units of
storage. Any encryption over the wire should be covered by TLS.  I think
that our goals for this should be (from
https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#Threat_models)

> TDE protects data from theft when file system access controls are
> compromised:
>
>- Malicious user steals storage devices and reads database files
>directly.
>- Malicious backup operator takes backup.
>- Protecting data at rest (persistent data)
>
> This does not protect from users who can read system memory, e.g., shared
> buffers, which root users can do.
>

I am not a security expert nor am I an expert on relational databases.
However, I can't identify any reason why the approach outlined by
PostgresDB, which is very similar to MySQL/InnoDB and IBM (from my
understanding) wouldn't work for data-at-rest encryption. In addition, we'd
get the added benefit of being consistent with other solutions, which is an
easier sell when discussing security with management (Kafka? Oh yeah, their
encryption solution is just like the one we already have in place for our
Postgres solutions), and may let us avoid reinventing a good part of the
wheel.


--

@Ryanne
One more complicating factor, regarding joins - the foreign key joiner
requires access to the value to extract the foreign key - if it's
encrypted, the FKJ would need to decrypt it to apply the value extractor.

@Soenk re (1)
> When people hear that this is not part of Apache Kafka itself, but that
> would need to develop something themselves that more often than not is the
> end of that discussion. Using something that is not "stock" is quite often
> simply not an option.

> I strongly feel that this is a needed feature in Kafka and that there is a
> large number of people out there that would want to use it - but I may
very
> well be mistaken, responses to this thread have not exactly been plentiful
> this last year and a half..

I agree with you on the default vs. non-default points made. We must all
note that this mailing list is *not *representative of the typical users of
Kafka, and that many organizations are predominantly looking to use
out-of-the-box solutions. This will only become more common as hosted Kafka
solutions (think AWS hosted Kafka) gain more traction. I think the goal of
this KIP to provide that out-of-the-box experience is extremely important,
especially for all the reasons noted so far (GDPR, privacy, financials,
interest by many parties but no default solution).

re: (4)
>> Regarding plaintext data in RocksDB instances, I am a bit torn to be
>> honest. On the one hand, I feel like this scenario is not something that
we
>> can fully control.

I agree with this in principle. I think that our responsibility to encrypt
data at rest ends the moment that data leaves the broker. That being said,
it isn't unreasonable. I am going to think more about this and see if I can
come up with something.





On Fri, May 8, 2020 at 5:05 AM Sönke Liebau
 wrote:

> Hey everybody,
>
> thanks a lot for reading and giving feedback!! I'll try and answer all
> points that I found going through the thread in this mail, but if I miss
> something please feel free to let me know! I've added a running number to
> the discussed topics for ease of reference down the road.
>
> I'll go through the KIP and update it with everything that I have written
> below after sending this mail.
>
> @Tom:
> (1) If I understand your concerns correctly you feel that this
> functionality would have a hard time getting approved into Apache Kafka
> because it can be achieved with custom Serializers in the same way and that
> we should maybe develop this outside of Apache Kafka at first.
> I feel like it is precisely the fact that this is not part of core Apache
> Kafka that makes people think twice about doing end-to-end encryption. I
> may be working in a market (Germany) that is a bit special when compared to
> the rest of the world where encryption and things like that are concerned,

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-09 Thread michael.andre.pearce
Tbh tom is right it is entirely possible to support end 2 end encryption today 
without broker or client changes with serializers. Infact i know many companies 
doing this.As such maybe a good approach would be to provide a default 
encryption and decryption serde thats able to be used rather than any client or 
broker changes at all. This way those who already have a working solution does 
not change and basically youre providing a default solution to those who have 
not already made one so that its easier to adopt.Sent from my Samsung Galaxy 
smartphone.
 Original message From: Sönke Liebau 
 Date: 08/05/2020  10:05  (GMT+00:00) To: 
dev  Subject: Re: [DISCUSS] KIP-317 - Add end-to-end data 
encryption functionality to Apache Kafka Hey everybody,thanks a lot for reading 
and giving feedback!! I'll try and answer allpoints that I found going through 
the thread in this mail, but if I misssomething please feel free to let me 
know! I've added a running number tothe discussed topics for ease of reference 
down the road.I'll go through the KIP and update it with everything that I have 
writtenbelow after sending this mail.@Tom:(1) If I understand your concerns 
correctly you feel that thisfunctionality would have a hard time getting 
approved into Apache Kafkabecause it can be achieved with custom Serializers in 
the same way and thatwe should maybe develop this outside of Apache Kafka at 
first.I feel like it is precisely the fact that this is not part of core 
ApacheKafka that makes people think twice about doing end-to-end encryption. 
Imay be working in a market (Germany) that is a bit special when compared tothe 
rest of the world where encryption and things like that are concerned,but I've 
personally sat in multiple meetings where this feature wasdiscussed. It is not 
necessarily the end-to-end encryption itself, but theat-rest encryption that 
you get with it.When people hear that this is not part of Apache Kafka itself, 
but thatwould need to develop something themselves that more often than not is 
theend of that discussion. Using something that is not "stock" is quite 
oftensimply not an option.Even if they decide to go forward with it, they'll 
find Hendrik's blog postfrom 4 years ago on this, probably the Whitepapers from 
Confluent andLenses and maybe a few implementations on github - all of which 
just serveto further muddy the waters. Not because any of these resources are 
bad orwrong, but just because information and implementations are spread out 
overa lot of different places. Developing this outside of Apache Kafka 
wouldsimply serve to add one more item to this list that would not really 
matterI'm afraid.I strongly feel that this is a needed feature in Kafka and 
that there is alarge number of people out there that would want to use it - but 
I may verywell be mistaken, responses to this thread have not exactly been 
plentifulthis last year and a half..@Mike:(2) Regarding the encryption of 
headers, my current idea is to keep thisconfigurable. I have seen customers use 
headers for stuff like accountnumbers which under the GDPR are considered to be 
personal data that shouldbe encrypted wherever possible. So in some instances 
it might be useful toencrypt header fields as well.My current PoC 
implementation allows specifying a Regex for headers thatshould be encrypted, 
which would allow having encrypted and unencryptedheaders in the same record to 
hopefully suit most use cases.(3) Also, my plan is to not change the message 
format, but to"encrypt-in-place" and add a header field with the necessary 
informationfor decryption, which would then be removed by the decrypting 
consumer.There may be some out-of-date intentions still in the KIP, I'll go 
throughit and update.@Ryanne:First off, I fully agree that we should avoid 
painting ourselves into acorner with an early client-only implementation. I 
scaled down this Kipfrom earlier attempts that included things like key 
rollover andbroker-side implementations because I could not get any feedback 
from thecommunity on those for a long time and felt that maybe there was 
noappetite for the full-blown solution. So I decided to try with a morelimited 
scope. I am very happy to discuss/go for the fully featured versionagain :)(4) 
Regarding plaintext data in RocksDB instances, I am a bit torn to behonest. On 
the one hand, I feel like this scenario is not something that wecan fully 
control. Kafka Streams in this case is a client that takes datafrom Kafka, 
decrypts it and then puts it somewhere in plaintext. To me thisscenario differs 
only slightly from for example someone writing a backupjob that reads a topic 
and writes it to a textfile - not much we can doabout it.That being said, Kafka 
Streams is part of Apache Kafka, so does meritspecial consideration. I'll have 
to dig into how StateStores are used a bit(I am not the worlds largest expert - 
or any 

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-08 Thread Sönke Liebau
Hey everybody,

thanks a lot for reading and giving feedback!! I'll try and answer all
points that I found going through the thread in this mail, but if I miss
something please feel free to let me know! I've added a running number to
the discussed topics for ease of reference down the road.

I'll go through the KIP and update it with everything that I have written
below after sending this mail.

@Tom:
(1) If I understand your concerns correctly you feel that this
functionality would have a hard time getting approved into Apache Kafka
because it can be achieved with custom Serializers in the same way and that
we should maybe develop this outside of Apache Kafka at first.
I feel like it is precisely the fact that this is not part of core Apache
Kafka that makes people think twice about doing end-to-end encryption. I
may be working in a market (Germany) that is a bit special when compared to
the rest of the world where encryption and things like that are concerned,
but I've personally sat in multiple meetings where this feature was
discussed. It is not necessarily the end-to-end encryption itself, but the
at-rest encryption that you get with it.
When people hear that this is not part of Apache Kafka itself, but that
would need to develop something themselves that more often than not is the
end of that discussion. Using something that is not "stock" is quite often
simply not an option.
Even if they decide to go forward with it, they'll find Hendrik's blog post
from 4 years ago on this, probably the Whitepapers from Confluent and
Lenses and maybe a few implementations on github - all of which just serve
to further muddy the waters. Not because any of these resources are bad or
wrong, but just because information and implementations are spread out over
a lot of different places. Developing this outside of Apache Kafka would
simply serve to add one more item to this list that would not really matter
I'm afraid.

I strongly feel that this is a needed feature in Kafka and that there is a
large number of people out there that would want to use it - but I may very
well be mistaken, responses to this thread have not exactly been plentiful
this last year and a half..

@Mike:
(2) Regarding the encryption of headers, my current idea is to keep this
configurable. I have seen customers use headers for stuff like account
numbers which under the GDPR are considered to be personal data that should
be encrypted wherever possible. So in some instances it might be useful to
encrypt header fields as well.
My current PoC implementation allows specifying a Regex for headers that
should be encrypted, which would allow having encrypted and unencrypted
headers in the same record to hopefully suit most use cases.

(3) Also, my plan is to not change the message format, but to
"encrypt-in-place" and add a header field with the necessary information
for decryption, which would then be removed by the decrypting consumer.
There may be some out-of-date intentions still in the KIP, I'll go through
it and update.

@Ryanne:
First off, I fully agree that we should avoid painting ourselves into a
corner with an early client-only implementation. I scaled down this Kip
from earlier attempts that included things like key rollover and
broker-side implementations because I could not get any feedback from the
community on those for a long time and felt that maybe there was no
appetite for the full-blown solution. So I decided to try with a more
limited scope. I am very happy to discuss/go for the fully featured version
again :)

(4) Regarding plaintext data in RocksDB instances, I am a bit torn to be
honest. On the one hand, I feel like this scenario is not something that we
can fully control. Kafka Streams in this case is a client that takes data
from Kafka, decrypts it and then puts it somewhere in plaintext. To me this
scenario differs only slightly from for example someone writing a backup
job that reads a topic and writes it to a textfile - not much we can do
about it.
That being said, Kafka Streams is part of Apache Kafka, so does merit
special consideration. I'll have to dig into how StateStores are used a bit
(I am not the worlds largest expert - or any kind of expert on that) to try
and come up with an idea.


(5) On key encryption and hashing, this is definitely an issue that we need
a solution for. I currently have key encryption configurable in my
implementation. When encryption is enabled, an option would of course be to
hash the original key and store the key data together with the value in an
encrypted form. Any salt added to the key before hashing could be encrypted
along with the data. This would allow all key-based functionality like
compaction, joins etc. to keep working without having to know the cleartext
key.

I've also considered deterministic encryption which would keep the
encrypted key the same, but I am fairly certain that we will want to allow
regular key rotation (more on this in next paragraph) without re-encrypting
older 

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Ryanne Dolan
Tom, good point, I've done exactly that -- hashing record keys -- but it's
unclear to me what should happen when the hash key must be rotated. In my
case the (external) solution involved rainbow tables, versioned keys, and
custom materializers that were aware of older keys for each record.

In particular I had a pipeline that would re-key records and re-ingest
them, while opportunistically overwriting records materialized with the old
key.

For a native solution I think maybe we'd need to carry around any old
versions of each record key, perhaps as metadata. Then brokers and
materializers can compact records based on _any_ overlapping key, maybe?
Not sure.

Ryanne

On Thu, May 7, 2020, 12:05 PM Tom Bentley  wrote:

> Hi Rayanne,
>
> You raise some good points there.
>
> Similarly, if the whole record is encrypted, it becomes impossible to do
> > joins, group bys etc, which just need the record key and maybe don't have
> > access to the encryption key. Maybe only record _values_ should be
> > encrypted, and maybe Kafka Streams could defer decryption until the
> actual
> > value is inspected. That way joins etc are possible without the
> encryption
> > key, and RocksDB would not need to decrypt values before materializing to
> > disk.
> >
>
> It's getting a bit late here, so maybe I overlooked something, but wouldn't
> the natural thing to do be to make the "encrypted" key a hash of the
> original key, and let the value of the encrypted value be the cipher text
> of the (original key, original value) pair. A scheme like this would
> preserve equality of the key (strictly speaking there's a chance of
> collision of course). I guess this could also be a solution for the
> compacted topic issue Sönke mentioned.
>
> Cheers,
>
> Tom
>
>
>
> On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan  wrote:
>
> > Thanks Sönke, this is an area in which Kafka is really, really far
> behind.
> >
> > I've built secure systems around Kafka as laid out in the KIP. One issue
> > that is not addressed in the KIP is re-encryption of records after a key
> > rotation. When a key is compromised, it's important that any data
> encrypted
> > using that key is immediately destroyed or re-encrypted with a new key.
> > Ideally first-class support for end-to-end encryption in Kafka would make
> > this possible natively, or else I'm not sure what the point would be. It
> > seems to me that the brokers would need to be involved in this process,
> so
> > perhaps a client-first approach will be painting ourselves into a corner.
> > Not sure.
> >
> > Another issue is whether materialized tables, e.g. in Kafka Streams,
> would
> > see unencrypted or encrypted records. If we implemented the KIP as
> written,
> > it would still result in a bunch of plain text data in RocksDB
> everywhere.
> > Again, I'm not sure what the point would be. Perhaps using custom serdes
> > would actually be a more holistic approach, since Kafka Streams etc could
> > leverage these as well.
> >
> > Similarly, if the whole record is encrypted, it becomes impossible to do
> > joins, group bys etc, which just need the record key and maybe don't have
> > access to the encryption key. Maybe only record _values_ should be
> > encrypted, and maybe Kafka Streams could defer decryption until the
> actual
> > value is inspected. That way joins etc are possible without the
> encryption
> > key, and RocksDB would not need to decrypt values before materializing to
> > disk.
> >
> > This is why I've implemented encryption on a per-field basis, not at the
> > record level, when addressing kafka security in the past. And I've had to
> > build external pipelines that purge, re-encrypt, and re-ingest records
> when
> > keys are compromised.
> >
> > This KIP might be a step in the right direction, not sure. But I'm
> hesitant
> > to support the idea of end-to-end encryption without a plan to address
> the
> > myriad other problems.
> >
> > That said, we need this badly and I hope something shakes out.
> >
> > Ryanne
> >
> > On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
> >  wrote:
> >
> > > All,
> > >
> > > I've asked for comments on this KIP in the past, but since I didn't
> > really
> > > get any feedback I've decided to reduce the initial scope of the KIP a
> > bit
> > > and try again.
> > >
> > > I have reworked to KIP to provide a limited, but useful set of features
> > for
> > > this initial KIP and laid out a very rough roadmap of what I'd envision
> > > this looking like in a final version.
> > >
> > > I am aware that the KIP is currently light on implementation details,
> but
> > > would like to get some feedback on the general approach before fully
> > > speccing everything.
> > >
> > > The KIP can be found at
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> > >
> > >
> > > I would very much appreciate any feedback!
> > >
> > > Best regards,
> > > Sönke
> > >
> >
>


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Tom Bentley
Hi again,

Of course I was overlooking at least one thing. Anyone who could guess the
record keys could hash them and compare. To make it work the producer and
consumer would need a shared secret to include in the hash computation. But
the key management service could furnish them with this in addition to the
encryption/decryption keys, so I think it should still work. Unless I've
overlooked something else.

Cheers,

Tom

On Thu, May 7, 2020 at 6:04 PM Tom Bentley  wrote:

> Hi Rayanne,
>
> You raise some good points there.
>
> Similarly, if the whole record is encrypted, it becomes impossible to do
>> joins, group bys etc, which just need the record key and maybe don't have
>> access to the encryption key. Maybe only record _values_ should be
>> encrypted, and maybe Kafka Streams could defer decryption until the actual
>> value is inspected. That way joins etc are possible without the encryption
>> key, and RocksDB would not need to decrypt values before materializing to
>> disk.
>>
>
> It's getting a bit late here, so maybe I overlooked something, but
> wouldn't the natural thing to do be to make the "encrypted" key a hash of
> the original key, and let the value of the encrypted value be the cipher
> text of the (original key, original value) pair. A scheme like this would
> preserve equality of the key (strictly speaking there's a chance of
> collision of course). I guess this could also be a solution for the
> compacted topic issue Sönke mentioned.
>
> Cheers,
>
> Tom
>
>
>
> On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan  wrote:
>
>> Thanks Sönke, this is an area in which Kafka is really, really far behind.
>>
>> I've built secure systems around Kafka as laid out in the KIP. One issue
>> that is not addressed in the KIP is re-encryption of records after a key
>> rotation. When a key is compromised, it's important that any data
>> encrypted
>> using that key is immediately destroyed or re-encrypted with a new key.
>> Ideally first-class support for end-to-end encryption in Kafka would make
>> this possible natively, or else I'm not sure what the point would be. It
>> seems to me that the brokers would need to be involved in this process, so
>> perhaps a client-first approach will be painting ourselves into a corner.
>> Not sure.
>>
>> Another issue is whether materialized tables, e.g. in Kafka Streams, would
>> see unencrypted or encrypted records. If we implemented the KIP as
>> written,
>> it would still result in a bunch of plain text data in RocksDB everywhere.
>> Again, I'm not sure what the point would be. Perhaps using custom serdes
>> would actually be a more holistic approach, since Kafka Streams etc could
>> leverage these as well.
>>
>> Similarly, if the whole record is encrypted, it becomes impossible to do
>> joins, group bys etc, which just need the record key and maybe don't have
>> access to the encryption key. Maybe only record _values_ should be
>> encrypted, and maybe Kafka Streams could defer decryption until the actual
>> value is inspected. That way joins etc are possible without the encryption
>> key, and RocksDB would not need to decrypt values before materializing to
>> disk.
>>
>> This is why I've implemented encryption on a per-field basis, not at the
>> record level, when addressing kafka security in the past. And I've had to
>> build external pipelines that purge, re-encrypt, and re-ingest records
>> when
>> keys are compromised.
>>
>> This KIP might be a step in the right direction, not sure. But I'm
>> hesitant
>> to support the idea of end-to-end encryption without a plan to address the
>> myriad other problems.
>>
>> That said, we need this badly and I hope something shakes out.
>>
>> Ryanne
>>
>> On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
>>  wrote:
>>
>> > All,
>> >
>> > I've asked for comments on this KIP in the past, but since I didn't
>> really
>> > get any feedback I've decided to reduce the initial scope of the KIP a
>> bit
>> > and try again.
>> >
>> > I have reworked to KIP to provide a limited, but useful set of features
>> for
>> > this initial KIP and laid out a very rough roadmap of what I'd envision
>> > this looking like in a final version.
>> >
>> > I am aware that the KIP is currently light on implementation details,
>> but
>> > would like to get some feedback on the general approach before fully
>> > speccing everything.
>> >
>> > The KIP can be found at
>> >
>> >
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
>> >
>> >
>> > I would very much appreciate any feedback!
>> >
>> > Best regards,
>> > Sönke
>> >
>>
>


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Tom Bentley
Hi Rayanne,

You raise some good points there.

Similarly, if the whole record is encrypted, it becomes impossible to do
> joins, group bys etc, which just need the record key and maybe don't have
> access to the encryption key. Maybe only record _values_ should be
> encrypted, and maybe Kafka Streams could defer decryption until the actual
> value is inspected. That way joins etc are possible without the encryption
> key, and RocksDB would not need to decrypt values before materializing to
> disk.
>

It's getting a bit late here, so maybe I overlooked something, but wouldn't
the natural thing to do be to make the "encrypted" key a hash of the
original key, and let the value of the encrypted value be the cipher text
of the (original key, original value) pair. A scheme like this would
preserve equality of the key (strictly speaking there's a chance of
collision of course). I guess this could also be a solution for the
compacted topic issue Sönke mentioned.

Cheers,

Tom



On Thu, May 7, 2020 at 5:17 PM Ryanne Dolan  wrote:

> Thanks Sönke, this is an area in which Kafka is really, really far behind.
>
> I've built secure systems around Kafka as laid out in the KIP. One issue
> that is not addressed in the KIP is re-encryption of records after a key
> rotation. When a key is compromised, it's important that any data encrypted
> using that key is immediately destroyed or re-encrypted with a new key.
> Ideally first-class support for end-to-end encryption in Kafka would make
> this possible natively, or else I'm not sure what the point would be. It
> seems to me that the brokers would need to be involved in this process, so
> perhaps a client-first approach will be painting ourselves into a corner.
> Not sure.
>
> Another issue is whether materialized tables, e.g. in Kafka Streams, would
> see unencrypted or encrypted records. If we implemented the KIP as written,
> it would still result in a bunch of plain text data in RocksDB everywhere.
> Again, I'm not sure what the point would be. Perhaps using custom serdes
> would actually be a more holistic approach, since Kafka Streams etc could
> leverage these as well.
>
> Similarly, if the whole record is encrypted, it becomes impossible to do
> joins, group bys etc, which just need the record key and maybe don't have
> access to the encryption key. Maybe only record _values_ should be
> encrypted, and maybe Kafka Streams could defer decryption until the actual
> value is inspected. That way joins etc are possible without the encryption
> key, and RocksDB would not need to decrypt values before materializing to
> disk.
>
> This is why I've implemented encryption on a per-field basis, not at the
> record level, when addressing kafka security in the past. And I've had to
> build external pipelines that purge, re-encrypt, and re-ingest records when
> keys are compromised.
>
> This KIP might be a step in the right direction, not sure. But I'm hesitant
> to support the idea of end-to-end encryption without a plan to address the
> myriad other problems.
>
> That said, we need this badly and I hope something shakes out.
>
> Ryanne
>
> On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
>  wrote:
>
> > All,
> >
> > I've asked for comments on this KIP in the past, but since I didn't
> really
> > get any feedback I've decided to reduce the initial scope of the KIP a
> bit
> > and try again.
> >
> > I have reworked to KIP to provide a limited, but useful set of features
> for
> > this initial KIP and laid out a very rough roadmap of what I'd envision
> > this looking like in a final version.
> >
> > I am aware that the KIP is currently light on implementation details, but
> > would like to get some feedback on the general approach before fully
> > speccing everything.
> >
> > The KIP can be found at
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> >
> >
> > I would very much appreciate any feedback!
> >
> > Best regards,
> > Sönke
> >
>


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Ryanne Dolan
Thanks Sönke, this is an area in which Kafka is really, really far behind.

I've built secure systems around Kafka as laid out in the KIP. One issue
that is not addressed in the KIP is re-encryption of records after a key
rotation. When a key is compromised, it's important that any data encrypted
using that key is immediately destroyed or re-encrypted with a new key.
Ideally first-class support for end-to-end encryption in Kafka would make
this possible natively, or else I'm not sure what the point would be. It
seems to me that the brokers would need to be involved in this process, so
perhaps a client-first approach will be painting ourselves into a corner.
Not sure.

Another issue is whether materialized tables, e.g. in Kafka Streams, would
see unencrypted or encrypted records. If we implemented the KIP as written,
it would still result in a bunch of plain text data in RocksDB everywhere.
Again, I'm not sure what the point would be. Perhaps using custom serdes
would actually be a more holistic approach, since Kafka Streams etc could
leverage these as well.

Similarly, if the whole record is encrypted, it becomes impossible to do
joins, group bys etc, which just need the record key and maybe don't have
access to the encryption key. Maybe only record _values_ should be
encrypted, and maybe Kafka Streams could defer decryption until the actual
value is inspected. That way joins etc are possible without the encryption
key, and RocksDB would not need to decrypt values before materializing to
disk.

This is why I've implemented encryption on a per-field basis, not at the
record level, when addressing kafka security in the past. And I've had to
build external pipelines that purge, re-encrypt, and re-ingest records when
keys are compromised.

This KIP might be a step in the right direction, not sure. But I'm hesitant
to support the idea of end-to-end encryption without a plan to address the
myriad other problems.

That said, we need this badly and I hope something shakes out.

Ryanne

On Tue, Apr 28, 2020, 6:26 PM Sönke Liebau
 wrote:

> All,
>
> I've asked for comments on this KIP in the past, but since I didn't really
> get any feedback I've decided to reduce the initial scope of the KIP a bit
> and try again.
>
> I have reworked to KIP to provide a limited, but useful set of features for
> this initial KIP and laid out a very rough roadmap of what I'd envision
> this looking like in a final version.
>
> I am aware that the KIP is currently light on implementation details, but
> would like to get some feedback on the general approach before fully
> speccing everything.
>
> The KIP can be found at
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
>
>
> I would very much appreciate any feedback!
>
> Best regards,
> Sönke
>


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Michael André Pearce

Hi 


I have just spotted this.


I would be a little -1 encrypting headers these are NOT safe to encrypt. The 
whole original reason for headers was for non-sensitive but transport or other 
meta information details, very akin to tcp headers, e.g. those also are not 
encrypted. These should remain un-encrypted so tools that are simply bridging 
messages between brokers/systems, can rely on headers for this, without needing 
to peek inside the business payload part (or decrypting it).


Second i would suggest we do not add additional section (again i would be a 
little -1 here) into the record specifically for this the whole point of 
headers being added, is additional bits such as this would levy on top of 
headers, e.g. the aes or other data that needs to transport with the record 
should be set into keys.


Please see both the original KIP-82 but more importantly the case and uses that 
they were added for.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers
https://cwiki.apache.org/confluence/display/KAFKA/A+Case+for+Kafka+Headers


Best
Mike



On 1 May 2020 at 23:18, Sönke Liebau  wrote:


Hi Tom,

thanks for taking a look!

Regarding your questions, I've answered below, but will also add more
detail to the KIP around these questions.

1. The functionality in this first phase could indeed be achieved with
custom serializers, that would then need to wrap the actual serializer that
is to be used. However, looking forward I intend to add functionality that
allows configuration to be configured broker-side via topic level configs
and investigate encrypting entire batches of messages for performance. Both
those things would require us to move past doing this in a serializer, so I
think we should take that plunge now to avoid unnecessary refactoring later
on.

2. Absolutely! I am currently working on a very (very) rough implementation
to kind of prove the principle. I'll add those to the KIP as soon as I
think they are in a somewhat final form.
There are a lot of design details missing from the KIP, I didn't want to go
all the way just for people to hate what I designed and have to start over
;)

3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of
this KIP that allows configuring keys per topic pattern and will read the
keys from a local file. This will provide encryption, but users would have
to manually sync keystores across consumer and producer systems. Proper key
management with rollover and retrieval from central vaults would come in a
later phase.

4. I'm not 100% sure I follow your meaning here tbh. But I think the
question may be academic in this first instance, as compression happens at
batch level, so we can't encrypt at the record level after that. If we want
to stick with encrypting individual records, that would have to happen
pre-compression, unless I am mistaken about the internals here.

Best regards,
Sönke


On Fri, 1 May 2020 at 18:19, Tom Bentley  wrote:


Hi Sönke,


I never looked at the original version, but what you describe in the new
version makes sense to me.


Here are a few things which sprang to mind while I was reading:


1. It wasn't immediately obvious why this can't be achieved using custom
serializers and deserializers.
2. It would be useful to fully define the Java interfaces you're talking
about.
3 Would a KeyManager implementation be provided?
4. About compression+encryption: My understanding is CRIME used a chosen
plaintext attack. AFAICS using compression would potentially allow a known
plaintext attack, which is a weaker way of attacking a cipher. Even without
compression in the picture known plaintext attacks would be possible, for
example if the attacker knew the key was JSON encoded.


Kind regards,


Tom


On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau
 wrote:



All,

I've asked for comments on this KIP in the past, but since I didn't

really

get any feedback I've decided to reduce the initial scope of the KIP a

bit

and try again.

I have reworked to KIP to provide a limited, but useful set of features

for

this initial KIP and laid out a very rough roadmap of what I'd envision
this looking like in a final version.

I am aware that the KIP is currently light on implementation details, but
would like to get some feedback on the general approach before fully
speccing everything.

The KIP can be found at



https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka



I would very much appreciate any feedback!

Best regards,
Sönke






--
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Michael André Pearce

Small typo correction i meant headers at the end of this paragraph not keys 
(sorry long week already)


corrected:


"
Second i would suggest we do not add additional section (again i would be a 
little -1 here) into the record specifically for this the whole point of 
headers being added, is additional bits such as this would levy on top of 
headers, e.g. the aes or other data that needs to transport with the record 
should be set into headers.
"

On 7 May 2020 at 8:47, Michael André Pearce  
wrote:


Hi 


I have just spotted this.


I would be a little -1 encrypting headers these are NOT safe to encrypt. The 
whole original reason for headers was for non-sensitive but transport or other 
meta information details, very akin to tcp headers, e.g. those also are not 
encrypted. These should remain un-encrypted so tools that are simply bridging 
messages between brokers/systems, can rely on headers for this, without needing 
to peek inside the business payload part (or decrypting it).


Second i would suggest we do not add additional section (again i would be a 
little -1 here) into the record specifically for this the whole point of 
headers being added, is additional bits such as this would levy on top of 
headers, e.g. the aes or other data that needs to transport with the record 
should be set into keys.


Please see both the original KIP-82 but more importantly the case and uses that 
they were added for.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-82+-+Add+Record+Headers
https://cwiki.apache.org/confluence/display/KAFKA/A+Case+for+Kafka+Headers


Best
Mike



On 1 May 2020 at 23:18, Sönke Liebau  wrote:


Hi Tom,

thanks for taking a look!

Regarding your questions, I've answered below, but will also add more
detail to the KIP around these questions.

1. The functionality in this first phase could indeed be achieved with
custom serializers, that would then need to wrap the actual serializer that
is to be used. However, looking forward I intend to add functionality that
allows configuration to be configured broker-side via topic level configs
and investigate encrypting entire batches of messages for performance. Both
those things would require us to move past doing this in a serializer, so I
think we should take that plunge now to avoid unnecessary refactoring later
on.

2. Absolutely! I am currently working on a very (very) rough implementation
to kind of prove the principle. I'll add those to the KIP as soon as I
think they are in a somewhat final form.
There are a lot of design details missing from the KIP, I didn't want to go
all the way just for people to hate what I designed and have to start over
;)

3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of
this KIP that allows configuring keys per topic pattern and will read the
keys from a local file. This will provide encryption, but users would have
to manually sync keystores across consumer and producer systems. Proper key
management with rollover and retrieval from central vaults would come in a
later phase.

4. I'm not 100% sure I follow your meaning here tbh. But I think the
question may be academic in this first instance, as compression happens at
batch level, so we can't encrypt at the record level after that. If we want
to stick with encrypting individual records, that would have to happen
pre-compression, unless I am mistaken about the internals here.

Best regards,
Sönke


On Fri, 1 May 2020 at 18:19, Tom Bentley  wrote:


Hi Sönke,


I never looked at the original version, but what you describe in the new
version makes sense to me.


Here are a few things which sprang to mind while I was reading:


1. It wasn't immediately obvious why this can't be achieved using custom
serializers and deserializers.
2. It would be useful to fully define the Java interfaces you're talking
about.
3 Would a KeyManager implementation be provided?
4. About compression+encryption: My understanding is CRIME used a chosen
plaintext attack. AFAICS using compression would potentially allow a known
plaintext attack, which is a weaker way of attacking a cipher. Even without
compression in the picture known plaintext attacks would be possible, for
example if the attacker knew the key was JSON encoded.


Kind regards,


Tom


On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau
 wrote:



All,

I've asked for comments on this KIP in the past, but since I didn't

really

get any feedback I've decided to reduce the initial scope of the KIP a

bit

and try again.

I have reworked to KIP to provide a limited, but useful set of features

for

this initial KIP and laid out a very rough roadmap of what I'd envision
this looking like in a final version.

I am aware that the KIP is currently light on implementation details, but
would like to get some feedback on the general approach before fully
speccing everything.

The KIP can be found at



https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+t

Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-07 Thread Tom Bentley
Hi Sönke,

Replies inline

1. The functionality in this first phase could indeed be achieved with
> custom serializers, that would then need to wrap the actual serializer that
> is to be used. However, looking forward I intend to add functionality that
> allows configuration to be configured broker-side via topic level configs
> and investigate encrypting entire batches of messages for performance. Both
> those things would require us to move past doing this in a serializer, so I
> think we should take that plunge now to avoid unnecessary refactoring later
> on.
>

I suspect you might have a hard time getting this KIP approved when the
immediate use cases it serves can already be implemented using custom
serialization.

Having a working implementation using custom serialization would:

* prove there's interest in these features amongst end users
* prove that there's interest in the specific features which would require
end-to-end encryption to be implemented in Kafka itself
* validate that the interfaces/abstractions in this proposal are the right
ones

All of those things would strengthen the argument for getting this into
Apache Kafka eventually.


> 2. Absolutely! I am currently working on a very (very) rough implementation
> to kind of prove the principle. I'll add those to the KIP as soon as I
> think they are in a somewhat final form.
> There are a lot of design details missing from the KIP, I didn't want to go
> all the way just for people to hate what I designed and have to start over
> ;)
>
> 3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of
> this KIP that allows configuring keys per topic pattern and will read the
> keys from a local file. This will provide encryption, but users would have
> to manually sync keystores across consumer and producer systems. Proper key
> management with rollover and retrieval from central vaults would come in a
> later phase.
>

I think this is the hard part in many respects. Having a working
implementation for at least one key management system would presumably be a
prerequisite for getting this merged.

Even if this KIP got merged I think it's likely that there would be a
desire to limit the number of implementations of the interfaces within
Apache Kafka because of the maintenance and testing burden. (We've seen
this in other areas previously, ConfigProviders being one example.)

So again, this suggests to me that you might make more progress
implementing this outside Apache Kafka for the moment.

Having said all that, these are just my thoughts second guessing what the
community might do. I might be wrong.

Kind regards,

Tom


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-01 Thread Sönke Liebau
Hi Tom,

thanks for taking a look!

Regarding your questions, I've answered below, but will also add more
detail to the KIP around these questions.

1. The functionality in this first phase could indeed be achieved with
custom serializers, that would then need to wrap the actual serializer that
is to be used. However, looking forward I intend to add functionality that
allows configuration to be configured broker-side via topic level configs
and investigate encrypting entire batches of messages for performance. Both
those things would require us to move past doing this in a serializer, so I
think we should take that plunge now to avoid unnecessary refactoring later
on.

2. Absolutely! I am currently working on a very (very) rough implementation
to kind of prove the principle. I'll add those to the KIP as soon as I
think they are in a somewhat final form.
There are a lot of design details missing from the KIP, I didn't want to go
all the way just for people to hate what I designed and have to start over
;)

3. Yes. I plan to create a LocalKeystoreKeyManager (name tbd) as part of
this KIP that allows configuring keys per topic pattern and will read the
keys from a local file. This will provide encryption, but users would have
to manually sync keystores across consumer and producer systems. Proper key
management with rollover and retrieval from central vaults would come in a
later phase.

4. I'm not 100% sure I follow your meaning here tbh. But I think the
question may be academic in this first instance, as compression happens at
batch level, so we can't encrypt at the record level after that. If we want
to stick with encrypting individual records, that would have to happen
pre-compression, unless I am mistaken about the internals here.

Best regards,
Sönke


On Fri, 1 May 2020 at 18:19, Tom Bentley  wrote:

> Hi Sönke,
>
> I never looked at the original version, but what you describe in the new
> version makes sense to me.
>
> Here are a few things which sprang to mind while I was reading:
>
> 1. It wasn't immediately obvious why this can't be achieved using custom
> serializers and deserializers.
> 2. It would be useful to fully define the Java interfaces you're talking
> about.
> 3 Would a KeyManager implementation be provided?
> 4. About compression+encryption: My understanding is CRIME used a chosen
> plaintext attack. AFAICS using compression would potentially allow a known
> plaintext attack, which is a weaker way of attacking a cipher. Even without
> compression in the picture known plaintext attacks would be possible, for
> example if the attacker knew the key was JSON encoded.
>
> Kind regards,
>
> Tom
>
> On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau
>  wrote:
>
> > All,
> >
> > I've asked for comments on this KIP in the past, but since I didn't
> really
> > get any feedback I've decided to reduce the initial scope of the KIP a
> bit
> > and try again.
> >
> > I have reworked to KIP to provide a limited, but useful set of features
> for
> > this initial KIP and laid out a very rough roadmap of what I'd envision
> > this looking like in a final version.
> >
> > I am aware that the KIP is currently light on implementation details, but
> > would like to get some feedback on the general approach before fully
> > speccing everything.
> >
> > The KIP can be found at
> >
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
> >
> >
> > I would very much appreciate any feedback!
> >
> > Best regards,
> > Sönke
> >
>


-- 
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel - Germany


Re: [DISCUSS] KIP-317 - Add end-to-end data encryption functionality to Apache Kafka

2020-05-01 Thread Tom Bentley
Hi Sönke,

I never looked at the original version, but what you describe in the new
version makes sense to me.

Here are a few things which sprang to mind while I was reading:

1. It wasn't immediately obvious why this can't be achieved using custom
serializers and deserializers.
2. It would be useful to fully define the Java interfaces you're talking
about.
3 Would a KeyManager implementation be provided?
4. About compression+encryption: My understanding is CRIME used a chosen
plaintext attack. AFAICS using compression would potentially allow a known
plaintext attack, which is a weaker way of attacking a cipher. Even without
compression in the picture known plaintext attacks would be possible, for
example if the attacker knew the key was JSON encoded.

Kind regards,

Tom

On Wed, Apr 29, 2020 at 12:32 AM Sönke Liebau
 wrote:

> All,
>
> I've asked for comments on this KIP in the past, but since I didn't really
> get any feedback I've decided to reduce the initial scope of the KIP a bit
> and try again.
>
> I have reworked to KIP to provide a limited, but useful set of features for
> this initial KIP and laid out a very rough roadmap of what I'd envision
> this looking like in a final version.
>
> I am aware that the KIP is currently light on implementation details, but
> would like to get some feedback on the general approach before fully
> speccing everything.
>
> The KIP can be found at
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-317%3A+Add+end-to-end+data+encryption+functionality+to+Apache+Kafka
>
>
> I would very much appreciate any feedback!
>
> Best regards,
> Sönke
>