My concern is specifically around the rules for SOX compliance, or rules
around PII, PCI, or HIPAA compliance. The audits get very complication,
but my understanding is that the general rule is that sensitive data
should be encrypted at rest and only decrypted when needed. And we don¹t
just need to be concerned about a malicious user. Consider a ³typical²
technology environment where many people have administrative access to
systems. This is the one where you need to not have the data visible to
anyone unless they have a specific use for it, which means having it
encrypted. In almost any audit situation, you need to be able to show a
trail of exactly who modified the data, and exactly who viewed the data.

Now, I do agree that not everything has to be done within Kafka, and the
producers and consumers can coordinate their own encryption. But I think
it¹s useful to have the concept of an envelope for a message within Kafka.
This can be used to hold all sorts of useful information, such as hashes
of the encryption keys that were used to encrypt a message, or the
signature of the message itself (so that you can have both confidentiality
and integrity). It can also be used to hold things like the time a message
was received into your infrastructure, or the specific Kafka cluster it
was stored in. A special consumer and producer, such as the mirror maker,
would be able to preserve this envelope across clusters.

-Todd


On 6/5/14, 2:18 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:

>Hey Todd,
>
>Can you elaborate on this? Certainly restricting access to and
>modification
>of data is important. But this doesn't imply storing the data encrypted.
>Are we assuming the attacker can (1) get on the network, (2) get on the
>kafka server as a non-root and non-kafka user or (3) get root on the Kafka
>server? If we assume (3) then it seems we are in a pretty bad state as
>almost any facility Kafka provides can be subverted by the root user just
>changing the Kafka code to not enforce that facility. Which of these
>levels
>of access are we assuming?
>
>Also which things actually need to be done inside Kafka and which can be
>done externally? Nothing prevents users from encrypting data they put into
>Kafka today, it is just that Kafka doesn't do this for you. But is there a
>reason you want Kafka to do this?
>
>The reason I am pushing on these things a bit is because I want to make
>sure we don't end up with a set of requirements so broad we can never
>really get them implemented...
>
>-Jay
>
>
>
>
>On Thu, Jun 5, 2014 at 2:05 PM, Todd Palino <tpal...@linkedin.com.invalid>
>wrote:
>
>> No, at-rest encryption is definitely important. When you start talking
>> about data that is used for financial reporting, restricting access to
>>it
>> (both modification and visibility) is a critical component.
>>
>> -Todd
>>
>>
>> On 6/5/14, 2:01 PM, "Jay Kreps" <jay.kr...@gmail.com> wrote:
>>
>> >Hey Joe,
>> >
>> >I don't really understand the sections you added to the wiki. Can you
>> >clarify them?
>> >
>> >Is non-repudiation what SASL would call integrity checks? If so don't
>>SSL
>> >and and many of the SASL schemes already support this as well as
>> >on-the-wire encryption?
>> >
>> >Or are you proposing an on-disk encryption scheme? Is this actually
>> >needed?
>> >Isn't a on-the-wire encryption when combined with mutual authentication
>> >and
>> >permissions sufficient for most uses?
>> >
>> >On-disk encryption seems unnecessary because if an attacker can get
>>root
>> >on
>> >the kafka boxes it can potentially modify Kafka to do anything he or
>>she
>> >wants with data. So this seems to break any security model.
>> >
>> >I understand the problem of a large organization not really having a
>> >trusted network and wanting to secure data transfer and limit and audit
>> >data access. The uses for these other things I don't totally
>>understand.
>> >
>> >Also it would be worth understanding the state of other messaging and
>> >storage systems (Hadoop, dbs, etc). What features do they support. I
>>think
>> >there is a sense in which you don't have to run faster than the bear,
>>but
>> >only faster then your friends. :-)
>> >
>> >-Jay
>> >
>> >
>> >On Wed, Jun 4, 2014 at 5:57 PM, Joe Stein <joe.st...@stealth.ly> wrote:
>> >
>> >> I like the idea of working on the spec and prioritizing. I will
>>update
>> >>the
>> >> wiki.
>> >>
>> >> - Joestein
>> >>
>> >>
>> >> On Wed, Jun 4, 2014 at 1:11 PM, Jay Kreps <jay.kr...@gmail.com>
>>wrote:
>> >>
>> >> > Hey Joe,
>> >> >
>> >> > Thanks for kicking this discussion off! I totally agree that for
>> >> something
>> >> > that acts as a central message broker security is critical
>>feature. I
>> >> think
>> >> > a number of people have been interested in this topic and several
>> >>people
>> >> > have put effort into special purpose security efforts.
>> >> >
>> >> > Since most the LinkedIn folks are working on the consumer right
>>now I
>> >> think
>> >> > this would be a great project for any other interested people to
>>take
>> >>on.
>> >> > There are some challenges in doing these things distributed but it
>>can
>> >> also
>> >> > be a lot of fun.
>> >> >
>> >> > I think a good first step would be to get a written plan we can all
>> >>agree
>> >> > on for how things should work. Then we can break things down into
>> >>chunks
>> >> > that can be done independently while still aiming at a good end
>>state.
>> >> >
>> >> > I had tried to write up some notes that summarized at least the
>> >>thoughts
>> >> I
>> >> > had had on security:
>> >> > https://cwiki.apache.org/confluence/display/KAFKA/Security
>> >> >
>> >> > What do you think of that?
>> >> >
>> >> > One assumption I had (which may be incorrect) is that although we
>>want
>> >> all
>> >> > the things in your list, the two most pressing would be
>>authentication
>> >> and
>> >> > authorization, and that was all that write up covered. You have
>>more
>> >> > experience in this domain, so I wonder how you would prioritize?
>> >> >
>> >> > Those notes are really sketchy, so I think the first goal I would
>>have
>> >> > would be to get to a real spec we can all agree on and discuss. A
>>lot
>> >>of
>> >> > the security stuff has a high human interaction element and needs
>>to
>> >>work
>> >> > in pretty different domains and different companies so getting this
>> >>kind
>> >> of
>> >> > review is important.
>> >> >
>> >> > -Jay
>> >> >
>> >> >
>> >> > On Tue, Jun 3, 2014 at 12:57 PM, Joe Stein <joe.st...@stealth.ly>
>> >>wrote:
>> >> >
>> >> > > Hi,I wanted to re-ignite the discussion around Apache Kafka
>> >>Security.
>> >> >  This
>> >> > > is a huge bottleneck (non-starter in some cases) for a lot of
>> >> > organizations
>> >> > > (due to regulatory, compliance and other requirements). Below
>>are my
>> >> > > suggestions for specific changes in Kafka to accommodate security
>> >> > > requirements.  This comes from what folks are doing "in the
>>wild" to
>> >> > > workaround and implement security with Kafka as it is today and
>>also
>> >> > what I
>> >> > > have discovered from organizations about their blockers. It also
>> >>picks
>> >> up
>> >> > > from the wiki (which I should have time to update later in the
>>week
>> >> based
>> >> > > on the below and feedback from the thread).
>> >> > >
>> >> > > 1) Transport Layer Security (i.e. SSL)
>> >> > >
>> >> > > This also includes client authentication in addition to
>>in-transit
>> >> > security
>> >> > > layer.  This work has been picked up here
>> >> > > https://issues.apache.org/jira/browse/KAFKA-1477 and do
>>appreciate
>> >>any
>> >> > > thoughts, comments, feedback, tomatoes, whatever for this patch.
>> It
>> >> is a
>> >> > > pickup from the fork of the work first done here
>> >> > > https://github.com/relango/kafka/tree/kafka_security.
>> >> > >
>> >> > > 2) Data encryption at rest.
>> >> > >
>> >> > > This is very important and something that can be facilitated
>>within
>> >>the
>> >> > > wire protocol. It requires an additional map data structure for
>>the
>> >> > > "encrypted [data encryption key]". With this map (either in your
>> >>object
>> >> > or
>> >> > > in the wire protocol) you can store the dynamically generated
>> >>symmetric
>> >> > key
>> >> > > (for each message) and then encrypt the data using that
>>dynamically
>> >> > > generated key.  You then encrypt the encryption key using each
>> >>public
>> >> key
>> >> > > for whom is expected to be able to decrypt the encryption key to
>> >>then
>> >> > > decrypt the message.  For each public key encrypted symmetric key
>> >> (which
>> >> > is
>> >> > > now the "encrypted [data encryption key]" along with which public
>> >>key
>> >> it
>> >> > > was encrypted with for (so a map of [publicKey] =
>> >> > > encryptedDataEncryptionKey) as a chain.   Other patterns can be
>> >> > implemented
>> >> > > but this is a pretty standard digital enveloping [0] pattern with
>> >>only
>> >> 1
>> >> > > field added. Other patterns should be able to use that field
>>to-do
>> >> their
>> >> > > implementation too.
>> >> > >
>> >> > > 3) Non-repudiation and long term non-repudiation.
>> >> > >
>> >> > > Non-repudiation is proving data hasn't changed.  This is often
>>(if
>> >>not
>> >> > > always) done with x509 public certificates (chained to a
>>certificate
>> >> > > authority).
>> >> > >
>> >> > > Long term non-repudiation is what happens when the certificates
>>of
>> >>the
>> >> > > certificate authority are expired (or revoked) and everything
>>ever
>> >> signed
>> >> > > (ever) with that certificate's public key then becomes "no longer
>> >> > provable
>> >> > > as ever being authentic".  That is where RFC3126 [1] and RFC3161
>>[2]
>> >> come
>> >> > > in (or worm drives [hardware], etc).
>> >> > >
>> >> > > For either (or both) of these it is an operation of the
>>encryptor to
>> >> > > sign/hash the data (with or without third party trusted timestap
>>of
>> >>the
>> >> > > signing event) and encrypt that with their own private key and
>> >> distribute
>> >> > > the results (before and after encrypting if required) along with
>> >>their
>> >> > > public key. This structure is a bit more complex but feasible, it
>> >>is a
>> >> > map
>> >> > > of digital signature formats and the chain of dig sig
>>attestations.
>> >>  The
>> >> > > map's key being the method (i.e. CRC32, PKCS7 [3], XmlDigSig [4])
>> >>and
>> >> > then
>> >> > > a list of map where that key is "purpose" of signature (what your
>> >> > attesting
>> >> > > too).  As a sibling field to the list another field for "the
>> >>attester"
>> >> as
>> >> > > bytes (e.g. their PKCS12 [5] for the map of PKCS7 signatures).
>> >> > >
>> >> > > 4) Authorization
>> >> > >
>> >> > > We should have a policy of "404" for data, topics, partitions
>>(etc)
>> >>if
>> >> > > authenticated connections do not have access.  In "secure mode"
>>any
>> >>non
>> >> > > authenticated connections should get a "404" type message on
>> >> everything.
>> >> > > Knowing "something is there" is a security risk in many uses
>>cases.
>> >> So
>> >> > if
>> >> > > you don't have access you don't even see it.  Baking "that" into
>> >>Kafka
>> >> > > along with some interface for entitlement (access management)
>> >>systems
>> >> > > (pretty standard) is all that I think needs to be done to the
>>core
>> >> > project.
>> >> > >  I want to tackle item later in the year after summer after the
>> >>other
>> >> > three
>> >> > > are complete.
>> >> > >
>> >> > > I look forward to thoughts on this and anyone else interested in
>> >> working
>> >> > > with us on these items.
>> >> > >
>> >> > > [0]
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>> 
>>http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/what-is-a-digi
>> >>tal-envelope.htm
>> >> > > [1] http://tools.ietf.org/html/rfc3126
>> >> > > [2] http://tools.ietf.org/html/rfc3161
>> >> > > [3]
>> >> > >
>> >> > >
>> >> >
>> >>
>> >>
>> 
>>http://www.emc.com/emc-plus/rsa-labs/standards-initiatives/pkcs-7-cryptog
>> >>raphic-message-syntax-standar.htm
>> >> > > [4] http://en.wikipedia.org/wiki/XML_Signature
>> >> > > [5] http://en.wikipedia.org/wiki/PKCS_12
>> >> > >
>> >> > > /*******************************************
>> >> > >  Joe Stein
>> >> > >  Founder, Principal Consultant
>> >> > >  Big Data Open Source Security LLC
>> >> > >  http://www.stealth.ly
>> >> > >  Twitter: @allthingshadoop
>><http://www.twitter.com/allthingshadoop>
>> >> > > ********************************************/
>> >> > >
>> >> >
>> >>
>>
>>

Reply via email to