Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-05 Thread Gwen Shapira
Hi,

I posted first draft of the change here:
https://issues.apache.org/jira/browse/KAFKA-1809

Its missing a lot of the feedback from Jun (IPv6, upgrade path, could
use more tests probably).
However, it has most of the structure that I had in mind.

So, comments are more than welcome :)

Gwen


On Tue, Dec 2, 2014 at 4:10 PM, Jun Rao jun...@gmail.com wrote:
 1. What would the format of advertised listener looks like? If we have two
 hosts separated by a colon, it may make parsing IP v6 harder.

 3.1 Currently, the only public api that exposes requests/responses is the
 SimpleConsumer. Since most people probably use the high level consumer,
 breaking the api in SimpleConsumer may be ok. Alternatively, we can keep
 Broker as it is, but add sth like BrokerProfile to represent the full
 broker.

 3.2 We haven't made any intra-broker protocol changes yet. The idea is to
 have a config, sth like use.new.wire.protocol that defaults to false. In
 phase 1, we do a rolling upgrade of every broker to the new code. In phase
 2, we set use.new.wire.protocol to true and do a rolling bounce of every
 broker again. Yes, we should bump up the protocol version now. As for
 multiple protocol changes within the same release, we can discuss that in
 the mailing list separately. One way that could work is that once a
 protocol change is stable, we can have a discussion in the mailing list and
 declare it stable. If there are new changes after this, we will bump up the
 version. This way, people deploying from trunk will know when it's safe to
 use the new feature.

 Thanks,

 Jun


 On Tue, Dec 2, 2014 at 2:56 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Thanks you so much for your help here Jun!
 Highlighting the specific protocols is very useful.

 See some detailed comments below.

 On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
  Hi, Gwen,
 
  Thanks for writing up the wiki. Some comments below.
 
  1. To make it more general, should we support a binding and an advertised
  host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
  figure out how to specify the wildcard binding host.

 Yes, thats the idea. Two lines of config, one with list of listeners
 (protocol://host:port) and one with list of advertised listeners.
 Advertised listeners are optional. I think wildcard binding is
 normally done with 0.0.0.0 host (at least for HDFS), so I was planning
 to keep that convention.

 
  2. Broker format change in ZK
  The broker registration in ZK needs to store the host/port for all
  protocols. We will need to bump up the version of the broker registration
  data. Since this is an intra-cluster protocol change, we need an extra
  config for rolling upgrades. So, in the first step, each broker is
 upgraded
  and is ready to parse brokers registered in the new format, but not
  registering using the new format yet. In the second step, when that new
  config is enabled, the broker will register using the new format.
 

 I'm not sure this is necessary in this case. We'll bump the version for
 sure.
 And as long as the new format contains all the fields of the previous
 formats, the JSON de-serialization should work and just ignore the new
 fields.
 So the new brokers can register with the new format right away and the
 old brokers will be able to read that registration with no issues.
 New brokers will be able to use old registration but will also know
 about the extra ports and protocols from the additional field.

  3. Wire protocol changes. Currently, the broker info is used in the
  following requests/responses: TopicMetadataResponse ,
  ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.


  3.1 TopicMetadataResponse and ConsumerMetadataResponse:
  These two are used between the clients and the broker. I am not sure that
  we need to make a wire protocol change for them. Currently, the protocol
  includes a single host/port pair in those responses. Based on the type of
  the port on which the request is sent, it seems that we can just pick the
  corresponding host and port to include in the response.

 The wire protocol will not change here, but the Scala API (i.e method
 signatures for response and request classes) will change from getting
 brokers (which no longer represent single host+port pair) to getting
 endpoints (which do).

 I assumed the Scala API is public, but perhaps I was wrong there.


  3.2 UpdateMetadataRequest:
  This is used between the controller and the broker. Since each broker
 needs
  to cache the host/port of all protocols, we need to make a wire protocol
  change. We also need to change the broker format in MetadataCache
  accordingly. This is also an intra-cluster protocol change. So the
 upgrade
  path will need to follow that in item 2.

 Yes. Because the wire protocol is byte-array based, the existing
 brokers will not be able to parse messages from new brokers without
 the upgrade path you described.

 Is this something that was done in 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Jun Rao
Hi, Gwen,

Thanks for writing up the wiki. Some comments below.

1. To make it more general, should we support a binding and an advertised
host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
figure out how to specify the wildcard binding host.

2. Broker format change in ZK
The broker registration in ZK needs to store the host/port for all
protocols. We will need to bump up the version of the broker registration
data. Since this is an intra-cluster protocol change, we need an extra
config for rolling upgrades. So, in the first step, each broker is upgraded
and is ready to parse brokers registered in the new format, but not
registering using the new format yet. In the second step, when that new
config is enabled, the broker will register using the new format.

3. Wire protocol changes. Currently, the broker info is used in the
following requests/responses: TopicMetadataResponse ,
ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.
3.1 TopicMetadataResponse and ConsumerMetadataResponse:
These two are used between the clients and the broker. I am not sure that
we need to make a wire protocol change for them. Currently, the protocol
includes a single host/port pair in those responses. Based on the type of
the port on which the request is sent, it seems that we can just pick the
corresponding host and port to include in the response.
3.2 UpdateMetadataRequest:
This is used between the controller and the broker. Since each broker needs
to cache the host/port of all protocols, we need to make a wire protocol
change. We also need to change the broker format in MetadataCache
accordingly. This is also an intra-cluster protocol change. So the upgrade
path will need to follow that in item 2.
3.3 LeaderAndIsrRequest:
This is also used between the controller and the broker. The receiving
broker uses the host/port of the leader replica to send the fetch request.
I am not sure if we need a wire protocol change in this case. I was
imagining that we will just add a new broker config, sth like
replication.socket.protocol. Base on this config, the controller will pick
the right host/port to include in the request.

4. Should we plan to support security just on the new java clients?
Supporting security in both the old and the new clients adds more work and
gives people less incentive to migrate off the old clients.

Thanks,

Jun


On Tue, Nov 25, 2014 at 11:13 AM, Gwen Shapira gshap...@cloudera.com
wrote:

 Hi Everyone,

 One of the pre-requisites we have for supporting multiple security
 protocols (SSL, Kerberos) is to support them on separate ports.

 This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
 several different issues - Multiple ports, enriching the channels, SSL
 implementation - which makes it more challenging to review and to test.

 I'd like to split this into 3 separate patches: multi-port brokers,
 enriching SocketChannel, and  the actual security implementations.

 Since even just adding support for multiple listeners per broker is
 somewhat involved and touches multiple components, I wrote a short design
 document that covers the necessary changes and the upgrade process:


 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers

 Comments are more than welcome :)

 If this is acceptable, hope to have a patch ready in few days.

 Gwen Shapira



Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Todd Palino
Leaving aside the rest of this, on #1, while I consider being able to
advertise the ports a good idea, I don't want to lose the ability for
maintaining multiple ports with the same protocol. For example, being able
to have 2 plaintext ports, one that only brokers communicate over, and one
that general clients use. The ability to segregate this traffic is useful
in a number of situations, over and above other controls like quotas, and
is relatively easy to do once we support multiple ports.

-Todd


On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:

 Hi, Gwen,

 Thanks for writing up the wiki. Some comments below.

 1. To make it more general, should we support a binding and an advertised
 host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
 figure out how to specify the wildcard binding host.

 2. Broker format change in ZK
 The broker registration in ZK needs to store the host/port for all
 protocols. We will need to bump up the version of the broker registration
 data. Since this is an intra-cluster protocol change, we need an extra
 config for rolling upgrades. So, in the first step, each broker is upgraded
 and is ready to parse brokers registered in the new format, but not
 registering using the new format yet. In the second step, when that new
 config is enabled, the broker will register using the new format.

 3. Wire protocol changes. Currently, the broker info is used in the
 following requests/responses: TopicMetadataResponse ,
 ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.
 3.1 TopicMetadataResponse and ConsumerMetadataResponse:
 These two are used between the clients and the broker. I am not sure that
 we need to make a wire protocol change for them. Currently, the protocol
 includes a single host/port pair in those responses. Based on the type of
 the port on which the request is sent, it seems that we can just pick the
 corresponding host and port to include in the response.
 3.2 UpdateMetadataRequest:
 This is used between the controller and the broker. Since each broker needs
 to cache the host/port of all protocols, we need to make a wire protocol
 change. We also need to change the broker format in MetadataCache
 accordingly. This is also an intra-cluster protocol change. So the upgrade
 path will need to follow that in item 2.
 3.3 LeaderAndIsrRequest:
 This is also used between the controller and the broker. The receiving
 broker uses the host/port of the leader replica to send the fetch request.
 I am not sure if we need a wire protocol change in this case. I was
 imagining that we will just add a new broker config, sth like
 replication.socket.protocol. Base on this config, the controller will pick
 the right host/port to include in the request.

 4. Should we plan to support security just on the new java clients?
 Supporting security in both the old and the new clients adds more work and
 gives people less incentive to migrate off the old clients.

 Thanks,

 Jun


 On Tue, Nov 25, 2014 at 11:13 AM, Gwen Shapira gshap...@cloudera.com
 wrote:

  Hi Everyone,
 
  One of the pre-requisites we have for supporting multiple security
  protocols (SSL, Kerberos) is to support them on separate ports.
 
  This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
  several different issues - Multiple ports, enriching the channels, SSL
  implementation - which makes it more challenging to review and to test.
 
  I'd like to split this into 3 separate patches: multi-port brokers,
  enriching SocketChannel, and  the actual security implementations.
 
  Since even just adding support for multiple listeners per broker is
  somewhat involved and touches multiple components, I wrote a short design
  document that covers the necessary changes and the upgrade process:
 
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers
 
  Comments are more than welcome :)
 
  If this is acceptable, hope to have a patch ready in few days.
 
  Gwen Shapira
 



Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Gwen Shapira
Thanks you so much for your help here Jun!
Highlighting the specific protocols is very useful.

See some detailed comments below.

On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
 Hi, Gwen,

 Thanks for writing up the wiki. Some comments below.

 1. To make it more general, should we support a binding and an advertised
 host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
 figure out how to specify the wildcard binding host.

Yes, thats the idea. Two lines of config, one with list of listeners
(protocol://host:port) and one with list of advertised listeners.
Advertised listeners are optional. I think wildcard binding is
normally done with 0.0.0.0 host (at least for HDFS), so I was planning
to keep that convention.


 2. Broker format change in ZK
 The broker registration in ZK needs to store the host/port for all
 protocols. We will need to bump up the version of the broker registration
 data. Since this is an intra-cluster protocol change, we need an extra
 config for rolling upgrades. So, in the first step, each broker is upgraded
 and is ready to parse brokers registered in the new format, but not
 registering using the new format yet. In the second step, when that new
 config is enabled, the broker will register using the new format.


I'm not sure this is necessary in this case. We'll bump the version for sure.
And as long as the new format contains all the fields of the previous
formats, the JSON de-serialization should work and just ignore the new
fields.
So the new brokers can register with the new format right away and the
old brokers will be able to read that registration with no issues.
New brokers will be able to use old registration but will also know
about the extra ports and protocols from the additional field.

 3. Wire protocol changes. Currently, the broker info is used in the
 following requests/responses: TopicMetadataResponse ,
 ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.


 3.1 TopicMetadataResponse and ConsumerMetadataResponse:
 These two are used between the clients and the broker. I am not sure that
 we need to make a wire protocol change for them. Currently, the protocol
 includes a single host/port pair in those responses. Based on the type of
 the port on which the request is sent, it seems that we can just pick the
 corresponding host and port to include in the response.

The wire protocol will not change here, but the Scala API (i.e method
signatures for response and request classes) will change from getting
brokers (which no longer represent single host+port pair) to getting
endpoints (which do).

I assumed the Scala API is public, but perhaps I was wrong there.


 3.2 UpdateMetadataRequest:
 This is used between the controller and the broker. Since each broker needs
 to cache the host/port of all protocols, we need to make a wire protocol
 change. We also need to change the broker format in MetadataCache
 accordingly. This is also an intra-cluster protocol change. So the upgrade
 path will need to follow that in item 2.

Yes. Because the wire protocol is byte-array based, the existing
brokers will not be able to parse messages from new brokers without
the upgrade path you described.

Is this something that was done in the past? I'm wondering about few things:
* Is the move from phase 1 (read new protocol but send old protocol)
to phase 2 (send and receive new protocol) done via a config
parameter? Or is there other methods?
* When do we actually bump the version? I'm planning to bump it now
and hopefully this will be the last bump for 0.9. However, if
additional patches make more protocol modifications, do we assume a
single version per release? Or do we assume people will want to
upgrade between random trunk states?


 3.3 LeaderAndIsrRequest:
 This is also used between the controller and the broker. The receiving
 broker uses the host/port of the leader replica to send the fetch request.
 I am not sure if we need a wire protocol change in this case. I was
 imagining that we will just add a new broker config, sth like
 replication.socket.protocol. Base on this config, the controller will pick
 the right host/port to include in the request.

I agree here. It looks feasible.


 4. Should we plan to support security just on the new java clients?
 Supporting security in both the old and the new clients adds more work and
 gives people less incentive to migrate off the old clients.

I'd love to do that. Lets plan on just new clients.
I'll update if any of my customers yells loudly and we'll need to
support old clients too :)

 Thanks,

 Jun


 On Tue, Nov 25, 2014 at 11:13 AM, Gwen Shapira gshap...@cloudera.com
 wrote:

 Hi Everyone,

 One of the pre-requisites we have for supporting multiple security
 protocols (SSL, Kerberos) is to support them on separate ports.

 This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
 several different issues - Multiple ports, enriching the channels, SSL
 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Gwen Shapira
Hey Todd,

You say lose the ability - you mean this ability actually exist now?
Or is this something you hope the new patch will support?

Gwen

On Tue, Dec 2, 2014 at 2:08 PM, Todd Palino tpal...@gmail.com wrote:
 Leaving aside the rest of this, on #1, while I consider being able to
 advertise the ports a good idea, I don't want to lose the ability for
 maintaining multiple ports with the same protocol. For example, being able
 to have 2 plaintext ports, one that only brokers communicate over, and one
 that general clients use. The ability to segregate this traffic is useful
 in a number of situations, over and above other controls like quotas, and
 is relatively easy to do once we support multiple ports.

 -Todd


 On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:

 Hi, Gwen,

 Thanks for writing up the wiki. Some comments below.

 1. To make it more general, should we support a binding and an advertised
 host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
 figure out how to specify the wildcard binding host.

 2. Broker format change in ZK
 The broker registration in ZK needs to store the host/port for all
 protocols. We will need to bump up the version of the broker registration
 data. Since this is an intra-cluster protocol change, we need an extra
 config for rolling upgrades. So, in the first step, each broker is upgraded
 and is ready to parse brokers registered in the new format, but not
 registering using the new format yet. In the second step, when that new
 config is enabled, the broker will register using the new format.

 3. Wire protocol changes. Currently, the broker info is used in the
 following requests/responses: TopicMetadataResponse ,
 ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.
 3.1 TopicMetadataResponse and ConsumerMetadataResponse:
 These two are used between the clients and the broker. I am not sure that
 we need to make a wire protocol change for them. Currently, the protocol
 includes a single host/port pair in those responses. Based on the type of
 the port on which the request is sent, it seems that we can just pick the
 corresponding host and port to include in the response.
 3.2 UpdateMetadataRequest:
 This is used between the controller and the broker. Since each broker needs
 to cache the host/port of all protocols, we need to make a wire protocol
 change. We also need to change the broker format in MetadataCache
 accordingly. This is also an intra-cluster protocol change. So the upgrade
 path will need to follow that in item 2.
 3.3 LeaderAndIsrRequest:
 This is also used between the controller and the broker. The receiving
 broker uses the host/port of the leader replica to send the fetch request.
 I am not sure if we need a wire protocol change in this case. I was
 imagining that we will just add a new broker config, sth like
 replication.socket.protocol. Base on this config, the controller will pick
 the right host/port to include in the request.

 4. Should we plan to support security just on the new java clients?
 Supporting security in both the old and the new clients adds more work and
 gives people less incentive to migrate off the old clients.

 Thanks,

 Jun


 On Tue, Nov 25, 2014 at 11:13 AM, Gwen Shapira gshap...@cloudera.com
 wrote:

  Hi Everyone,
 
  One of the pre-requisites we have for supporting multiple security
  protocols (SSL, Kerberos) is to support them on separate ports.
 
  This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
  several different issues - Multiple ports, enriching the channels, SSL
  implementation - which makes it more challenging to review and to test.
 
  I'd like to split this into 3 separate patches: multi-port brokers,
  enriching SocketChannel, and  the actual security implementations.
 
  Since even just adding support for multiple listeners per broker is
  somewhat involved and touches multiple components, I wrote a short design
  document that covers the necessary changes and the upgrade process:
 
 
 
 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers
 
  Comments are more than welcome :)
 
  If this is acceptable, hope to have a patch ready in few days.
 
  Gwen Shapira
 



Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Gwen Shapira
The good news is that I'm not using a map to represent protocol list.

The bad news is that as mentioned in the wiki: producers, consumers
and broker configuration specify security protocol, so we'll know
which host/port pair to use for communication. This implicitly assumes
there will be only one host/port per protocol.

I'll think a bit on how this assumption can be relaxed.

Gwen

On Tue, Dec 2, 2014 at 3:14 PM, Todd Palino tpal...@gmail.com wrote:
 Gwen - just my reading of what we could expect from what you had described
 so far. Without having gone into implementation details, there didn't seem
 to be anything that would block the ability to run two ports with the same
 protocol configuration, at least from the way you proposed to represent it
 in Zookeeper. I'd just like to not go down the path of using something like
 a map for representing the protocol list that would eliminate this
 possibility, unless there's a pretty good reason to do so.

 -Todd

 On Tue, Dec 2, 2014 at 3:00 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Hey Todd,

 You say lose the ability - you mean this ability actually exist now?
 Or is this something you hope the new patch will support?

 Gwen

 On Tue, Dec 2, 2014 at 2:08 PM, Todd Palino tpal...@gmail.com wrote:
  Leaving aside the rest of this, on #1, while I consider being able to
  advertise the ports a good idea, I don't want to lose the ability for
  maintaining multiple ports with the same protocol. For example, being
 able
  to have 2 plaintext ports, one that only brokers communicate over, and
 one
  that general clients use. The ability to segregate this traffic is useful
  in a number of situations, over and above other controls like quotas, and
  is relatively easy to do once we support multiple ports.
 
  -Todd
 
 
  On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
 
  Hi, Gwen,
 
  Thanks for writing up the wiki. Some comments below.
 
  1. To make it more general, should we support a binding and an
 advertised
  host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
  figure out how to specify the wildcard binding host.
 
  2. Broker format change in ZK
  The broker registration in ZK needs to store the host/port for all
  protocols. We will need to bump up the version of the broker
 registration
  data. Since this is an intra-cluster protocol change, we need an extra
  config for rolling upgrades. So, in the first step, each broker is
 upgraded
  and is ready to parse brokers registered in the new format, but not
  registering using the new format yet. In the second step, when that new
  config is enabled, the broker will register using the new format.
 
  3. Wire protocol changes. Currently, the broker info is used in the
  following requests/responses: TopicMetadataResponse ,
  ConsumerMetadataResponse, LeaderAndIsrRequest  and
 UpdateMetadataRequest.
  3.1 TopicMetadataResponse and ConsumerMetadataResponse:
  These two are used between the clients and the broker. I am not sure
 that
  we need to make a wire protocol change for them. Currently, the protocol
  includes a single host/port pair in those responses. Based on the type
 of
  the port on which the request is sent, it seems that we can just pick
 the
  corresponding host and port to include in the response.
  3.2 UpdateMetadataRequest:
  This is used between the controller and the broker. Since each broker
 needs
  to cache the host/port of all protocols, we need to make a wire protocol
  change. We also need to change the broker format in MetadataCache
  accordingly. This is also an intra-cluster protocol change. So the
 upgrade
  path will need to follow that in item 2.
  3.3 LeaderAndIsrRequest:
  This is also used between the controller and the broker. The receiving
  broker uses the host/port of the leader replica to send the fetch
 request.
  I am not sure if we need a wire protocol change in this case. I was
  imagining that we will just add a new broker config, sth like
  replication.socket.protocol. Base on this config, the controller will
 pick
  the right host/port to include in the request.
 
  4. Should we plan to support security just on the new java clients?
  Supporting security in both the old and the new clients adds more work
 and
  gives people less incentive to migrate off the old clients.
 
  Thanks,
 
  Jun
 
 
  On Tue, Nov 25, 2014 at 11:13 AM, Gwen Shapira gshap...@cloudera.com
  wrote:
 
   Hi Everyone,
  
   One of the pre-requisites we have for supporting multiple security
   protocols (SSL, Kerberos) is to support them on separate ports.
  
   This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
   several different issues - Multiple ports, enriching the channels, SSL
   implementation - which makes it more challenging to review and to
 test.
  
   I'd like to split this into 3 separate patches: multi-port brokers,
   enriching SocketChannel, and  the actual security implementations.
  
   Since even just adding 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Todd Palino
Thanks. Just to add more detail as to why I think it's a good idea to be
able to segregate traffic like that...

One reason would just be to separate out the intra-cluster communication to
a separate port. This can allow you to run it over a different interface
(for example, you could have dedicated links for the brokers to do
replication), though you could do that with a wildcard bind and multiple
interfaces with a little care on the broker config. It also allows for
firewalling off clients from a cluster while leaving the broker
communication intact. We've run into this situation a couple times where it
was advantageous to do this during recovery to prevent things like runaway
file descriptor allocation. It also gives the ability to use QOS tools to
work with networking guarantees for broker traffic.

Maybe it's enough to be able to specify a special protocol name to do this.
Meaning you could configure a port with protocol broker (or something
like that) which could be used by the brokers if it exists. Otherwise they
would default back to something else.

-Todd

-TOdd

On Tue, Dec 2, 2014 at 3:23 PM, Gwen Shapira gshap...@cloudera.com wrote:

 The good news is that I'm not using a map to represent protocol list.

 The bad news is that as mentioned in the wiki: producers, consumers
 and broker configuration specify security protocol, so we'll know
 which host/port pair to use for communication. This implicitly assumes
 there will be only one host/port per protocol.

 I'll think a bit on how this assumption can be relaxed.

 Gwen

 On Tue, Dec 2, 2014 at 3:14 PM, Todd Palino tpal...@gmail.com wrote:
  Gwen - just my reading of what we could expect from what you had
 described
  so far. Without having gone into implementation details, there didn't
 seem
  to be anything that would block the ability to run two ports with the
 same
  protocol configuration, at least from the way you proposed to represent
 it
  in Zookeeper. I'd just like to not go down the path of using something
 like
  a map for representing the protocol list that would eliminate this
  possibility, unless there's a pretty good reason to do so.
 
  -Todd
 
  On Tue, Dec 2, 2014 at 3:00 PM, Gwen Shapira gshap...@cloudera.com
 wrote:
 
  Hey Todd,
 
  You say lose the ability - you mean this ability actually exist now?
  Or is this something you hope the new patch will support?
 
  Gwen
 
  On Tue, Dec 2, 2014 at 2:08 PM, Todd Palino tpal...@gmail.com wrote:
   Leaving aside the rest of this, on #1, while I consider being able to
   advertise the ports a good idea, I don't want to lose the ability for
   maintaining multiple ports with the same protocol. For example, being
  able
   to have 2 plaintext ports, one that only brokers communicate over, and
  one
   that general clients use. The ability to segregate this traffic is
 useful
   in a number of situations, over and above other controls like quotas,
 and
   is relatively easy to do once we support multiple ports.
  
   -Todd
  
  
   On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
  
   Hi, Gwen,
  
   Thanks for writing up the wiki. Some comments below.
  
   1. To make it more general, should we support a binding and an
  advertised
   host for each protocol (e.g. plaintext, ssl, etc)? We will also need
 to
   figure out how to specify the wildcard binding host.
  
   2. Broker format change in ZK
   The broker registration in ZK needs to store the host/port for all
   protocols. We will need to bump up the version of the broker
  registration
   data. Since this is an intra-cluster protocol change, we need an
 extra
   config for rolling upgrades. So, in the first step, each broker is
  upgraded
   and is ready to parse brokers registered in the new format, but not
   registering using the new format yet. In the second step, when that
 new
   config is enabled, the broker will register using the new format.
  
   3. Wire protocol changes. Currently, the broker info is used in the
   following requests/responses: TopicMetadataResponse ,
   ConsumerMetadataResponse, LeaderAndIsrRequest  and
  UpdateMetadataRequest.
   3.1 TopicMetadataResponse and ConsumerMetadataResponse:
   These two are used between the clients and the broker. I am not sure
  that
   we need to make a wire protocol change for them. Currently, the
 protocol
   includes a single host/port pair in those responses. Based on the
 type
  of
   the port on which the request is sent, it seems that we can just pick
  the
   corresponding host and port to include in the response.
   3.2 UpdateMetadataRequest:
   This is used between the controller and the broker. Since each broker
  needs
   to cache the host/port of all protocols, we need to make a wire
 protocol
   change. We also need to change the broker format in MetadataCache
   accordingly. This is also an intra-cluster protocol change. So the
  upgrade
   path will need to follow that in item 2.
   3.3 LeaderAndIsrRequest:
   This is also used 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Jun Rao
1. What would the format of advertised listener looks like? If we have two
hosts separated by a colon, it may make parsing IP v6 harder.

3.1 Currently, the only public api that exposes requests/responses is the
SimpleConsumer. Since most people probably use the high level consumer,
breaking the api in SimpleConsumer may be ok. Alternatively, we can keep
Broker as it is, but add sth like BrokerProfile to represent the full
broker.

3.2 We haven't made any intra-broker protocol changes yet. The idea is to
have a config, sth like use.new.wire.protocol that defaults to false. In
phase 1, we do a rolling upgrade of every broker to the new code. In phase
2, we set use.new.wire.protocol to true and do a rolling bounce of every
broker again. Yes, we should bump up the protocol version now. As for
multiple protocol changes within the same release, we can discuss that in
the mailing list separately. One way that could work is that once a
protocol change is stable, we can have a discussion in the mailing list and
declare it stable. If there are new changes after this, we will bump up the
version. This way, people deploying from trunk will know when it's safe to
use the new feature.

Thanks,

Jun


On Tue, Dec 2, 2014 at 2:56 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Thanks you so much for your help here Jun!
 Highlighting the specific protocols is very useful.

 See some detailed comments below.

 On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
  Hi, Gwen,
 
  Thanks for writing up the wiki. Some comments below.
 
  1. To make it more general, should we support a binding and an advertised
  host for each protocol (e.g. plaintext, ssl, etc)? We will also need to
  figure out how to specify the wildcard binding host.

 Yes, thats the idea. Two lines of config, one with list of listeners
 (protocol://host:port) and one with list of advertised listeners.
 Advertised listeners are optional. I think wildcard binding is
 normally done with 0.0.0.0 host (at least for HDFS), so I was planning
 to keep that convention.

 
  2. Broker format change in ZK
  The broker registration in ZK needs to store the host/port for all
  protocols. We will need to bump up the version of the broker registration
  data. Since this is an intra-cluster protocol change, we need an extra
  config for rolling upgrades. So, in the first step, each broker is
 upgraded
  and is ready to parse brokers registered in the new format, but not
  registering using the new format yet. In the second step, when that new
  config is enabled, the broker will register using the new format.
 

 I'm not sure this is necessary in this case. We'll bump the version for
 sure.
 And as long as the new format contains all the fields of the previous
 formats, the JSON de-serialization should work and just ignore the new
 fields.
 So the new brokers can register with the new format right away and the
 old brokers will be able to read that registration with no issues.
 New brokers will be able to use old registration but will also know
 about the extra ports and protocols from the additional field.

  3. Wire protocol changes. Currently, the broker info is used in the
  following requests/responses: TopicMetadataResponse ,
  ConsumerMetadataResponse, LeaderAndIsrRequest  and UpdateMetadataRequest.


  3.1 TopicMetadataResponse and ConsumerMetadataResponse:
  These two are used between the clients and the broker. I am not sure that
  we need to make a wire protocol change for them. Currently, the protocol
  includes a single host/port pair in those responses. Based on the type of
  the port on which the request is sent, it seems that we can just pick the
  corresponding host and port to include in the response.

 The wire protocol will not change here, but the Scala API (i.e method
 signatures for response and request classes) will change from getting
 brokers (which no longer represent single host+port pair) to getting
 endpoints (which do).

 I assumed the Scala API is public, but perhaps I was wrong there.


  3.2 UpdateMetadataRequest:
  This is used between the controller and the broker. Since each broker
 needs
  to cache the host/port of all protocols, we need to make a wire protocol
  change. We also need to change the broker format in MetadataCache
  accordingly. This is also an intra-cluster protocol change. So the
 upgrade
  path will need to follow that in item 2.

 Yes. Because the wire protocol is byte-array based, the existing
 brokers will not be able to parse messages from new brokers without
 the upgrade path you described.

 Is this something that was done in the past? I'm wondering about few
 things:
 * Is the move from phase 1 (read new protocol but send old protocol)
 to phase 2 (send and receive new protocol) done via a config
 parameter? Or is there other methods?
 * When do we actually bump the version? I'm planning to bump it now
 and hopefully this will be the last bump for 0.9. However, if
 additional patches make more 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Jun Rao
Todd,

Does that imply the intra-broker protocol is always plaintext?

Thanks,

Jun

On Tue, Dec 2, 2014 at 3:31 PM, Todd Palino tpal...@gmail.com wrote:

 Thanks. Just to add more detail as to why I think it's a good idea to be
 able to segregate traffic like that...

 One reason would just be to separate out the intra-cluster communication to
 a separate port. This can allow you to run it over a different interface
 (for example, you could have dedicated links for the brokers to do
 replication), though you could do that with a wildcard bind and multiple
 interfaces with a little care on the broker config. It also allows for
 firewalling off clients from a cluster while leaving the broker
 communication intact. We've run into this situation a couple times where it
 was advantageous to do this during recovery to prevent things like runaway
 file descriptor allocation. It also gives the ability to use QOS tools to
 work with networking guarantees for broker traffic.

 Maybe it's enough to be able to specify a special protocol name to do this.
 Meaning you could configure a port with protocol broker (or something
 like that) which could be used by the brokers if it exists. Otherwise they
 would default back to something else.

 -Todd

 -TOdd

 On Tue, Dec 2, 2014 at 3:23 PM, Gwen Shapira gshap...@cloudera.com
 wrote:

  The good news is that I'm not using a map to represent protocol list.
 
  The bad news is that as mentioned in the wiki: producers, consumers
  and broker configuration specify security protocol, so we'll know
  which host/port pair to use for communication. This implicitly assumes
  there will be only one host/port per protocol.
 
  I'll think a bit on how this assumption can be relaxed.
 
  Gwen
 
  On Tue, Dec 2, 2014 at 3:14 PM, Todd Palino tpal...@gmail.com wrote:
   Gwen - just my reading of what we could expect from what you had
  described
   so far. Without having gone into implementation details, there didn't
  seem
   to be anything that would block the ability to run two ports with the
  same
   protocol configuration, at least from the way you proposed to represent
  it
   in Zookeeper. I'd just like to not go down the path of using something
  like
   a map for representing the protocol list that would eliminate this
   possibility, unless there's a pretty good reason to do so.
  
   -Todd
  
   On Tue, Dec 2, 2014 at 3:00 PM, Gwen Shapira gshap...@cloudera.com
  wrote:
  
   Hey Todd,
  
   You say lose the ability - you mean this ability actually exist now?
   Or is this something you hope the new patch will support?
  
   Gwen
  
   On Tue, Dec 2, 2014 at 2:08 PM, Todd Palino tpal...@gmail.com
 wrote:
Leaving aside the rest of this, on #1, while I consider being able
 to
advertise the ports a good idea, I don't want to lose the ability
 for
maintaining multiple ports with the same protocol. For example,
 being
   able
to have 2 plaintext ports, one that only brokers communicate over,
 and
   one
that general clients use. The ability to segregate this traffic is
  useful
in a number of situations, over and above other controls like
 quotas,
  and
is relatively easy to do once we support multiple ports.
   
-Todd
   
   
On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:
   
Hi, Gwen,
   
Thanks for writing up the wiki. Some comments below.
   
1. To make it more general, should we support a binding and an
   advertised
host for each protocol (e.g. plaintext, ssl, etc)? We will also
 need
  to
figure out how to specify the wildcard binding host.
   
2. Broker format change in ZK
The broker registration in ZK needs to store the host/port for all
protocols. We will need to bump up the version of the broker
   registration
data. Since this is an intra-cluster protocol change, we need an
  extra
config for rolling upgrades. So, in the first step, each broker is
   upgraded
and is ready to parse brokers registered in the new format, but not
registering using the new format yet. In the second step, when that
  new
config is enabled, the broker will register using the new format.
   
3. Wire protocol changes. Currently, the broker info is used in the
following requests/responses: TopicMetadataResponse ,
ConsumerMetadataResponse, LeaderAndIsrRequest  and
   UpdateMetadataRequest.
3.1 TopicMetadataResponse and ConsumerMetadataResponse:
These two are used between the clients and the broker. I am not
 sure
   that
we need to make a wire protocol change for them. Currently, the
  protocol
includes a single host/port pair in those responses. Based on the
  type
   of
the port on which the request is sent, it seems that we can just
 pick
   the
corresponding host and port to include in the response.
3.2 UpdateMetadataRequest:
This is used between the controller and the broker. Since each
 broker
   needs
to cache the host/port of all 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-02 Thread Todd Palino
I don't think it necessarily has to be. I was thinking about that while I
was typing it out, and I realized that as well. With a special broker port,
my biggest concern is making sure that nothing other than a broker uses it
(and so cannot bypass security controls like authentication and
authorization). Outside of that assurance, I don't see a reason to add
overhead like TLS to the broker communication. Someone else might have one,
though.

-Todd


On Tue, Dec 2, 2014 at 4:12 PM, Jun Rao j...@confluent.io wrote:

 Todd,

 Does that imply the intra-broker protocol is always plaintext?

 Thanks,

 Jun

 On Tue, Dec 2, 2014 at 3:31 PM, Todd Palino tpal...@gmail.com wrote:

  Thanks. Just to add more detail as to why I think it's a good idea to be
  able to segregate traffic like that...
 
  One reason would just be to separate out the intra-cluster communication
 to
  a separate port. This can allow you to run it over a different interface
  (for example, you could have dedicated links for the brokers to do
  replication), though you could do that with a wildcard bind and multiple
  interfaces with a little care on the broker config. It also allows for
  firewalling off clients from a cluster while leaving the broker
  communication intact. We've run into this situation a couple times where
 it
  was advantageous to do this during recovery to prevent things like
 runaway
  file descriptor allocation. It also gives the ability to use QOS tools to
  work with networking guarantees for broker traffic.
 
  Maybe it's enough to be able to specify a special protocol name to do
 this.
  Meaning you could configure a port with protocol broker (or something
  like that) which could be used by the brokers if it exists. Otherwise
 they
  would default back to something else.
 
  -Todd
 
  -TOdd
 
  On Tue, Dec 2, 2014 at 3:23 PM, Gwen Shapira gshap...@cloudera.com
  wrote:
 
   The good news is that I'm not using a map to represent protocol list.
  
   The bad news is that as mentioned in the wiki: producers, consumers
   and broker configuration specify security protocol, so we'll know
   which host/port pair to use for communication. This implicitly assumes
   there will be only one host/port per protocol.
  
   I'll think a bit on how this assumption can be relaxed.
  
   Gwen
  
   On Tue, Dec 2, 2014 at 3:14 PM, Todd Palino tpal...@gmail.com wrote:
Gwen - just my reading of what we could expect from what you had
   described
so far. Without having gone into implementation details, there didn't
   seem
to be anything that would block the ability to run two ports with the
   same
protocol configuration, at least from the way you proposed to
 represent
   it
in Zookeeper. I'd just like to not go down the path of using
 something
   like
a map for representing the protocol list that would eliminate this
possibility, unless there's a pretty good reason to do so.
   
-Todd
   
On Tue, Dec 2, 2014 at 3:00 PM, Gwen Shapira gshap...@cloudera.com
   wrote:
   
Hey Todd,
   
You say lose the ability - you mean this ability actually exist
 now?
Or is this something you hope the new patch will support?
   
Gwen
   
On Tue, Dec 2, 2014 at 2:08 PM, Todd Palino tpal...@gmail.com
  wrote:
 Leaving aside the rest of this, on #1, while I consider being able
  to
 advertise the ports a good idea, I don't want to lose the ability
  for
 maintaining multiple ports with the same protocol. For example,
  being
able
 to have 2 plaintext ports, one that only brokers communicate over,
  and
one
 that general clients use. The ability to segregate this traffic is
   useful
 in a number of situations, over and above other controls like
  quotas,
   and
 is relatively easy to do once we support multiple ports.

 -Todd


 On Tue, Dec 2, 2014 at 1:58 PM, Jun Rao jun...@gmail.com wrote:

 Hi, Gwen,

 Thanks for writing up the wiki. Some comments below.

 1. To make it more general, should we support a binding and an
advertised
 host for each protocol (e.g. plaintext, ssl, etc)? We will also
  need
   to
 figure out how to specify the wildcard binding host.

 2. Broker format change in ZK
 The broker registration in ZK needs to store the host/port for
 all
 protocols. We will need to bump up the version of the broker
registration
 data. Since this is an intra-cluster protocol change, we need an
   extra
 config for rolling upgrades. So, in the first step, each broker
 is
upgraded
 and is ready to parse brokers registered in the new format, but
 not
 registering using the new format yet. In the second step, when
 that
   new
 config is enabled, the broker will register using the new format.

 3. Wire protocol changes. Currently, the broker info is used in
 the
 following requests/responses: TopicMetadataResponse ,
 ConsumerMetadataResponse, 

Re: [SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-12-01 Thread Joe Stein
+1 to doing this, can you sub ticket in the security ticket when you create
the JIRA for this (unless you did it already and I missed it). I made one
comment in regards to the JSON returned on the confluence otherwise this
matches what we already agreed and discussed and like how it is breaking it
down into smaller chunks so that the different implementations can utilize
it.

/***
 Joe Stein
 Founder, Principal Consultant
 Big Data Open Source Security LLC
 http://www.stealth.ly
 Twitter: @allthingshadoop http://www.twitter.com/allthingshadoop
/

On Tue, Nov 25, 2014 at 2:13 PM, Gwen Shapira gshap...@cloudera.com wrote:

 Hi Everyone,

 One of the pre-requisites we have for supporting multiple security
 protocols (SSL, Kerberos) is to support them on separate ports.

 This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
 several different issues - Multiple ports, enriching the channels, SSL
 implementation - which makes it more challenging to review and to test.

 I'd like to split this into 3 separate patches: multi-port brokers,
 enriching SocketChannel, and  the actual security implementations.

 Since even just adding support for multiple listeners per broker is
 somewhat involved and touches multiple components, I wrote a short design
 document that covers the necessary changes and the upgrade process:


 https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers

 Comments are more than welcome :)

 If this is acceptable, hope to have a patch ready in few days.

 Gwen Shapira



[SECURITY DISCUSSION] Refactoring Brokers to support multiple ports

2014-11-25 Thread Gwen Shapira
Hi Everyone,

One of the pre-requisites we have for supporting multiple security
protocols (SSL, Kerberos) is to support them on separate ports.

This is done in KAFKA-1684 (The SSL Patch), but that patch addresses
several different issues - Multiple ports, enriching the channels, SSL
implementation - which makes it more challenging to review and to test.

I'd like to split this into 3 separate patches: multi-port brokers,
enriching SocketChannel, and  the actual security implementations.

Since even just adding support for multiple listeners per broker is
somewhat involved and touches multiple components, I wrote a short design
document that covers the necessary changes and the upgrade process:

https://cwiki.apache.org/confluence/display/KAFKA/Multiple+Listeners+for+Kafka+Brokers

Comments are more than welcome :)

If this is acceptable, hope to have a patch ready in few days.

Gwen Shapira