Re: Config for new clients (and server)

2014-02-10 Thread Jay Kreps
Hey Jun,

I think that is reasonable but would object to having it be debug logging?
I think logging out a bunch of noise during normal operation in a client
library is pretty ugly. Also, is there value in exposing the final configs
programmatically?

-Jay



On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao jun...@gmail.com wrote:

 +1 on the new config. Just one comment. Currently, when initiating a config
 (e.g. ProducerConfig), we log those overridden property values and unused
 property keys (likely due to mis-spelling). This has been very useful for
 config verification. It would be good to add similar support in the new
 config.

 Thanks,

 Jun


 On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:

  We touched on this a bit in previous discussions, but I wanted to draw
 out
  the approach to config specifically as an item of discussion.
 
  The new producer and consumer use a similar key-value config approach as
  the existing scala clients but have different implementation code to help
  define these configs. The plan is to use the same approach on the server,
  once the new clients are complete; so if we agree on this approach it
 will
  be the new default across the board.
 
  Let me split this into two parts. First I will try to motivate the use of
  key-value pairs as a configuration api. Then let me discuss the mechanics
  of specifying and parsing these. If we agree on the public api then the
  public api then the implementation details are interesting as this will
 be
  shared across producer, consumer, and broker and potentially some tools;
  but if we disagree about the api then there is no point in discussing the
  implementation.
 
  Let me explain the rationale for this. In a sense a key-value map of
  configs is the worst possible API to the programmer using the clients.
 Let
  me contrast the pros and cons versus a POJO and motivate why I think it
 is
  still superior overall.
 
  Pro: An application can externalize the configuration of its kafka
 clients
  into its own configuration. Whatever config management system the client
  application is using will likely support key-value pairs, so the client
  should be able to directly pull whatever configurations are present and
 use
  them in its client. This means that any configuration the client supports
  can be added to any application at runtime. With the pojo approach the
  client application has to expose each pojo getter as some config
 parameter.
  The result of many applications doing this is that the config is
 different
  for each and it is very hard to have a standard client config shared
  across. Moving config into config files allows the usual tooling (version
  control, review, audit, config deployments separate from code pushes,
  etc.).
 
  Pro: Backwards and forwards compatibility. Provided we stick to our java
  api many internals can evolve and expose new configs. The application can
  support both the new and old client by just specifying a config that will
  be unused in the older version (and of course the reverse--we can remove
  obsolete configs).
 
  Pro: We can use a similar mechanism for both the client and the server.
  Since most people run the server as a stand-alone process it needs a
 config
  file.
 
  Pro: Systems like Samza that need to ship configs across the network can
  easily do so as configs have a natural serialized form. This can be done
  with pojos using java serialization but it is ugly and has bizare failure
  cases.
 
  Con: The IDE gives nice auto-completion for pojos.
 
  Con: There are some advantages to javadoc as a documentation mechanism
 for
  java people.
 
  Basically to me this is about operability versus niceness of api and I
  think operability is more important.
 
  Let me now give some details of the config support classes in
  kafka.common.config and how they are intended to be used.
 
  The goal of this code is the following:
  1. Make specifying configs, their expected type (string, numbers, lists,
  etc) simple and declarative
  2. Allow for validating simple checks (numeric range checks, etc)
  3. Make the config self-documenting. I.e. we should be able to write
 code
  that generates the configuration documentation off the config def.
  4. Specify default values.
  5. Track which configs actually get used.
  6. Make it easy to get config values.
 
  There are two classes there: ConfigDef and AbstractConfig. ConfigDef
  defines the specification of the accepted configurations and
 AbstractConfig
  is a helper class for implementing the configuration class. The
 difference
  is kind of like the difference between a class and an object:
 ConfigDef
  is for specifying the configurations that are accepted, AbstractConfig is
  the base class for an instance of these configs.
 
  You can see this in action here:
 
 
 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
 
  

Re: Config for new clients (and server)

2014-02-10 Thread Jun Rao
I actually prefer to see those at INFO level. The reason is that the config
system in an application can be complex. Some configs can be overridden in
different layers and it may not be easy to determine what the final binding
value is. The logging in Kafka will serve as the source of truth.

For reference, ZK client logs all overridden values during initialization.
It's a one time thing during starting up, so shouldn't add much noise. It's
very useful for debugging subtle config issues.

Exposing final configs programmatically is potentially useful. If we don't
want to log overridden values out of box, an app can achieve the same thing
using the programming api. The only missing thing is that we won't know
those unused property keys, which is probably less important than seeing
the overridden values.

Thanks,

Jun


On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey Jun,

 I think that is reasonable but would object to having it be debug logging?
 I think logging out a bunch of noise during normal operation in a client
 library is pretty ugly. Also, is there value in exposing the final configs
 programmatically?

 -Jay



 On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao jun...@gmail.com wrote:

  +1 on the new config. Just one comment. Currently, when initiating a
 config
  (e.g. ProducerConfig), we log those overridden property values and unused
  property keys (likely due to mis-spelling). This has been very useful for
  config verification. It would be good to add similar support in the new
  config.
 
  Thanks,
 
  Jun
 
 
  On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   We touched on this a bit in previous discussions, but I wanted to draw
  out
   the approach to config specifically as an item of discussion.
  
   The new producer and consumer use a similar key-value config approach
 as
   the existing scala clients but have different implementation code to
 help
   define these configs. The plan is to use the same approach on the
 server,
   once the new clients are complete; so if we agree on this approach it
  will
   be the new default across the board.
  
   Let me split this into two parts. First I will try to motivate the use
 of
   key-value pairs as a configuration api. Then let me discuss the
 mechanics
   of specifying and parsing these. If we agree on the public api then the
   public api then the implementation details are interesting as this will
  be
   shared across producer, consumer, and broker and potentially some
 tools;
   but if we disagree about the api then there is no point in discussing
 the
   implementation.
  
   Let me explain the rationale for this. In a sense a key-value map of
   configs is the worst possible API to the programmer using the clients.
  Let
   me contrast the pros and cons versus a POJO and motivate why I think it
  is
   still superior overall.
  
   Pro: An application can externalize the configuration of its kafka
  clients
   into its own configuration. Whatever config management system the
 client
   application is using will likely support key-value pairs, so the client
   should be able to directly pull whatever configurations are present and
  use
   them in its client. This means that any configuration the client
 supports
   can be added to any application at runtime. With the pojo approach the
   client application has to expose each pojo getter as some config
  parameter.
   The result of many applications doing this is that the config is
  different
   for each and it is very hard to have a standard client config shared
   across. Moving config into config files allows the usual tooling
 (version
   control, review, audit, config deployments separate from code pushes,
   etc.).
  
   Pro: Backwards and forwards compatibility. Provided we stick to our
 java
   api many internals can evolve and expose new configs. The application
 can
   support both the new and old client by just specifying a config that
 will
   be unused in the older version (and of course the reverse--we can
 remove
   obsolete configs).
  
   Pro: We can use a similar mechanism for both the client and the server.
   Since most people run the server as a stand-alone process it needs a
  config
   file.
  
   Pro: Systems like Samza that need to ship configs across the network
 can
   easily do so as configs have a natural serialized form. This can be
 done
   with pojos using java serialization but it is ugly and has bizare
 failure
   cases.
  
   Con: The IDE gives nice auto-completion for pojos.
  
   Con: There are some advantages to javadoc as a documentation mechanism
  for
   java people.
  
   Basically to me this is about operability versus niceness of api and I
   think operability is more important.
  
   Let me now give some details of the config support classes in
   kafka.common.config and how they are intended to be used.
  
   The goal of this code is the following:
   1. Make specifying configs, their 

Re: Config for new clients (and server)

2014-02-10 Thread Sriram Subramanian
+1 on Jun's suggestion.

On 2/10/14 2:01 PM, Jun Rao jun...@gmail.com wrote:

I actually prefer to see those at INFO level. The reason is that the
config
system in an application can be complex. Some configs can be overridden in
different layers and it may not be easy to determine what the final
binding
value is. The logging in Kafka will serve as the source of truth.

For reference, ZK client logs all overridden values during initialization.
It's a one time thing during starting up, so shouldn't add much noise.
It's
very useful for debugging subtle config issues.

Exposing final configs programmatically is potentially useful. If we don't
want to log overridden values out of box, an app can achieve the same
thing
using the programming api. The only missing thing is that we won't know
those unused property keys, which is probably less important than seeing
the overridden values.

Thanks,

Jun


On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps jay.kr...@gmail.com wrote:

 Hey Jun,

 I think that is reasonable but would object to having it be debug
logging?
 I think logging out a bunch of noise during normal operation in a client
 library is pretty ugly. Also, is there value in exposing the final
configs
 programmatically?

 -Jay



 On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao jun...@gmail.com wrote:

  +1 on the new config. Just one comment. Currently, when initiating a
 config
  (e.g. ProducerConfig), we log those overridden property values and
unused
  property keys (likely due to mis-spelling). This has been very useful
for
  config verification. It would be good to add similar support in the
new
  config.
 
  Thanks,
 
  Jun
 
 
  On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
   We touched on this a bit in previous discussions, but I wanted to
draw
  out
   the approach to config specifically as an item of discussion.
  
   The new producer and consumer use a similar key-value config
approach
 as
   the existing scala clients but have different implementation code to
 help
   define these configs. The plan is to use the same approach on the
 server,
   once the new clients are complete; so if we agree on this approach
it
  will
   be the new default across the board.
  
   Let me split this into two parts. First I will try to motivate the
use
 of
   key-value pairs as a configuration api. Then let me discuss the
 mechanics
   of specifying and parsing these. If we agree on the public api then
the
   public api then the implementation details are interesting as this
will
  be
   shared across producer, consumer, and broker and potentially some
 tools;
   but if we disagree about the api then there is no point in
discussing
 the
   implementation.
  
   Let me explain the rationale for this. In a sense a key-value map of
   configs is the worst possible API to the programmer using the
clients.
  Let
   me contrast the pros and cons versus a POJO and motivate why I
think it
  is
   still superior overall.
  
   Pro: An application can externalize the configuration of its kafka
  clients
   into its own configuration. Whatever config management system the
 client
   application is using will likely support key-value pairs, so the
client
   should be able to directly pull whatever configurations are present
and
  use
   them in its client. This means that any configuration the client
 supports
   can be added to any application at runtime. With the pojo approach
the
   client application has to expose each pojo getter as some config
  parameter.
   The result of many applications doing this is that the config is
  different
   for each and it is very hard to have a standard client config shared
   across. Moving config into config files allows the usual tooling
 (version
   control, review, audit, config deployments separate from code
pushes,
   etc.).
  
   Pro: Backwards and forwards compatibility. Provided we stick to our
 java
   api many internals can evolve and expose new configs. The
application
 can
   support both the new and old client by just specifying a config that
 will
   be unused in the older version (and of course the reverse--we can
 remove
   obsolete configs).
  
   Pro: We can use a similar mechanism for both the client and the
server.
   Since most people run the server as a stand-alone process it needs a
  config
   file.
  
   Pro: Systems like Samza that need to ship configs across the network
 can
   easily do so as configs have a natural serialized form. This can be
 done
   with pojos using java serialization but it is ugly and has bizare
 failure
   cases.
  
   Con: The IDE gives nice auto-completion for pojos.
  
   Con: There are some advantages to javadoc as a documentation
mechanism
  for
   java people.
  
   Basically to me this is about operability versus niceness of api
and I
   think operability is more important.
  
   Let me now give some details of the config support classes in
   kafka.common.config and how they are intended to be used.
  
   The 

Re: Config for new clients (and server)

2014-02-10 Thread Pradeep Gollakota
+1 Jun.


On Mon, Feb 10, 2014 at 2:17 PM, Sriram Subramanian 
srsubraman...@linkedin.com wrote:

 +1 on Jun's suggestion.

 On 2/10/14 2:01 PM, Jun Rao jun...@gmail.com wrote:

 I actually prefer to see those at INFO level. The reason is that the
 config
 system in an application can be complex. Some configs can be overridden in
 different layers and it may not be easy to determine what the final
 binding
 value is. The logging in Kafka will serve as the source of truth.
 
 For reference, ZK client logs all overridden values during initialization.
 It's a one time thing during starting up, so shouldn't add much noise.
 It's
 very useful for debugging subtle config issues.
 
 Exposing final configs programmatically is potentially useful. If we don't
 want to log overridden values out of box, an app can achieve the same
 thing
 using the programming api. The only missing thing is that we won't know
 those unused property keys, which is probably less important than seeing
 the overridden values.
 
 Thanks,
 
 Jun
 
 
 On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps jay.kr...@gmail.com wrote:
 
  Hey Jun,
 
  I think that is reasonable but would object to having it be debug
 logging?
  I think logging out a bunch of noise during normal operation in a client
  library is pretty ugly. Also, is there value in exposing the final
 configs
  programmatically?
 
  -Jay
 
 
 
  On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao jun...@gmail.com wrote:
 
   +1 on the new config. Just one comment. Currently, when initiating a
  config
   (e.g. ProducerConfig), we log those overridden property values and
 unused
   property keys (likely due to mis-spelling). This has been very useful
 for
   config verification. It would be good to add similar support in the
 new
   config.
  
   Thanks,
  
   Jun
  
  
   On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com
 wrote:
  
We touched on this a bit in previous discussions, but I wanted to
 draw
   out
the approach to config specifically as an item of discussion.
   
The new producer and consumer use a similar key-value config
 approach
  as
the existing scala clients but have different implementation code to
  help
define these configs. The plan is to use the same approach on the
  server,
once the new clients are complete; so if we agree on this approach
 it
   will
be the new default across the board.
   
Let me split this into two parts. First I will try to motivate the
 use
  of
key-value pairs as a configuration api. Then let me discuss the
  mechanics
of specifying and parsing these. If we agree on the public api then
 the
public api then the implementation details are interesting as this
 will
   be
shared across producer, consumer, and broker and potentially some
  tools;
but if we disagree about the api then there is no point in
 discussing
  the
implementation.
   
Let me explain the rationale for this. In a sense a key-value map of
configs is the worst possible API to the programmer using the
 clients.
   Let
me contrast the pros and cons versus a POJO and motivate why I
 think it
   is
still superior overall.
   
Pro: An application can externalize the configuration of its kafka
   clients
into its own configuration. Whatever config management system the
  client
application is using will likely support key-value pairs, so the
 client
should be able to directly pull whatever configurations are present
 and
   use
them in its client. This means that any configuration the client
  supports
can be added to any application at runtime. With the pojo approach
 the
client application has to expose each pojo getter as some config
   parameter.
The result of many applications doing this is that the config is
   different
for each and it is very hard to have a standard client config shared
across. Moving config into config files allows the usual tooling
  (version
control, review, audit, config deployments separate from code
 pushes,
etc.).
   
Pro: Backwards and forwards compatibility. Provided we stick to our
  java
api many internals can evolve and expose new configs. The
 application
  can
support both the new and old client by just specifying a config that
  will
be unused in the older version (and of course the reverse--we can
  remove
obsolete configs).
   
Pro: We can use a similar mechanism for both the client and the
 server.
Since most people run the server as a stand-alone process it needs a
   config
file.
   
Pro: Systems like Samza that need to ship configs across the network
  can
easily do so as configs have a natural serialized form. This can be
  done
with pojos using java serialization but it is ugly and has bizare
  failure
cases.
   
Con: The IDE gives nice auto-completion for pojos.
   
Con: There are some advantages to javadoc as a documentation
 mechanism
   for
java people.
  

Re: Config for new clients (and server)

2014-02-10 Thread Jay Kreps
Yeah I am aware of how zookeeper behaves, I think it is kind of gross.

I think logging it at DEBUG gets you what you want--by default we don't
pollute logs, but anyone who wants to log this can enable DEBUG logging on
org.apache.kafka.clients.producer.ProducerConfig.

If we want this on by default at LinkedIn we can just set this logger to
debug in our wrapper, we don't need to inflict this on everyone.

The point is that spewing out each config IS a debug according to our
definition:
  http://kafka.apache.org/coding-guide.html

-Jay


On Mon, Feb 10, 2014 at 2:01 PM, Jun Rao jun...@gmail.com wrote:

 I actually prefer to see those at INFO level. The reason is that the config
 system in an application can be complex. Some configs can be overridden in
 different layers and it may not be easy to determine what the final binding
 value is. The logging in Kafka will serve as the source of truth.

 For reference, ZK client logs all overridden values during initialization.
 It's a one time thing during starting up, so shouldn't add much noise. It's
 very useful for debugging subtle config issues.

 Exposing final configs programmatically is potentially useful. If we don't
 want to log overridden values out of box, an app can achieve the same thing
 using the programming api. The only missing thing is that we won't know
 those unused property keys, which is probably less important than seeing
 the overridden values.

 Thanks,

 Jun


 On Mon, Feb 10, 2014 at 10:15 AM, Jay Kreps jay.kr...@gmail.com wrote:

  Hey Jun,
 
  I think that is reasonable but would object to having it be debug
 logging?
  I think logging out a bunch of noise during normal operation in a client
  library is pretty ugly. Also, is there value in exposing the final
 configs
  programmatically?
 
  -Jay
 
 
 
  On Sun, Feb 9, 2014 at 9:23 PM, Jun Rao jun...@gmail.com wrote:
 
   +1 on the new config. Just one comment. Currently, when initiating a
  config
   (e.g. ProducerConfig), we log those overridden property values and
 unused
   property keys (likely due to mis-spelling). This has been very useful
 for
   config verification. It would be good to add similar support in the new
   config.
  
   Thanks,
  
   Jun
  
  
   On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:
  
We touched on this a bit in previous discussions, but I wanted to
 draw
   out
the approach to config specifically as an item of discussion.
   
The new producer and consumer use a similar key-value config approach
  as
the existing scala clients but have different implementation code to
  help
define these configs. The plan is to use the same approach on the
  server,
once the new clients are complete; so if we agree on this approach it
   will
be the new default across the board.
   
Let me split this into two parts. First I will try to motivate the
 use
  of
key-value pairs as a configuration api. Then let me discuss the
  mechanics
of specifying and parsing these. If we agree on the public api then
 the
public api then the implementation details are interesting as this
 will
   be
shared across producer, consumer, and broker and potentially some
  tools;
but if we disagree about the api then there is no point in discussing
  the
implementation.
   
Let me explain the rationale for this. In a sense a key-value map of
configs is the worst possible API to the programmer using the
 clients.
   Let
me contrast the pros and cons versus a POJO and motivate why I think
 it
   is
still superior overall.
   
Pro: An application can externalize the configuration of its kafka
   clients
into its own configuration. Whatever config management system the
  client
application is using will likely support key-value pairs, so the
 client
should be able to directly pull whatever configurations are present
 and
   use
them in its client. This means that any configuration the client
  supports
can be added to any application at runtime. With the pojo approach
 the
client application has to expose each pojo getter as some config
   parameter.
The result of many applications doing this is that the config is
   different
for each and it is very hard to have a standard client config shared
across. Moving config into config files allows the usual tooling
  (version
control, review, audit, config deployments separate from code pushes,
etc.).
   
Pro: Backwards and forwards compatibility. Provided we stick to our
  java
api many internals can evolve and expose new configs. The application
  can
support both the new and old client by just specifying a config that
  will
be unused in the older version (and of course the reverse--we can
  remove
obsolete configs).
   
Pro: We can use a similar mechanism for both the client and the
 server.
Since most people run the server as a stand-alone process it needs a
   config
file.
   
 

Re: Config for new clients (and server)

2014-02-06 Thread Jay Kreps
Joel,

Ah, I actually don't think the internal usage is a problem for *us*. We
just use config in one place, whereas it gets set in 1000s of apps, so I am
implicitly optimizing for the application interface. I agree that we can
add getters and setters on the ProducerConfig if we like. Basically I was
just concerned about the user, the nice thing about a pojo is that when you
type
   config.set
The ide pops up a list of configs with documentation right there, which is
just very convenient.

-Jay


On Wed, Feb 5, 2014 at 5:06 PM, Joel Koshy jjkosh...@gmail.com wrote:

 Overall, +1 on sticking with key-values for configs.


  Con: The IDE gives nice auto-completion for pojos.
 
  Con: There are some advantages to javadoc as a documentation mechanism
 for
  java people.

 Optionally, both the above cons can be addressed (to some degree) by
 wrapper config POJOs that read in the config. i.e., the client will
 provide a KV config, but then we (internally) would load that into a
 specific config POJO that will be helpful for auto-completion and
 javadocs and convenience for our internal implementation (as opposed
 to using getLong/getString, etc. which could cause runtime exceptions
 if done incorrectly). The javadoc in the pojo would need a @value link
 to the original config key string if it is to show up in the generated
 javadoc.


  show you the value of the constant, just the variable name (unless you
  discover how to unhide it). That is fine for the clients, but for the

 Figuring out a way to un-hide it would be preferable to having to keep
 the website as the single source of documentation (even if it is
 generated from the javadoc) and make the javadoc link to it. I tried,
 but was unsuccessful so unless someone knows how to do that the above
 approach is the next-best alternative.

  server would be very weird especially for non-java people. We could
 attempt
  to duplicate documentation between the javadoc and the ConfigDef but
 given
  our struggle to get well-documented config in a single place this seems
  unwise.
 
  So I recommend we have a single source for documentation of these and
 that
  that source be the website documentation on configuration that covers
  clients and server and that that be generated off the config defs. The
  javadoc on KafkaProducer will link to this table so it should be quite
  convenient to discover.




Re: Config for new clients (and server)

2014-02-05 Thread Guozhang Wang
+1 for the key-value approach.

Guozhang


On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:

 We touched on this a bit in previous discussions, but I wanted to draw out
 the approach to config specifically as an item of discussion.

 The new producer and consumer use a similar key-value config approach as
 the existing scala clients but have different implementation code to help
 define these configs. The plan is to use the same approach on the server,
 once the new clients are complete; so if we agree on this approach it will
 be the new default across the board.

 Let me split this into two parts. First I will try to motivate the use of
 key-value pairs as a configuration api. Then let me discuss the mechanics
 of specifying and parsing these. If we agree on the public api then the
 public api then the implementation details are interesting as this will be
 shared across producer, consumer, and broker and potentially some tools;
 but if we disagree about the api then there is no point in discussing the
 implementation.

 Let me explain the rationale for this. In a sense a key-value map of
 configs is the worst possible API to the programmer using the clients. Let
 me contrast the pros and cons versus a POJO and motivate why I think it is
 still superior overall.

 Pro: An application can externalize the configuration of its kafka clients
 into its own configuration. Whatever config management system the client
 application is using will likely support key-value pairs, so the client
 should be able to directly pull whatever configurations are present and use
 them in its client. This means that any configuration the client supports
 can be added to any application at runtime. With the pojo approach the
 client application has to expose each pojo getter as some config parameter.
 The result of many applications doing this is that the config is different
 for each and it is very hard to have a standard client config shared
 across. Moving config into config files allows the usual tooling (version
 control, review, audit, config deployments separate from code pushes,
 etc.).

 Pro: Backwards and forwards compatibility. Provided we stick to our java
 api many internals can evolve and expose new configs. The application can
 support both the new and old client by just specifying a config that will
 be unused in the older version (and of course the reverse--we can remove
 obsolete configs).

 Pro: We can use a similar mechanism for both the client and the server.
 Since most people run the server as a stand-alone process it needs a config
 file.

 Pro: Systems like Samza that need to ship configs across the network can
 easily do so as configs have a natural serialized form. This can be done
 with pojos using java serialization but it is ugly and has bizare failure
 cases.

 Con: The IDE gives nice auto-completion for pojos.

 Con: There are some advantages to javadoc as a documentation mechanism for
 java people.

 Basically to me this is about operability versus niceness of api and I
 think operability is more important.

 Let me now give some details of the config support classes in
 kafka.common.config and how they are intended to be used.

 The goal of this code is the following:
 1. Make specifying configs, their expected type (string, numbers, lists,
 etc) simple and declarative
 2. Allow for validating simple checks (numeric range checks, etc)
 3. Make the config self-documenting. I.e. we should be able to write code
 that generates the configuration documentation off the config def.
 4. Specify default values.
 5. Track which configs actually get used.
 6. Make it easy to get config values.

 There are two classes there: ConfigDef and AbstractConfig. ConfigDef
 defines the specification of the accepted configurations and AbstractConfig
 is a helper class for implementing the configuration class. The difference
 is kind of like the difference between a class and an object: ConfigDef
 is for specifying the configurations that are accepted, AbstractConfig is
 the base class for an instance of these configs.

 You can see this in action here:

 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD

 (Ignore the static config names in there for now...I'm not actually sure
 that is the best approach).

 So the way this works is that the config specification is defined as:

 config = new ConfigDef().define(bootstrap.brokers, Type.LIST,
 documentation)

 .define(metadata.timeout.ms, Type.LONG,
 60 * 1000, atLeast(0), documentation)
 .define(max.partition.size, Type.INT,
 16384, atLeast(0), documentation)


 This is used in a ProducerConfig class which extends AbstractConfig to get
 access to some helper methods as well as the logic for tracking which
 configs get accessed.

 Currently I have included static String variables for each 

Re: Config for new clients (and server)

2014-02-05 Thread Neha Narkhede
I'm not so sure about the static config names used in the producer, but I'm
+1 on using the key value approach for configs to ease operability.

Thanks,
Neha


On Wed, Feb 5, 2014 at 10:10 AM, Guozhang Wang wangg...@gmail.com wrote:

 +1 for the key-value approach.

 Guozhang


 On Tue, Feb 4, 2014 at 9:34 AM, Jay Kreps jay.kr...@gmail.com wrote:

  We touched on this a bit in previous discussions, but I wanted to draw
 out
  the approach to config specifically as an item of discussion.
 
  The new producer and consumer use a similar key-value config approach as
  the existing scala clients but have different implementation code to help
  define these configs. The plan is to use the same approach on the server,
  once the new clients are complete; so if we agree on this approach it
 will
  be the new default across the board.
 
  Let me split this into two parts. First I will try to motivate the use of
  key-value pairs as a configuration api. Then let me discuss the mechanics
  of specifying and parsing these. If we agree on the public api then the
  public api then the implementation details are interesting as this will
 be
  shared across producer, consumer, and broker and potentially some tools;
  but if we disagree about the api then there is no point in discussing the
  implementation.
 
  Let me explain the rationale for this. In a sense a key-value map of
  configs is the worst possible API to the programmer using the clients.
 Let
  me contrast the pros and cons versus a POJO and motivate why I think it
 is
  still superior overall.
 
  Pro: An application can externalize the configuration of its kafka
 clients
  into its own configuration. Whatever config management system the client
  application is using will likely support key-value pairs, so the client
  should be able to directly pull whatever configurations are present and
 use
  them in its client. This means that any configuration the client supports
  can be added to any application at runtime. With the pojo approach the
  client application has to expose each pojo getter as some config
 parameter.
  The result of many applications doing this is that the config is
 different
  for each and it is very hard to have a standard client config shared
  across. Moving config into config files allows the usual tooling (version
  control, review, audit, config deployments separate from code pushes,
  etc.).
 
  Pro: Backwards and forwards compatibility. Provided we stick to our java
  api many internals can evolve and expose new configs. The application can
  support both the new and old client by just specifying a config that will
  be unused in the older version (and of course the reverse--we can remove
  obsolete configs).
 
  Pro: We can use a similar mechanism for both the client and the server.
  Since most people run the server as a stand-alone process it needs a
 config
  file.
 
  Pro: Systems like Samza that need to ship configs across the network can
  easily do so as configs have a natural serialized form. This can be done
  with pojos using java serialization but it is ugly and has bizare failure
  cases.
 
  Con: The IDE gives nice auto-completion for pojos.
 
  Con: There are some advantages to javadoc as a documentation mechanism
 for
  java people.
 
  Basically to me this is about operability versus niceness of api and I
  think operability is more important.
 
  Let me now give some details of the config support classes in
  kafka.common.config and how they are intended to be used.
 
  The goal of this code is the following:
  1. Make specifying configs, their expected type (string, numbers, lists,
  etc) simple and declarative
  2. Allow for validating simple checks (numeric range checks, etc)
  3. Make the config self-documenting. I.e. we should be able to write
 code
  that generates the configuration documentation off the config def.
  4. Specify default values.
  5. Track which configs actually get used.
  6. Make it easy to get config values.
 
  There are two classes there: ConfigDef and AbstractConfig. ConfigDef
  defines the specification of the accepted configurations and
 AbstractConfig
  is a helper class for implementing the configuration class. The
 difference
  is kind of like the difference between a class and an object:
 ConfigDef
  is for specifying the configurations that are accepted, AbstractConfig is
  the base class for an instance of these configs.
 
  You can see this in action here:
 
 
 https://git-wip-us.apache.org/repos/asf?p=kafka.git;a=blob_plain;f=clients/src/main/java/kafka/clients/producer/ProducerConfig.java;hb=HEAD
 
  (Ignore the static config names in there for now...I'm not actually sure
  that is the best approach).
 
  So the way this works is that the config specification is defined as:
 
  config = new ConfigDef().define(bootstrap.brokers, Type.LIST,
  documentation)
 
  .define(metadata.timeout.ms,
 Type.LONG,
  60 * 1000, atLeast(0), 

Re: Config for new clients (and server)

2014-02-05 Thread Joel Koshy
Overall, +1 on sticking with key-values for configs.


 Con: The IDE gives nice auto-completion for pojos.
 
 Con: There are some advantages to javadoc as a documentation mechanism for
 java people.

Optionally, both the above cons can be addressed (to some degree) by
wrapper config POJOs that read in the config. i.e., the client will
provide a KV config, but then we (internally) would load that into a
specific config POJO that will be helpful for auto-completion and
javadocs and convenience for our internal implementation (as opposed
to using getLong/getString, etc. which could cause runtime exceptions
if done incorrectly). The javadoc in the pojo would need a @value link
to the original config key string if it is to show up in the generated
javadoc.


 show you the value of the constant, just the variable name (unless you
 discover how to unhide it). That is fine for the clients, but for the

Figuring out a way to un-hide it would be preferable to having to keep
the website as the single source of documentation (even if it is
generated from the javadoc) and make the javadoc link to it. I tried,
but was unsuccessful so unless someone knows how to do that the above
approach is the next-best alternative.

 server would be very weird especially for non-java people. We could attempt
 to duplicate documentation between the javadoc and the ConfigDef but given
 our struggle to get well-documented config in a single place this seems
 unwise.
 
 So I recommend we have a single source for documentation of these and that
 that source be the website documentation on configuration that covers
 clients and server and that that be generated off the config defs. The
 javadoc on KafkaProducer will link to this table so it should be quite
 convenient to discover.