Improving the Kafka client ecosystem

2014-07-18 Thread Jay Kreps
A question was asked in another thread about what was an effective way
to contribute to the Kafka project for people who weren't very
enthusiastic about writing Java/Scala code.

I wanted to kind of advocate for an area I think is really important
and not as good as it could be--the client ecosystem. I think our goal
is to make Kafka effective as a general purpose, centralized, data
subscription system. This vision only really works if all your
applications, are able to integrate easily, whatever language they are
in.

We have a number of pretty good non-java producers. We have been
lacking the features on the server-side to make writing non-java
consumers easy. We are fixing that right now as part of the consumer
work going on right now (which moves a lot of the functionality in the
java consumer to the server side).

But apart from this I think there may be a lot more we can do to make
the client ecosystem better.

Here are some concrete ideas. If anyone has additional ideas please
reply to this thread and share them. If you are interested in picking
any of these up, please do.

1. The most obvious way to improve the ecosystem is to help work on
clients. This doesn't necessarily mean writing new clients, since in
many cases we already have a client in a given language. I think any
way we can incentivize fewer, better clients rather than many
half-working clients we should do. However we are working now on the
server-side consumer co-ordination so it should now be possible to
write much simpler consumers.

2. It would be great if someone put together a mailing list just for
client developers to share tips, tricks, problems, and so on. We can
make sure all the main contributors on this too. I think this could be
a forum for kind of directing improvements in this area.

3. Help improve the documentation on how to implement a client. We
have tried to make the protocol spec not just a dry document but also
have it share best practices, rationale, and intentions. I think this
could potentially be even better as there is really a range of options
from a very simple quick implementation to a more complex highly
optimized version. It would be good to really document some of the
options and tradeoffs.

4. Come up with a standard way of documenting the features of clients.
In an ideal world it would be possible to get the same information
(author, language, feature set, download link, source code, etc) for
all clients. It would be great to standardize the documentation for
the client as well. For example having one or two basic examples that
are repeated for every client in a standardized way. This would let
someone come to the Kafka site who is not a java developer, and click
on the link for their language and view examples of interacting with
Kafka in the language they know using the client they would eventually
use.

5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
anyone who wants to implement a client would implement a simple
command line program with a set of standardized options. The
compatibility kit would be a standard set of scripts that ran their
client using this command line driver and validate its behavior. E.g.
for a producer it would test that it correctly can send messages, that
the ordering is retained, that the client correctly handles
reconnection and metadata refresh, and compression. The output would
be a list of features that passed are certified, and perhaps basic
performance information. This would be an easy way to help client
developers write correct clients, as well as having a standardized
comparison for the clients that says that they work correctly.

-Jay


Re: Improving the Kafka client ecosystem

2014-08-19 Thread Mark Roberts
Did this mailing list ever get created? Was there consensus that it did or 
didn't need created?

-Mark

> On Jul 18, 2014, at 14:34, Jay Kreps  wrote:
> 
> A question was asked in another thread about what was an effective way
> to contribute to the Kafka project for people who weren't very
> enthusiastic about writing Java/Scala code.
> 
> I wanted to kind of advocate for an area I think is really important
> and not as good as it could be--the client ecosystem. I think our goal
> is to make Kafka effective as a general purpose, centralized, data
> subscription system. This vision only really works if all your
> applications, are able to integrate easily, whatever language they are
> in.
> 
> We have a number of pretty good non-java producers. We have been
> lacking the features on the server-side to make writing non-java
> consumers easy. We are fixing that right now as part of the consumer
> work going on right now (which moves a lot of the functionality in the
> java consumer to the server side).
> 
> But apart from this I think there may be a lot more we can do to make
> the client ecosystem better.
> 
> Here are some concrete ideas. If anyone has additional ideas please
> reply to this thread and share them. If you are interested in picking
> any of these up, please do.
> 
> 1. The most obvious way to improve the ecosystem is to help work on
> clients. This doesn't necessarily mean writing new clients, since in
> many cases we already have a client in a given language. I think any
> way we can incentivize fewer, better clients rather than many
> half-working clients we should do. However we are working now on the
> server-side consumer co-ordination so it should now be possible to
> write much simpler consumers.
> 
> 2. It would be great if someone put together a mailing list just for
> client developers to share tips, tricks, problems, and so on. We can
> make sure all the main contributors on this too. I think this could be
> a forum for kind of directing improvements in this area.
> 
> 3. Help improve the documentation on how to implement a client. We
> have tried to make the protocol spec not just a dry document but also
> have it share best practices, rationale, and intentions. I think this
> could potentially be even better as there is really a range of options
> from a very simple quick implementation to a more complex highly
> optimized version. It would be good to really document some of the
> options and tradeoffs.
> 
> 4. Come up with a standard way of documenting the features of clients.
> In an ideal world it would be possible to get the same information
> (author, language, feature set, download link, source code, etc) for
> all clients. It would be great to standardize the documentation for
> the client as well. For example having one or two basic examples that
> are repeated for every client in a standardized way. This would let
> someone come to the Kafka site who is not a java developer, and click
> on the link for their language and view examples of interacting with
> Kafka in the language they know using the client they would eventually
> use.
> 
> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
> anyone who wants to implement a client would implement a simple
> command line program with a set of standardized options. The
> compatibility kit would be a standard set of scripts that ran their
> client using this command line driver and validate its behavior. E.g.
> for a producer it would test that it correctly can send messages, that
> the ordering is retained, that the client correctly handles
> reconnection and metadata refresh, and compression. The output would
> be a list of features that passed are certified, and perhaps basic
> performance information. This would be an easy way to help client
> developers write correct clients, as well as having a standardized
> comparison for the clients that says that they work correctly.
> 
> -Jay


Re: Improving the Kafka client ecosystem

2014-07-18 Thread Jun Rao
Another important part of eco-system could be around the adaptors of
getting data from other systems into Kafka and vice versa. So, for the
ingestion part, this can include things like getting data from mysql,
syslog, apache server log, etc. For the egress part, this can include
putting Kafka data into HDFS, S3, etc.

Will a separate mailing list be convenient? Could we just use the Kafka
mailing list?

Thanks,

Jun


On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps  wrote:

> A question was asked in another thread about what was an effective way
> to contribute to the Kafka project for people who weren't very
> enthusiastic about writing Java/Scala code.
>
> I wanted to kind of advocate for an area I think is really important
> and not as good as it could be--the client ecosystem. I think our goal
> is to make Kafka effective as a general purpose, centralized, data
> subscription system. This vision only really works if all your
> applications, are able to integrate easily, whatever language they are
> in.
>
> We have a number of pretty good non-java producers. We have been
> lacking the features on the server-side to make writing non-java
> consumers easy. We are fixing that right now as part of the consumer
> work going on right now (which moves a lot of the functionality in the
> java consumer to the server side).
>
> But apart from this I think there may be a lot more we can do to make
> the client ecosystem better.
>
> Here are some concrete ideas. If anyone has additional ideas please
> reply to this thread and share them. If you are interested in picking
> any of these up, please do.
>
> 1. The most obvious way to improve the ecosystem is to help work on
> clients. This doesn't necessarily mean writing new clients, since in
> many cases we already have a client in a given language. I think any
> way we can incentivize fewer, better clients rather than many
> half-working clients we should do. However we are working now on the
> server-side consumer co-ordination so it should now be possible to
> write much simpler consumers.
>
> 2. It would be great if someone put together a mailing list just for
> client developers to share tips, tricks, problems, and so on. We can
> make sure all the main contributors on this too. I think this could be
> a forum for kind of directing improvements in this area.
>
> 3. Help improve the documentation on how to implement a client. We
> have tried to make the protocol spec not just a dry document but also
> have it share best practices, rationale, and intentions. I think this
> could potentially be even better as there is really a range of options
> from a very simple quick implementation to a more complex highly
> optimized version. It would be good to really document some of the
> options and tradeoffs.
>
> 4. Come up with a standard way of documenting the features of clients.
> In an ideal world it would be possible to get the same information
> (author, language, feature set, download link, source code, etc) for
> all clients. It would be great to standardize the documentation for
> the client as well. For example having one or two basic examples that
> are repeated for every client in a standardized way. This would let
> someone come to the Kafka site who is not a java developer, and click
> on the link for their language and view examples of interacting with
> Kafka in the language they know using the client they would eventually
> use.
>
> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
> anyone who wants to implement a client would implement a simple
> command line program with a set of standardized options. The
> compatibility kit would be a standard set of scripts that ran their
> client using this command line driver and validate its behavior. E.g.
> for a producer it would test that it correctly can send messages, that
> the ordering is retained, that the client correctly handles
> reconnection and metadata refresh, and compression. The output would
> be a list of features that passed are certified, and perhaps basic
> performance information. This would be an easy way to help client
> developers write correct clients, as well as having a standardized
> comparison for the clients that says that they work correctly.
>
> -Jay
>


Re: Improving the Kafka client ecosystem

2014-07-18 Thread Jay Kreps
Basically my thought with getting a separate mailing list was to have
a place specifically to discuss issues around clients. I don't see a
lot of discussion about them on the main list. I thought perhaps this
was because people don't like to ask questions which are about
adjacent projects/code bases. But basically whatever will lead to a
robust discussion, bug tracking, etc on clients.

-Jay

On Fri, Jul 18, 2014 at 3:49 PM, Jun Rao  wrote:
> Another important part of eco-system could be around the adaptors of
> getting data from other systems into Kafka and vice versa. So, for the
> ingestion part, this can include things like getting data from mysql,
> syslog, apache server log, etc. For the egress part, this can include
> putting Kafka data into HDFS, S3, etc.
>
> Will a separate mailing list be convenient? Could we just use the Kafka
> mailing list?
>
> Thanks,
>
> Jun
>
>
> On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps  wrote:
>
>> A question was asked in another thread about what was an effective way
>> to contribute to the Kafka project for people who weren't very
>> enthusiastic about writing Java/Scala code.
>>
>> I wanted to kind of advocate for an area I think is really important
>> and not as good as it could be--the client ecosystem. I think our goal
>> is to make Kafka effective as a general purpose, centralized, data
>> subscription system. This vision only really works if all your
>> applications, are able to integrate easily, whatever language they are
>> in.
>>
>> We have a number of pretty good non-java producers. We have been
>> lacking the features on the server-side to make writing non-java
>> consumers easy. We are fixing that right now as part of the consumer
>> work going on right now (which moves a lot of the functionality in the
>> java consumer to the server side).
>>
>> But apart from this I think there may be a lot more we can do to make
>> the client ecosystem better.
>>
>> Here are some concrete ideas. If anyone has additional ideas please
>> reply to this thread and share them. If you are interested in picking
>> any of these up, please do.
>>
>> 1. The most obvious way to improve the ecosystem is to help work on
>> clients. This doesn't necessarily mean writing new clients, since in
>> many cases we already have a client in a given language. I think any
>> way we can incentivize fewer, better clients rather than many
>> half-working clients we should do. However we are working now on the
>> server-side consumer co-ordination so it should now be possible to
>> write much simpler consumers.
>>
>> 2. It would be great if someone put together a mailing list just for
>> client developers to share tips, tricks, problems, and so on. We can
>> make sure all the main contributors on this too. I think this could be
>> a forum for kind of directing improvements in this area.
>>
>> 3. Help improve the documentation on how to implement a client. We
>> have tried to make the protocol spec not just a dry document but also
>> have it share best practices, rationale, and intentions. I think this
>> could potentially be even better as there is really a range of options
>> from a very simple quick implementation to a more complex highly
>> optimized version. It would be good to really document some of the
>> options and tradeoffs.
>>
>> 4. Come up with a standard way of documenting the features of clients.
>> In an ideal world it would be possible to get the same information
>> (author, language, feature set, download link, source code, etc) for
>> all clients. It would be great to standardize the documentation for
>> the client as well. For example having one or two basic examples that
>> are repeated for every client in a standardized way. This would let
>> someone come to the Kafka site who is not a java developer, and click
>> on the link for their language and view examples of interacting with
>> Kafka in the language they know using the client they would eventually
>> use.
>>
>> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
>> anyone who wants to implement a client would implement a simple
>> command line program with a set of standardized options. The
>> compatibility kit would be a standard set of scripts that ran their
>> client using this command line driver and validate its behavior. E.g.
>> for a producer it would test that it correctly can send messages, that
>> the ordering is retained, that the client correctly handles
>> reconnection and metadata refresh, and compression. The output would
>> be a list of features that passed are certified, and perhaps basic
>> performance information. This would be an easy way to help client
>> developers write correct clients, as well as having a standardized
>> comparison for the clients that says that they work correctly.
>>
>> -Jay
>>


Re: Improving the Kafka client ecosystem

2014-07-19 Thread Timothy Chen
The certified client test suite really will benefit all the client developers, 
as writing a Kafka client often is not just talking protocol but to be able to 
handle correctly all the cases, errors and situations, but also performance.

From my experience writing a C# client definitely feel that a lot of test 
scenarios could be generalized and used for all clients.

I was reviewing some other client implementation and there are errors and cases 
it didn't handle and having a suite that exposes that will allow users to not 
run knot those problems and try to determine its a client or server bug as it's 
sometimes hard to figure out.

Tim

> On Jul 18, 2014, at 3:57 PM, Jay Kreps  wrote:
> 
> Basically my thought with getting a separate mailing list was to have
> a place specifically to discuss issues around clients. I don't see a
> lot of discussion about them on the main list. I thought perhaps this
> was because people don't like to ask questions which are about
> adjacent projects/code bases. But basically whatever will lead to a
> robust discussion, bug tracking, etc on clients.
> 
> -Jay
> 
>> On Fri, Jul 18, 2014 at 3:49 PM, Jun Rao  wrote:
>> Another important part of eco-system could be around the adaptors of
>> getting data from other systems into Kafka and vice versa. So, for the
>> ingestion part, this can include things like getting data from mysql,
>> syslog, apache server log, etc. For the egress part, this can include
>> putting Kafka data into HDFS, S3, etc.
>> 
>> Will a separate mailing list be convenient? Could we just use the Kafka
>> mailing list?
>> 
>> Thanks,
>> 
>> Jun
>> 
>> 
>>> On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps  wrote:
>>> 
>>> A question was asked in another thread about what was an effective way
>>> to contribute to the Kafka project for people who weren't very
>>> enthusiastic about writing Java/Scala code.
>>> 
>>> I wanted to kind of advocate for an area I think is really important
>>> and not as good as it could be--the client ecosystem. I think our goal
>>> is to make Kafka effective as a general purpose, centralized, data
>>> subscription system. This vision only really works if all your
>>> applications, are able to integrate easily, whatever language they are
>>> in.
>>> 
>>> We have a number of pretty good non-java producers. We have been
>>> lacking the features on the server-side to make writing non-java
>>> consumers easy. We are fixing that right now as part of the consumer
>>> work going on right now (which moves a lot of the functionality in the
>>> java consumer to the server side).
>>> 
>>> But apart from this I think there may be a lot more we can do to make
>>> the client ecosystem better.
>>> 
>>> Here are some concrete ideas. If anyone has additional ideas please
>>> reply to this thread and share them. If you are interested in picking
>>> any of these up, please do.
>>> 
>>> 1. The most obvious way to improve the ecosystem is to help work on
>>> clients. This doesn't necessarily mean writing new clients, since in
>>> many cases we already have a client in a given language. I think any
>>> way we can incentivize fewer, better clients rather than many
>>> half-working clients we should do. However we are working now on the
>>> server-side consumer co-ordination so it should now be possible to
>>> write much simpler consumers.
>>> 
>>> 2. It would be great if someone put together a mailing list just for
>>> client developers to share tips, tricks, problems, and so on. We can
>>> make sure all the main contributors on this too. I think this could be
>>> a forum for kind of directing improvements in this area.
>>> 
>>> 3. Help improve the documentation on how to implement a client. We
>>> have tried to make the protocol spec not just a dry document but also
>>> have it share best practices, rationale, and intentions. I think this
>>> could potentially be even better as there is really a range of options
>>> from a very simple quick implementation to a more complex highly
>>> optimized version. It would be good to really document some of the
>>> options and tradeoffs.
>>> 
>>> 4. Come up with a standard way of documenting the features of clients.
>>> In an ideal world it would be possible to get the same information
>>> (author, language, feature set, download link, source code, etc) for
>>> all clients. It would be great to standardize the documentation for
>>> the client as well. For example having one or two basic examples that
>>> are repeated for every client in a standardized way. This would let
>>> someone come to the Kafka site who is not a java developer, and click
>>> on the link for their language and view examples of interacting with
>>> Kafka in the language they know using the client they would eventually
>>> use.
>>> 
>>> 5. Build a Kafka Client Compatibility Kit (KCCK) :-) The idea is this:
>>> anyone who wants to implement a client would implement a simple
>>> command line program with a set of standardized options. The
>>> compatibil

Re: Improving the Kafka client ecosystem

2014-07-19 Thread Mark Roberts
Hi all,

As a client engineer on the python client, I would really appreciate a
separate mailing list for client implementation discussion and a language
agnostic test suite.  What might also be really useful is an enumerated
list of error conditions and the expected behavior to come out of them.
 For instance, what do you do if you have a multi-partition producer that
tries to produce to a non-existent topic?  The metadata request is going to
return nothing, which means you don't know where to send the request at
all.  You could just arbitrarily send it to a broker I guess?

At any rate, I have lots of questions about a formalized "certified client"
process.  I'm not against the idea (in fact quite the opposite), but I'm
concerned that non-Java clients will be constrained purely to the currently
existing Java API in the name of client uniformity and standardization.

-Mark



On Sat, Jul 19, 2014 at 12:30 AM, Timothy Chen  wrote:

> The certified client test suite really will benefit all the client
> developers, as writing a Kafka client often is not just talking protocol
> but to be able to handle correctly all the cases, errors and situations,
> but also performance.
>
> From my experience writing a C# client definitely feel that a lot of test
> scenarios could be generalized and used for all clients.
>
> I was reviewing some other client implementation and there are errors and
> cases it didn't handle and having a suite that exposes that will allow
> users to not run knot those problems and try to determine its a client or
> server bug as it's sometimes hard to figure out.
>
> Tim
>
> > On Jul 18, 2014, at 3:57 PM, Jay Kreps  wrote:
> >
> > Basically my thought with getting a separate mailing list was to have
> > a place specifically to discuss issues around clients. I don't see a
> > lot of discussion about them on the main list. I thought perhaps this
> > was because people don't like to ask questions which are about
> > adjacent projects/code bases. But basically whatever will lead to a
> > robust discussion, bug tracking, etc on clients.
> >
> > -Jay
> >
> >> On Fri, Jul 18, 2014 at 3:49 PM, Jun Rao  wrote:
> >> Another important part of eco-system could be around the adaptors of
> >> getting data from other systems into Kafka and vice versa. So, for the
> >> ingestion part, this can include things like getting data from mysql,
> >> syslog, apache server log, etc. For the egress part, this can include
> >> putting Kafka data into HDFS, S3, etc.
> >>
> >> Will a separate mailing list be convenient? Could we just use the Kafka
> >> mailing list?
> >>
> >> Thanks,
> >>
> >> Jun
> >>
> >>
> >>> On Fri, Jul 18, 2014 at 2:34 PM, Jay Kreps 
> wrote:
> >>>
> >>> A question was asked in another thread about what was an effective way
> >>> to contribute to the Kafka project for people who weren't very
> >>> enthusiastic about writing Java/Scala code.
> >>>
> >>> I wanted to kind of advocate for an area I think is really important
> >>> and not as good as it could be--the client ecosystem. I think our goal
> >>> is to make Kafka effective as a general purpose, centralized, data
> >>> subscription system. This vision only really works if all your
> >>> applications, are able to integrate easily, whatever language they are
> >>> in.
> >>>
> >>> We have a number of pretty good non-java producers. We have been
> >>> lacking the features on the server-side to make writing non-java
> >>> consumers easy. We are fixing that right now as part of the consumer
> >>> work going on right now (which moves a lot of the functionality in the
> >>> java consumer to the server side).
> >>>
> >>> But apart from this I think there may be a lot more we can do to make
> >>> the client ecosystem better.
> >>>
> >>> Here are some concrete ideas. If anyone has additional ideas please
> >>> reply to this thread and share them. If you are interested in picking
> >>> any of these up, please do.
> >>>
> >>> 1. The most obvious way to improve the ecosystem is to help work on
> >>> clients. This doesn't necessarily mean writing new clients, since in
> >>> many cases we already have a client in a given language. I think any
> >>> way we can incentivize fewer, better clients rather than many
> >>> half-working clients we should do. However we are working now on the
> >>> server-side consumer co-ordination so it should now be possible to
> >>> write much simpler consumers.
> >>>
> >>> 2. It would be great if someone put together a mailing list just for
> >>> client developers to share tips, tricks, problems, and so on. We can
> >>> make sure all the main contributors on this too. I think this could be
> >>> a forum for kind of directing improvements in this area.
> >>>
> >>> 3. Help improve the documentation on how to implement a client. We
> >>> have tried to make the protocol spec not just a dry document but also
> >>> have it share best practices, rationale, and intentions. I think this
> >>> could potentially be even better as there is rea