Hey guys,

We had a bit of a compatibility slip-up in 0.8.2 with the offset commit
stuff. We caught this one before the final release so it's not too bad. But
I do think it kind of points to an area we could do better.

One piece of feedback we have gotten from going out and talking to users is
that compatibility is really, really important to them. Kafka is getting
deployed in big environments where the clients are embedded in lots of
applications and any kind of incompatibility is a huge pain for people
using it and generally makes upgrade difficult or impossible.

In practice what I think this means for development is a lot more pressure
to really think about the public interfaces we are making and try our best
to get them right. This can be hard sometimes as changes come in patches
and it is hard to follow every single rb with enough diligence to know.

Compatibility really means a couple things:
1. Protocol changes
2. Binary data format changes
3. Changes in public apis in the clients
4. Configs
5. Metric names
6. Command line tools

I think 1-2 are critical. 3 is very important. And 4, 5 and 6 are pretty
important but not critical.

One thing this implies is that we are really going to have to do a good job
of thinking about apis and use cases. You can definitely see a number of
places in the old clients and in a couple of the protocols where enough
care was not given to thinking things through. Some of those were from long
long ago, but we should really try to avoid adding to that set because
increasingly we will have to carry around these mistakes for a long time.

Here are a few things I thought we could do that might help us get better
in this area:

1. Technically we are just in a really bad place with the protocol because
it is defined twice--once in the old scala request objects, and once in the
new protocol format for the clients. This makes changes massively painful.
The good news is that the new request definition DSL was intended to make
adding new protocol versions a lot easier and clearer. It will also make it
a lot more obvious when the protocol is changed since you will be checking
in or reviewing a change to Protocol.java. Getting the server moved over to
the new request objects and protocol definition will be a bit of a slog but
it will really help here I think.

2. We need to get some testing in place on cross-version compatibility.
This is work and no tests here will be perfect, but I suspect with some
effort we could catch a lot of things.

3. I was also thinking it might be worth it to get a little bit more formal
about the review and discussion process for things which will have impact
to these public areas to ensure we end up with something we are happy with.
Python has a PIP process (https://www.python.org/dev/peps/pep-0257/) by
which major changes are made, and it might be worth it for us to do a
similar thing. We have essentially been doing this already--major changes
almost always have an associated wiki, but I think just getting a little
more rigorous might be good. The idea would be to just call out these wikis
as official proposals and do a full Apache discuss/vote thread for these
important change. We would use these for big features (security, log
compaction, etc) as well as for small changes that introduce or change a
public api/config/etc. This is a little heavier weight, but I think it is
really just critical that we get these things right and this would be a way
to call out this kind of change so that everyone would take the time to
look at them.

Thoughts?

-Jay

Reply via email to