Hey Jun, I think the existing scala clients should just remain as they are. There is no point updating them, and as you say it would be quite fragile. The conversion to the new requests would just be for the server usage.
-Jay On Mon, Feb 9, 2015 at 9:48 AM, Jun Rao <j...@confluent.io> wrote: > We need to be a bit careful when doing 2b. Currently, our public apis > include SimpleConsumer, which unfortunately exposes our RPC > requests/responses. Doing 2b would mean api changes to SimpleConsumer. So, > if we want to do 2b before 3, we would need to agree on making such api > changes. Otherwise, 2b will need to be done after 3. > > Thanks, > > Jun > > On Sun, Feb 8, 2015 at 7:26 AM, Jay Kreps <jay.kr...@gmail.com> wrote: > > > Hey all, > > > > Someone asked about why there is code duplication between > org.apache.common > > and core. The answer seemed like it might be useful to others, so > including > > it here: > > > > Originally Kafka was more of a proof of concept and we didn't separate > the > > clients from the server. LinkedIn was much smaller and it wasn't open > > source, and keeping those separate always adds a lot of overhead. So we > > ended up with just one big jar. > > > > Next thing we know the kafka jar is embedded everywhere. Lot's of fallout > > from that > > - It has to be really sensitive to dependencies > > - Scala causes all kinds of pain for users. Ironically it causes the most > > pain for people using scala because of compatibility. I think the single > > biggest Kafka complaint was the scala clients and resulting scary > > exceptions, lack of javadoc, etc. > > - Many of the client interfaces weren't well thought out as permanent > > long-term commitments. > > - We new we had to rewrite both clients due to technical deficiencies > > anyway. The clients really needed to move to non-blocking I/O which is > > basically a rewrite on it's own. > > > > So how to go about that? > > > > Well we felt we needed to maintain the old client interfaces for a good > > period of time. Any kind of breaking cut-over was kind of a non-starter. > > But a major refactoring in place was really hard since so many classes > were > > public and so little attention had been paid to the difference between > > public and private classes. > > > > Naturally since the client and server do the inverse of each other there > is > > a ton of shared logic. So we thought we needed to break it up into three > > independent chunks: > > 1. common - shared helper code used by both clients and server > > 2. clients - the producer, consumer, and eventually admin java > interfaces. > > This depends on common. > > 3. server - the server (and legacy clients). This is currently called > core. > > This will depend on common and clients (because sometimes the server > needs > > to make client requests) > > > > Common and clients were left as a single jar and just logically separate > so > > that people wouldn't have to deal with two jars (and hence the > possibility > > of getting different versions of each). > > > > The dependency is actually a little counter-intuitive to people--they > > usually think of the client as depending on the server since the client > > calls the server. But in terms of code dependencies it is the other > way--if > > you depend on the client you obviously don't want to drag in the server. > > > > So to get all this done we decided to just go big and do a rewrite of the > > clients in Java. A result of this is that any shared code would have to > > move to Java (so the clients don't pull in Scala). We felt this was > > probably a good thing in its own right as it gave a chance to improve a > few > > of these utility libraries like config parsing, etc. > > > > So the plan was and is: > > 1. Rewrite producer, release and roll out > > 2a. Rewrite consumer, release and roll out > > 2b. Migrate server from scala code to org.apache.common classes > > 3. Deprecate scala clients > > > > (2a) Is is in flight now, and that means (2b) is totally up for grabs. Of > > these the request conversion is definitely the most pressing since having > > those defined twice duplicates a ton of work. We will have to be > > hyper-conscientious during the conversion about making the shared code in > > common really solve the problem well and conveniently on the server as > well > > (so we don't end up just shoe-horning it in). My hope is that we can > treat > > this common code really well--it isn't as permanent as the public classes > > but ends up heavily used so we should take good care of it. Most the > shared > > code is private so we can refactor the stuff in common to meet the needs > of > > the server if we find mismatches or missing functionality. I tried to > keep > > in mind the eventual server usage while writing it, but I doubt it will > be > > as trivial as just deleting the old and adding the new. > > > > In terms of the simplicity: > > - Converting exceptions should be trivial > > - Converting utils is straight-forward but we should evaluate the > > individual utilities and see if they actually make sense, have tests, are > > used, etc. > > - Converting the requests may not be too complex but touches a huge hunk > of > > code and may require some effort to decouple the network layer. > > - Converting the network code will be delicate and may require some > changes > > in org.apache.common.network to meet the server's needs > > > > This is all a lot of work, but if we stick to it at the end we will have > > really nice clients and a nice modular code base. :-) > > > > Cheers, > > > > -Jay > > >