Hey Guys, Just a friendly (and hopefully not too annoying) reminder about code layering. We have a couple of different systems that need to remain independent and not know about each other. Some examples of this are the network layer, the log system, the replication system, the controller code, etc. Various pieces of API-specific logic sit on top of these layers and integrate them into the functionality the system provides. The goal is to make these lower layers have the cleanest, most logical, and most general possible APIs that wrap up all the complicated stuff that they do.
An easy way to tell if we are doing the right thing is to think about how well the lowest levels of code makes sense if picked up and used on its own without the rest of kafka. Here is an example where we ended up breaking that layering: Log.scala: def getOffsetsBefore(request: OffsetRequest): Array[Long] We clearly didn't get the layering right here either. The method takes a OffsetRequest object, but the Log shouldn't know anything about the API layer. The implementation of the method is a bunch of logic very specific to some of the oddities of this API. The right way to implement this would be to add a more general API to get LogSegments from the log. Then in the API layer add all the specific types of filtering and munging needed to fullfill the contract of the API. What I have seen is that when we don't focus on this we end up with lots and lots of odd APIs and it tends to make the code really hard to evolve or to reason about what it is that the subsystem does. This is not a major problem, but being a little more rigorous at these layers helps keep the code beautiful. :-) Cheers, -Jay