Thomas, Thanks for you comments, it is an intriguing list of points you lay down below and in some sense it highlights the fact that there is still work to be done. I find it sad, though, that you decided to frame it in such a destructive way, picturing ZooKeeper as a proof-of-concept, poorly designed system. I certainly don't share your view, and the fact that people use it and invest on it makes me think that it is not as bad as you put it. There are other (and possibly better) ways of implementing the same or similar functionality, and it is great to hear that you have good ideas for how to do it. If you are able to develop such a system and form a community around it, then I'd certainly consider contributing to it. 

-Flavio 

On May 19, 2011, at 11:21 AM, Thomas Koch wrote:

Hi,

as you may have noticed, I haven't been active in the ZooKeeper project
anymore for a couple of months. I'm a full time student again since march so
that any further activity in Hadoop/ZooKeeper would need to be auto-motivated.

Since I don't want to just fade away and I'll still give a talk about
ZooKeeper on the BerlinBuzzWords conf (Berlin, june 6/7), I listed the reasons
why I wouldn't like to work on the current ZooKeeper code base anymore.

I plan the following structure for my talk:

1) theoretical model / protocol of ZooKeeper
2) practical applications, projects using ZooKeeper
3) shortcomings of the current ZooKeeper code base

A tentative brain dump of part three is listed below. I appreciate any
comments that could help me to give a balanced presentation of the ZooKeeper
project.

If I'd need a ZooKeeper implementation right now I'd probably do a minimal-
feature rewrite in Scala + Akka. I do appreciate ZooKeeper as an invaluable
proof-of-concept implementation and pioneer. But as in american history there
should come others after the pioneers that don't look like Clint Eastwood
anymore and build more tidy things.

The list:

* The code is tightly coupled
* most so called "Unit-Tests" are actualy integration tests. They run the
whole application and test one specific functionality.

* no uniform configuration: command line parameters, system properties,
configuration file (java properties)
* configuration properties copied to static class members

* feature bloat on fragile foundation: e.g. chroot + automatic resubscribtion
does not work

* implementation unlike specification: allowed characters in path

* still on ant instead of maven (depends how you see ant vs. maven)

* circular object dependencies (e.g. ZooKeeper <-> ClientCnxn)

* methods with +100 lines of code and nested conditions depth well over 5

* general attitude against refactoring, no knowledge or appreciation of
"effective java" (Josh Bloch) or "clean code" (Robert C. Martin)

* magic numbers instead of enum

* still bound to inline copy of jute (HadoopIO, avro predecessor)
* even hand coded (de)serialization in leader election

* no client-only jar. Every client gets the full server code.

* unhandy API triggered (at least) two client API wrappers: zkClient, cages

* insane amounts of code duplication

* horrible, fragile thread programming: plenty of "XYZ extends Threads"
instead of
 - implements runnable
 - or better: executor framework
 - or much better: actors (see Akka)
 -> leads to fear of refactoring, because nobody understands all
synchronization needs.

Best regards,

Thomas Koch, http://www.koch.ro

flavio
junqueira
 
research scientist
 
f...@yahoo-inc.com
direct +34 93-183-8828
 
avinguda diagonal 177, 8th floor, barcelona, 08018, es
phone (408) 349 3300    fax (408) 349 3301


Reply via email to