Hi, as you may have noticed, I haven't been active in the ZooKeeper project anymore for a couple of months. I'm a full time student again since march so that any further activity in Hadoop/ZooKeeper would need to be auto-motivated.
Since I don't want to just fade away and I'll still give a talk about ZooKeeper on the BerlinBuzzWords conf (Berlin, june 6/7), I listed the reasons why I wouldn't like to work on the current ZooKeeper code base anymore. I plan the following structure for my talk: 1) theoretical model / protocol of ZooKeeper 2) practical applications, projects using ZooKeeper 3) shortcomings of the current ZooKeeper code base A tentative brain dump of part three is listed below. I appreciate any comments that could help me to give a balanced presentation of the ZooKeeper project. If I'd need a ZooKeeper implementation right now I'd probably do a minimal- feature rewrite in Scala + Akka. I do appreciate ZooKeeper as an invaluable proof-of-concept implementation and pioneer. But as in american history there should come others after the pioneers that don't look like Clint Eastwood anymore and build more tidy things. The list: * The code is tightly coupled * most so called "Unit-Tests" are actualy integration tests. They run the whole application and test one specific functionality. * no uniform configuration: command line parameters, system properties, configuration file (java properties) * configuration properties copied to static class members * feature bloat on fragile foundation: e.g. chroot + automatic resubscribtion does not work * implementation unlike specification: allowed characters in path * still on ant instead of maven (depends how you see ant vs. maven) * circular object dependencies (e.g. ZooKeeper <-> ClientCnxn) * methods with +100 lines of code and nested conditions depth well over 5 * general attitude against refactoring, no knowledge or appreciation of "effective java" (Josh Bloch) or "clean code" (Robert C. Martin) * magic numbers instead of enum * still bound to inline copy of jute (HadoopIO, avro predecessor) * even hand coded (de)serialization in leader election * no client-only jar. Every client gets the full server code. * unhandy API triggered (at least) two client API wrappers: zkClient, cages * insane amounts of code duplication * horrible, fragile thread programming: plenty of "XYZ extends Threads" instead of - implements runnable - or better: executor framework - or much better: actors (see Akka) -> leads to fear of refactoring, because nobody understands all synchronization needs. Best regards, Thomas Koch, http://www.koch.ro