Have you seen my grool system which allows simple MR programs to be written
simply?


yes, I did take a look. your success with it was part of the reason I went with Groovy first, instead of Jython or Jruby.

In addition, I have been working on a layer over Zookeeper to handle
collection of data feed oriented information about availability of files containing data. This is similar in some sense to Amazon's simple queue service except that it describes content as files, rather than opaque blobs passed through queues. This allows simpler retrospective processing of data. It would make a very good substrate for something like Cascades since it would allow clean coordination semantics between multiple workers on independent machines as well as provide notification of new (if desired) without polling. That would allow much lower latency systems to be built.


That sounds really cool. I haven't played with zookeeper yet. Most of our coordination has been easily satisfied with SQS and Cascadings internal topological scheduler (and associated event listener interfaces). But that will only go so far.

We are currently testing a Amazon EC2/Hadoop 'on demand' cluster tool that was extraordinarily trivial to implement (it's not generic enough to share yet though). But I can see this could fall apart without something like Zookeeper as things get more sophisticated, or need to run outside AWS.

Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/




Reply via email to