Have you seen my grool system which allows simple MR programs to be
written
simply?
yes, I did take a look. your success with it was part of the reason I
went with Groovy first, instead of Jython or Jruby.
In addition, I have been working on a layer over Zookeeper to handle
collection of data feed oriented information about availability of
files
containing data. This is similar in some sense to Amazon's simple
queue
service except that it describes content as files, rather than
opaque blobs
passed through queues. This allows simpler retrospective processing
of
data. It would make a very good substrate for something like
Cascades since
it would allow clean coordination semantics between multiple workers
on
independent machines as well as provide notification of new (if
desired)
without polling. That would allow much lower latency systems to be
built.
That sounds really cool. I haven't played with zookeeper yet. Most of
our coordination has been easily satisfied with SQS and Cascadings
internal topological scheduler (and associated event listener
interfaces). But that will only go so far.
We are currently testing a Amazon EC2/Hadoop 'on demand' cluster tool
that was extraordinarily trivial to implement (it's not generic enough
to share yet though). But I can see this could fall apart without
something like Zookeeper as things get more sophisticated, or need to
run outside AWS.
Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/