Re: Groovy Scripting for Hadoop

Chris K Wensel Tue, 06 May 2008 08:09:07 -0700

Have you seen my grool system which allows simple MR programs to bewritten
simply?

yes, I did take a look. your success with it was part of the reason Iwent with Groovy first, instead of Jython or Jruby.

In addition, I have been working on a layer over Zookeeper to handle
collection of data feed oriented information about availability offilescontaining data. This is similar in some sense to Amazon's simplequeueservice except that it describes content as files, rather thanopaque blobspassed through queues. This allows simpler retrospective processingofdata. It would make a very good substrate for something likeCascades sinceit would allow clean coordination semantics between multiple workersonindependent machines as well as provide notification of new (ifdesired)without polling. That would allow much lower latency systems to bebuilt.

That sounds really cool. I haven't played with zookeeper yet. Most ofour coordination has been easily satisfied with SQS and Cascadingsinternal topological scheduler (and associated event listenerinterfaces). But that will only go so far.

We are currently testing a Amazon EC2/Hadoop 'on demand' cluster toolthat was extraordinarily trivial to implement (it's not generic enoughto share yet though). But I can see this could fall apart withoutsomething like Zookeeper as things get more sophisticated, or need torun outside AWS.


Chris K Wensel
[EMAIL PROTECTED]
http://chris.wensel.net/
http://www.cascading.org/

Re: Groovy Scripting for Hadoop

Reply via email to