> was thinking of using cascading, but cascading, requires me for each change 
> in the data flow, to recompile and deploy. Maybe cascading can be part of the 
> implementation but not the solution.

Cascading is well suited for this.

Multitool was written with Cascading, you can spawn reasonably complex 
filtering, conversion, and joins from the command line (no recompiling). Amazon 
promotes this for searching S3 buckets from EMR.

Cascading.JRuby allows you to creating complex jobs from a jruby script, no 
compiling. Etsy uses this for their web site funnel analysis.

Cascalog is much more sophisticated, and can be driven from a Clojure shell 
(repl), obviously no compiling there either. Quite a few companies use this to 
power their analytics and analysis.

all of which can be found here
http://www.cascading.org/modules.html

And a number of companies have built proprietary web UI's to Hadoop with 
Cascading as the query planner and processing engine. Some of which will ship 
as products this year.

fyi, there will be a Cascalog workshop this Saturday (I'll be attending)
http://www.cascading.org/2011/02/cascalog-workshop-february-19t.html

cheers,
chris

--
Chris K Wensel
ch...@concurrentinc.com
http://www.concurrentinc.com

Reply via email to