> was thinking of using cascading, but cascading, requires me for each change > in the data flow, to recompile and deploy. Maybe cascading can be part of the > implementation but not the solution.
Cascading is well suited for this. Multitool was written with Cascading, you can spawn reasonably complex filtering, conversion, and joins from the command line (no recompiling). Amazon promotes this for searching S3 buckets from EMR. Cascading.JRuby allows you to creating complex jobs from a jruby script, no compiling. Etsy uses this for their web site funnel analysis. Cascalog is much more sophisticated, and can be driven from a Clojure shell (repl), obviously no compiling there either. Quite a few companies use this to power their analytics and analysis. all of which can be found here http://www.cascading.org/modules.html And a number of companies have built proprietary web UI's to Hadoop with Cascading as the query planner and processing engine. Some of which will ship as products this year. fyi, there will be a Cascalog workshop this Saturday (I'll be attending) http://www.cascading.org/2011/02/cascalog-workshop-february-19t.html cheers, chris -- Chris K Wensel ch...@concurrentinc.com http://www.concurrentinc.com