Thanks for sharing. We have need to expose hadoop cluster to 'casual' users for ad-hoc query, I find it difficult to ask them to write map reduce program, pig latin comes in very handy in this case. However, for continuous production data processing, hadoop+cascading sounds like a good option.
Haijun -----Original Message----- From: Ted Dunning [mailto:[EMAIL PROTECTED] Sent: Wednesday, June 11, 2008 5:01 PM To: core-user@hadoop.apache.org Subject: Re: does anyone have idea on how to run multiple sequential jobs with bash script Pig is much more ambitious than cascading. Because of the ambitions, simple things got overlooked. For instance, something as simple as computing a file name to load is not possible in pig, nor is it possible to write functions in pig. You can hook to Java functions (for some things), but you can't really write programs in pig. On the other hand, pig may eventually provide really incredible capabilities including program rewriting and optimization that would be incredibly hard to write directly in Java. The point of cascading was simply to make life easier for a normal Java/map-reduce programmer. It provides an abstraction for gluing together several map-reduce programs and for doing a few common things like joins. Because you are still writing Java (or Groovy) code, you have all of the functionality you always had. But, this same benefit costs you the future in terms of what optimizations are likely to ever be possible. The summary for us (especially 4-6 months ago when we were deciding) is that cascading is good enough to use now and pig will probably be more useful later. On Wed, Jun 11, 2008 at 4:19 PM, Haijun Cao <[EMAIL PROTECTED]> wrote: > > I find cascading very similar to pig, do you care to provide your comment > here? If map reduce programmers are to go to the next level (scripting/query > language), which way to go? > > >