RE: does anyone have idea on how to run multiple sequential jobs with bash script

Haijun Cao Wed, 11 Jun 2008 17:55:38 -0700

Thanks for sharing. We have need to expose hadoop cluster to 'casual' users for 
ad-hoc query, I find it difficult to ask them to write map reduce program, pig 
latin comes in very handy in this case. However, for continuous production data 
processing, hadoop+cascading sounds like a good option.

Haijun

-----Original Message-----
From: Ted Dunning [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, June 11, 2008 5:01 PM
To: core-user@hadoop.apache.org
Subject: Re: does anyone have idea on how to run multiple sequential jobs with 
bash script

Pig is much more ambitious than cascading.  Because of the ambitions, simple
things got overlooked.  For instance, something as simple as computing a
file name to load is not possible in pig, nor is it possible to write
functions in pig.  You can hook to Java functions (for some things), but you
can't really write programs in pig.  On the other hand, pig may eventually
provide really incredible capabilities including program rewriting and
optimization that would be incredibly hard to write directly in Java.

The point of cascading was simply to make life easier for a normal
Java/map-reduce programmer.  It provides an abstraction for gluing together
several map-reduce programs and for doing a few common things like joins.
Because you are still writing Java (or Groovy) code, you have all of the
functionality you always had.  But, this same benefit costs you the future
in terms of what optimizations are likely to ever be possible.

The summary for us (especially 4-6 months ago when we were deciding) is that
cascading is good enough to use now and pig will probably be more useful
later.

On Wed, Jun 11, 2008 at 4:19 PM, Haijun Cao <[EMAIL PROTECTED]> wrote:

>
> I find cascading very similar to pig, do you care to provide your comment
> here? If map reduce programmers are to go to the next level (scripting/query
> language), which way to go?
>
>
>

RE: does anyone have idea on how to run multiple sequential jobs with bash script

Reply via email to