It depends, personally I have the opposite opinion.

IMO expressing pipelines in a functional language feels natural, you just
have to get used with the language (scala).

Testing spark jobs is easy where testing a Pig script is much harder and
not natural.

If you want a more high level language that deals with RDDs for you, you
can use spark sql
http://people.apache.org/~pwendell/catalyst-docs/sql-programming-guide.html

Of course you can express less things this way, but if you have some
complex logic I think it would make sense to write a classic spark job that
would be more robust in the long term.


2014-04-25 15:30 GMT+02:00 Mark Baker <dist...@acm.org>:

> I've only had a quick look at Pig, but it seems that a declarative
> layer on top of Spark couldn't be anything other than a big win, as it
> allows developers to declare *what* they want, permitting the compiler
> to determine how best poke at the RDD API to implement it.
>
> In my brief time with Spark, I've often thought that it feels very
> unnatural to use imperative code to declare a pipeline.
>

Reply via email to