What do you think is preventing you from optimizing your own RDD-level
transformations and actions?  AFAIK, nothing that has been added in
Catalyst precludes you from doing that.  The fact of the matter is, though,
that there is less type and semantic information available to Spark from
the raw RDD API than from using Spark SQL, DataFrames or DataSets.  That
means that Spark itself can't optimize for raw RDDs the same way that it
can for higher-level constructs that can leverage Catalyst; but if you want
to write your own optimizations based on your own knowledge of the data
types and semantics that are hiding in your raw RDDs, there's no reason
that you can't do that.

On Mon, Jan 25, 2016 at 9:35 AM, Nirav Patel <npa...@xactlycorp.com> wrote:

> Hi,
>
> Perhaps I should write a blog about this that why spark is focusing more
> on writing easier spark jobs and hiding underlaying performance
> optimization details from a seasoned spark users. It's one thing to provide
> such abstract framework that does optimization for you so you don't have to
> worry about it as a data scientist or data analyst but what about
> developers who do not want overhead of SQL and Optimizers and unnecessary
> abstractions ! Application designer who knows their data and queries should
> be able to optimize at RDD level transformations and actions. Does spark
> provides a way to achieve same level of optimization by using either SQL
> Catalyst or raw RDD transformation?
>
> Thanks
>
>
>
>
>
> [image: What's New with Xactly] <http://www.xactlycorp.com/email-click/>
>
> <https://www.nyse.com/quote/XNYS:XTLY>  [image: LinkedIn]
> <https://www.linkedin.com/company/xactly-corporation>  [image: Twitter]
> <https://twitter.com/Xactly>  [image: Facebook]
> <https://www.facebook.com/XactlyCorp>  [image: YouTube]
> <http://www.youtube.com/xactlycorporation>

Reply via email to