[jira] [Commented] (SPARK-9999) RDD-like API on top of Catalyst/DataFrame

Sen Fang (JIRA) Tue, 29 Sep 2015 19:25:18 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9999?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14936268#comment-14936268
 ]


Sen Fang commented on SPARK-9999:
---------------------------------

Another idea is do something similar to F# TypeProvider approach: 
http://fsharp.github.io/FSharp.Data/
I haven't looked into this extensively just yet but as far as I understand this 
uses compile time macro to generate classes based on data sources. In that 
sense, it is slightly similar to protobuf where you generate Java class based 
on schema definition. This makes dataframe type safe at the very upstream. With 
a bit of IDE plugin, you will even able to have autocomplete and type check 
when you write code, which would be very nice. I'm not sure if it will be 
scalable to propagate these type information down stream (in aggregation or 
transformed dataframe) though. As I understand, the macro and type provider in 
Scala provides similar capabilities.

> RDD-like API on top of Catalyst/DataFrame
> -----------------------------------------
>
>                 Key: SPARK-9999
>                 URL: https://issues.apache.org/jira/browse/SPARK-9999
>             Project: Spark
>          Issue Type: Story
>          Components: SQL
>            Reporter: Reynold Xin
>
> The RDD API is very flexible, and as a result harder to optimize its 
> execution in some cases. The DataFrame API, on the other hand, is much easier 
> to optimize, but lacks some of the nice perks of the RDD API (e.g. harder to 
> use UDFs, lack of strong types in Scala/Java).
> As a Spark user, I want an API that sits somewhere in the middle of the 
> spectrum so I can write most of my applications with that API, and yet it can 
> be optimized well by Spark to achieve performance and stability.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9999) RDD-like API on top of Catalyst/DataFrame

Reply via email to