[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns

Russell Pierce (JIRA) Sat, 03 Oct 2015 08:07:37 -0700

    [ 
https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942345#comment-14942345
 ]


Russell Pierce commented on SPARK-9325:
---------------------------------------

>From an outside perspective, I'll +1 this.

In an earlier version of SparkR this wasn't supported... and its absence seemed 
ridiculous to me.  It didn't help that errors were all the same leaving the 
source of the error unclear to me (thank goodness it is resolved: 
https://issues.apache.org/jira/browse/SPARK-8742).  In my view as someone who 
programs heavily in R, this is a key feature for SparkR to be of use to me.  In 
my use case (again not leveraging much of the power of Spark but leveraging my 
existing skills), I need Spark to provide a large back-end cached data 
warehouse that I can then subset to pull workable size pieces into R and do 
arbitrary processing.  If you give a user like me collect(Column), then there 
is no need for you to give me count(Column), sum(Column), Ave(Column) etc - 
I'll do whatever processing I still need done in R.  If I /really/ need to do 
it for the entire frame, I can just combine the results of my subsets (in R); 
but in that case I'm going to have to thrash out to disk for everything anyway 
and may just opt to do it via Hadoop/HIVE.

> Support `collect` on DataFrame columns
> --------------------------------------
>
>                 Key: SPARK-9325
>                 URL: https://issues.apache.org/jira/browse/SPARK-9325
>             Project: Spark
>          Issue Type: Sub-task
>          Components: SparkR
>            Reporter: Shivaram Venkataraman
>
> This is to support code of the form 
> ```
> ages <- collect(df$Age)
> ```
> Right now `df$Age` returns a Column, which has no functions supported.
> Similarly we might consider supporting `head(df$Age)` etc.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-9325) Support `collect` on DataFrame columns

Reply via email to