[ https://issues.apache.org/jira/browse/SPARK-9325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14942345#comment-14942345 ]
Russell Pierce commented on SPARK-9325: --------------------------------------- >From an outside perspective, I'll +1 this. In an earlier version of SparkR this wasn't supported... and its absence seemed ridiculous to me. It didn't help that errors were all the same leaving the source of the error unclear to me (thank goodness it is resolved: https://issues.apache.org/jira/browse/SPARK-8742). In my view as someone who programs heavily in R, this is a key feature for SparkR to be of use to me. In my use case (again not leveraging much of the power of Spark but leveraging my existing skills), I need Spark to provide a large back-end cached data warehouse that I can then subset to pull workable size pieces into R and do arbitrary processing. If you give a user like me collect(Column), then there is no need for you to give me count(Column), sum(Column), Ave(Column) etc - I'll do whatever processing I still need done in R. If I /really/ need to do it for the entire frame, I can just combine the results of my subsets (in R); but in that case I'm going to have to thrash out to disk for everything anyway and may just opt to do it via Hadoop/HIVE. > Support `collect` on DataFrame columns > -------------------------------------- > > Key: SPARK-9325 > URL: https://issues.apache.org/jira/browse/SPARK-9325 > Project: Spark > Issue Type: Sub-task > Components: SparkR > Reporter: Shivaram Venkataraman > > This is to support code of the form > ``` > ages <- collect(df$Age) > ``` > Right now `df$Age` returns a Column, which has no functions supported. > Similarly we might consider supporting `head(df$Age)` etc. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org