Weiqiang Zhuang created SPARK-10894:
---------------------------------------

             Summary: Add 'drop' support for DataFrame's subset function
                 Key: SPARK-10894
                 URL: https://issues.apache.org/jira/browse/SPARK-10894
             Project: Spark
          Issue Type: Improvement
          Components: SparkR
            Reporter: Weiqiang Zhuang


SparkR DataFrame can be subset to get one or more columns of the dataset. The 
current '[' implementation does not support 'drop' when is asked for just one 
column. This is not consistent with the R syntax:
x[i, j, ... , drop = TRUE]

# in R, when drop is FALSE, remain as data.frame
> class(iris[, "Sepal.Width", drop=F])
[1] "data.frame"
# when drop is TRUE (default), drop to be a vector
> class(iris[, "Sepal.Width", drop=T])
[1] "numeric"
> class(iris[,"Sepal.Width"])
[1] "numeric"

> df <- createDataFrame(sqlContext, iris)
# in SparkR, 'drop' argument has no impact
> class(df[,"Sepal_Width", drop=F])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
# should have dropped to be a Column class instead
> class(df[,"Sepal_Width", drop=T])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"
> class(df[,"Sepal_Width"])
[1] "DataFrame"
attr(,"package")
[1] "SparkR"

We should add the 'drop' support.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to