Weiqiang Zhuang created SPARK-10894: ---------------------------------------
Summary: Add 'drop' support for DataFrame's subset function Key: SPARK-10894 URL: https://issues.apache.org/jira/browse/SPARK-10894 Project: Spark Issue Type: Improvement Components: SparkR Reporter: Weiqiang Zhuang SparkR DataFrame can be subset to get one or more columns of the dataset. The current '[' implementation does not support 'drop' when is asked for just one column. This is not consistent with the R syntax: x[i, j, ... , drop = TRUE] # in R, when drop is FALSE, remain as data.frame > class(iris[, "Sepal.Width", drop=F]) [1] "data.frame" # when drop is TRUE (default), drop to be a vector > class(iris[, "Sepal.Width", drop=T]) [1] "numeric" > class(iris[,"Sepal.Width"]) [1] "numeric" > df <- createDataFrame(sqlContext, iris) # in SparkR, 'drop' argument has no impact > class(df[,"Sepal_Width", drop=F]) [1] "DataFrame" attr(,"package") [1] "SparkR" # should have dropped to be a Column class instead > class(df[,"Sepal_Width", drop=T]) [1] "DataFrame" attr(,"package") [1] "SparkR" > class(df[,"Sepal_Width"]) [1] "DataFrame" attr(,"package") [1] "SparkR" We should add the 'drop' support. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org