[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214263#comment-16214263 ] Apache Spark commented on SPARK-17902: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/19551 > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209958#comment-16209958 ] Hossein Falaki commented on SPARK-17902: A simple unit test we could add would be: {code} > df <- createDataFrame(iris) > sapply(iris, typeof) == sapply(collect(df, stringsAsFactors = T), typeof) Sepal.Length Sepal.Width Petal.Length Petal.Width Species TRUE TRUE TRUE TRUEFALSE {code} As for the solution, I suggest performing the conversion inside [this loop|https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L1168]. > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209927#comment-16209927 ] Shivaram Venkataraman commented on SPARK-17902: --- I think [~falaki] might have a test case that we could test against ? > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208685#comment-16208685 ] Hyukjin Kwon commented on SPARK-17902: -- Hi [~falaki] and [~shivaram], I was thinking a just simple way such as : {quote} if (stringsAsFactors) { df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], as.factor) } {quote} Would it make sense? > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572689#comment-15572689 ] Hossein Falaki commented on SPARK-17902: Thanks for the pointer [~shivaram]. I will submit it patch with a regression test. The only obvious side-effect of this bug, is that collected type will be String, while it should have been a Factor. What makes it bad is that it is in our documentation and it used to work, so it is a regression. > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors
[ https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572472#comment-15572472 ] Shivaram Venkataraman commented on SPARK-17902: --- Good catch - Looks like this was changed in https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c which is a part of 1.6.0. Do you have a small test case that fails ? > collect() ignores stringsAsFactors > -- > > Key: SPARK-17902 > URL: https://issues.apache.org/jira/browse/SPARK-17902 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.0.1 >Reporter: Hossein Falaki > > `collect()` function signature includes an optional flag named > `stringsAsFactors`. It seems it is completely ignored. > {code} > str(collect(createDataFrame(iris), stringsAsFactors = TRUE))) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org