[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2017-10-22 Thread Apache Spark (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16214263#comment-16214263
 ] 

Apache Spark commented on SPARK-17902:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/19551

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2017-10-18 Thread Hossein Falaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209958#comment-16209958
 ] 

Hossein Falaki commented on SPARK-17902:


A simple unit test we could add would be:

{code}
> df <- createDataFrame(iris)
> sapply(iris, typeof) == sapply(collect(df, stringsAsFactors = T), typeof)
Sepal.Length  Sepal.Width Petal.Length  Petal.Width  Species
TRUE TRUE TRUE TRUEFALSE
{code}

As for the solution, I suggest performing the conversion inside [this 
loop|https://github.com/apache/spark/blob/master/R/pkg/R/DataFrame.R#L1168].

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2017-10-18 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16209927#comment-16209927
 ] 

Shivaram Venkataraman commented on SPARK-17902:
---

I think [~falaki] might have a test case that we could test against ?

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2017-10-17 Thread Hyukjin Kwon (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16208685#comment-16208685
 ] 

Hyukjin Kwon commented on SPARK-17902:
--

Hi [~falaki] and [~shivaram], I was thinking a just simple way such as :

{quote}
if (stringsAsFactors) {
  df[sapply(df, is.character)] <- lapply(df[sapply(df, is.character)], 
as.factor)
}
{quote}

Would it make sense?

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2016-10-13 Thread Hossein Falaki (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572689#comment-15572689
 ] 

Hossein Falaki commented on SPARK-17902:


Thanks for the pointer [~shivaram]. I will submit it patch with a regression 
test. The only obvious side-effect of this bug, is that collected type will be 
String, while it should have been a Factor. What makes it bad is that it is in 
our documentation and it used to work, so it is a regression.

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-17902) collect() ignores stringsAsFactors

2016-10-13 Thread Shivaram Venkataraman (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-17902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15572472#comment-15572472
 ] 

Shivaram Venkataraman commented on SPARK-17902:
---

Good catch - Looks like this was changed in 
https://github.com/apache/spark/commit/71a138cd0e0a14e8426f97877e3b52a562bbd02c 
which is a part of 1.6.0. Do you have a small test case that fails ?

> collect() ignores stringsAsFactors
> --
>
> Key: SPARK-17902
> URL: https://issues.apache.org/jira/browse/SPARK-17902
> Project: Spark
>  Issue Type: Bug
>  Components: SparkR
>Affects Versions: 2.0.1
>Reporter: Hossein Falaki
>
> `collect()` function signature includes an optional flag named 
> `stringsAsFactors`. It seems it is completely ignored.
> {code}
> str(collect(createDataFrame(iris), stringsAsFactors = TRUE)))
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org