[ https://issues.apache.org/jira/browse/SPARK-16468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15371188#comment-15371188 ]
Shivaram Venkataraman commented on SPARK-16468: ----------------------------------------------- Are the character columns problem fixed by SPARK-16429 ? Regarding the rounding, I think we just create a R data.frame and then let R format it. Could you check what the output of `options("digits")` is in your R session ? cc [~dongjoon] > Confusing results when describe() used on DataFrame with chr columns > -------------------------------------------------------------------- > > Key: SPARK-16468 > URL: https://issues.apache.org/jira/browse/SPARK-16468 > Project: Spark > Issue Type: Bug > Components: SparkR > Affects Versions: 1.6.1 > Environment: Databricks.com > Reporter: Neil Dewar > Priority: Minor > > The describe() function returns statistical summaries on numeric columns of a > DataFrame. If the DataFrame contains columns of type chr, only the count, > min and max stats are returned. > When a dataframe contains a mixture of numeric and chr columns, the results > become jumbled together. > Example: > sdfR <- createDataFrame(sqlContext, ToothGrowth) > collect(describe(sdfR)) > Results: > summary len supp dose > 1 count 60 60 60 > 2 mean 18.813333333333336 1.1666666666666667 > 3 stddev 7.649315171887615 0.6288721857330792 > 4 min 4.2 OJ 0.5 > 5 max 33.9 VC 2.0 > There appear to be two problems here: > (1) The mean and stdev values have not been rounded for the columns where > there are valid values > (2) There is no ability to distinguish that the supp column has no values in > mean and stdev rows. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org