[ https://issues.apache.org/jira/browse/SPARK-6117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Apache Spark reassigned SPARK-6117: ----------------------------------- Assignee: Apache Spark > describe function for summary statistics > ---------------------------------------- > > Key: SPARK-6117 > URL: https://issues.apache.org/jira/browse/SPARK-6117 > Project: Spark > Issue Type: Sub-task > Components: SQL > Reporter: Reynold Xin > Assignee: Apache Spark > Labels: starter > > DataFrame.describe should return a DataFrame with summary statistics. > {code} > def describe(cols: String*): DataFrame > {code} > If cols is empty, then run describe on all numeric columns. > The returned DataFrame should have 5 rows (count, mean, stddev, min, max) and > n + 1 columns. The 1st column is the name of the aggregate function, and the > next n columns are the numeric columns of interest in the input DataFrame. > Similar to Pandas (but removing percentile since accurate percentiles are too > expensive to compute for Big Data) > {code} > In [19]: df.describe() > Out[19]: > A B C D > count 6.000000 6.000000 6.000000 6.000000 > mean 0.073711 -0.431125 -0.687758 -0.233103 > std 0.843157 0.922818 0.779887 0.973118 > min -0.861849 -2.104569 -1.509059 -1.135632 > max 1.212112 0.567020 0.276232 1.071804 > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org