[ https://issues.apache.org/jira/browse/SPARK-10385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15081538#comment-15081538 ]
Gaurav Kumar edited comment on SPARK-10385 at 1/4/16 6:35 PM: -------------------------------------------------------------- Is this done? This has also been quoted on the Spark-1.6 release page. was (Author: gauravkumar37): I believe this is done and has also been quoted on the Spark-1.6 release page. > Bivariate statistics in DataFrames > ---------------------------------- > > Key: SPARK-10385 > URL: https://issues.apache.org/jira/browse/SPARK-10385 > Project: Spark > Issue Type: Umbrella > Components: ML, SQL > Reporter: Xiangrui Meng > Assignee: Burak Yavuz > > Similar to SPARK-10384, it would be nice to have bivariate statistics support > in DataFrames (defined as UDAFs). This JIRA discuss general implementation > and track subtasks. Bivariate statistics include: > * continuous: covariance, Pearson's correlation (SPARK-9298), and Spearman's > correlation > * categorical: ?? > If we define them as UDAFs, it would be flexible to use them with DataFrames, > e.g., > {code} > df.groupBy("key").agg(corr("x", "y")) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org