Re: dataframe stat corr for multiple columns

2016-05-19 Thread Sun Rui
There is an existing JIRA issue for it: https://issues.apache.org/jira/browse/SPARK-11057 Also there is an PR. Maybe we should help to review and merge it with a higher priority. > On May 20, 2016, at 00:09, Xiangrui Meng

Re: dataframe stat corr for multiple columns

2016-05-19 Thread Xiangrui Meng
This is nice to have. Please create a JIRA for it. Right now, you can merge all columns into a vector column using RFormula or VectorAssembler, then convert it into an RDD and call corr from MLlib. On Tue, May 17, 2016, 7:09 AM Ankur Jain wrote: > Hello Team, > > > > In my

dataframe stat corr for multiple columns

2016-05-17 Thread Ankur Jain
Hello Team, In my current usecase I am loading data from CSV using spark-csv and trying to correlate all variables. As of now if we want to correlate 2 column in a dataframe df.stat.corr works great but if we want to correlate multiple columns this won't work. In case of R we can use corrplot