This is nice to have. Please create a JIRA for it. Right now, you can merge
all columns into a vector column using RFormula or VectorAssembler, then
convert it into an RDD and call corr from MLlib.

On Tue, May 17, 2016, 7:09 AM Ankur Jain <ankur.j...@yash.com> wrote:

> Hello Team,
>
>
>
> In my current usecase I am loading data from CSV using spark-csv and
> trying to correlate all variables.
>
>
>
> As of now if we want to correlate 2 column in a dataframe * df.stat.corr*
> works great but if we want to correlate multiple columns this won’t work.
>
> In case of R we can use corrplot and correlate all numeric columns in a
> single line of code. Can you guide me how to achieve the same with
> dataframe or sql?
>
>
>
> There seems a way in spark-mllib
>
> http://spark.apache.org/docs/latest/mllib-statistics.html
>
>
>
>
>
> But it seems that it don’t take input as dataframe…
>
>
>
> Regards,
>
> Ankur
> Information transmitted by this e-mail is proprietary to YASH Technologies
> and/ or its Customers and is intended for use only by the individual or
> entity to which it is addressed, and may contain information that is
> privileged, confidential or exempt from disclosure under applicable law. If
> you are not the intended recipient or it appears that this mail has been
> forwarded to you without proper authority, you are notified that any use or
> dissemination of this information in any manner is strictly prohibited. In
> such cases, please notify us immediately at i...@yash.com and delete this
> mail from your records.
>

Reply via email to