Hello Team,

In my current usecase I am loading data from CSV using spark-csv and trying to 
correlate all variables.

As of now if we want to correlate 2 column in a dataframe df.stat.corr works 
great but if we want to correlate multiple columns this won't work.
In case of R we can use corrplot and correlate all numeric columns in a single 
line of code. Can you guide me how to achieve the same with dataframe or sql?

There seems a way in spark-mllib
http://spark.apache.org/docs/latest/mllib-statistics.html

[cid:image001.png@01D1B069.D3099410]

But it seems that it don't take input as dataframe...

Regards,
Ankur
Information transmitted by this e-mail is proprietary to YASH Technologies and/ 
or its Customers and is intended for use only by the individual or entity to 
which it is addressed, and may contain information that is privileged, 
confidential or exempt from disclosure under applicable law. If you are not the 
intended recipient or it appears that this mail has been forwarded to you 
without proper authority, you are notified that any use or dissemination of 
this information in any manner is strictly prohibited. In such cases, please 
notify us immediately at i...@yash.com and delete this mail from your records.

Reply via email to