Hello,

I'm attempting to create a data frame with correlations between every pair
of variables in a data frame, so that I can then sort by the value of the
correlation coefficient and see which pairs of variables are most strongly
correlated.

The sm2vec function in the corpcor library works very nicely as shown here:

library(Hmisc)
library(corpcor)

# Create example data
x1 = runif(50)
x2 = runif(50)
x3 = runif(50)
d = data.frame(x1=x1,x2=x2,x3=x3)
label(d$x1) = "Variable x1"
label(d$x2) = "Variable x2"
label(d$x3) = "Variable x3"

# Get correlations
cormat = cor(d)

# Get vector form of lower triangular elements
cors = sm2vec(cormat,diag=F)
inds = sm.index(cormat,diag=F)

# Create a data frame
var1 = dimnames(cormat)[[1]][inds[,1]]
var2 = dimnames(cormat)[[2]][inds[,2]]
lbl1 = label(d[,var1])
lbl2 = label(d[,var2])
cor_df = data.frame(Var1=lbl1,Var2=lbl2,Cor=cors)

The issue that I run into is when trying to get the labels in lbl1 and
lbl2.  I get the warning:

In mapply(FUN = label, x = x, default = default, MoreArgs = list(self =
TRUE),  :
  longer argument not a multiple of length of shorter

My usage of label seems ambiguous since the data frame could also a label
attached to it, aside from labels attached to variables within the data
frame.  However, the code above does work, with the warning.  Aside from
using a loop to get the label of one variable at a time, is there any other
way of getting the labels for all variables in the data frame?

Also, if there is a better way to achieve my goal of getting the
correlations between all variable pairs, I'd love to know.

Thanks in advance for any responses!

--Krishna

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to