Salvatore,

I won't comment on whether to use log weight "to increase the correlation" -- that depends on whether that makes sense, and whether the relationships with other variables is more nearly linear.

Try this with your pca of the correlation matrix:

biplot(pca_morpho)

You'll see that the  first component is defined largely by the
large correlations among length, interoc,and cwidth
while component2 is largely determined by weight.

You should probably do some reading on PCA or get some
statistical consulting at OSU to decide what to do with this.

 hope this helps

-Michael


On 11/13/16 12:46 AM, Sidoti, Salvatore A. wrote:
Let's say I perform 4 measurements on an animal: three are linear measurements 
in millimeters and the fourth is its weight in milligrams. So, we have a data 
set with mixed units.

Based on these four correlated measurements, I would like to obtain one "score" or value 
that describes an individual animal's size. I considered simply taking the geometric mean of these 
4 measurements, and that would give me a "score" - larger values would be for larger 
animals, etc.

However, this assumes that all 4 of these measurements contribute equally to an 
animal's size. Of course, more than likely this is not the case. I then 
performed a PCA to discover how much influence each variable had on the overall 
data set. I was hoping to use this analysis to refine my original approach.

I honestly do not know how to apply the information from the PCA to this 
particular problem...

I do know, however, that principle components 1 and 2 capture enough of the 
variation to reduce the number of dimensions down to 2 (see analysis below with 
the original data set).

Note: animal weights were ln() transformed to increase correlation with the 3 
other variables.

df <- data.frame(
  weight = log(1000*c(0.0980, 0.0622, 0.0600, 0.1098, 0.0538, 0.0701, 0.1138, 
0.0540, 0.0629, 0.0930,
             0.0443, 0.1115, 0.1157, 0.0734, 0.0616, 0.0640, 0.0480, 0.1339, 
0.0547, 0.0844,
             0.0431, 0.0472, 0.0752, 0.0604, 0.0713, 0.0658, 0.0538, 0.0585, 
0.0645, 0.0529,
             0.0448, 0.0574, 0.0577, 0.0514, 0.0758, 0.0424, 0.0997, 0.0758, 
0.0649, 0.0465,
             0.0748, 0.0540, 0.0819, 0.0732, 0.0725, 0.0730, 0.0777, 0.0630, 
0.0466)),
  interoc = c(0.853, 0.865, 0.811, 0.840, 0.783, 0.868, 0.818, 0.847, 0.838, 
0.799,
              0.737, 0.788, 0.731, 0.777, 0.863, 0.877, 0.814, 0.926, 0.767, 
0.746,
              0.700, 0.768, 0.807, 0.753, 0.809, 0.788, 0.750, 0.815, 0.757, 
0.737,
              0.759, 0.863, 0.747, 0.838, 0.790, 0.676, 0.857, 0.728, 0.743, 
0.870,
              0.787, 0.773, 0.829, 0.785, 0.746, 0.834, 0.829, 0.750, 0.842),
  cwidth = c(3.152, 3.046, 3.139, 3.181, 3.023, 3.452, 2.803, 3.050, 3.160, 
3.186,
             2.801, 2.862, 3.183, 2.770, 3.207, 3.188, 2.969, 3.033, 2.972, 
3.291,
             2.772, 2.875, 2.978, 3.094, 2.956, 2.966, 2.896, 3.149, 2.813, 
2.935,
             2.839, 3.152, 2.984, 3.037, 2.888, 2.723, 3.342, 2.562, 2.827, 
2.909,
             3.093, 2.990, 3.097, 2.751, 2.877, 2.901, 2.895, 2.721, 2.942),
  clength = c(3.889, 3.733, 3.762, 4.059, 3.911, 3.822, 3.768, 3.814, 3.721, 
3.794,
              3.483, 3.863, 3.856, 3.457, 3.996, 3.876, 3.642, 3.978, 3.534, 
3.967,
              3.429, 3.518, 3.766, 3.755, 3.706, 3.785, 3.607, 3.922, 3.453, 
3.589,
              3.508, 3.861, 3.706, 3.593, 3.570, 3.341, 3.916, 3.336, 3.504, 
3.688,
              3.735, 3.724, 3.860, 3.405, 3.493, 3.586, 3.545, 3.443, 3.640))

pca_morpho <- princomp(df, cor = TRUE)

summary(pca_morpho)

Importance of components:
                                        Comp.1          Comp.2          Comp.3  
        Comp.4
Standard deviation      1.604107        0.8827323       0.7061206       
0.3860275
Proportion of Variance  0.643290        0.1948041       0.1246516       
0.0372543
Cumulative Proportion   0.643290        0.8380941       0.9627457       
1.0000000

Loadings:
                        Comp.1  Comp.2  Comp.3  Comp.4
weight          -0.371          0.907                           -0.201
interoc         -0.486  -0.227  -0.840
cwidth          -0.537  -0.349          0.466           -0.611
clength         -0.582                          0.278   0.761

                        Comp.1  Comp.2  Comp.3  Comp.4
SS loadings             1.00            1.00            1.00            1.00
Proportion Var          0.25            0.25            0.25            0.25
Cumulative Var          0.25            0.50            0.75            1.00

Any guidance will be greatly appreciated!

Salvatore A. Sidoti
PhD Student
The Ohio State University
Behavioral Ecology


______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to