Re: [R] problem with PCA

2017-03-13 Thread David L Carlson
The manual is talking about the angle between the variables in p dimensional 
space were p is the number of variables. The angle can appear differently in 
2-dimensions depending on your viewing angle (which dimensions you are 
ignoring, think of how the 2-dimensional shadow of a 3-dimensional object 
changes as the sun moves across the sky). Imagine two vectors originating at 
the origin on a plane, one pointing northeast and one pointing southeast. Now 
rotate those vectors in a 3rd dimension by bringing the northeast vector toward 
you. As you do that the southeast vector will move away and the angle between 
the 2 will appear to decrease. When you have rotated 90 degrees toward you, the 
vectors will be on top of one another, their angle appears to have changed from 
90 degrees to 0 degrees. 

As you indicated, in your first example, your variables are only moderately 
correlated. The first 2 components capture only 78 percent of the variation 
among the original 4 variables:

> summary(pca.mx_fus)
Importance of components:
  PC1  PC2  PC3 PC4
Standard deviation 1.5264   0.8950   0.7234 0.58793
Proportion of Variance 0.5825   0.2003   0.1308 0.08642
Cumulative Proportion  0.5825 <<0.7828>> 0.9136 1.0

In your second example, the variables are more highly correlated and the first 
2 components capture almost 98 percent of the variation among the original 5 
variables. As a result, plotting the first two variables gives you a better 
perspective on variation in the original 5 dimensions:

> summary(pca.tb)
Importance of components:
  PC1  PC2   PC3 PC4   PC5
Standard deviation 1.9694   1.0063   0.32647 0.04681 3.868e-17
Proportion of Variance 0.7757   0.2025   0.02132 0.00044 0.000e+00
Cumulative Proportion  0.7757 <<0.9782>> 0.99956 1.0 1.000e+00

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352



From: Denis Francisci [mailto:denis.franci...@gmail.com] 
Sent: Saturday, March 11, 2017 3:21 AM
To: David L Carlson <dcarl...@tamu.edu>
Cc: R-help Mailing List <r-help@r-project.org>
Subject: Re: [R] problem with PCA

Thank you David for your answer. 
If I understood the relative positions of variable arrows don't reflect the 
coefficient of correlation of the original variables. In fact these positions 
change if I use different PC axes.
But in some manual about PCA in R I read: "Pairs of variables that form acute 
angles at the origin, close to 0°, should be highly and positively correlated; 
variables close to right angles tend to have low correlation; variables at 
obtuse angles, close to 180°, tend to have high negative correlation".

And If I do a fictional test, it seems true:

tb<-data.frame(
  c(1,2,3,4,5,6,7,8,9), #orig data
  c(2,4,5,8,10,12,14,16,18),#strong positive correlation
  c(25,29,52,63,110,111,148,161,300),#weakly correlation
  c(-1,-2,-3,-4,-5,-6,-7,-8,-9),#strong negative correlation
  c(3,8,4,6,1,3,2,5,7)#not correlation
)
names(tb)<-c("orig","corr+","corr+2","corr-","random")

pca<-prcomp(as.matrix(tb),scale=T)
biplot(pca,choices = c(1,2))

On the first 2 PC the positions of arrows reflect perfectly the original 
correlations.

My data behaviour differently, maybe because my original variables are not 
strong correlated?

2017-03-10 15:49 GMT+01:00 David L Carlson <dcarl...@tamu.edu>:
This is more a question about principal components analysis than about R. You 
have 4 variables and they are moderately correlated with one another (weight 
and hole are only .2). When the data consist of measurements, this usually 
suggests that the overall size of the object is being partly measured by each 
variable. In your case object size is measured by the first principle component 
(PC1) with larger objects having more negative scores so larger objects are on 
the left and smaller ones are on the right of the biplot.

The biplot can only display 2 of the 4 dimensions of your data at one time. In 
the first 2 dimensions, diam and height are close together, but in the 3rd 
dimension (PC3), they are on opposite sides of the component. If you plot 
different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see below), the 
arrows will look different because you are looking from different directions.

> pca
Standard deviations:
[1] 1.5264292 0.8950379 0.7233671 0.5879295

Rotation:
              PC1         PC2         PC3        PC4
height -0.5210224 -0.06545193  0.80018012 -0.2897646
diam   -0.5473677  0.06309163 -0.57146893 -0.6081376
hole   -0.4598646 -0.70952862 -0.17476677  0.5045297
weight -0.4663141  0.69878797 -0.05090785  0.5400508

> biplot(pca, choices=c(1, 3))
> biplot(pca, choices=c(2, 3))

-
David L Carlson
Department of Anthropology
Texas A University
Col

Re: [R] problem with PCA

2017-03-11 Thread Denis Francisci
Thank you David for your answer.
If I understood the relative positions of variable arrows don't reflect the
coefficient of correlation of the original variables. In fact these
positions change if I use different PC axes.
But in some manual about PCA in R I read: "Pairs of variables that form
acute angles at the origin, close to 0°, should be highly and positively
correlated; variables close to right angles tend to have low correlation;
variables at obtuse angles, close to 180°, tend to have high negative
correlation".

And If I do a fictional test, it seems true:

tb<-data.frame(
  c(1,2,3,4,5,6,7,8,9), #orig data
  c(2,4,5,8,10,12,14,16,18),#strong positive correlation
  c(25,29,52,63,110,111,148,161,300),#weakly correlation
  c(-1,-2,-3,-4,-5,-6,-7,-8,-9),#strong negative correlation
  c(3,8,4,6,1,3,2,5,7)#not correlation
)
names(tb)<-c("orig","corr+","corr+2","corr-","random")

pca<-prcomp(as.matrix(tb),scale=T)
biplot(pca,choices = c(1,2))

On the first 2 PC the positions of arrows reflect perfectly the original
correlations.

My data behaviour differently, maybe because my original variables are not
strong correlated?

2017-03-10 15:49 GMT+01:00 David L Carlson :

> This is more a question about principal components analysis than about R.
> You have 4 variables and they are moderately correlated with one another
> (weight and hole are only .2). When the data consist of measurements, this
> usually suggests that the overall size of the object is being partly
> measured by each variable. In your case object size is measured by the
> first principle component (PC1) with larger objects having more negative
> scores so larger objects are on the left and smaller ones are on the right
> of the biplot.
>
> The biplot can only display 2 of the 4 dimensions of your data at one
> time. In the first 2 dimensions, diam and height are close together, but in
> the 3rd dimension (PC3), they are on opposite sides of the component. If
> you plot different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see
> below), the arrows will look different because you are looking from
> different directions.
>
> > pca
> Standard deviations:
> [1] 1.5264292 0.8950379 0.7233671 0.5879295
>
> Rotation:
>   PC1 PC2 PC3PC4
> height -0.5210224 -0.06545193  0.80018012 -0.2897646
> diam   -0.5473677  0.06309163 -0.57146893 -0.6081376
> hole   -0.4598646 -0.70952862 -0.17476677  0.5045297
> weight -0.4663141  0.69878797 -0.05090785  0.5400508
>
> > biplot(pca, choices=c(1, 3))
> > biplot(pca, choices=c(2, 3))
>
> -
> David L Carlson
> Department of Anthropology
> Texas A University
> College Station, TX 77840-4352
>
>
> -Original Message-
> From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Denis
> Francisci
> Sent: Friday, March 10, 2017 4:45 AM
> To: R-help Mailing List 
> Subject: [R] problem with PCA
>
> Hi all.
> I'm newbie in PCA by I don't understand a behaviour of R.
> I have this data matrix:
>
> >mx_fus
>   height diam  hole  weight
> 12.3  3.5  1.1   18
> 22.0  3.5  0.9   17
> 33.8  4.3  0.7   34
> 42.1  3.4  0.9   15
> 52.3  3.8  1.0   19
> 62.2  3.8  1.0   19
> 73.2  4.4  0.9   34
> 83.0  4.3  1.0   30
> 92.8  3.9  0.9   21
> 10   3.3  4.2  1.1   33
> 11   2.3  3.9  0.9   25
> 12   2.3  3.3  0.5   17
> 13   0.9  2.4  0.4   10
> 14   1.4  2.4  0.5   10
> 15   2.2  3.6  0.7   22
> 16   2.9  3.8  0.8   30
> 17   2.9  3.5  0.6   27
> 18   2.3  3.5  0.5   24
> 19   1.8  2.3  0.5   29
> 20   1.4  2.5  0.6   34
> 21   0.8  2.3  0.6   21
> 22   1.8  2.4  0.6   23
> 23   1.5  2.2  0.67
> 24   0.9  1.7  0.4   14
> 25   2.1  2.2  0.5   25
> 26   1.3  2.4  0.6   33
> 27   1.3  2.7  0.4   39
> 28   0.5  2.2  0.5   13
> 29   1.4  4.2  0.8   23
> 30   1.6  2.0  0.4   30
> 31   1.4  2.2  0.6   25
> 32   1.8  2.5  0.6   28
> 33   1.4  2.6  0.6   41
> 34   1.6  2.3  0.3   32
> 35   1.6  2.5  0.5   41
> 36   2.8  2.9  0.8   47
> 37   0.6  2.5  0.8   21
> 38   1.6  2.8  0.7   13
> 39   1.7  3.3  0.8   17
> 40   1.6  3.9  1.9   20
> 41   1.4  4.7  0.9   26
> 42   1.2  4.2  0.7   21
> 43   3.5  4.2  0.9   47
> 44   2.3  3.6  0.7   24
> 45   2.3  3.4  0.4   21
> 46   1.9  2.6  0.7   14
> 47   1.9  3.0  0.7   15
> 48   2.7  3.7  0.9   26
> 49   3.0  3.8  0.7   35
> 50   1.2  2.0  0.75
> 51   1.6  2.5  0.5   15
> 52   1.3  2.6  0.5   16
> 53   2.5  3.9  0.9   32
> 54   0.9  3.3  0.69
> 55   1.8  2.4  0.5   17
> 56   2.4  3.7  1.1   30
> 57   2.1  3.5  1.1   22
> 58   2.6  3.9  1.0   38
> 59   2.6  3.6  1.0   27
> 60   2.6  4.1  1.0   34
> 61   2.9  3.6  0.8   32
> 62   2.6  3.3  0.7   22
> 63   1.8  2.5  0.7   26
> 64   3.0  2.8  1.32
> 65   0.5  2.2  0.43
> 66   1.9  3.4  0.7   14
> 67   1.4  3.8  0.9   18
> 68   2.0  4.0  1.0   30
> 69   3.1  4.0  1.3   21
> 70   2.5  4.0  0.8   19
> 71   2.5  4.5  1.0   20
> 72   1.8  3.5  1.4   18
> 73   2.1  3.5  1.4   25

Re: [R] problem with PCA

2017-03-10 Thread David L Carlson
This is more a question about principal components analysis than about R. You 
have 4 variables and they are moderately correlated with one another (weight 
and hole are only .2). When the data consist of measurements, this usually 
suggests that the overall size of the object is being partly measured by each 
variable. In your case object size is measured by the first principle component 
(PC1) with larger objects having more negative scores so larger objects are on 
the left and smaller ones are on the right of the biplot. 

The biplot can only display 2 of the 4 dimensions of your data at one time. In 
the first 2 dimensions, diam and height are close together, but in the 3rd 
dimension (PC3), they are on opposite sides of the component. If you plot 
different pairs of dimensions (e.g. 1 with 3 or 2 with 3, see below), the 
arrows will look different because you are looking from different directions.

> pca
Standard deviations:
[1] 1.5264292 0.8950379 0.7233671 0.5879295

Rotation:
  PC1 PC2 PC3PC4
height -0.5210224 -0.06545193  0.80018012 -0.2897646
diam   -0.5473677  0.06309163 -0.57146893 -0.6081376
hole   -0.4598646 -0.70952862 -0.17476677  0.5045297
weight -0.4663141  0.69878797 -0.05090785  0.5400508

> biplot(pca, choices=c(1, 3))
> biplot(pca, choices=c(2, 3))

-
David L Carlson
Department of Anthropology
Texas A University
College Station, TX 77840-4352


-Original Message-
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Denis Francisci
Sent: Friday, March 10, 2017 4:45 AM
To: R-help Mailing List 
Subject: [R] problem with PCA

Hi all.
I'm newbie in PCA by I don't understand a behaviour of R.
I have this data matrix:

>mx_fus
  height diam  hole  weight
12.3  3.5  1.1   18
22.0  3.5  0.9   17
33.8  4.3  0.7   34
42.1  3.4  0.9   15
52.3  3.8  1.0   19
62.2  3.8  1.0   19
73.2  4.4  0.9   34
83.0  4.3  1.0   30
92.8  3.9  0.9   21
10   3.3  4.2  1.1   33
11   2.3  3.9  0.9   25
12   2.3  3.3  0.5   17
13   0.9  2.4  0.4   10
14   1.4  2.4  0.5   10
15   2.2  3.6  0.7   22
16   2.9  3.8  0.8   30
17   2.9  3.5  0.6   27
18   2.3  3.5  0.5   24
19   1.8  2.3  0.5   29
20   1.4  2.5  0.6   34
21   0.8  2.3  0.6   21
22   1.8  2.4  0.6   23
23   1.5  2.2  0.67
24   0.9  1.7  0.4   14
25   2.1  2.2  0.5   25
26   1.3  2.4  0.6   33
27   1.3  2.7  0.4   39
28   0.5  2.2  0.5   13
29   1.4  4.2  0.8   23
30   1.6  2.0  0.4   30
31   1.4  2.2  0.6   25
32   1.8  2.5  0.6   28
33   1.4  2.6  0.6   41
34   1.6  2.3  0.3   32
35   1.6  2.5  0.5   41
36   2.8  2.9  0.8   47
37   0.6  2.5  0.8   21
38   1.6  2.8  0.7   13
39   1.7  3.3  0.8   17
40   1.6  3.9  1.9   20
41   1.4  4.7  0.9   26
42   1.2  4.2  0.7   21
43   3.5  4.2  0.9   47
44   2.3  3.6  0.7   24
45   2.3  3.4  0.4   21
46   1.9  2.6  0.7   14
47   1.9  3.0  0.7   15
48   2.7  3.7  0.9   26
49   3.0  3.8  0.7   35
50   1.2  2.0  0.75
51   1.6  2.5  0.5   15
52   1.3  2.6  0.5   16
53   2.5  3.9  0.9   32
54   0.9  3.3  0.69
55   1.8  2.4  0.5   17
56   2.4  3.7  1.1   30
57   2.1  3.5  1.1   22
58   2.6  3.9  1.0   38
59   2.6  3.6  1.0   27
60   2.6  4.1  1.0   34
61   2.9  3.6  0.8   32
62   2.6  3.3  0.7   22
63   1.8  2.5  0.7   26
64   3.0  2.8  1.32
65   0.5  2.2  0.43
66   1.9  3.4  0.7   14
67   1.4  3.8  0.9   18
68   2.0  4.0  1.0   30
69   3.1  4.0  1.3   21
70   2.5  4.0  0.8   19
71   2.5  4.5  1.0   20
72   1.8  3.5  1.4   18
73   2.1  3.5  1.4   25
74   1.5  2.6  0.59
75   2.8  3.2  1.2   16
76   1.0  5.0  0.3   32
77   0.3  5.8  0.5   56
78   0.5  1.5  0.21
79   0.7  1.4  0.21
80   0.5  1.3  0.21
81   0.7  3.3  0.47
82   1.9  4.7  1.0   24
83   3.1  4.2  0.9   49
84   2.8  3.6  0.7   28
85   2.7  3.2  0.7   29
86   3.0  4.0  0.9   36
87   1.7  2.7  0.7   14
88   1.5  2.9  0.7   18
89   2.9  3.5  0.7   30
90   3.0  3.4  0.8   30
91   2.0  2.8  0.5   14
92   2.4  3.5  0.7   24
93   0.8  4.1  0.6   12
94   1.7  2.5  0.5   23
95   1.4  2.4  0.8   31
96   1.5  2.7  0.4   20
97   2.6  3.7  0.6   31
98   2.6  3.0  0.6   18
99   2.5  5.0  0.7   40
100  2.5  3.7  0.5   30
101  2.4  2.9  0.7   17
102  2.3  3.0  0.5   15
103  2.2  3.3  0.6   19
104  1.5  2.1  0.55
105  2.0  2.2  0.5   10
106  2.6  3.5  0.6   26
107  2.3  3.0  0.6   15
108  2.5  4.5  0.7   40
109  2.1  3.1  0.5   15
110  1.3  2.1  0.8   14
111  0.8  2.5  0.25
112  0.6  3.1  0.78

I perform a PCA in R

>pca<-prcomp(mx_fus,scale=TRUE)
>biplot(pca, choices = c(1,2), cex=0.7)

The biplot put the arrows of diam and height very near on the first
component axis.
So I understand that these 2 variables are well represented in the PC1 and
they are correlated each other.
But if I test the correlation, the value o correlation coefficient is low

>cor(mx_fus[,1],mx_fus[,2])
0.4828185

Why the plot says a thing and correlation function says the opposite?
Two near arrows don't 

Re: [R] problem with PCA loading plot

2009-06-10 Thread David Winsemius


On Jun 10, 2009, at 1:26 PM, Fireblast wrote:



Hi,

I am a beginner with R. I would like to get a loading plot of PC 3  
vs PC 1.


For PC 1 vs PC 2 I use

library(pls)
loadingplot(pca.result, comps = 1:2, scatter = TRUE, labels=names)

if I try

loadingplot(pca.result, comps = 1:3, scatter = TRUE, labels=names)

I get the loading plots of PC 1 vs PC 2, PC 1 vs PC 3 and PC 2 vs PC  
3. What

do I have to do to get just a single loading plot of PC 3 vs PC 1.


It seems blindingly obvious. Manual says comp is a vector. Make one:

 , comps=c(3,1), ...

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with PCA

2008-03-06 Thread Gavin Simpson

On Wed, 2008-03-05 at 19:43 -0800, phuong thao wrote:
 Dear Ms/Mr,
 In a package, I want to use PCA function. The structure I used follow this
 page: http://www.statmethods.net/advstats/factor.html.
fit-principle(mydata, nfactors=9, rotation=TRUE)
or:
result-PCA(mydata)
 
 But I don't known why R language in my computer noticed: not found
 principle, not found PCA.

You have several problems:

It is 'principal' not 'principle' components analysis; the instructions
on that page clearly say:

fit - principal(mydata, nfactors=5, rotation=TRUE)
   ^

And finally, you are not following all the steps on that page. You
*have* to load a package to use functions contained within in it.

principal() is in the psych package, so load it by executing:

library(psych)

PCA() is in the FactoMineR package, so load it by executing:

library(FactoMineR)

On the site you mention, all the R code chunks are higlighted by being
in monospace font with a thick green border to the left of the block.
Make sure you enter all the lines in those code chunks exactly as types
otherwise you'll get errors like the ones reported here.

HTH

G

 I download and installed R-2.6.2-win32.exe.
 Thanks alot for answering me.
 Hue University, VietNam.
 
   [[alternative HTML version deleted]]
 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.
-- 
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%
 Dr. Gavin Simpson [t] +44 (0)20 7679 0522
 ECRC, UCL Geography,  [f] +44 (0)20 7679 0565
 Pearson Building, [e] gavin.simpsonATNOSPAMucl.ac.uk
 Gower Street, London  [w] http://www.ucl.ac.uk/~ucfagls/
 UK. WC1E 6BT. [w] http://www.freshwaters.org.uk
%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%~%

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with PCA

2008-03-03 Thread Liviu Andronic
On 3/3/08, Richard Rowe [EMAIL PROTECTED] wrote:
 This is a homework problem.  I know how to do a PCA, you need to learn.
  I suggest you visit your textbook, then check the documentation for R's
  various PCA implementations to work out how to effect the analysis.

Check Rcmdr. There you can perform it graphically. This would be a
starting point. This page [1] should also be of interest.

[1] http://www.statmethods.net/advstats/factor.html

Liviu

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] Problem with PCA

2008-03-02 Thread Richard Rowe
This is a homework problem.  I know how to do a PCA, you need to learn.
I suggest you visit your textbook, then check the documentation for R's 
various PCA implementations to work out how to effect the analysis.



phthao05 wrote:
 I have an exercise. With 3 kinds of yohourt a,b,c. There are 25
 participatients estimate 3 norms: taste (va,vb,vc), structure (ca,cb,cc) and
 price (ga,gb,gc) and give the mark from 1 to 5. I don't know how to PCA this
 data. Please help me!
 I attached the data file follow:
 VaVb  Vc  Ca  Cb  Cc  Ga  Gb  Gc
 4 2   4   5   5   5   4   4   2
 2 2   4   3   2   5   4   5   1
 2 2   1   2   3   3   3   1   4
 1 1   2   2   3   3   4   3   2
 3 4   4   4   3   1   2   1   2
 1 1   1   1   2   4   3   2   4
 4 4   2   2   2   1   2   1   3
 2 3   3   3   4   3   1   1   1
 4 5   1   3   3   2   4   2   4
 2 2   5   1   1   3   2   3   3
 4 2   4   5   3   3   4   4   4
 3 4   2   1   2   2   1   2   4
 1 2   1   2   3   3   3   1   4
 3 4   1   1   2   1   2   3   3
 5 4   3   4   3   1   1   1   2
 4 4   2   2   2   1   4   2   4
 2 2   1   1   2   4   3   2   4
 3 1   3   4   2   5   3   4   1
 1 1   2   2   3   3   4   3   2
 2 3   4   3   4   4   4   3   1
 2 3   3   3   4   3   1   1   1
 4 3   1   1   1   2   2   3   3
 1 1   1   1   2   4   3   2   4
 3 4   2   1   2   2   3   1   3
 3 4   2   1   2   2   1   2   4
  
   


-- 
Dr Richard Rowe
Zoology  Tropical Ecology
School of Marine  Tropical Biology
James Cook University
Townsville 4811
AUSTRALIA

ph +61 7 47 81 4851
fax +61 7 47 25 1570
JCU has CRICOS Provider Code 00117J

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.