On Tue, Jun 12, 2012 at 1:03 AM, Justin R <justinbr...@gmail.com> wrote:
> operating system Windows 7
> matplotlib version : 1.1.0
> obtained from sourceforge
>
> the class seems to generate the same Wt matrix for every input. The
> every element of the weight matrix is either +sqrt(1/2) or -sqrt(1/2).
>
> dat1 = 4*np.random.randn(200,1) + 2
> dat2 = dat1*.25 + 1*np.random.randn(200,1)
> pcaObj1 = PCA(np.hstack((dat1,dat2)))
> print pcaObj1.Wt
>
> dat3 = 2*np.random.randn(200,1) + 2
> dat4 = dat3*2 + 3*np.random.randn(200,1)
> pcaObj2 = PCA(np.hstack((dat1,dat2)))
> print pcaObj2.Wt
>
> The output Y seems to be correct, and the projection function works.
> only the Wt matrix seems to be messed up. Am I using this class
> incorrectly, or could this be a bug?

Hi,


I wouldn't call myself a PCA expert - so don't weight my answer too
heavily - but here is what I think is happening:

Looking at the code, the input data array is centered and scaled to
unit variance in each dimension. The attribute .a of the class is a
copy of the array that is actually sent to the SVD; note the
centering/scaling. I don't have a proof of this, but intuitively I
expect that the PCA axes associated with a 2-dimension centered/scaled
array will always be at 45" angles (e.g., [1,1], [-1,1], etc., which
are normalized to [sqrt(1/2), sqrt(1/2)], etc). I think one way to
describe this is that after centering/scaling there are no degrees of
freedom left if you only started with 2 dimensions. So I don't think
there is a bug, but it is maybe unclear what the PCA class is doing.
If you increase to > 2 dimensions, you can see there is random
fluctuation in Wt:

In [102]: pcaObj = PCA(np.random.randn(200,2))
In [103]: pcaObj.Wt
Out[103]:
array([[-0.70710678, -0.70710678],
       [-0.70710678,  0.70710678]])

In [104]: pcaObj = PCA(np.random.randn(200,3))
In [105]: pcaObj.Wt
Out[105]:
array([[ 0.65456366, -0.24141116, -0.7164266 ],
       [ 0.39843462,  0.91551401,  0.05553329],
       [ 0.64249223, -0.32179924,  0.69544877]])

In [106]: pcaObj = PCA(np.random.randn(200,3))
In [107]: pcaObj.Wt
Out[107]:
array([[-0.29885902, -0.67436982,  0.67521007],
       [-0.95428685,  0.21449891, -0.20815098],
       [-0.00446109, -0.70655189, -0.70764718]])


Hope that helps,
Aronne

------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and 
threat landscape has changed and how IT managers can respond. Discussions 
will include endpoint security, mobile security and the latest in malware 
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Matplotlib-users mailing list
Matplotlib-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/matplotlib-users

Reply via email to