Hi,

I am interfacing the PCA class from scikits-learn into a package I'm
writing that uses a numpy datastructure and have a few questions about its
implementation.

I noticed in the PCA class methods fit() and fit_transform(), there is a
keyword option "y=none" that is never actually used.  I was curious why
this is, but it's not all the important.

My second question is in regard to the choice to use singular value
decomposition.  I will be performing PCA on spectral data, which generally
has a much higher feature dimension than sample dimension.  For example, I
may have 2000 features (wavelengths) but only 10 time points (columns).
 The data is not sparse, however.  My question basically is will the SVD
still predict the same values that the brute force computation of the
eigenvectors of the covariance matrix would give?   Are there caveats, or
do you think it's safe to use with confidence?

Additionally, I noticed the SVD option "full_matricies" is set to false.  I
realize this is an approximation to speed up the computation of the SVD,
and in the numpy example, they used np.allclose() to verify that the
correction is insignificant.  Can I be confident that setting
full_matricies to false is always a good idea, or are there cases that it
may introduce error?

Thanks.
------------------------------------------------------------------------------
AlienVault Unified Security Management (USM) platform delivers complete
security visibility with the essential security capabilities. Easily and
efficiently configure, manage, and operate all of your security controls
from a single console and one unified framework. Download a free trial.
http://p.sf.net/sfu/alienvault_d2d
_______________________________________________
Scikit-learn-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Reply via email to