Hi Xiangrui,

Thanks a lot for you answer.
So I fixed my Julia code, also calculated PCA using R as well.

R Code:
-------------
data <- read.csv('/home/upul/Desktop/iris.csv');
X <- data[,1:4]
pca <- prcomp(X, center = TRUE, scale=FALSE)
transformed <- predict(pca, newdata = X)

Julia Code (Fixed)
--------------
data = readcsv("/home/upul/temp/iris.csv");
X = data[:,1:end-1];
meanX = mean(X,1);
m,n = size(X);
X = X - repmat(x, m,1);
u,s,v = svd(X);
transformed =  X*v;

Now PCA calculated using Julia and R is identical, but still I can see a
small
difference between PCA  values given by Spark and other two.

Thanks,
Upul

On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng <men...@gmail.com> wrote:

> You need to subtract mean values to obtain the covariance matrix
> (http://en.wikipedia.org/wiki/Covariance_matrix).
>
> On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara <upulband...@gmail.com>
> wrote:
> > Hi Xiangrui,
> >
> > Thanks for the reply.
> >
> > Julia code is also using the covariance matrix:
> > (1/n)*X'*X ;
> >
> > Thanks,
> > Upul
> >
> > On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <men...@gmail.com> wrote:
> >>
> >> The Julia code is computing the SVD of the Gram matrix. PCA should be
> >> applied to the covariance matrix. -Xiangrui
> >>
> >> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulband...@gmail.com>
> >> wrote:
> >> > Hi All,
> >> >
> >> > I tried to do PCA for the Iris dataset
> >> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib
> >> >
> >> > [
> http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html].
> >> > Also, PCA  was calculated in Julia using following method:
> >> >
> >> > Sigma = (1/numRow(X))*X'*X ;
> >> > [U, S, V] = svd(Sigma);
> >> > Ureduced = U(:, 1:k);
> >> > Z = X*Ureduced;
> >> >
> >> > However, I'm seeing a little difference between values given by MLLib
> >> > and
> >> > the method shown above .
> >> >
> >> > Does anyone have any idea about this difference?
> >> >
> >> > Additionally, I have attached two visualizations, related to two
> >> > approaches.
> >> >
> >> > Thanks,
> >> > Upul
> >> >
> >> >
> >> >
> >> > ---------------------------------------------------------------------
> >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> >> > For additional commands, e-mail: user-h...@spark.apache.org
> >
> >
>

Reply via email to