Hi Xiangrui, Thanks a lot for you answer. So I fixed my Julia code, also calculated PCA using R as well.
R Code: ------------- data <- read.csv('/home/upul/Desktop/iris.csv'); X <- data[,1:4] pca <- prcomp(X, center = TRUE, scale=FALSE) transformed <- predict(pca, newdata = X) Julia Code (Fixed) -------------- data = readcsv("/home/upul/temp/iris.csv"); X = data[:,1:end-1]; meanX = mean(X,1); m,n = size(X); X = X - repmat(x, m,1); u,s,v = svd(X); transformed = X*v; Now PCA calculated using Julia and R is identical, but still I can see a small difference between PCA values given by Spark and other two. Thanks, Upul On Sat, Jan 10, 2015 at 11:17 AM, Xiangrui Meng <men...@gmail.com> wrote: > You need to subtract mean values to obtain the covariance matrix > (http://en.wikipedia.org/wiki/Covariance_matrix). > > On Fri, Jan 9, 2015 at 6:41 PM, Upul Bandara <upulband...@gmail.com> > wrote: > > Hi Xiangrui, > > > > Thanks for the reply. > > > > Julia code is also using the covariance matrix: > > (1/n)*X'*X ; > > > > Thanks, > > Upul > > > > On Fri, Jan 9, 2015 at 2:11 AM, Xiangrui Meng <men...@gmail.com> wrote: > >> > >> The Julia code is computing the SVD of the Gram matrix. PCA should be > >> applied to the covariance matrix. -Xiangrui > >> > >> On Thu, Jan 8, 2015 at 8:27 AM, Upul Bandara <upulband...@gmail.com> > >> wrote: > >> > Hi All, > >> > > >> > I tried to do PCA for the Iris dataset > >> > [https://archive.ics.uci.edu/ml/datasets/Iris] using MLLib > >> > > >> > [ > http://spark.apache.org/docs/1.1.1/mllib-dimensionality-reduction.html]. > >> > Also, PCA was calculated in Julia using following method: > >> > > >> > Sigma = (1/numRow(X))*X'*X ; > >> > [U, S, V] = svd(Sigma); > >> > Ureduced = U(:, 1:k); > >> > Z = X*Ureduced; > >> > > >> > However, I'm seeing a little difference between values given by MLLib > >> > and > >> > the method shown above . > >> > > >> > Does anyone have any idea about this difference? > >> > > >> > Additionally, I have attached two visualizations, related to two > >> > approaches. > >> > > >> > Thanks, > >> > Upul > >> > > >> > > >> > > >> > --------------------------------------------------------------------- > >> > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > >> > For additional commands, e-mail: user-h...@spark.apache.org > > > > >