[R] Issue:CCC Doesn't Match in R and SAS
Hi, I was trying to calculate the CCC metric in R with the help of "NbClust" source code and the available SAS manual for CCC. All the Pseudo-F and R-square are matching exactly with the SAS output except for the E_R2 and hence CCC. I have tried and tested in multiple ways but couldn’t get any explanation for this. I have attached sample data, initial seed and also the SAS cluster output in Rdata format which I used for E_R2 and CCC calculation. FYI, following are the values of E_R2 in SAS and R respectively: E_R2=0.4630339 (R); but ERSQ=0.3732597284 (SAS) Could you please help me out with finding what's going wrong in the background? - Kindly find below the codes I have used for this: -- -- proc fastclus data=sample_data maxiter=100 seed=initial_seed maxc=5 outstat=metrics out=output; Var v1 v2 v3 v4 v5 v6; run; -- -- load("C:\\Users\\sagnik\\Desktop\\SAS_cluster.Rdata") clust.perf.metrics <- function(data, cl) { data1 <- as.matrix(data) numberObsBefore <- dim(data1)[1] data <- na.omit(data1) nn <- numberObsAfter <- dim(data)[1] pp <- dim(data)[2] qq <- max(cl) TT <- t(data) %*% data sizeEigenTT <- length(eigen(TT)$value) eigenValues <- eigen(TT/(nn - 1))$value for (i in 1:sizeEigenTT) { if (eigenValues[i] < 0) { cat(paste("There are only", numberObsAfter, "non-missing observations out of a possible", numberObsBefore, "observations.")) stop("The TSS matrix is indefinite. There must be too many missing values. The index cannot be calculated.") } } s1 <- sqrt(eigenValues) ss <- rep(1, sizeEigenTT) for (i in 1:sizeEigenTT) { if (s1[i] != 0) ss[i] = s1[i] } vv <- prod(ss) z <- matrix(0, ncol = qq, nrow = nn) clX <- as.matrix(cl) for (i in 1:nn) for (j in 1:qq) { z[i, j] == 0 if (clX[i, 1] == j) z[i, j] = 1 } xbar <- solve(t(z) %*% z) %*% t(z) %*% data B <- t(xbar) %*% t(z) %*% z %*% xbar W <- TT - B R2 <- 1 - (sum(diag(W))/sum(diag(TT))) PseudoF <- (sum(diag(B))/(qq-1))/(sum(diag(W))/(nn-qq)) v1 <- 1 u1 <- rep(0, pp) c1 <- (vv/qq)^(1/pp) u1 <- ss/c1 k1 <- sum((u1 >= 1) == TRUE) p1 <- min(k1, qq - 1) if (all(p1 > 0, p1 < pp)) { for (i in 1:p1) { v1 <- v1 * ss[i]} c <- (v1/qq)^(1/p1) u <- ss/c b1 <- sum(1/(nn + u[1:p1])) b2 <- sum(u[(p1 + 1):pp]^2/(nn + u[(p1 + 1):pp]), na.rm = TRUE) E_R2 <- 1 - ((b1 + b2)/sum(u^2)) * ((nn - qq)^2/nn) * (1 + (4/nn)) ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * p1/2)/((0.001 + E_R2)^1.2)) } else { b1 <- sum(1/(nn + u)) E_R2 <- 1 - (b1/sum(u^2)) * ((nn - qq)^2/nn) * (1 + 4/nn) ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * pp/2)/((0.001 + E_R2)^1.2)) } results <- list(R_2=R2, PseudoF=PseudoF, CCC = ccc, E_R2=E_R2); return(results) } clust.perf.metrics(output[,1:6],output[,7]) #--- THANKS IN ADVANCE, REGARDS, SAGNIK __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Issues with fa() function in "psych"
Hi William, Recently I noticed that if the requested rotation is not available, "principal" function also defaults to rotate=“none” without any WARNING. You had earlier fixed the same issue with "fa" in version 1.4.4. Kindly include the same for "principal" also. Also, as I had pointed out earlier in my trailing mails, is there any update on the following suggestion: "The fa() function doesn't account for 'Heywood cases' (communality greater than 1) and never ever throws out any error related to that which other softwares do. This is a serious and common issue in iterative factor analysis and hence should have been accounted for." Awaiting your revert, Thanks and Regards, Sagnik On Fri, May 16, 2014 at 10:54 AM, sagnik chakravarty wrote: > Hi William, > > Thanks for the update. I see this package has so many capabilities ! I > will suggest further for its development if anything else comes to my mind. > > Regards, > Sagnik > > > On Thu, May 15, 2014 at 6:34 AM, William Revelle > wrote: > >> Sagnik, >> >> I did some more checking and in fact you can do equamax through GPA >> rotation. (Gunter Nickel pointed this out in a post to R-help). I will >> implement this in version 1.4.6 (1.4.5 is now working its way through the >> various CRAN mirrors). >> >> You might like 1.4.5 in that I have added various ways of displaying >> confidence intervals (cats eye plots) as well as upper and lower confidence >> limits for correlations (cor.plot.upperLowerCi) >> >> Bill >> >> On Apr 10, 2014, at 1:22 AM, sagnik chakravarty >> wrote: >> >> > Thanks a lot Bill and Revelle for your helpful response. >> > It would have been great if I could know when we can expect the release >> of the edited version 1.4.4. >> > >> > Sagnik >> > >> > >> > On Wed, Apr 9, 2014 at 8:05 PM, William Revelle >> wrote: >> > Sagnik raises the question as to why the psych package does not offer >> the ‘equamax’ rotation. >> > It is because all rotations are handled through the GPArotation package >> which does not offer equamax. >> > >> > Sagnik also points out that if the requested rotation is not available, >> fa defaults to rotate=“none” without any warning. I have fixed that for >> the next release (1.4.4). >> > (1.4.4 also will fix a bug in corr.test introduced into 1.4.3). >> > >> > >> > The question about why printing just the loadings matrix leaves blank >> cells? That is because the loadings matrix of class “loadings” which the >> default print function prints with a cut = .3. >> > Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match >> the output of efa_pa. >> > >> > The fm=“pa” option runs conventional principal axis factor analysis >> (ala SPSS). As documented, this iterates max.iter times >> > >> > "Not all factor programs that do principal axes do iterative solutions. >> The example from the SAS manual (Chapter 26) is such a case. To achieve >> that solution, it is necessary to specify that the max.iterations = 1. >> Comparing that solution to an iterated one (the default) shows that >> iterations improve the solution. In addition, fm="minres" or fm="mle" >> produces even better solutions for this example.” >> > >> > The com column is factor complexity using the index developed by >> Hofmann (1978). It is a row wise measure of item complexity. >> > I have added more documentation to this in 1.4.4 >> > >> > Bill >> > >> > >> > On Apr 8, 2014, at 2:28 AM, Pascal Oettli wrote: >> > >> > > Hello, >> > > >> > > And what about submitting your suggestions directly to the package >> > > author/maintainer? >> > > >> > > And please don't post in HTML. >> > > >> > > Regards, >> > > Pascal >> > > >> > > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty >> > > wrote: >> > >> Hi Team, >> > >> >> > >> I was using your "psych" package for factor analysis and was also >> comparing >> > >> the results with SAS results. I have some suggestions and/or >> confusions >> > >> regarding the fa() function in the package: >> > >> >> > >> - The fa() function *doesn't account for Heywood cases* >> (communality >> > >> greater than 1) and never ever throws
Re: [R] Issues with fa() function in "psych"
Hi William, Thanks for the update. I see this package has so many capabilities ! I will suggest further for its development if anything else comes to my mind. Regards, Sagnik On Thu, May 15, 2014 at 6:34 AM, William Revelle wrote: > Sagnik, > > I did some more checking and in fact you can do equamax through GPA > rotation. (Gunter Nickel pointed this out in a post to R-help). I will > implement this in version 1.4.6 (1.4.5 is now working its way through the > various CRAN mirrors). > > You might like 1.4.5 in that I have added various ways of displaying > confidence intervals (cats eye plots) as well as upper and lower confidence > limits for correlations (cor.plot.upperLowerCi) > > Bill > > On Apr 10, 2014, at 1:22 AM, sagnik chakravarty > wrote: > > > Thanks a lot Bill and Revelle for your helpful response. > > It would have been great if I could know when we can expect the release > of the edited version 1.4.4. > > > > Sagnik > > > > > > On Wed, Apr 9, 2014 at 8:05 PM, William Revelle > wrote: > > Sagnik raises the question as to why the psych package does not offer > the âequamaxâ rotation. > > It is because all rotations are handled through the GPArotation package > which does not offer equamax. > > > > Sagnik also points out that if the requested rotation is not available, > fa defaults to rotate=ânoneâ without any warning. I have fixed that for > the next release (1.4.4). > > (1.4.4 also will fix a bug in corr.test introduced into 1.4.3). > > > > > > The question about why printing just the loadings matrix leaves blank > cells? That is because the loadings matrix of class âloadingsâ which the > default print function prints with a cut = .3. > > Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match > the output of efa_pa. > > > > The fm=âpaâ option runs conventional principal axis factor analysis (ala > SPSS). As documented, this iterates max.iter times > > > > "Not all factor programs that do principal axes do iterative solutions. > The example from the SAS manual (Chapter 26) is such a case. To achieve > that solution, it is necessary to specify that the max.iterations = 1. > Comparing that solution to an iterated one (the default) shows that > iterations improve the solution. In addition, fm="minres" or fm="mle" > produces even better solutions for this example.â > > > > The com column is factor complexity using the index developed by Hofmann > (1978). It is a row wise measure of item complexity. > > I have added more documentation to this in 1.4.4 > > > > Bill > > > > > > On Apr 8, 2014, at 2:28 AM, Pascal Oettli wrote: > > > > > Hello, > > > > > > And what about submitting your suggestions directly to the package > > > author/maintainer? > > > > > > And please don't post in HTML. > > > > > > Regards, > > > Pascal > > > > > > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty > > > wrote: > > >> Hi Team, > > >> > > >> I was using your "psych" package for factor analysis and was also > comparing > > >> the results with SAS results. I have some suggestions and/or > confusions > > >> regarding the fa() function in the package: > > >> > > >> - The fa() function *doesn't account for Heywood cases* (communality > > >> greater than 1) and never ever throws out any error related to that > which > > >> other softwares do. This is a serious and common issue in iterative > factor > > >> analysis and hence should have been accounted for. > > >> > > >> > > >> - The fa() function doesn't provide "equamax" rotation in its > rotation > > >> list and still if you specify "*rotation=equamax*", it will run > without > > >> throwing out any error and even mentioning in the result that > "equamax" has > > >> been applied. But I have thoroughly compared results from " > > >> *rotation=none*" and "*rotation=equamax*" options and they are > exactly > > >> same. *That means fa() is not doing the rotation at all and yet > telling > > >> that it is doing that!!* I have even mentioned "*rotation=crap*" > option > > >> just to check and surprisingly it ran(without any error) with the > result > > >> showing: > > >> > > >> *Factor Analysis using method
Re: [R] Issues with fa() function in "psych"
Thanks a lot Bill and Revelle for your helpful response. It would have been great if I could know when we can expect the release of the edited version 1.4.4. Sagnik On Wed, Apr 9, 2014 at 8:05 PM, William Revelle wrote: > Sagnik raises the question as to why the psych package does not offer the > 'equamax' rotation. > It is because all rotations are handled through the GPArotation package > which does not offer equamax. > > Sagnik also points out that if the requested rotation is not available, fa > defaults to rotate="none" without any warning. I have fixed that for the > next release (1.4.4). > (1.4.4 also will fix a bug in corr.test introduced into 1.4.3). > > > The question about why printing just the loadings matrix leaves blank > cells? That is because the loadings matrix of class "loadings" which the > default print function prints with a cut = .3. > Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match the > output of efa_pa. > > The fm="pa" option runs conventional principal axis factor analysis (ala > SPSS). As documented, this iterates max.iter times > > "Not all factor programs that do principal axes do iterative solutions. > The example from the SAS manual (Chapter 26) is such a case. To achieve > that solution, it is necessary to specify that the max.iterations = 1. > Comparing that solution to an iterated one (the default) shows that > iterations improve the solution. In addition, fm="minres" or fm="mle" > produces even better solutions for this example." > > The com column is factor complexity using the index developed by Hofmann > (1978). It is a row wise measure of item complexity. > I have added more documentation to this in 1.4.4 > > Bill > > > On Apr 8, 2014, at 2:28 AM, Pascal Oettli wrote: > > > Hello, > > > > And what about submitting your suggestions directly to the package > > author/maintainer? > > > > And please don't post in HTML. > > > > Regards, > > Pascal > > > > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty > > wrote: > >> Hi Team, > >> > >> I was using your "psych" package for factor analysis and was also > comparing > >> the results with SAS results. I have some suggestions and/or confusions > >> regarding the fa() function in the package: > >> > >> - The fa() function *doesn't account for Heywood cases* (communality > >> greater than 1) and never ever throws out any error related to that > which > >> other softwares do. This is a serious and common issue in iterative > factor > >> analysis and hence should have been accounted for. > >> > >> > >> - The fa() function doesn't provide "equamax" rotation in its rotation > >> list and still if you specify "*rotation=equamax*", it will run > without > >> throwing out any error and even mentioning in the result that > "equamax" has > >> been applied. But I have thoroughly compared results from " > >> *rotation=none*" and "*rotation=equamax*" options and they are exactly > >> same. *That means fa() is not doing the rotation at all and yet > telling > >> that it is doing that!!* I have even mentioned "*rotation=crap*" > option > >> just to check and surprisingly it ran(without any error) with the > result > >> showing: > >> > >> *Factor Analysis using method = gls* > >> * Call: fa(r = cor_mat, nfactors = 4, n.obs = 69576, rotate = > >> "crap", fm = "gls")* > >> > >>I hope you understand the severity of this bug and hence > >> request you to correct this. > >> > >> - To my sense, there might be some problem with "fm=ml" and "fm=pa" > >> options since the convergence issue should be with MLE method and not > PA > >> method but while running factor analysis with PA, I am getting the > >> following warning: > >> > >>*maximum iteration exceeded* > >> *The estimated weights for the factor scores are probably > >> incorrect. Try a different factor extraction method.* > >> > >> If I compare the results of R and SAS,* I am getting > >> convergence error for MLE in SAS whereas I am getting the same error > for PA > >> in R *!! I am not being able to understand this mismatch. > >> > >> - If I call the *loading matrix like ef
[R] Issues with fa() function in "psych"
Hi Team, I was using your "psych" package for factor analysis and was also comparing the results with SAS results. I have some suggestions and/or confusions regarding the fa() function in the package: - The fa() function *doesn't account for Heywood cases* (communality greater than 1) and never ever throws out any error related to that which other softwares do. This is a serious and common issue in iterative factor analysis and hence should have been accounted for. - The fa() function doesn't provide "equamax" rotation in its rotation list and still if you specify "*rotation=equamax*", it will run without throwing out any error and even mentioning in the result that "equamax" has been applied. But I have thoroughly compared results from " *rotation=none*" and "*rotation=equamax*" options and they are exactly same. *That means fa() is not doing the rotation at all and yet telling that it is doing that!!* I have even mentioned "*rotation=crap*" option just to check and surprisingly it ran(without any error) with the result showing: *Factor Analysis using method = gls* * Call: fa(r = cor_mat, nfactors = 4, n.obs = 69576, rotate = "crap", fm = "gls")* I hope you understand the severity of this bug and hence request you to correct this. - To my sense, there might be some problem with "fm=ml" and "fm=pa" options since the convergence issue should be with MLE method and not PA method but while running factor analysis with PA, I am getting the following warning: *maximum iteration exceeded* *The estimated weights for the factor scores are probably incorrect. Try a different factor extraction method.* If I compare the results of R and SAS,* I am getting convergence error for MLE in SAS whereas I am getting the same error for PA in R *!! I am not being able to understand this mismatch. - If I call the *loading matrix like efa_pa$loadings, the matrix shown has many blank cells whereas the final result showing the loadings doesn't have so* !! *Loadings:* * PA1PA2PA3PA4 * *Var10.401 -0.243* *Var20.336 -0.1040.710* *Var30.624 0.123 0.170 * - Could you please explain* what the "com" column means* in the output:? * PA1 PA3 PA2 PA4 h2 u2 com* *Var1 0.44 0.14 -0.03 -0.10 0.22665 0.773 1.3* *Var2 0.08 0.11 0.02 0.78 0.62951 0.370 1.1* *Var3 0.62 0.12 0.15 0.14 0.43578 0.564 1.3* - Request you to add option for *"equamax" rotation* also if possible. I have come across the above issues until now. Please do correct me if I am wrong. Awaiting your revert which would clear out my confusions, Thanks for your valuable time, Sagnik -- Regards, *SAGNIK CHAKRAVARTY* *Mob:* +919972865435 *Email:* sagnik.st...@gmail.com sagnik@gmail.com [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.