[R] Issue:CCC Doesn't Match in R and SAS

2015-02-20 Thread sagnik chakravarty
Hi,

I was trying to calculate the CCC metric in R with the help of "NbClust"
source code and the available SAS manual for CCC. All the Pseudo-F and
R-square are matching exactly with the SAS output except for the E_R2 and
hence CCC. I have tried and tested in multiple ways but couldn’t get any
explanation for this.

I have attached sample data, initial seed and also the SAS cluster output
in Rdata format which I used for E_R2 and CCC calculation.

FYI, following are the values of E_R2 in SAS and R respectively:

E_R2=0.4630339 (R); but ERSQ=0.3732597284 (SAS)

Could you please help me out with finding what's going wrong in the
background?

-

Kindly find below the codes I have used for this:

--



--

proc fastclus data=sample_data

maxiter=100 seed=initial_seed maxc=5 outstat=metrics out=output;

Var v1 v2 v3 v4 v5 v6;

run;



--



--

load("C:\\Users\\sagnik\\Desktop\\SAS_cluster.Rdata")



clust.perf.metrics <- function(data, cl) {

  data1 <- as.matrix(data)

  numberObsBefore <- dim(data1)[1]

  data <- na.omit(data1)

  nn <- numberObsAfter <- dim(data)[1]

  pp <- dim(data)[2]

  qq <- max(cl)

  TT <- t(data) %*% data

  sizeEigenTT <- length(eigen(TT)$value)

  eigenValues <- eigen(TT/(nn - 1))$value

  for (i in 1:sizeEigenTT) {

if (eigenValues[i] < 0) {

  cat(paste("There are only", numberObsAfter, "non-missing observations
out of a possible",

numberObsBefore, "observations."))

  stop("The TSS matrix is indefinite. There must be too many missing
values. The index cannot be calculated.")

}

  }



  s1 <- sqrt(eigenValues)

  ss <- rep(1, sizeEigenTT)

  for (i in 1:sizeEigenTT) {

if (s1[i] != 0)

  ss[i] = s1[i]

  }

  vv <- prod(ss)

  z <- matrix(0, ncol = qq, nrow = nn)

  clX <- as.matrix(cl)

  for (i in 1:nn)

for (j in 1:qq) {

  z[i, j] == 0

  if (clX[i, 1] == j) z[i, j] = 1

}

  xbar <- solve(t(z) %*% z) %*% t(z) %*% data

  B <- t(xbar) %*% t(z) %*% z %*% xbar

  W <- TT - B

  R2 <- 1 - (sum(diag(W))/sum(diag(TT)))

  PseudoF <- (sum(diag(B))/(qq-1))/(sum(diag(W))/(nn-qq))



  v1 <- 1

  u1 <- rep(0, pp)

  c1 <- (vv/qq)^(1/pp)

  u1 <- ss/c1

  k1 <- sum((u1 >= 1) == TRUE)

  p1 <- min(k1, qq - 1)



  if (all(p1 > 0, p1 < pp)) {

for (i in 1:p1) { v1 <- v1 * ss[i]}

c <- (v1/qq)^(1/p1)

u <- ss/c

b1 <- sum(1/(nn + u[1:p1]))

b2 <- sum(u[(p1 + 1):pp]^2/(nn + u[(p1 + 1):pp]), na.rm = TRUE)

E_R2 <- 1 - ((b1 + b2)/sum(u^2)) * ((nn - qq)^2/nn) * (1 + (4/nn))

ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * p1/2)/((0.001 + E_R2)^1.2))

  } else {

b1 <- sum(1/(nn + u))

E_R2 <- 1 - (b1/sum(u^2)) * ((nn - qq)^2/nn) * (1 + 4/nn)

ccc <- log((1 - E_R2)/(1 - R2)) * (sqrt(nn * pp/2)/((0.001 + E_R2)^1.2))

  }

  results <- list(R_2=R2, PseudoF=PseudoF, CCC = ccc, E_R2=E_R2);
return(results)

}



clust.perf.metrics(output[,1:6],output[,7])

#---

THANKS IN ADVANCE,

REGARDS,

SAGNIK
__
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Issues with fa() function in "psych"

2014-09-01 Thread sagnik chakravarty
Hi William,

Recently I noticed that if the requested rotation is not available,
"principal" function also defaults to rotate=“none” without any WARNING.
You had earlier fixed the same issue with "fa" in version 1.4.4. Kindly
include the same for "principal" also.

Also, as I had pointed out earlier in my trailing mails, is there any
update on the following suggestion:

"The fa() function doesn't account for 'Heywood cases' (communality greater
than 1) and never ever throws out any error related to that which other
softwares do. This is a serious and common issue in iterative factor
analysis and hence should have been accounted for."

Awaiting your revert,

Thanks and Regards,

Sagnik


On Fri, May 16, 2014 at 10:54 AM, sagnik chakravarty  wrote:

> Hi William,
>
> Thanks for the update. I see this package has so many capabilities ! I
> will suggest further for its development if anything else comes to my mind.
>
> Regards,
> Sagnik
>
>
> On Thu, May 15, 2014 at 6:34 AM, William Revelle 
> wrote:
>
>> Sagnik,
>>
>> I did some more checking and in fact you can do equamax through GPA
>> rotation.  (Gunter Nickel pointed this out in a post to R-help).  I will
>> implement this in version 1.4.6 (1.4.5 is now working its way through the
>> various CRAN mirrors).
>>
>> You might like 1.4.5 in that I have added various ways of displaying
>> confidence intervals (cats eye plots) as well as upper and lower confidence
>> limits for correlations (cor.plot.upperLowerCi)
>>
>> Bill
>>
>> On Apr 10, 2014, at 1:22 AM, sagnik chakravarty 
>> wrote:
>>
>> > Thanks a lot Bill and Revelle for your helpful response.
>> > It would have been great if I could know when we can expect the release
>> of the edited version 1.4.4.
>> >
>> > Sagnik
>> >
>> >
>> > On Wed, Apr 9, 2014 at 8:05 PM, William Revelle 
>> wrote:
>> > Sagnik raises the question as to why the psych package does not offer
>> the ‘equamax’ rotation.
>> > It is because all rotations are handled through the GPArotation package
>> which does not offer equamax.
>> >
>> > Sagnik also points out that if the requested rotation is not available,
>> fa defaults to rotate=“none” without any warning.  I have fixed that for
>> the  next release (1.4.4).
>> > (1.4.4 also will fix a bug in corr.test introduced into 1.4.3).
>> >
>> >
>> > The question about why printing just the loadings matrix leaves blank
>> cells?  That is because the loadings matrix of class “loadings” which the
>> default print function prints with a cut = .3.
>> > Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match
>> the output of efa_pa.
>> >
>> > The fm=“pa” option runs conventional principal axis factor analysis
>> (ala SPSS).  As documented, this iterates max.iter times
>> >
>> > "Not all factor programs that do principal axes do iterative solutions.
>> The example from the SAS manual (Chapter 26) is such a case. To achieve
>> that solution, it is necessary to specify that the max.iterations = 1.
>> Comparing that solution to an iterated one (the default) shows that
>> iterations improve the solution. In addition, fm="minres" or fm="mle"
>> produces even better solutions for this example.”
>> >
>> > The com column is factor complexity using the index developed by
>> Hofmann (1978).  It is a row wise measure of item complexity.
>> > I have added more documentation to this in 1.4.4
>> >
>> > Bill
>> >
>> >
>> > On Apr 8, 2014, at 2:28 AM, Pascal Oettli  wrote:
>> >
>> > > Hello,
>> > >
>> > > And what about submitting your suggestions directly to the package
>> > > author/maintainer?
>> > >
>> > > And please don't post in HTML.
>> > >
>> > > Regards,
>> > > Pascal
>> > >
>> > > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty
>> > >  wrote:
>> > >> Hi Team,
>> > >>
>> > >> I was using your "psych" package for factor analysis and was also
>> comparing
>> > >> the results with SAS results. I have some suggestions and/or
>> confusions
>> > >> regarding the fa() function in the package:
>> > >>
>> > >>   - The fa() function *doesn't account for Heywood cases*
>> (communality
>> > >>   greater than 1) and never ever throws 

Re: [R] Issues with fa() function in "psych"

2014-05-15 Thread sagnik chakravarty
Hi William,

Thanks for the update. I see this package has so many capabilities ! I will
suggest further for its development if anything else comes to my mind.

Regards,
Sagnik


On Thu, May 15, 2014 at 6:34 AM, William Revelle  wrote:

> Sagnik,
>
> I did some more checking and in fact you can do equamax through GPA
> rotation.  (Gunter Nickel pointed this out in a post to R-help).  I will
> implement this in version 1.4.6 (1.4.5 is now working its way through the
> various CRAN mirrors).
>
> You might like 1.4.5 in that I have added various ways of displaying
> confidence intervals (cats eye plots) as well as upper and lower confidence
> limits for correlations (cor.plot.upperLowerCi)
>
> Bill
>
> On Apr 10, 2014, at 1:22 AM, sagnik chakravarty 
> wrote:
>
> > Thanks a lot Bill and Revelle for your helpful response.
> > It would have been great if I could know when we can expect the release
> of the edited version 1.4.4.
> >
> > Sagnik
> >
> >
> > On Wed, Apr 9, 2014 at 8:05 PM, William Revelle 
> wrote:
> > Sagnik raises the question as to why the psych package does not offer
> the ‘equamax’ rotation.
> > It is because all rotations are handled through the GPArotation package
> which does not offer equamax.
> >
> > Sagnik also points out that if the requested rotation is not available,
> fa defaults to rotate=“none” without any warning.  I have fixed that for
> the  next release (1.4.4).
> > (1.4.4 also will fix a bug in corr.test introduced into 1.4.3).
> >
> >
> > The question about why printing just the loadings matrix leaves blank
> cells?  That is because the loadings matrix of class “loadings” which the
> default print function prints with a cut = .3.
> > Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match
> the output of efa_pa.
> >
> > The fm=“pa” option runs conventional principal axis factor analysis (ala
> SPSS).  As documented, this iterates max.iter times
> >
> > "Not all factor programs that do principal axes do iterative solutions.
> The example from the SAS manual (Chapter 26) is such a case. To achieve
> that solution, it is necessary to specify that the max.iterations = 1.
> Comparing that solution to an iterated one (the default) shows that
> iterations improve the solution. In addition, fm="minres" or fm="mle"
> produces even better solutions for this example.”
> >
> > The com column is factor complexity using the index developed by Hofmann
> (1978).  It is a row wise measure of item complexity.
> > I have added more documentation to this in 1.4.4
> >
> > Bill
> >
> >
> > On Apr 8, 2014, at 2:28 AM, Pascal Oettli  wrote:
> >
> > > Hello,
> > >
> > > And what about submitting your suggestions directly to the package
> > > author/maintainer?
> > >
> > > And please don't post in HTML.
> > >
> > > Regards,
> > > Pascal
> > >
> > > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty
> > >  wrote:
> > >> Hi Team,
> > >>
> > >> I was using your "psych" package for factor analysis and was also
> comparing
> > >> the results with SAS results. I have some suggestions and/or
> confusions
> > >> regarding the fa() function in the package:
> > >>
> > >>   - The fa() function *doesn't account for Heywood cases* (communality
> > >>   greater than 1) and never ever throws out any error related to that
> which
> > >>   other softwares do. This is a serious and common issue in iterative
> factor
> > >>   analysis and hence should have been accounted for.
> > >>
> > >>
> > >>   - The fa() function doesn't provide "equamax" rotation in its
> rotation
> > >>   list and still if you specify "*rotation=equamax*", it will run
> without
> > >>   throwing out any error and even mentioning in the result that
> "equamax" has
> > >>   been applied. But I have thoroughly compared results from "
> > >>   *rotation=none*" and "*rotation=equamax*" options and they are
> exactly
> > >>   same. *That means fa() is not doing the rotation at all and yet
> telling
> > >>   that it is doing that!!* I have even mentioned "*rotation=crap*"
> option
> > >>   just to check and surprisingly it ran(without any error) with the
> result
> > >>   showing:
> > >>
> > >>   *Factor Analysis using method 

Re: [R] Issues with fa() function in "psych"

2014-04-09 Thread sagnik chakravarty
Thanks a lot Bill and Revelle for your helpful response.
It would have been great if I could know when we can expect the release of
the edited version 1.4.4.

Sagnik


On Wed, Apr 9, 2014 at 8:05 PM, William Revelle  wrote:

> Sagnik raises the question as to why the psych package does not offer the
> 'equamax' rotation.
> It is because all rotations are handled through the GPArotation package
> which does not offer equamax.
>
> Sagnik also points out that if the requested rotation is not available, fa
> defaults to rotate="none" without any warning.  I have fixed that for the
>  next release (1.4.4).
> (1.4.4 also will fix a bug in corr.test introduced into 1.4.3).
>
>
> The question about why printing just the loadings matrix leaves blank
> cells?  That is because the loadings matrix of class "loadings" which the
> default print function prints with a cut = .3.
> Using the example from Sagnik, print(efa_pa$loadings,cut=0) will match the
> output of efa_pa.
>
> The fm="pa" option runs conventional principal axis factor analysis (ala
> SPSS).  As documented, this iterates max.iter times
>
> "Not all factor programs that do principal axes do iterative solutions.
> The example from the SAS manual (Chapter 26) is such a case. To achieve
> that solution, it is necessary to specify that the max.iterations = 1.
> Comparing that solution to an iterated one (the default) shows that
> iterations improve the solution. In addition, fm="minres" or fm="mle"
> produces even better solutions for this example."
>
> The com column is factor complexity using the index developed by Hofmann
> (1978).  It is a row wise measure of item complexity.
> I have added more documentation to this in 1.4.4
>
> Bill
>
>
> On Apr 8, 2014, at 2:28 AM, Pascal Oettli  wrote:
>
> > Hello,
> >
> > And what about submitting your suggestions directly to the package
> > author/maintainer?
> >
> > And please don't post in HTML.
> >
> > Regards,
> > Pascal
> >
> > On Tue, Apr 8, 2014 at 3:13 PM, sagnik chakravarty
> >  wrote:
> >> Hi Team,
> >>
> >> I was using your "psych" package for factor analysis and was also
> comparing
> >> the results with SAS results. I have some suggestions and/or confusions
> >> regarding the fa() function in the package:
> >>
> >>   - The fa() function *doesn't account for Heywood cases* (communality
> >>   greater than 1) and never ever throws out any error related to that
> which
> >>   other softwares do. This is a serious and common issue in iterative
> factor
> >>   analysis and hence should have been accounted for.
> >>
> >>
> >>   - The fa() function doesn't provide "equamax" rotation in its rotation
> >>   list and still if you specify "*rotation=equamax*", it will run
> without
> >>   throwing out any error and even mentioning in the result that
> "equamax" has
> >>   been applied. But I have thoroughly compared results from "
> >>   *rotation=none*" and "*rotation=equamax*" options and they are exactly
> >>   same. *That means fa() is not doing the rotation at all and yet
> telling
> >>   that it is doing that!!* I have even mentioned "*rotation=crap*"
> option
> >>   just to check and surprisingly it ran(without any error) with the
> result
> >>   showing:
> >>
> >>   *Factor Analysis using method =  gls*
> >> *   Call: fa(r = cor_mat, nfactors = 4, n.obs = 69576, rotate =
> >> "crap", fm = "gls")*
> >>
> >>I hope you understand the severity of this bug and hence
> >> request you to correct this.
> >>
> >>   - To my sense, there might be some problem with "fm=ml" and "fm=pa"
> >>   options since the convergence issue should be with MLE method and not
> PA
> >>   method but while running factor analysis with PA, I am getting the
> >>   following warning:
> >>
> >>*maximum iteration exceeded*
> >> *The estimated weights for the factor scores are probably
> >> incorrect.  Try a different factor extraction method.*
> >>
> >> If I compare the results of R and SAS,* I am getting
> >> convergence error for MLE in SAS whereas I am getting the same error
> for PA
> >> in R *!! I am not being able to understand this mismatch.
> >>
> >>   - If I call the *loading matrix like ef

[R] Issues with fa() function in "psych"

2014-04-07 Thread sagnik chakravarty
Hi Team,

I was using your "psych" package for factor analysis and was also comparing
the results with SAS results. I have some suggestions and/or confusions
regarding the fa() function in the package:

   - The fa() function *doesn't account for Heywood cases* (communality
   greater than 1) and never ever throws out any error related to that which
   other softwares do. This is a serious and common issue in iterative factor
   analysis and hence should have been accounted for.


   - The fa() function doesn't provide "equamax" rotation in its rotation
   list and still if you specify "*rotation=equamax*", it will run without
   throwing out any error and even mentioning in the result that "equamax" has
   been applied. But I have thoroughly compared results from "
   *rotation=none*" and "*rotation=equamax*" options and they are exactly
   same. *That means fa() is not doing the rotation at all and yet telling
   that it is doing that!!* I have even mentioned "*rotation=crap*" option
   just to check and surprisingly it ran(without any error) with the result
   showing:

   *Factor Analysis using method =  gls*
*   Call: fa(r = cor_mat, nfactors = 4, n.obs = 69576, rotate =
"crap", fm = "gls")*

I hope you understand the severity of this bug and hence
request you to correct this.

   - To my sense, there might be some problem with "fm=ml" and "fm=pa"
   options since the convergence issue should be with MLE method and not PA
   method but while running factor analysis with PA, I am getting the
   following warning:

*maximum iteration exceeded*
*The estimated weights for the factor scores are probably
incorrect.  Try a different factor extraction method.*

 If I compare the results of R and SAS,* I am getting
convergence error for MLE in SAS whereas I am getting the same error for PA
in R *!! I am not being able to understand this mismatch.

   - If I call the *loading matrix like efa_pa$loadings, the matrix shown
   has many blank cells whereas the final result showing the loadings doesn't
   have so* !!

*Loadings:*
* PA1PA2PA3PA4   *
*Var10.401   -0.243*
*Var20.336 -0.1040.710*
*Var30.624  0.123 0.170  *


   - Could you please explain* what the "com" column means* in the output:?


*   PA1   PA3   PA2   PA4 h2  u2  com*
*Var1  0.44  0.14 -0.03  -0.10 0.22665  0.773  1.3*
*Var2  0.08  0.11  0.02   0.78  0.62951  0.370  1.1*
*Var3  0.62  0.12  0.15   0.14  0.43578  0.564  1.3*

   - Request you to add option for *"equamax" rotation* also if possible.


I have come across the above issues until now. Please do correct me if I am
wrong.

Awaiting your revert which would clear out my confusions,

Thanks for your valuable time,

Sagnik

-- 
Regards,

*SAGNIK CHAKRAVARTY*

*Mob:*  +919972865435
*Email:* sagnik.st...@gmail.com
   sagnik@gmail.com

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.