Re: [R] CFA with lavaan or with SEM

2013-02-01 Thread Rick Bilonick
Not sure if you are aware of the OpenMx SEM package 
(http://openmx.psyc.virginia.edu/). It's a very full-featured structural 
equation modeling package.


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CFA with lavaan or with SEM

2013-01-25 Thread yrosseel

I am trying to use the cfa command in the lavaan package to run a CFA
however I am unsure over a couple of issues.

I have @25 dichotomous variables, 300 observations and an EFA on a
training dataset suggests a 3 factor model.


That is a lot of variables, and a rather small sample size (for binary 
data).



After defining the model I use the command

fit.dat - cfa(model.1, data=my.dat, std.lv = T, estimator=WLSMV,
ordered=c(var1,var2 and so on for the other 23 variables))


To avoid having to type var? 25 times, you can say

ordered=paste(var,1:25,sep=)


Is it right that I define the variables as ordered (the output
returns thresholds suggesting I should).


Yes!

Does the cfa command

calculate tetrachoric correlations in the background?


Yes, indeed. You can 'see' it by typing

inspect(fit, sampstat)

lavaan also computes an asymptotic variance matrix of these 
correlations, so you should get correct standard errors and a correct 
test statistic. By default, lavaan will provide robust standard errors 
and a mean and variance adjusted test statistic (estimator=WLSMV).



However, output for the command returns two variables with  small
negative variances (-0.002) which I think is due to the correlation
matrix not being positive definite.  Is it reasonable to force these
to be zero when defining the model or is this more a sign of problems
with the model?


You can NOT force these to be equal (at least not in the current version 
of lavaan - 0.5-11, where the residual variance is a function of other 
model parameters). I don't think this is caused by a non-pd correlation 
matrix (you should get a big warning if this was the case). Perhaps the 
sample size is too small. Could you remove some items, or regroup them?



As an alternative is it possible to calculate the tetrachoric
correlations using hetcor (which applies smoothing) and then use the
smoothed sample correlation as the input to the model, such as

fit.cor - cfa(model.1, sample.cov=my.hetcor, sample.nobs=300, std.lv
= T,estimator=ML, ordered=c(var1,var2 and so on for the other
23 variables)).


This will work only if you omit the 'ordered' argument. Perhaps in 
combination with estimator=ULS. But do not trust/report the standard 
errors in this case.



Final question is I have a lot of missing data - listwise deletion
leaves 90 subjects. Is there a way to calculate estimates using
pairwise deletion (this is another reason why I tried using the
correlation matrix as the input).


You could do this, and use estimator=ULS. But again, you can not use 
the standard errors.


Yves.
--
http://lavaan.org

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] CFA with lavaan or with SEM

2013-01-23 Thread David Purves
Hi

Sorry for the rather long message.

I am trying to use the cfa command in the lavaan package to run a CFA however I 
am unsure over a couple of issues.

I have @25 dichotomous variables, 300 observations and an EFA on a training 
dataset suggests a 3 factor model.

After defining the model I use the command

fit.dat - cfa(model.1, data=my.dat, std.lv = T, estimator=WLSMV, 
ordered=c(var1,var2 and so on for the other 23 variables))

Is it right that I define the variables as ordered (the output returns 
thresholds suggesting I should). Does the cfa command calculate tetrachoric 
correlations in the background?

However, output for the command returns two variables with  small negative 
variances (-0.002) which I think is due to the correlation matrix not being 
positive definite. Is it reasonable to force these to be zero when defining the 
model or is this more a sign of problems with the model?

As an alternative is it possible to calculate the tetrachoric correlations 
using hetcor (which applies smoothing) and then use the smoothed sample 
correlation as the input to the model, such as

fit.cor - cfa(model.1, sample.cov=my.hetcor, sample.nobs=300, std.lv = 
T,estimator=ML, ordered=c(var1,var2 and so on for the other 23 
variables)).

This however does not produce thresholds suggesting what I have tried is 
nonsense but is there a way to do this?

Final question is I have a lot of missing data - listwise deletion leaves 90 
subjects. Is there a way to calculate estimates using pairwise deletion (this 
is another reason why I tried using the correlation matrix as the input).



I have tried the analysis using John Fox's SEM package / command.

I calculate the correlation matrix with smoothing

my.cor-hetcor(north.dat.sub,use=pairwise.complete.obs)$correlations

This returns the warning indicating that the correlation matrix was adjusted to 
make it positive definite. However the following sem model does not run, with 
the error message that the matrix is non-invertible.

mod1-sem::sem(sem .model.1, S=my.cor, 300)

Should the smoothing not allow it to be inverted?

thanks for help, david




The University of Glasgow, charity number SC004401

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CFA with lavaan or with SEM

2013-01-23 Thread John Fox
Dear David,

On Wed, 23 Jan 2013 11:19:09 +
 David Purves david.pur...@glasgow.ac.uk wrote:
 Hi
 
 Sorry for the rather long message.
 

. . .

 
 I have tried the analysis using John Fox's SEM package / command.
 
 I calculate the correlation matrix with smoothing
 
 my.cor-hetcor(north.dat.sub,use=pairwise.complete.obs)$correlations
 
 This returns the warning indicating that the correlation matrix was adjusted 
 to make it positive definite. However the following sem model does not run, 
 with the error message that the matrix is non-invertible.
 
 mod1-sem::sem(sem .model.1, S=my.cor, 300)
 
 Should the smoothing not allow it to be inverted?
 

If the input correlation matrix is really positive definite, then it has an 
inverse. You could check directly, e.g., by looking at the eignevalues of the 
tetrachoric correlation matrix. There's very little here to go on, not even the 
error message produced by sem(). By the way, I assume that you didn't really 
call sem in the sem package as sem::sem in a session in which lavann was 
loaded. I'm not sure what would happen if you did that.

Best,
 John


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

 
 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CFA with lavaan or with SEM

2013-01-23 Thread David Purves
Hi John

Thanks for your quick reply.

The full warning I got is

' Error in csem(model = model.description, start, opt.flag = 1, typsize = 
typsize,  :
  The matrix is non-invertable.'

The eigenvalues of the tetrachoric correlations are non negative. So it is must 
be how I am defining my model.

I have also tried it without having lavaan in the session.

A wee example of my error (whether it is sensible);

library(sem)

my.cor-matrix(c( 1.000  ,  0.7600616  ,  0.3653309 ,   0.4377949 , 
0.2917927 ,   0.5133697,
0.7600616 ,   1.000,   0.6335519 ,   0.8288809 , 0.6223942  ,  
0.6355725,
 0.3653309 ,  0.6335519  ,  1.000 ,   0.9098309 , 0.9098309  ,  
0.7693395,
 0.4377949 , 0.8288809  ,  0.9098309  ,  1.000  ,0.9136967   , 
0.7829854,
  0.2917927  ,0.6223942  ,  0.9098309  ,  0.9136967  ,1.000   , 
0.7354562,
 0.5133697  ,0.6355725  ,  0.7693395  ,  0.7829854 , 0.7354562   , 
1.000),
nrow=6,byrow=T)

colnames(my.cor)-rownames(my.cor)-c(a,b,c,d,e,g)

eigen(my.cor)
solve(my.cor)

#i tried defining the model in two ways

model.1-matrix(c(
#   arrow   #parameter  #start
f - a,   g1,   NA,
f - b,   g2,   NA,
f - c,   g3,   NA,
f - d,   g4,   NA,
f - e,   g5,   NA,
f - g,   g6,   NA,
f - f,  NA, 1),
ncol=3,byrow=T)

out-sem(model.1,S=my.cor,200)

model.1 - specifyEquations()
 f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g
 f1 = 1* f1

out-sem(model.1,S=my.cor,200)

But the same error.

I would be very grateful if you could indicate where the error in my code is 
please.


thanks, david




-Original Message-
From: John Fox [mailto:j...@mcmaster.ca]
Sent: 23 January 2013 14:00
To: David Purves
Cc: r-help@R-project.org
Subject: Re: [R] CFA with lavaan or with SEM

Dear David,

On Wed, 23 Jan 2013 11:19:09 +
 David Purves david.pur...@glasgow.ac.uk wrote:
 Hi

 Sorry for the rather long message.


. . .


 I have tried the analysis using John Fox's SEM package / command.

 I calculate the correlation matrix with smoothing

 my.cor-hetcor(north.dat.sub,use=pairwise.complete.obs)$correlations

 This returns the warning indicating that the correlation matrix was adjusted 
 to make it positive definite. However the following sem model does not run, 
 with the error message that the matrix is non-invertible.

 mod1-sem::sem(sem .model.1, S=my.cor, 300)

 Should the smoothing not allow it to be inverted?


If the input correlation matrix is really positive definite, then it has an 
inverse. You could check directly, e.g., by looking at the eignevalues of the 
tetrachoric correlation matrix. There's very little here to go on, not even the 
error message produced by sem(). By the way, I assume that you didn't really 
call sem in the sem package as sem::sem in a session in which lavann was 
loaded. I'm not sure what would happen if you did that.

Best,
 John


John Fox
Sen. William McMaster Prof. of Social Statistics Department of Sociology 
McMaster University Hamilton, Ontario, Canada http://socserv.mcmaster.ca/jfox/


 __
 R-help@r-project.org mailing list
 https://stat.ethz.ch/mailman/listinfo/r-help
 PLEASE do read the posting guide
 http://www.R-project.org/posting-guide.html
 and provide commented, minimal, self-contained, reproducible code.




The University of Glasgow, charity number SC004401

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] CFA with lavaan or with SEM

2013-01-23 Thread John Fox
Dear David,

It certainly helps to have a reproducible example.

You've left out the error variances (uniquenesses) for the observed
variables. You're also making the specification *much* harder than it needs
to be:

-- snip ---
 cfa.mod.1 - cfa()
1: F: a, b, c, d, e, g
2: 
Read 1 item
NOTE: adding 6 variances to the model

 cfa.mod.1
   PathParameter StartValue
1  F - a  fixed   1 
2  F - b  lam[b:F]
3  F - c  lam[c:F]
4  F - d  lam[d:F]
5  F - e  lam[e:F]
6  F - g  lam[g:F]
7  F - F V[F]
8  a - a V[a]
9  b - b V[b]
10 c - c V[c]
11 d - d V[d]
12 e - e V[e]
13 g - g V[g]   

 cfa.sem.1 - sem(cfa.mod.1, S=my.cor, N=200)
 summary(cfa.sem.1)

 Model Chisquare =  543.6442   Df =  9 Pr(Chisq) = 2.565155e-111
 AIC =  567.6442
 BIC =  495.9594

 Normalized Residuals
 Min.   1st Qu.Median  Mean   3rd Qu.  Max. 
-1.536000 -0.135500  0.002829  0.294500  0.353400  5.337000 

 R-square for Endogenous Variables
 a  b  c  d  e  g 
0.1841 0.6969 0.8172 1.0084 0.8269 0.6007 

 Parameter Estimates
 Estimate Std Error   z value   Pr(|z|) 
lam[b:F]  1.945376727 0.302785547  6.424933 1.319280e-10 b --- F
lam[c:F]  2.106647980 0.320689035  6.569130 5.061006e-11 c --- F
lam[d:F]  2.340103148 0.347900207  6.726363 1.739560e-11 d --- F
lam[e:F]  2.119171567 0.322095480  6.579327 4.725816e-11 e --- F
lam[g:F]  1.806192591 0.287680436  6.278469 3.419240e-10 g --- F
V[F]  0.184137740 0.057758730  3.188050 1.432356e-03 F -- F
V[a]  0.815862342 0.081641551  9.993224 1.631854e-23 a -- a
V[b]  0.303132223 0.030545714  9.923887 3.277381e-23 b -- b
V[c]  0.182802929 0.019248279  9.497105 2.158058e-21 c -- c
V[d] -0.008353614 0.008298643 -1.006624 3.141154e-01 d -- d
V[e]  0.173057855 0.018375461  9.417878 4.602950e-21 e -- e
V[g]  0.399281457 0.039935977  9.998039 1.554445e-23 g -- g

 Iterations =  59 

-- snip ---

Note that the default in cfa() is to use a reference indicator, and that the
solution is improper -- there's a negative estimated error variance, V[d]. 

An alternative specification sets the variance of the factor to 1, but then
cfa() fails to converge:

-- snip ---

 cfa.mod.2 - cfa(reference.indicators=FALSE)
1: F: a, b, c, d, e, g
2: 
Read 1 item
NOTE: adding 6 variances to the model

 cfa.mod.2
   PathParameter StartValue
1  F - a  lam[a:F]
2  F - b  lam[b:F]
3  F - c  lam[c:F]
4  F - d  lam[d:F]
5  F - e  lam[e:F]
6  F - g  lam[g:F]
7  F - F fixed   1 
8  a - a V[a]
9  b - b V[b]
10 c - c V[c]
11 d - d V[d]
12 e - e V[e]
13 g - g V[g]

 cfa.sem.2 - sem(cfa.mod.2, S=my.cor, N=200, debug=TRUE)

. . .

Start values:
  lam[a:F]   lam[b:F]   lam[c:F]   lam[d:F]   lam[e:F]   lam[g:F]   V[a]
V[b]   V[c]   V[d]   V[e]   V[g] 
0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865 0.56728160
0.23437445 0.19722125 0.09427268 0.23686401 0.24924941 

iteration = 0
Step:
 [1] 0 0 0 0 0 0 0 0 0 0 0 0
Parameter:
 [1] 0.65781335 0.87500031 0.89597921 0.95169707 0.87357655 0.86645865
0.56728160 0.23437445 0.19722125 0.09427268 0.23686401 0.24924941
Function Value
[1] 3.346898
Gradient:
 [1]  0.4583916  0.3957443 -0.2067868 -0.4369468 -0.2629929  0.2431501
-0.5501220 -1.672  0.6543088  3.0031327  0.7820309 -1.0122023

. . .

iteration = 21
Parameter:
 [1]  0.4428  0.68987016  0.99055402  1.15651371  0.99812990  0.75293242
0.82441291  1.01174284  0.01185904 -1.30253783 -0.01183159
[12]  0.71942353
Function Value
[1] -316143
Gradient:
 [1]  83431722 105921661   12975044375-137927630  -13105242109
162575760 -22404848 -36111801 -541872735153
[10] -61232522 -552802111412 -85072888

Successive iterates within tolerance.
Current iterate is probably solution.

Warning message:
In eval(expr, envir, enclos) :
  Could not compute QR decomposition of Hessian.
Optimization probably did not converge.

-- snip ---

The problem seems ill-conditioned, and in any event the standard errors that
you get using tetrachoric correlations won't be right (I expect you know
that).

I hope this helps,
 John

 -Original Message-
 From: David Purves [mailto:david.pur...@glasgow.ac.uk]
 Sent: Wednesday, January 23, 2013 10:23 AM
 To: John Fox
 Cc: r-help@R-project.org
 Subject: RE: [R] CFA with lavaan or with SEM
 
 Hi John
 
 Thanks for your quick reply.
 
 The full warning I got is
 
 ' Error in csem(model = model.description, start, opt.flag = 1, typsize
 = typsize,  :
   The matrix is non-invertable.'
 
 The eigenvalues of the tetrachoric correlations are non negative. So

Re: [R] CFA with lavaan or with SEM

2013-01-23 Thread John Fox
Dear Daniel,

Oh, I see I forgot to comment on your second specification in my last reply:

 -Original Message-
 From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org]
 On Behalf Of David Purves
 Sent: Wednesday, January 23, 2013 10:23 AM
 To: John Fox
 Cc: r-help@R-project.org
 Subject: Re: [R] CFA with lavaan or with SEM
 

. . .

 
 model.1 - specifyEquations()
  f1 = gam11*a + gam12*b + gam13*c + gam14*d + gam15*e + gam16*g
  f1 = 1* f1
 

First, this is backwards: the observed variables depend on the factor, and
not vice-versa; e.g., a = gam11*f1. Second, the factor has an error-variance
parameter; it doesn't depend on itself: V(f1) = 1. As I mentioned in my
previous message, it's easier to use cfa() for this kind of model.

Best,
 John

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.