Re: REML for Dummies?

2002-03-01 Thread John Uebersax

The Enclyclopedia of Biostatistics (Armitage P, Colton T; Wiley,
1999?) has an article on REML.

I have not seen the article, but usually their articles well explain
statistical concepts to non-statisticians.

The Encyclopedia is a resource you might find helpful in general.  For
more info, see:

http://www.wiley.co.uk/wileychi/eob/


John Uebersax, PhD (858) 597-5571 
La Jolla, California   (858) 625-0155 (fax)
email: [EMAIL PROTECTED]

Statistics:  http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Psychology:  http://members.aol.com/spiritualpsych


Dr Jonathan Newman [EMAIL PROTECTED] 
 I'm trying to find a good introduction to REML (restricted maximum
 likelihood.


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: factor Analysis

2002-01-28 Thread John Uebersax

A program like SAS or SPSS will calculate factor scores for you.  A
factor score is an estimated location of an object (not a variable)
relative to a factor.  If your factors are orthogonal, then you can
plot each case using that case's score on Factor 1 and the score on
Factor 2 as the X- and Y- coordinates of in a 2-dimensional space.

I believe the formula for estimating factor scores of a common-factor
model is not trvial (unless all communalities are 1).  Therefore one
might as well let the software calculate factor scores.  The topic is
well explained in the SAS manual (PROC FACTOR)--perhaps also in the
SPSS manual.


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych
Diet  Fitness:http://members.aol.com/WeightControl101


Huxley [EMAIL PROTECTED] wrote in message news:a2u3sa$q3e$[EMAIL PROTECTED]...
 Hi,
 I've got a question. Does anyone know how to set object in 2-factor
 dimensional space ...
 I heard that factor score for a product is equal to product of the suitable
 factor loadings and variables mean. i.e.
 f(m,p)=a(1,m)u(1,p) +a(2,m)u(2,p)+ ...+a(j,m)u(j,p)
 where: f(m,d) - factor score for m-factor,  p-th - consumer product , u(*) -
 mean for variable j and product p.
 Could you tell me is this true? How to proof this formally


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Measure of Association Question.

2002-01-02 Thread John Uebersax

[EMAIL PROTECTED] (Petrus Nel) wrote in message 
news:000201c18fe2$f73aeee0$ed9e22c4@oemcomputer...

 I require some advice regarding the following: One set of variables is 
 the grades obtained by students for different high school subjects (i.e. 
 the symbols candidates obtained such as A, B, C, D, etc. for each 
 subject). The other set of variables are the scores obtained for a 
 college level subject (i.e. no symbols, just their percentages 
 ... 
 The grades obtained for their high school subjects were coded on the 
 questionnaire as follows - 1=A, 2=B, 3=C, 4=D, 5=E, 6=F.  
 ...
 How do I proceed?

Simpler answer:

First, change the coding to 1=F, 2=E, 3=D, 4=C, 5=B, 6=A.   In the US
at least
there is no 'E'; if so, the correct coding would be 1=F, 2=D, 3=C,
4=B, 5=A.

If the latter coding is used, calculate the Spearman rank correlation
between the grade in a given high school course and the college score.

If the former coding is used, you can use either the Pearson
correlation or the Spearman rank correlation; the Pearson correlation
would probably be better.

More complex answer:

The approach above ignores the fact that within each letter grade
there is variation--e.g., all students who get a 'B' are not at the
same level.  Further, there is censoring at the upper end and lower
ends of the scale--e.g., no matter how well a person does, the highest
grade they can get is an 'A'.

The polyserial correlation can account for this.  The polyserial
correlation estimates what the correlation of grade and score would be
if grades were measured on a continuous scale.  An assumption is that
there is a bivariate normal distribution between (1) the continuous
latent variable of which grade is a manifest representation and (2)
the percentage score.

The polyserial correlation is related to the polychoric correlation. 
For information about the polychoric correlation, see:

http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm

Drasgow F. Polychoric and polyserial correlations. In Kotz L, Johnson
NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74).
New York: Wiley, 1988.

I don't know if SPSS will calculate the polyserial correlation--the
last I
heard it did not.  If not, the polyserial correlation can be
calculated with the program PRELIS, which is distributed with LISREL. 
Many universities have copies of LISREL/PRELIS.

If you are interested in comparing to see which high school classes
best predict college scores, then, as a practical matter, I would
expect you would draw the same conclusions regardless of whether you
used the Pearson, the Spearman, or the polyserial correlation
coefficients.

Good luck!


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych
Diet  Fitness:http://members.aol.com/WeightControl101



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Most Frequently Used Clustering Algorithm

2001-11-16 Thread John Uebersax

Chia C Chong [EMAIL PROTECTED] wrote in message 
news:9t1qd9$k6m$[EMAIL PROTECTED]...

 I wonder which clustering algorithm is the most
 frequently used and maybe the most robust??
 
 I intend to use some kind of clustering to identify two random variables in
 obervations I have got.

Which is your goal:  to find groups of similar objects (object cluster
analysis), or to find groups of similar variables (variable cluster
analysis)?

John

John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Good Book on Clustering Algorithm??

2001-11-13 Thread John Uebersax

Chia C Chong [EMAIL PROTECTED] wrote in message 
news:9sk4p9$1e9$[EMAIL PROTECTED]...

 Any recommendation for books on Clustering Algorithm??

Two suggestions:

 Anderberg, M.R. (1973), Cluster Analysis for Applications, New York:
 Academic Press, Inc.

 Hartigan, J.A. (1975), Clustering Algorithms, New York:  John Wiley 
 Sons, Inc.


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: PCA source code

2001-10-09 Thread John Uebersax

Per Kallblad [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
 Hi,
 
 I am looking for high-quality source code (f77, f90 or C) to perform
 Principal Component Analysis (PCA). I would be most grateful for
 information on where to find such code.

You can find PCA code in f77 and C at Fionn Murtagh's Multivariate
Data Analysis Software and Resources Page:

   http://astro.u-strasbg.fr/~fmurtagh/mda-sw/

Hope this helps.

John Uebersax

John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Definitions of Likert scale, Likert item, etc.

2001-09-06 Thread John Uebersax

A recent question made me realize the extent of ambiguity in the use
of Likert scale and related terms.  I'd like to see things be more
clear.  Here are my thoughts (I don't claim they are correct; they're
just a starting point for discussion).  Concise responses are
encouraged.  If there are enough, I'll post a summary.

1.  Likert scaling strictly refers to the scaling method developed
by Likert in the 1930's.  If refers entire process of scaling a set of
many items (i.e., as an alternative to Thurstone scaling). One step of
this is administering many items to individuals.   Each item has
integer-labeled rating levels.

Likert used the method only for attitude measurement, and with 
response categories indicating levels of agreement to specific
statements, like:

I believe the work week should be reduced to 32 hours.

1.  strongly disagree
2.  mildly disagree
3.  neither agree nor disagree
4.  mildly agree
5.  strongly agree

2.  A Likert scale, strictly speaking, refers to a set of many such
items.

3.  I do not know if Likert also used a visual analog format such as:
  
 neither
strongly   mildly   agree normildly   strongly
disagree  disagree  disagree agree agree
 
   1 2  3  4 5
   +-+--+--+-+

4. It seems reasonable to refer to a single such item as a Likert
item.  However, many people seem to refer to a single item of this
type as a Likert scale; that would seem to invite confusion, as
Likert's original intent was to produce a scale compused of many such
items.

5. Many researchers use such items outside the area of attitude
measurement; it seems reasonable to refer to such items as
Likert-type items, to distinguish them from strict Likert items as
described above.

If anyone has any definitive references that clarify this, I would
greatly appreciate learning of them.


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor analysis - which package is best for Windows?

2001-08-31 Thread John Uebersax

Thanks for the tip on KyPlot.  It does seem very nice.  

Two questions:

1.  As best I can tell, the Factor Analysis routines work off
a correlation or covariance matrix.  At least from a perusal
of the Help index, I can't see how to run Factor Analysis from
raw data, or to calculate a correlation/covariance matrix from 
raw data (short of applying matrix manipulations).  Is there
a way to produce a corr/cov matrix within KyPlot?

2.  Does anyone know the current homepage for KyPlot?

Thanks

John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych


 [EMAIL PROTECTED] (Richard Wright) wrote in message 
news:[EMAIL PROTECTED]...
 KyPlot runs under Windows, is freeware and gives you several factor
 analysis algorithms to choose from.
 
 http://www.rocketdownload.com/Details/Math/kyplot.htm


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: MDS, the radex, and indices of multidimensionality agreement

2001-08-27 Thread John Uebersax

[EMAIL PROTECTED] (Niko Tiliopoulos) wrote in message 
news:[EMAIL PROTECTED]...

 Q1. I have run a multidimensional scaling analysis (MDS) and the
 2D-map suggests that the variables are arranged in a circular-like
 fashion. I have found a paper that presents a 2D-map showing a similar
 arrangement. 

Louis Guttman did work on circular MDS structures in the '70s.  If the
paper you refer to is not one of his, you might look at some of
Guttman's work.

 Q2. I have also run a factor analysis on the same dataset, and I would
 like to compare the level of agreement between the  FA factors and the
 MDS dimensions.

There is a mathematical identity between Euclidean metric MDS and
principle components analysis of Pearson correlations.  The solutions
are the same, I believe, except for a scaling of individual
dimensions/components and perhaps rotation.  This is possibly
described in Torgerson WS (1958) Theory and Methods of Scaling.

More generally, you could perform a canonical correlation analysis
between the two solutions, and measure agreement with the R^2.

Another possibility is to calculate the Pearson correlation between
all pairwise distances between all points in the MDS solution with the
corresponding pairwise distances in the factor analysis solution; I
believe the statistical significance of such a correlation is not
correct (because data points are not independent) but the r^2 is still
a measure of the proportion of variance in one structure explained by
the other.

Hope this helps.

John

John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax
Existential Psych: http://members.aol.com/spiritualpsych



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Venn diagram program?

2001-08-20 Thread John Uebersax

No I had more in mind:

1.  The argument room 

and perhaps:

2.  Well I didn't expect the Spanish Inquisition

It's like asking a question like, Excuse me, can you tell me how to
get to First and Main Street, and getting 5 replies like Oh come
now, why would anybody want to go to First and Main Street?

[EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote in message 
news:[EMAIL PROTECTED]...
  Thanks Alan for the constructive reply.  The others so far remind me
  of a Monty Python routine.
 
   Let me guess - the one in which the film producer fires everybody who
 comments on his idea?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Venn diagram program?

2001-08-17 Thread John Uebersax

Thanks Alan for the constructive reply.  The others so far remind me
of a Monty Python routine.

Yes, I am using Powerpoint now.  It's harder than it sounds, because
one must calculate the radius' that give appropriately scaled circle
areas; and one can only guess how close to move the circles to give
the correct overlap area.

John

[EMAIL PROTECTED] (Alan McLean) wrote in message 
news:[EMAIL PROTECTED]...
 You can draw Venn diagrams very easily in Powerpoint using the
 ellipse/circle and box/rectangle tools. Draw the diagram, group all the
 bits together, and copy it into Word or whatever.


John Uebersax, PhD (805) 384-7688 
Thousand Oaks, California  (805) 383-1726 (fax)
email: [EMAIL PROTECTED]

Existential Psych: http://members.aol.com/spiritualpsych
Agreement Stats:   http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm
Latent Structure:  http://ourworld.compuserve.com/homepages/jsuebersax



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: likert scale items - why not PCA?

2001-07-26 Thread John Uebersax

The common factor model is compatible with the idea that you have
unobserved constructs that you wish to estimate using item responses. 
The constructs are presumed measured with error.  A common factor
model takes this error into account, whereas PCA does not.

When we're talking about multiple psychological traits, these are
often correlated--so one often wishes to relax the requirement of
orthogonality.

John Uebersax

Magenta [EMAIL PROTECTED] wrote in message 
news:LIN77.634$[EMAIL PROTECTED]...

 Why a factor analysis and not a principal components analysis?  I've been
 taught
 that a principal components analysis makes fewer assumptions on the data, so
 assuming that one can perform a factor analysis then automatically one can
 also perform a principal components analysis.
 
 I think I have a preference for orthogonal rotations.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: likert scale items

2001-07-25 Thread John Uebersax

If your items are visually anchored so as to imply equal spacing,
like:

+++++
01234
  leastmost
 possiblepossible

then one might accept the data as interval-level, on the assumption
that respondents interpret them as such.

Also keep in mind that after you add responses on several items, minor
deviations of the response categories from being equally-spaced may
matter less.

In my substance abuse and personality research with teens, I have done
a lot of factor analysis on ordered-category response items.  One way
to avoid the assumption of equally-spaced categories (though
introducing an assumption of normally distributed traits) is to
perform factor analysis of polychoric correlation coefficients.

For more information on polychoric correlations and their factor
analysis, see:

http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm
http://ourworld.compuserve.com/homepages/jsuebersax/irt.htm

With my data, factor analysis produced mostly the same results
regardless of whether polychoric correlations or regular Pearson
correlations were used.

If you are concerned about creating scales by summing ordered-category
responses, there is the alternative of latent trait modeling.  See:

http://ourworld.compuserve.com/homepages/jsuebersax/lta.htm

and some of the links there.  Again, one often finds it makes little
or no practical difference.  Scale scores produced by simply adding
item responses and scores produced by more complex latent trait models
may correlate .99 or better with each other.

BTW, the original study you describe sounds so much like one I did
the analysis for that I wonder if they are the same.  You aren't by
any chance referring to a study done in Winston-Salem, North Carolina,
are you?

John Uebersax

Teen Assessment Project [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]...
 I am using a measure with likert scale items.  Original psychometrics
 for the measure
 included factor analysis to reduce the 100 variables to 20 composites.
 However, since the variables are not interval,  shouldn't non-parametic
 tests be done to determine group differences (by gender, age, income) on
 the variables?  Can I still use the composites...was it appropriate to
 do the original factor analysis on ordinal data?


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Alscal vs. NCSS

2001-07-16 Thread John Uebersax

As the other reply suggested, perhaps there is a problem with local
maxima.

Or maybe, since these are different programs, the commands in one case
were incorrect.  Why not run a metric MDS for comparison purposes? 
That might help you decide whether the Alscal or NCSS results are
suspect.

John Uebersax
[EMAIL PROTECTED]

[EMAIL PROTECTED] (Niko Tiliopoulos) wrote in message 
news:[EMAIL PROTECTED]...
 Dear all,
 
 I have two questions regarding MDS:
 
 1. I have run an NMDS through Alscal (SPSS) and NCSS, and the
 representations of the variables on a 2-dimensional map look
 completely different. As far as I can tell, I am using the same
 procedure in both algorhythms, so I cannot understand why I get
 different results, and which one I should prefer as more accurate.
 
 2. Does anyone know which of the following two stress indices should
 be used with data from psychometric instruments (e.g. personality
 questionnaire):
 
 Kruskal's or Guttman-Lingoes?
 
 Thank you
 
 Niko Tiliopoulos


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: How calculate 95%=1.96 stdv

2001-07-06 Thread John Uebersax

Jon Cryer [EMAIL PROTECTED] astutely noted an error in the formula
(below) that I gave for the standard normal cumulative density
function.  The integral, of course, should go from -infinity to z, not
from -infinity to +infinity (the latter integral will always equal 1).

I apologize for the error and thank Jon for pointing it out.

John Uebersax

John Uebersax wrote:
 
 +infinity  [-- should be z, not +infinity]
p = PHI(z) = INTEGRAL  phi(z)
-infinity
 
 where:
  z   =  standard normal deviate
   PHI(z) =  is the probability (p) of observing a score at or
 below z
   phi(z) =  is the formula for the standard normal curve:
 
 1/sqrt(2*pi) * exp(-z^2/2)  
 
 Note that PHI() and phi() -- (these mean the greek letters, upper-case
 and lower-case, respectively) are different.  PHI() is the cumulant of
 phi().


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: How calculate 95%=1.96 stdv

2001-07-05 Thread John Uebersax

Hi Stefan,

s.petersson [EMAIL PROTECTED] wrote in message 
news:XBE07.7641$[EMAIL PROTECTED]...

 Let's say I want to calculate this constant with a security level of
 93.4563, how do I do that? Basically I want to unfold a function like
 this:
 
 f(95)=1.96
 
 Where I can replace 95 with any number ranging from 0-100.

To Eric's reply I'd just add that use of a table is unnecessary. 
Especially in a computer program, it is easier to use a numerical
function to calculate the confidence interval.

The tables you've seen are for the cumulative probabilities of the
standard normal curve--otherwise known as the standard normal
cumulative density function (cdf).  The standard normal cdf is the
function:

+infinity 
   p = PHI(z) = INTEGRAL  phi(z)
   -infinity

where:
 z   =  standard normal deviate
  PHI(z) =  is the probability (p) of observing a score at or
below z
  phi(z) =  is the formula for the standard normal curve:

1/sqrt(2*pi) * exp(-z^2/2)  

Note that PHI() and phi() -- (these mean the greek letters, upper-case
and lower-case, respectively) are different.  PHI() is the cumulant of
phi().

With the function above, one supplies a value for z, and is given a
cumulative probability.

You seek the inverse function for PHI(), sometimes called the probit
function.  With the probit function, one supplies a value for p and
is returned the value of z such that the area under the standard
normal curve from -inf to z equals p.  (As Eric noted, you may need to
adjust p to handle issues of 1- vs 2-tailed intervals.)

Both the PHI() and probit() functions are well approximated in simple
applications (such as calculating confidence intervals) by simple
polynomial formulas of a few terms.  Some of these take as few as 2 or
3 lines of code.  A good reference for such approximations is:

Abramowitz, M., and I. A. Stegan, 1972: Handbook of Mathematical
Functions. Dover.

Hope this helps.

John Uebersax


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: factor analysis of dichotomous variables

2001-05-01 Thread John Uebersax

A list of such programs and discussion can be found at:

http://ourworld.compuserve.com/homepages/jsuebersax/binary.htm

The results of Knol  Berger (1991) and Parry  MacArdle (1991) 
(see above web page for citations) suggest that there is not much 
difference in results between the Muthen method and the simpler 
method of factoring tetrachoric correlations.  For additional 
information (including examples using PRELIS/LISREL and SAS) on 
factoring tetrachorics, see

http://ourworld.compuserve.com/homepages/jsuebersax/irt.htm 

Hope this helps.

John Uebersax


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: IRT/Rasch Modeling with SAS?

2001-03-13 Thread John Uebersax

Hi Lee,
 
If you go to my web page for Latent Trait and Item Response Theory (IRT)
Models,
 
http://ourworld.compuserve.com/homepages/jsuebersax/lta.htm
 
(please let me know if this link doesn't work)
 
that will point to several other pages that might help.  
 
 Then the IRT curve that I am looking for (something they call a 3-parameter
 logistic, which I think is not a 100% correct name) is described by the
 following function (best viewed in a fixed-width font):
 
A well-kept secret is that it is just as easy to estimate a probit
(cumulative gaussian) latent trait model.  The probit model is
theoretically more appropriate in many applications.
 
Of course, you will need to decide, if you haven't already, whether to
pursue a 1- 2- or 3-parameter model.
 
 find a reference that tells me exactly the recipe for finding it, but the
 best I can tell is that the algorithm would start with an initial guess for
 T, fit the curve parameters a, b, and c, then use this curve to re-estimate
 T. The process repeats until some convergence criterion is reached.
 
That's one approach.  Another is "brute force" optimization, where one
uses a general purpose optimization routine to (simultaneously) find the
set of paramter values that maximizes a given criterion--usually the
log-likelihood.
 
Here's a good book that covers the material without making things more
complicated than necessary:
 
Hulin, C. L., F. Drasgow, C. K. Parsons, Item Response Theory,
Homewood, Illinois, Dow Jones-Irwin, 1983.
 
I'd also recommend looking at some of Bock's work, such as:
 
Bock, R. D., and Aitkin, M. (1981). "Marginal Maximum Likelihood
Estimation of Item Parameters:  Application of an EM Algorithm,"
Psychometrika, 46, 443-459.
 
Of course, the "bibles" are still:
 
Lazarsfeld, P. F., and Henry, N. W. (1968), Latent Structure
Analysis, 2oston:  Houghton Mifflin.
 
Lord FM, Novick MR. (1968).  Statistical theories of mental test
scores.  Reading, Massachusetts:  Addison-Wesley.
 
 Does anyone know if SAS will do this?
 
One of my pages describes how to estimate a 2-parameter latent trait
model by factor-analyzing a matrix of tetrachoric correlations.  SAS
(via a macro available on the SAS site) can produce a matrix of
tetrachoric correlations.  And the matrix can be supplied to and
factored by PROC FACTOR.
 
This works pretty well for estimating the item paramters (slopes and
thresholds).  However if you also want to score respondents (i.e.,
estimate their latent trait levels) that takes a little more work (a
separate page on my site talks about this).
 
A 1-parameter Rasch model can be formulated as a loglinear model.
Therefore it might be possible to use say, PROC CATMOD or something like
that to estimate a Rasch model.
 
 I have found a piece of software
 that claims to fit "Rasch models", but the classical Rasch model is a
 one-parameter version of what I'm looking for (set b and c to zero, and
 you have a Rasch model).
 
Correct.  I prefer 2-parameter models, unless there is some theoretical
reason to expect a 1-parameter model (i.e., that all items have the same
correlation with the latent trait).
 
I maintain that the choice of logistic IRT vs probit IRT vs Rasch model
should be made based on the theoretical assumptions of each model and
the assumptions about your data.  For example, Rasch has a very nice
theory about how people answer test items that justifies use of
Rasch modeling.  (I don't necessarily agree with the model, but
it is interesting).  On the other hand, if you have a familiar:
 
manifest trait = latent trait + error
 
model, where error is (a) normally distributed, and (b) homoscedastic (
error variance not correlated with latent trait level), and where
one assumes discretizing thresholds that convert latent continuous
responses to observed binary responses, then a probit latent trait
model is appropriate.
 
 Plus, the software costs about $1000, and I don't have that to spare.
 The software (one called "BIGSTEPS" is the only one I can find that will
 deal with the 89,000 students I have to deal with) is not exactly
 "Microsoft Bob" in its ease of use.
 
Check my web site.  One page talks about software for estimating IRT and
Rasch models.  Personally, for Rasch models, I use MIRA or WINMIRA; for
IRT models I use my own programs for "discrete latent trait" modeling:
 
Heinen T. Latent class and discrete latent trait models:
Similarities and differences. Thousand Oaks, California: Sage, 1996.
 
I also have a FAQ on the Rasch model on the site, including information
specifically on Rasch software.
 
Hope this helps.
 
John Uebersax
[EMAIL PROTECTED]
http://ourworld.compuserve.com/homepages/jsuebersax
 
P.S.  The limiting factor on IRT software is usually the number of
items, rather than the number of subjects.


=
Instructions for joinin

Re: goodness of fit for mixture of multinomials

2001-01-17 Thread John Uebersax

Gimenez Olivier [EMAIL PROTECTED] wrote:
 
 ... we have three samples arising from three multinomials
 with the same number of cells. This can be represented as a table:

 n11 n12 ... n1k  (1)
 n21 n22 ... n2k  (2)
 n31 n32 ... n3k  (3)

 We would like to know whether the last sample (3) can be
 considered a mixture of (1) and (2).

 Some help would be appreciated, especially references.
 
If you know the mixing proportions with which (1) and (2) combine, a
simple approach would be:
 
1.  Convert (1) and (2) to expected probability distributions:
 
   p11 p12 ... p1k(4)
   p21 p22 ... p2k(5)
 
by dividing each nij by the appropriate row total.
 
2.  From the results, calculate a table of expected proportions
for the mixture,
 
   q1  q2  ...  qk
 
where
 
   q1 = r(p11) + (1 - r)(p21)
   q2 = r(p12) + (1 - r)(p22)
   ...
   qk = r(p1k) + (1 - r)(p2k)
 
and r, (1 - r) are the mixing proportions, with 0  r  1.
 
3.  Let N3 be the number of observations in (3) above.
Calculate expected frequencies e1, e2, ... ek as
 
   e1 = N3 q1
   e2 = N3 q2
   ...
   ek = N3 qk
 
4.  Compare the observed frequency distribution:
 
   n31  n32  ...  n3k
 
with the expected frequency distribution:
 
   e1   e2   ...  ek
 
using the likelihood ratio (LR) chi-squared test.  For large
samples, the statistic is distributed as approximately
chi-squared with k-1 df.  A nonsignificant result is consistent
with the hypothesis that (3) is a mixture of (1) and (2).
 
You can also use the Pearson chi-squared test to compare
the distributions.  It would also have k-1 df.
 
If you don't know the mixing proportion a priori, you would need to
estimate it.  The usual criterion is maximum likelihood--i.e., the value
of r that maximizes the likelihood of observing n31, n32, ..., n3k given
q1, q2, ..., q3.  However, the maximum likelihood value of r is the same
as the value that gives the lowest LR chi-squared; so you could just use
trial-and-error to test different values of r until you find the best
value.
 
If you estimate r, the df for the LR chi-squared test are k - 2.
 
For the formulas to calculate the LR and Pearson chi-squared statistics,
you could check:
 
 Bishop YMM, Fienberg SE, Holland PW.  Discrete multivariate
 analysis: theory and practice.  Cambridge, Massachusetts:  MIT
 Press, 1975
 
or any text on loglinear modeling, or one of Alan Agresti's books on
categorical data analysis.
 
--
John Uebersax
http://ourworld.compuserve.com/homepages/jsuebersax
[EMAIL PROTECTED]
 
 
 


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: OT: psychological test for recruitment in Statistics

2000-12-26 Thread John Uebersax

I've never heard of any statistician position requiring a psychological
test.  Even when I worked at the RAND Corporation, where the position
involved some degree of defense-related research, it was not required.
(Frankly, if a firm required such a test, I would take that as a sign
that it is not a place to consider working for.)

I would think that such tests present more problems that they solve.
For example, suppose a test suggests a person has a bipolar mental
disorder.  Would that be grounds not to consider them?  If so, might
the person have legal recourse, subce that psychiatric diagnosis might
legitimately be considered a medical disability.

IMHO, psychological tests in this case should not substitute for a
thorough interview and human judgment.

Just my .02 worth.

--
John Uebersax


In article 9211so$9kt$[EMAIL PROTECTED],
  T.S. Lim [EMAIL PROTECTED] wrote:
 My apology for posting an off-topic message.

 I was wondering if it's a common practice in Statistics to require job
 applicants to take a psychological test. At the MS/PhD level (in the
 US), I don't think it's common. However, some companies ask job
 applicants to take a test like the GRE Quantitative one.

 By a psychological test, I mean a test that attempts to probe
 applicants' "personality". It actually consists of several tests that
 may include drawing tests.

 Any idea which field uses such tests? Thanks in advance for any
pointer.

 --
 T.S. Lim
 [EMAIL PROTECTED]
 www.Recursive-Partitioning.com
 _
 Get paid to write reviews! http://recursive-partitioning.epinions.com

 Sent via Deja.com
 http://www.deja.com/



Sent via Deja.com
http://www.deja.com/


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: EdStat: Factoring tetrachoric matrix in SAS

2000-12-11 Thread John Uebersax

I think all the comments supplied by other posters are relevant.
Of course you should check to make sure that SAS is reading the input
matrix correctly, as was pointed out.  However, even assuming that you
did everything correctly I'm not surprised that SAS has a problem
factoring the matrix.  A correlation matrix composed of tetrachorics
may not be factorable--especially if there is a large number of items.
That can be remedied by "conditioning" the matrix.  For a discussion,
see the paper by Knol and Berger (the Parry  McArdle paper might also
talk about this):

Knol DL, Berger MP. Empirical comparison between factor analysis and
multidimensional item response models. Multivariate Behavioral
Research, 1991, 46, 457-477.

Parry CD, McArdle JJ. An applied comparison of methods for least-
squares factor analysis of dichotomous variables. Applied Psychological
Measurement, 1991, 15, 35-46.

Note that conditioning the matrix in this way is a completely "ad hoc"
procedure.

Hope this helps.

--
John Uebersax
[EMAIL PROTECTED]




Sent via Deja.com http://www.deja.com/
Before you buy.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistical Methods in Psychology Journals

2000-05-31 Thread John Uebersax

Yes, but garbage in, garbage out.  :)

--
John Uebersax
[EMAIL PROTECTED]


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Multidimensional Models IRT

2000-04-01 Thread John Uebersax

Based on more research, here are some updates and corrections to
my reply of yesterday -- John Uebersax
 
 
MULTIDIMENSIONAL LATENT TRAIT AND ITEM RESPONSE THEORY (IRT)
MODELS
 
As mentioned in yesterday's post, this does not include
information on logistic-ogive and Rasch-type multidimensional
latent trait/IRT models.
 
SOFTWARE
 
--
 
TESTFACT (D. T. Wilson, R. Wood, R. D. Gibbons)
 
Available from:
 
* Assessment Systems Corporation
* Scientific Software International
* ProGAMMA (Netherlands)
 
(see end of this section for distributor contact information)
 
With TESTFACT, the user can choose either factoring of tetrachoric
correlations or full-information maximum-likelihood estimation.
TESTFACT will calculate factor scores, which may be needed in some
applications.
 
The ProGAMMA site lists the latest version (TESTFACT 3), but
possibly the other distributors listed above also have the latest
version.
 
For an online description, check the ProGAMMA website
http://www.gamma.rug.nl , or http://www.assess.com/testfact.html
 
--
 
MicroFACT  (Niels G. Waller)
 
Available from:
 
* Assessment Systems Corporation
* ProGAMMA (Netherlands)
 
MicroFACT appears to work by factoring tetrachoric correlations.
For an online description, check the ProGAMMA website
http://www.gamma.rug.nl , or http://www.assess.com/MicroFACT.html
 
--
 
Mplus (Bengt and Linda Muthen)
 
Available from:
 
*  Muthen  Muthen
 
This possibly replaces the earlier program, LISCOMP, which
estimates the dichotomous/polytomous data factor analysis models
described by B. Muthen. (Mplus estimates a wide range of other
latent variable models as well.)
 
--
 
NOHARM (Colin Fraser)
 
NOHARM (Fraser, 198?) can be used to estimate unidimensional and
multidimensional latent trait (IRT) models.  For more information,
one might check with Jack McArdle at [EMAIL PROTECTED] .  He used
to have the program available by ftp.
 
--
 
PRELIS (Karl Joreskog and Dag Sorbom)
 
* Scientific Software International
* Assessment Systems Corporation
* ProGAMMA (Netherlands)
 
Will calculate tetrachoric and polychoric correlations.  These can
be output and factor-analyzed to estimate a unidimensional or
multidimensional latent trait/IRT model.
 
--
Software distributor contact information:
 
Assessment Systems Corporation
2233 University Ave, Suite 200
St. Paul, MN  55114
United States
Tel:   (651) 647-9220
Fax:   (651) 647-0412
Web:   http://www.assess.com
Email: [EMAIL PROTECTED]
 
Muthen  Muthen
11965 Venice Blvd, Suite 407
Los Angeles, CA  90066
United States
Tel:   (310) 391-9971, Toll Free (888) 814-9144
Fax:   (310) 391-8971
Web:   http://www.statmodel.com
Email: [EMAIL PROTECTED]
 
ProGAMMA bv
PO Box 841   (mailing address?)
9700 AV Groningen
Grote Rosensraat 15  (street address?)
9712 TG Groningen
Tel:   +31 50 3636900
Fax:   +31 50 3636687
Web:   http://www.gamma.rug.nl
Email: [EMAIL PROTECTED]
 
Scientific Software International
7383 N Lincoln Ave, Suite 100
Lincolnwood, IL  60712-1704
United States
Tel:   (800) 247-6113 or (847) 675-0720
Fax:   (847) 675-2140
Web:   http://www.ssicentral.com
Email: [EMAIL PROTECTED]
 
==
 
BIBLIOGRAPHY
 
Bartholomew, D. J.  Factor analysis for categorical data (with
discussion).  J Royal Statist Soc, B. 1980, 42, 293-321.
 
Bartholomew, D. J.  Latent variable models for ordered categorical
data.  Journal of Econometrics, 1983, 22, 229-243.
 
Bartholomew, D. J.  Latent variable models and factor analysis.
New York:  Oxford University Press, 1987.
 
Bock, R. D., and Aitkin, M.  Marginal maximum likelihood
estimation of item parameters:  Application of an EM algorithm.
Psychometrika, 1981, 46, 443-459.
 
Bock, R. D., Gibbons, R., and Muraki, E.  Full-information item
factor analysis.  Applied Psychological Measurement, 1988, 12,
261-280.
 
Christoffersson, A.  Factor analysis of dichotomized variables.
Psychometrika, 1975, 40, 5-32.
 
Fraser, C.  (19??).  NOHARM II:  A FORTRAN program for fitting
unidimensional and multidimensional normal ogive models of latent
trait theory.  Center for Behavioral Studies, the University of
New England, Armidale, NSW, Australia"
 
Fraser C, McDonald R. (1988).  [possibly another reference for
NOHARM]
 
Knol DL, Berger MP. (1991). Empirical comparison between factor
analysis and multidimensional item response models.  Multivariate
Behavioral Research, 26, 457-477
 
McDonald, R. P.  Linear versus non-linear models in item response
theory.  Applied Psychological Measurement, 1982, 6, 379-396.
 
McDonald, R. P.  Unidimensional and multidimens

Re: Weighted Kappa

2000-03-08 Thread John Uebersax

Your post makes it seem unclear that kappa is the right statistic.
 
Usually one uses kappa when each rater/clinician rates a sample of
patients or cases.  But you merely describe a questionnaire (sp?) that
each clinician completes.  Assuming each clinician completes the
questionnaire only one time (as opposed to, say, one time in relation to
each of a sample of patients), then I don't see that kappa is
appropriate.  Instead, one would use simpler statistics--such as
calculating the standard deviation, across clinicians, for each item.
 
You also raise the issue of many possible rater pairs--note, though,
that there are only (16 * 15) / 2 = 120 unique pairs that involve
different raters.  Rather than calculate 120 different kappa
coefficients, a simpler alternative might be to calculate the general
kappa that measures agreement between any two raters--considering all
raters simultaneously.  That is done with Fleiss' kappa (as opposed to
Cohen's kappa, which only applies for pairwise comparisons).  For a
discussion of the difference between these two types of kappa, see
Joseph Fleiss, Statistical Methods for Rates and Proportions, 1981.
 
--
John Uebersax
[EMAIL PROTECTED]


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Correlation - Constraints on Variables

2000-01-03 Thread John Uebersax

bkamen [EMAIL PROTECTED] wrote:
 
 This practical question arose between myself and a colleague at work.
 It concerns whether we can use correlation analysis if one of the
 variables is non-continuous or "categorical."
 ...
 I would appreciate clarification of any such constraints on the
 practical use of correlation analysis.
 
I'm assuming that by "discrete" you mean that X is constrained to take
certain discrete values (e.g., 0, 1, 2, 3), but that the values
themselves are either (a) valid interval-level data (e.g., a value of
2 is truly 1 unit more than a value of 1) or (b) ordinal (e.g., a
value of 2 necessarily means a greater level of the trait than a value
of 1).
 
My understanding of how this works is as follows:
 
If X is "discrete" in this way and Y is a usual continuous measure,
then the correlation r(X,Y) will tend to be constrained in magnitude.
For example, it might be difficult or impossible to obtain a
correlation of 1 or -1 in this situation.  In that sense, a test of
r(X, Y) would seem to be conservative in principle, which I believe
your message alluded to.
 
The question appears to be how this situation affects formal
significance testing.  I do not know for sure, but it would not
surprise me if the significance test assumes that both measures are
truly continuous.  If one is "discrete" (in the sense above), I don't
know how that affects the significance test.
 
However, there is an alternative.  You could consider use of the
biserial correlation (if X has only two values) or the polyserial
correlation (if X can have more than two values--as a practical
matter, this might only apply if the number of different X values is
relatively low, say less than 8 or 10; of course, if there are many X
values, then the impact of the "discreteness" may be relatively
little).
 
The biserial/polyserial correlation estimates the correlation you
would have obtained if both X and Y were truly continuous.  The main
assumption is that, fundamentally, the traits associated with X and Y
are normally distributed (and jointly distributed as bivariate
normal).  However, the biserial/polyserial correlation allows that one
of the variables has been "discretized."
 
You might want to consider this option.  For more information, you
could check Kendall  Stuart, "The Advanced Theory of Statistics."
 
Hope this helps.
 
John Uebersax
[EMAIL PROTECTED]