Re: REML for Dummies?
The Enclyclopedia of Biostatistics (Armitage P, Colton T; Wiley, 1999?) has an article on REML. I have not seen the article, but usually their articles well explain statistical concepts to non-statisticians. The Encyclopedia is a resource you might find helpful in general. For more info, see: http://www.wiley.co.uk/wileychi/eob/ John Uebersax, PhD (858) 597-5571 La Jolla, California (858) 625-0155 (fax) email: [EMAIL PROTECTED] Statistics: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Psychology: http://members.aol.com/spiritualpsych Dr Jonathan Newman [EMAIL PROTECTED] I'm trying to find a good introduction to REML (restricted maximum likelihood. = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: factor Analysis
A program like SAS or SPSS will calculate factor scores for you. A factor score is an estimated location of an object (not a variable) relative to a factor. If your factors are orthogonal, then you can plot each case using that case's score on Factor 1 and the score on Factor 2 as the X- and Y- coordinates of in a 2-dimensional space. I believe the formula for estimating factor scores of a common-factor model is not trvial (unless all communalities are 1). Therefore one might as well let the software calculate factor scores. The topic is well explained in the SAS manual (PROC FACTOR)--perhaps also in the SPSS manual. John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych Diet Fitness:http://members.aol.com/WeightControl101 Huxley [EMAIL PROTECTED] wrote in message news:a2u3sa$q3e$[EMAIL PROTECTED]... Hi, I've got a question. Does anyone know how to set object in 2-factor dimensional space ... I heard that factor score for a product is equal to product of the suitable factor loadings and variables mean. i.e. f(m,p)=a(1,m)u(1,p) +a(2,m)u(2,p)+ ...+a(j,m)u(j,p) where: f(m,d) - factor score for m-factor, p-th - consumer product , u(*) - mean for variable j and product p. Could you tell me is this true? How to proof this formally = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Measure of Association Question.
[EMAIL PROTECTED] (Petrus Nel) wrote in message news:000201c18fe2$f73aeee0$ed9e22c4@oemcomputer... I require some advice regarding the following: One set of variables is the grades obtained by students for different high school subjects (i.e. the symbols candidates obtained such as A, B, C, D, etc. for each subject). The other set of variables are the scores obtained for a college level subject (i.e. no symbols, just their percentages ... The grades obtained for their high school subjects were coded on the questionnaire as follows - 1=A, 2=B, 3=C, 4=D, 5=E, 6=F. ... How do I proceed? Simpler answer: First, change the coding to 1=F, 2=E, 3=D, 4=C, 5=B, 6=A. In the US at least there is no 'E'; if so, the correct coding would be 1=F, 2=D, 3=C, 4=B, 5=A. If the latter coding is used, calculate the Spearman rank correlation between the grade in a given high school course and the college score. If the former coding is used, you can use either the Pearson correlation or the Spearman rank correlation; the Pearson correlation would probably be better. More complex answer: The approach above ignores the fact that within each letter grade there is variation--e.g., all students who get a 'B' are not at the same level. Further, there is censoring at the upper end and lower ends of the scale--e.g., no matter how well a person does, the highest grade they can get is an 'A'. The polyserial correlation can account for this. The polyserial correlation estimates what the correlation of grade and score would be if grades were measured on a continuous scale. An assumption is that there is a bivariate normal distribution between (1) the continuous latent variable of which grade is a manifest representation and (2) the percentage score. The polyserial correlation is related to the polychoric correlation. For information about the polychoric correlation, see: http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm Drasgow F. Polychoric and polyserial correlations. In Kotz L, Johnson NL (Eds.), Encyclopedia of statistical sciences. Vol. 7 (pp. 69-74). New York: Wiley, 1988. I don't know if SPSS will calculate the polyserial correlation--the last I heard it did not. If not, the polyserial correlation can be calculated with the program PRELIS, which is distributed with LISREL. Many universities have copies of LISREL/PRELIS. If you are interested in comparing to see which high school classes best predict college scores, then, as a practical matter, I would expect you would draw the same conclusions regardless of whether you used the Pearson, the Spearman, or the polyserial correlation coefficients. Good luck! John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych Diet Fitness:http://members.aol.com/WeightControl101 = Instructions for joining and leaving this list, remarks about the problem of INAPPROPRIATE MESSAGES, and archives are available at http://jse.stat.ncsu.edu/ =
Re: Most Frequently Used Clustering Algorithm
Chia C Chong [EMAIL PROTECTED] wrote in message news:9t1qd9$k6m$[EMAIL PROTECTED]... I wonder which clustering algorithm is the most frequently used and maybe the most robust?? I intend to use some kind of clustering to identify two random variables in obervations I have got. Which is your goal: to find groups of similar objects (object cluster analysis), or to find groups of similar variables (variable cluster analysis)? John John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Good Book on Clustering Algorithm??
Chia C Chong [EMAIL PROTECTED] wrote in message news:9sk4p9$1e9$[EMAIL PROTECTED]... Any recommendation for books on Clustering Algorithm?? Two suggestions: Anderberg, M.R. (1973), Cluster Analysis for Applications, New York: Academic Press, Inc. Hartigan, J.A. (1975), Clustering Algorithms, New York: John Wiley Sons, Inc. John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: PCA source code
Per Kallblad [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... Hi, I am looking for high-quality source code (f77, f90 or C) to perform Principal Component Analysis (PCA). I would be most grateful for information on where to find such code. You can find PCA code in f77 and C at Fionn Murtagh's Multivariate Data Analysis Software and Resources Page: http://astro.u-strasbg.fr/~fmurtagh/mda-sw/ Hope this helps. John Uebersax John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Definitions of Likert scale, Likert item, etc.
A recent question made me realize the extent of ambiguity in the use of Likert scale and related terms. I'd like to see things be more clear. Here are my thoughts (I don't claim they are correct; they're just a starting point for discussion). Concise responses are encouraged. If there are enough, I'll post a summary. 1. Likert scaling strictly refers to the scaling method developed by Likert in the 1930's. If refers entire process of scaling a set of many items (i.e., as an alternative to Thurstone scaling). One step of this is administering many items to individuals. Each item has integer-labeled rating levels. Likert used the method only for attitude measurement, and with response categories indicating levels of agreement to specific statements, like: I believe the work week should be reduced to 32 hours. 1. strongly disagree 2. mildly disagree 3. neither agree nor disagree 4. mildly agree 5. strongly agree 2. A Likert scale, strictly speaking, refers to a set of many such items. 3. I do not know if Likert also used a visual analog format such as: neither strongly mildly agree normildly strongly disagree disagree disagree agree agree 1 2 3 4 5 +-+--+--+-+ 4. It seems reasonable to refer to a single such item as a Likert item. However, many people seem to refer to a single item of this type as a Likert scale; that would seem to invite confusion, as Likert's original intent was to produce a scale compused of many such items. 5. Many researchers use such items outside the area of attitude measurement; it seems reasonable to refer to such items as Likert-type items, to distinguish them from strict Likert items as described above. If anyone has any definitive references that clarify this, I would greatly appreciate learning of them. John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Factor analysis - which package is best for Windows?
Thanks for the tip on KyPlot. It does seem very nice. Two questions: 1. As best I can tell, the Factor Analysis routines work off a correlation or covariance matrix. At least from a perusal of the Help index, I can't see how to run Factor Analysis from raw data, or to calculate a correlation/covariance matrix from raw data (short of applying matrix manipulations). Is there a way to produce a corr/cov matrix within KyPlot? 2. Does anyone know the current homepage for KyPlot? Thanks John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych [EMAIL PROTECTED] (Richard Wright) wrote in message news:[EMAIL PROTECTED]... KyPlot runs under Windows, is freeware and gives you several factor analysis algorithms to choose from. http://www.rocketdownload.com/Details/Math/kyplot.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: MDS, the radex, and indices of multidimensionality agreement
[EMAIL PROTECTED] (Niko Tiliopoulos) wrote in message news:[EMAIL PROTECTED]... Q1. I have run a multidimensional scaling analysis (MDS) and the 2D-map suggests that the variables are arranged in a circular-like fashion. I have found a paper that presents a 2D-map showing a similar arrangement. Louis Guttman did work on circular MDS structures in the '70s. If the paper you refer to is not one of his, you might look at some of Guttman's work. Q2. I have also run a factor analysis on the same dataset, and I would like to compare the level of agreement between the FA factors and the MDS dimensions. There is a mathematical identity between Euclidean metric MDS and principle components analysis of Pearson correlations. The solutions are the same, I believe, except for a scaling of individual dimensions/components and perhaps rotation. This is possibly described in Torgerson WS (1958) Theory and Methods of Scaling. More generally, you could perform a canonical correlation analysis between the two solutions, and measure agreement with the R^2. Another possibility is to calculate the Pearson correlation between all pairwise distances between all points in the MDS solution with the corresponding pairwise distances in the factor analysis solution; I believe the statistical significance of such a correlation is not correct (because data points are not independent) but the r^2 is still a measure of the proportion of variance in one structure explained by the other. Hope this helps. John John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax Existential Psych: http://members.aol.com/spiritualpsych = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Venn diagram program?
No I had more in mind: 1. The argument room and perhaps: 2. Well I didn't expect the Spanish Inquisition It's like asking a question like, Excuse me, can you tell me how to get to First and Main Street, and getting 5 replies like Oh come now, why would anybody want to go to First and Main Street? [EMAIL PROTECTED] (Robert J. MacG. Dawson) wrote in message news:[EMAIL PROTECTED]... Thanks Alan for the constructive reply. The others so far remind me of a Monty Python routine. Let me guess - the one in which the film producer fires everybody who comments on his idea? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Venn diagram program?
Thanks Alan for the constructive reply. The others so far remind me of a Monty Python routine. Yes, I am using Powerpoint now. It's harder than it sounds, because one must calculate the radius' that give appropriately scaled circle areas; and one can only guess how close to move the circles to give the correct overlap area. John [EMAIL PROTECTED] (Alan McLean) wrote in message news:[EMAIL PROTECTED]... You can draw Venn diagrams very easily in Powerpoint using the ellipse/circle and box/rectangle tools. Draw the diagram, group all the bits together, and copy it into Word or whatever. John Uebersax, PhD (805) 384-7688 Thousand Oaks, California (805) 383-1726 (fax) email: [EMAIL PROTECTED] Existential Psych: http://members.aol.com/spiritualpsych Agreement Stats: http://ourworld.compuserve.com/homepages/jsuebersax/agree.htm Latent Structure: http://ourworld.compuserve.com/homepages/jsuebersax = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items - why not PCA?
The common factor model is compatible with the idea that you have unobserved constructs that you wish to estimate using item responses. The constructs are presumed measured with error. A common factor model takes this error into account, whereas PCA does not. When we're talking about multiple psychological traits, these are often correlated--so one often wishes to relax the requirement of orthogonality. John Uebersax Magenta [EMAIL PROTECTED] wrote in message news:LIN77.634$[EMAIL PROTECTED]... Why a factor analysis and not a principal components analysis? I've been taught that a principal components analysis makes fewer assumptions on the data, so assuming that one can perform a factor analysis then automatically one can also perform a principal components analysis. I think I have a preference for orthogonal rotations. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: likert scale items
If your items are visually anchored so as to imply equal spacing, like: +++++ 01234 leastmost possiblepossible then one might accept the data as interval-level, on the assumption that respondents interpret them as such. Also keep in mind that after you add responses on several items, minor deviations of the response categories from being equally-spaced may matter less. In my substance abuse and personality research with teens, I have done a lot of factor analysis on ordered-category response items. One way to avoid the assumption of equally-spaced categories (though introducing an assumption of normally distributed traits) is to perform factor analysis of polychoric correlation coefficients. For more information on polychoric correlations and their factor analysis, see: http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm http://ourworld.compuserve.com/homepages/jsuebersax/irt.htm With my data, factor analysis produced mostly the same results regardless of whether polychoric correlations or regular Pearson correlations were used. If you are concerned about creating scales by summing ordered-category responses, there is the alternative of latent trait modeling. See: http://ourworld.compuserve.com/homepages/jsuebersax/lta.htm and some of the links there. Again, one often finds it makes little or no practical difference. Scale scores produced by simply adding item responses and scores produced by more complex latent trait models may correlate .99 or better with each other. BTW, the original study you describe sounds so much like one I did the analysis for that I wonder if they are the same. You aren't by any chance referring to a study done in Winston-Salem, North Carolina, are you? John Uebersax Teen Assessment Project [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED]... I am using a measure with likert scale items. Original psychometrics for the measure included factor analysis to reduce the 100 variables to 20 composites. However, since the variables are not interval, shouldn't non-parametic tests be done to determine group differences (by gender, age, income) on the variables? Can I still use the composites...was it appropriate to do the original factor analysis on ordinal data? = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Alscal vs. NCSS
As the other reply suggested, perhaps there is a problem with local maxima. Or maybe, since these are different programs, the commands in one case were incorrect. Why not run a metric MDS for comparison purposes? That might help you decide whether the Alscal or NCSS results are suspect. John Uebersax [EMAIL PROTECTED] [EMAIL PROTECTED] (Niko Tiliopoulos) wrote in message news:[EMAIL PROTECTED]... Dear all, I have two questions regarding MDS: 1. I have run an NMDS through Alscal (SPSS) and NCSS, and the representations of the variables on a 2-dimensional map look completely different. As far as I can tell, I am using the same procedure in both algorhythms, so I cannot understand why I get different results, and which one I should prefer as more accurate. 2. Does anyone know which of the following two stress indices should be used with data from psychometric instruments (e.g. personality questionnaire): Kruskal's or Guttman-Lingoes? Thank you Niko Tiliopoulos = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: How calculate 95%=1.96 stdv
Jon Cryer [EMAIL PROTECTED] astutely noted an error in the formula (below) that I gave for the standard normal cumulative density function. The integral, of course, should go from -infinity to z, not from -infinity to +infinity (the latter integral will always equal 1). I apologize for the error and thank Jon for pointing it out. John Uebersax John Uebersax wrote: +infinity [-- should be z, not +infinity] p = PHI(z) = INTEGRAL phi(z) -infinity where: z = standard normal deviate PHI(z) = is the probability (p) of observing a score at or below z phi(z) = is the formula for the standard normal curve: 1/sqrt(2*pi) * exp(-z^2/2) Note that PHI() and phi() -- (these mean the greek letters, upper-case and lower-case, respectively) are different. PHI() is the cumulant of phi(). = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: How calculate 95%=1.96 stdv
Hi Stefan, s.petersson [EMAIL PROTECTED] wrote in message news:XBE07.7641$[EMAIL PROTECTED]... Let's say I want to calculate this constant with a security level of 93.4563, how do I do that? Basically I want to unfold a function like this: f(95)=1.96 Where I can replace 95 with any number ranging from 0-100. To Eric's reply I'd just add that use of a table is unnecessary. Especially in a computer program, it is easier to use a numerical function to calculate the confidence interval. The tables you've seen are for the cumulative probabilities of the standard normal curve--otherwise known as the standard normal cumulative density function (cdf). The standard normal cdf is the function: +infinity p = PHI(z) = INTEGRAL phi(z) -infinity where: z = standard normal deviate PHI(z) = is the probability (p) of observing a score at or below z phi(z) = is the formula for the standard normal curve: 1/sqrt(2*pi) * exp(-z^2/2) Note that PHI() and phi() -- (these mean the greek letters, upper-case and lower-case, respectively) are different. PHI() is the cumulant of phi(). With the function above, one supplies a value for z, and is given a cumulative probability. You seek the inverse function for PHI(), sometimes called the probit function. With the probit function, one supplies a value for p and is returned the value of z such that the area under the standard normal curve from -inf to z equals p. (As Eric noted, you may need to adjust p to handle issues of 1- vs 2-tailed intervals.) Both the PHI() and probit() functions are well approximated in simple applications (such as calculating confidence intervals) by simple polynomial formulas of a few terms. Some of these take as few as 2 or 3 lines of code. A good reference for such approximations is: Abramowitz, M., and I. A. Stegan, 1972: Handbook of Mathematical Functions. Dover. Hope this helps. John Uebersax = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: factor analysis of dichotomous variables
A list of such programs and discussion can be found at: http://ourworld.compuserve.com/homepages/jsuebersax/binary.htm The results of Knol Berger (1991) and Parry MacArdle (1991) (see above web page for citations) suggest that there is not much difference in results between the Muthen method and the simpler method of factoring tetrachoric correlations. For additional information (including examples using PRELIS/LISREL and SAS) on factoring tetrachorics, see http://ourworld.compuserve.com/homepages/jsuebersax/irt.htm Hope this helps. John Uebersax = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: IRT/Rasch Modeling with SAS?
Hi Lee, If you go to my web page for Latent Trait and Item Response Theory (IRT) Models, http://ourworld.compuserve.com/homepages/jsuebersax/lta.htm (please let me know if this link doesn't work) that will point to several other pages that might help. Then the IRT curve that I am looking for (something they call a 3-parameter logistic, which I think is not a 100% correct name) is described by the following function (best viewed in a fixed-width font): A well-kept secret is that it is just as easy to estimate a probit (cumulative gaussian) latent trait model. The probit model is theoretically more appropriate in many applications. Of course, you will need to decide, if you haven't already, whether to pursue a 1- 2- or 3-parameter model. find a reference that tells me exactly the recipe for finding it, but the best I can tell is that the algorithm would start with an initial guess for T, fit the curve parameters a, b, and c, then use this curve to re-estimate T. The process repeats until some convergence criterion is reached. That's one approach. Another is "brute force" optimization, where one uses a general purpose optimization routine to (simultaneously) find the set of paramter values that maximizes a given criterion--usually the log-likelihood. Here's a good book that covers the material without making things more complicated than necessary: Hulin, C. L., F. Drasgow, C. K. Parsons, Item Response Theory, Homewood, Illinois, Dow Jones-Irwin, 1983. I'd also recommend looking at some of Bock's work, such as: Bock, R. D., and Aitkin, M. (1981). "Marginal Maximum Likelihood Estimation of Item Parameters: Application of an EM Algorithm," Psychometrika, 46, 443-459. Of course, the "bibles" are still: Lazarsfeld, P. F., and Henry, N. W. (1968), Latent Structure Analysis, 2oston: Houghton Mifflin. Lord FM, Novick MR. (1968). Statistical theories of mental test scores. Reading, Massachusetts: Addison-Wesley. Does anyone know if SAS will do this? One of my pages describes how to estimate a 2-parameter latent trait model by factor-analyzing a matrix of tetrachoric correlations. SAS (via a macro available on the SAS site) can produce a matrix of tetrachoric correlations. And the matrix can be supplied to and factored by PROC FACTOR. This works pretty well for estimating the item paramters (slopes and thresholds). However if you also want to score respondents (i.e., estimate their latent trait levels) that takes a little more work (a separate page on my site talks about this). A 1-parameter Rasch model can be formulated as a loglinear model. Therefore it might be possible to use say, PROC CATMOD or something like that to estimate a Rasch model. I have found a piece of software that claims to fit "Rasch models", but the classical Rasch model is a one-parameter version of what I'm looking for (set b and c to zero, and you have a Rasch model). Correct. I prefer 2-parameter models, unless there is some theoretical reason to expect a 1-parameter model (i.e., that all items have the same correlation with the latent trait). I maintain that the choice of logistic IRT vs probit IRT vs Rasch model should be made based on the theoretical assumptions of each model and the assumptions about your data. For example, Rasch has a very nice theory about how people answer test items that justifies use of Rasch modeling. (I don't necessarily agree with the model, but it is interesting). On the other hand, if you have a familiar: manifest trait = latent trait + error model, where error is (a) normally distributed, and (b) homoscedastic ( error variance not correlated with latent trait level), and where one assumes discretizing thresholds that convert latent continuous responses to observed binary responses, then a probit latent trait model is appropriate. Plus, the software costs about $1000, and I don't have that to spare. The software (one called "BIGSTEPS" is the only one I can find that will deal with the 89,000 students I have to deal with) is not exactly "Microsoft Bob" in its ease of use. Check my web site. One page talks about software for estimating IRT and Rasch models. Personally, for Rasch models, I use MIRA or WINMIRA; for IRT models I use my own programs for "discrete latent trait" modeling: Heinen T. Latent class and discrete latent trait models: Similarities and differences. Thousand Oaks, California: Sage, 1996. I also have a FAQ on the Rasch model on the site, including information specifically on Rasch software. Hope this helps. John Uebersax [EMAIL PROTECTED] http://ourworld.compuserve.com/homepages/jsuebersax P.S. The limiting factor on IRT software is usually the number of items, rather than the number of subjects. = Instructions for joinin
Re: goodness of fit for mixture of multinomials
Gimenez Olivier [EMAIL PROTECTED] wrote: ... we have three samples arising from three multinomials with the same number of cells. This can be represented as a table: n11 n12 ... n1k (1) n21 n22 ... n2k (2) n31 n32 ... n3k (3) We would like to know whether the last sample (3) can be considered a mixture of (1) and (2). Some help would be appreciated, especially references. If you know the mixing proportions with which (1) and (2) combine, a simple approach would be: 1. Convert (1) and (2) to expected probability distributions: p11 p12 ... p1k(4) p21 p22 ... p2k(5) by dividing each nij by the appropriate row total. 2. From the results, calculate a table of expected proportions for the mixture, q1 q2 ... qk where q1 = r(p11) + (1 - r)(p21) q2 = r(p12) + (1 - r)(p22) ... qk = r(p1k) + (1 - r)(p2k) and r, (1 - r) are the mixing proportions, with 0 r 1. 3. Let N3 be the number of observations in (3) above. Calculate expected frequencies e1, e2, ... ek as e1 = N3 q1 e2 = N3 q2 ... ek = N3 qk 4. Compare the observed frequency distribution: n31 n32 ... n3k with the expected frequency distribution: e1 e2 ... ek using the likelihood ratio (LR) chi-squared test. For large samples, the statistic is distributed as approximately chi-squared with k-1 df. A nonsignificant result is consistent with the hypothesis that (3) is a mixture of (1) and (2). You can also use the Pearson chi-squared test to compare the distributions. It would also have k-1 df. If you don't know the mixing proportion a priori, you would need to estimate it. The usual criterion is maximum likelihood--i.e., the value of r that maximizes the likelihood of observing n31, n32, ..., n3k given q1, q2, ..., q3. However, the maximum likelihood value of r is the same as the value that gives the lowest LR chi-squared; so you could just use trial-and-error to test different values of r until you find the best value. If you estimate r, the df for the LR chi-squared test are k - 2. For the formulas to calculate the LR and Pearson chi-squared statistics, you could check: Bishop YMM, Fienberg SE, Holland PW. Discrete multivariate analysis: theory and practice. Cambridge, Massachusetts: MIT Press, 1975 or any text on loglinear modeling, or one of Alan Agresti's books on categorical data analysis. -- John Uebersax http://ourworld.compuserve.com/homepages/jsuebersax [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: OT: psychological test for recruitment in Statistics
I've never heard of any statistician position requiring a psychological test. Even when I worked at the RAND Corporation, where the position involved some degree of defense-related research, it was not required. (Frankly, if a firm required such a test, I would take that as a sign that it is not a place to consider working for.) I would think that such tests present more problems that they solve. For example, suppose a test suggests a person has a bipolar mental disorder. Would that be grounds not to consider them? If so, might the person have legal recourse, subce that psychiatric diagnosis might legitimately be considered a medical disability. IMHO, psychological tests in this case should not substitute for a thorough interview and human judgment. Just my .02 worth. -- John Uebersax In article 9211so$9kt$[EMAIL PROTECTED], T.S. Lim [EMAIL PROTECTED] wrote: My apology for posting an off-topic message. I was wondering if it's a common practice in Statistics to require job applicants to take a psychological test. At the MS/PhD level (in the US), I don't think it's common. However, some companies ask job applicants to take a test like the GRE Quantitative one. By a psychological test, I mean a test that attempts to probe applicants' "personality". It actually consists of several tests that may include drawing tests. Any idea which field uses such tests? Thanks in advance for any pointer. -- T.S. Lim [EMAIL PROTECTED] www.Recursive-Partitioning.com _ Get paid to write reviews! http://recursive-partitioning.epinions.com Sent via Deja.com http://www.deja.com/ Sent via Deja.com http://www.deja.com/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: EdStat: Factoring tetrachoric matrix in SAS
I think all the comments supplied by other posters are relevant. Of course you should check to make sure that SAS is reading the input matrix correctly, as was pointed out. However, even assuming that you did everything correctly I'm not surprised that SAS has a problem factoring the matrix. A correlation matrix composed of tetrachorics may not be factorable--especially if there is a large number of items. That can be remedied by "conditioning" the matrix. For a discussion, see the paper by Knol and Berger (the Parry McArdle paper might also talk about this): Knol DL, Berger MP. Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 1991, 46, 457-477. Parry CD, McArdle JJ. An applied comparison of methods for least- squares factor analysis of dichotomous variables. Applied Psychological Measurement, 1991, 15, 35-46. Note that conditioning the matrix in this way is a completely "ad hoc" procedure. Hope this helps. -- John Uebersax [EMAIL PROTECTED] Sent via Deja.com http://www.deja.com/ Before you buy. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: Statistical Methods in Psychology Journals
Yes, but garbage in, garbage out. :) -- John Uebersax [EMAIL PROTECTED] === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Multidimensional Models IRT
Based on more research, here are some updates and corrections to my reply of yesterday -- John Uebersax MULTIDIMENSIONAL LATENT TRAIT AND ITEM RESPONSE THEORY (IRT) MODELS As mentioned in yesterday's post, this does not include information on logistic-ogive and Rasch-type multidimensional latent trait/IRT models. SOFTWARE -- TESTFACT (D. T. Wilson, R. Wood, R. D. Gibbons) Available from: * Assessment Systems Corporation * Scientific Software International * ProGAMMA (Netherlands) (see end of this section for distributor contact information) With TESTFACT, the user can choose either factoring of tetrachoric correlations or full-information maximum-likelihood estimation. TESTFACT will calculate factor scores, which may be needed in some applications. The ProGAMMA site lists the latest version (TESTFACT 3), but possibly the other distributors listed above also have the latest version. For an online description, check the ProGAMMA website http://www.gamma.rug.nl , or http://www.assess.com/testfact.html -- MicroFACT (Niels G. Waller) Available from: * Assessment Systems Corporation * ProGAMMA (Netherlands) MicroFACT appears to work by factoring tetrachoric correlations. For an online description, check the ProGAMMA website http://www.gamma.rug.nl , or http://www.assess.com/MicroFACT.html -- Mplus (Bengt and Linda Muthen) Available from: * Muthen Muthen This possibly replaces the earlier program, LISCOMP, which estimates the dichotomous/polytomous data factor analysis models described by B. Muthen. (Mplus estimates a wide range of other latent variable models as well.) -- NOHARM (Colin Fraser) NOHARM (Fraser, 198?) can be used to estimate unidimensional and multidimensional latent trait (IRT) models. For more information, one might check with Jack McArdle at [EMAIL PROTECTED] . He used to have the program available by ftp. -- PRELIS (Karl Joreskog and Dag Sorbom) * Scientific Software International * Assessment Systems Corporation * ProGAMMA (Netherlands) Will calculate tetrachoric and polychoric correlations. These can be output and factor-analyzed to estimate a unidimensional or multidimensional latent trait/IRT model. -- Software distributor contact information: Assessment Systems Corporation 2233 University Ave, Suite 200 St. Paul, MN 55114 United States Tel: (651) 647-9220 Fax: (651) 647-0412 Web: http://www.assess.com Email: [EMAIL PROTECTED] Muthen Muthen 11965 Venice Blvd, Suite 407 Los Angeles, CA 90066 United States Tel: (310) 391-9971, Toll Free (888) 814-9144 Fax: (310) 391-8971 Web: http://www.statmodel.com Email: [EMAIL PROTECTED] ProGAMMA bv PO Box 841 (mailing address?) 9700 AV Groningen Grote Rosensraat 15 (street address?) 9712 TG Groningen Tel: +31 50 3636900 Fax: +31 50 3636687 Web: http://www.gamma.rug.nl Email: [EMAIL PROTECTED] Scientific Software International 7383 N Lincoln Ave, Suite 100 Lincolnwood, IL 60712-1704 United States Tel: (800) 247-6113 or (847) 675-0720 Fax: (847) 675-2140 Web: http://www.ssicentral.com Email: [EMAIL PROTECTED] == BIBLIOGRAPHY Bartholomew, D. J. Factor analysis for categorical data (with discussion). J Royal Statist Soc, B. 1980, 42, 293-321. Bartholomew, D. J. Latent variable models for ordered categorical data. Journal of Econometrics, 1983, 22, 229-243. Bartholomew, D. J. Latent variable models and factor analysis. New York: Oxford University Press, 1987. Bock, R. D., and Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 1981, 46, 443-459. Bock, R. D., Gibbons, R., and Muraki, E. Full-information item factor analysis. Applied Psychological Measurement, 1988, 12, 261-280. Christoffersson, A. Factor analysis of dichotomized variables. Psychometrika, 1975, 40, 5-32. Fraser, C. (19??). NOHARM II: A FORTRAN program for fitting unidimensional and multidimensional normal ogive models of latent trait theory. Center for Behavioral Studies, the University of New England, Armidale, NSW, Australia" Fraser C, McDonald R. (1988). [possibly another reference for NOHARM] Knol DL, Berger MP. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26, 457-477 McDonald, R. P. Linear versus non-linear models in item response theory. Applied Psychological Measurement, 1982, 6, 379-396. McDonald, R. P. Unidimensional and multidimens
Re: Weighted Kappa
Your post makes it seem unclear that kappa is the right statistic. Usually one uses kappa when each rater/clinician rates a sample of patients or cases. But you merely describe a questionnaire (sp?) that each clinician completes. Assuming each clinician completes the questionnaire only one time (as opposed to, say, one time in relation to each of a sample of patients), then I don't see that kappa is appropriate. Instead, one would use simpler statistics--such as calculating the standard deviation, across clinicians, for each item. You also raise the issue of many possible rater pairs--note, though, that there are only (16 * 15) / 2 = 120 unique pairs that involve different raters. Rather than calculate 120 different kappa coefficients, a simpler alternative might be to calculate the general kappa that measures agreement between any two raters--considering all raters simultaneously. That is done with Fleiss' kappa (as opposed to Cohen's kappa, which only applies for pairwise comparisons). For a discussion of the difference between these two types of kappa, see Joseph Fleiss, Statistical Methods for Rates and Proportions, 1981. -- John Uebersax [EMAIL PROTECTED] === This list is open to everyone. Occasionally, less thoughtful people send inappropriate messages. Please DO NOT COMPLAIN TO THE POSTMASTER about these messages because the postmaster has no way of controlling them, and excessive complaints will result in termination of the list. For information about this list, including information about the problem of inappropriate messages and information about how to unsubscribe, please see the web page at http://jse.stat.ncsu.edu/ ===
Re: Correlation - Constraints on Variables
bkamen [EMAIL PROTECTED] wrote: This practical question arose between myself and a colleague at work. It concerns whether we can use correlation analysis if one of the variables is non-continuous or "categorical." ... I would appreciate clarification of any such constraints on the practical use of correlation analysis. I'm assuming that by "discrete" you mean that X is constrained to take certain discrete values (e.g., 0, 1, 2, 3), but that the values themselves are either (a) valid interval-level data (e.g., a value of 2 is truly 1 unit more than a value of 1) or (b) ordinal (e.g., a value of 2 necessarily means a greater level of the trait than a value of 1). My understanding of how this works is as follows: If X is "discrete" in this way and Y is a usual continuous measure, then the correlation r(X,Y) will tend to be constrained in magnitude. For example, it might be difficult or impossible to obtain a correlation of 1 or -1 in this situation. In that sense, a test of r(X, Y) would seem to be conservative in principle, which I believe your message alluded to. The question appears to be how this situation affects formal significance testing. I do not know for sure, but it would not surprise me if the significance test assumes that both measures are truly continuous. If one is "discrete" (in the sense above), I don't know how that affects the significance test. However, there is an alternative. You could consider use of the biserial correlation (if X has only two values) or the polyserial correlation (if X can have more than two values--as a practical matter, this might only apply if the number of different X values is relatively low, say less than 8 or 10; of course, if there are many X values, then the impact of the "discreteness" may be relatively little). The biserial/polyserial correlation estimates the correlation you would have obtained if both X and Y were truly continuous. The main assumption is that, fundamentally, the traits associated with X and Y are normally distributed (and jointly distributed as bivariate normal). However, the biserial/polyserial correlation allows that one of the variables has been "discretized." You might want to consider this option. For more information, you could check Kendall Stuart, "The Advanced Theory of Statistics." Hope this helps. John Uebersax [EMAIL PROTECTED]