Setting bounds for repeatability of scores
On 11 August in a thread about -How to determine adequate samples Ken Mintz [EMAIL PROTECTED] wrote: The std err (se) around the mean is given by the formula: se = sd / sqrt(n) (68% conf) se = (1.96*sd) / sqrt(n) (95% conf) se = (2.58*sd) / sqrt(n) (99% conf) where sd is the std dev. Suppose you want to be 99% that the population avg is within +/-3% of the sample avg. Then, se = 0.03*avg. (We choose 3% or whatever arbitrarily.) Then the minimum sample size (n) is: x = (0.03*avg) / (2.58*sd) n = x*x My question is - Is it sensible to use this formula to do something else? Situation: 35 'examiners' award a percentage score for the performance of examinees. In the course of a year each examiner will see about 500 examinees, perhaps 30 or so in a single session. I'm using Microsoft Access Database to *enter* the scores awarded (not to analyse it! - that's to be done using Minitab). I want to set up a 'rule' in Access that indicates that there is less than 99% certainty that a session is +/- 3% (or similar arbitrary cut off points) from a 'gold standard'. (e.g. to compare the session mean for a single examiner with - A the grand mean for all that examiners scores and - B, the session mean for that single examiner with the grand mean for the population of examiners. The basic question is - Are A B within the +/- 3% bounds? If yes = accept, if no = check and adjust). Help and advice appreciated. Jonathan Robbins J H Robbins FRSA FRPS posting from Dorset in the UK. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: large N, categorical outcomes, significance?
One approach: (I assume that by residual you mean (O-E)/sqrt(E) for each cell of a two-way frequency table, where O=observed frequency and E=expected frequency under the null hypothesis). For the several (or the single) largest residual(s), report O and E as proportions (of total N). Express the residual in terms of proportions, which will turn out to include N (or its square root) as a factor. Show that the residual can be whatever it was (105.6, say) only if N is as large as it is in your dataset, and that the same proportions for some smaller (more reasonable?) N would _not_ produce a significant residual. For purposes of this exercise, you could express the total chi-square in terms of proportions and N, and show that for the observed proportions only values of N larger than some value would produce a significant result; or you could take, for any single cell, a critical value for chi-square with one d.f. (One could argue for d.f. = (r-1)(c-1)/(rc), since the table has rc cells but only (r-1)(c-1) d.f., but 1 d.f. is arguably conservative, and finding critical values for fractional d.f. may be difficult.) On 17 Aug 2001, JDriscoll wrote: I have a large dataset (N can be 2,000-9,000) with mostly categorical outcome variables. Any chi square is significant with residuals of 100+ for tiny differences. I know one can determine effect size for continuous variables and show result is sign only due to size of the N, but...how do I do this for categorical outcome variables? Thanks! Donald F. Burrill [EMAIL PROTECTED] 184 Nashua Road, Bedford, NH 03110 603-471-7128 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Min n CFA clarification
Let me clarify my previous posting. I want to do a confirmatory factor analysis to validate a questionnaire. There are 45 questions (subjects answer using a 1-5 scale). Theoretically, there are 3 subscales with 15 items on each. In a CFA, that gives me 3 factors, 45 error terms, 14 factor loadings on each of 3 factors, and 3 covariances. I figure that gives me 93 parameters. That's the part I need somebody to verify for me. Have I counted the number of parameters correctly? If so, then I should have at least a 1:10 and at best a 1:20 ratio of subjects to parameters. Hence, the estimate of 930-1860 subjects. Is this correct? Thanks to all who are helping me with this. Marianne = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
bootstrap hypothesis testing
Hi: I am new to the list and have a question about bootstrap hypothesis testing. I am testing the equality of two means according to Algorithm 16.2 in An Introduction to the Bootstrap by Efron and Tibshirani (1993). They define the estimated ASL as #{t(x*b) = tobs}/B. It seems to me that this is a one sided estimated ASL. I can easily determine the significance or lack or significance by changing the order of the means I subtract. My question is why is the ASL defined as =? Why would I not wish to examine both ends of the null distribution? Thanks for your help. Joe Joe Horton Psychology and Social Sciences Department 7373 Admiral Peary Highway Mount Aloysius College Cresson, PA 16630 (814) 886-6437 [EMAIL PROTECTED] = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
canonical R and Mancova
re previous discussion My old computer program MANOVA has a built in test of parallelism in multivariate ANCOVA. It's really standard multivariate regression theory although it isn't widely known. (TW Anderson gave MANOVA and CanR as two different eigenproblems). They are easily shown to be equivalent with the same vectors and eigenvalues related by R^2 = L/(1+L) The basic theory is discussed in Bock's multivariate text. Multivariate regression IS canonical correlation just as multiple regression IS multiple correlation. The correlations are the natural measures of association. Add a ANOVA or MANOVA structure and you get ANCOVA or MANCOVA. The statistical tests are all the standard tests involving eigenvalues. Partial correlations can be generalized also. See my papers Cramer, E. M. (1974). Brief report: The distribution of partial correlations and generalizations. Multivariate Behavioral Research, 9, 119-122. Cramer, E. M. (1973). Note: A simple derivation of the canonical correlation equations. Biometrics, 29, 379-380. If you write the equations for MANCOVA you see immediately that it involves Multivariate regression equations for different subgroups and that the test of parallelism is in fact a test of equality of slopes for different canonical correlation problems = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
citation: orthog vs Cholesky
While cleaning my office I found a 1973 paper by Golub and Styan which says the matrix X'X is greatly influenced by roundoff errors and is often ill-conditioned ... An excellent way of solving (the LS equations) is through an orthogonal triangular decomposition of X. At a training session, (with some trepidation) I challenged C. R. Rao on the best way to solve the LS equations; he said it was reparameterization. Rao said that Golub would be present the next day to tell us the best way; he did and Rao graciously conceded that he was wrong. It just shows how slow good computational techniques are to be accepted in statistics. For the record, my MANOVA program used orthogonalization following Bock's methodology in 1964. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Venn Diagrams
The program I would choose to accomplish this is Visio. There are various versions of this program ranging from a simple drawing package to a version that will accomodate CAD. It is a microsoft product that has made drawing so much easier for those of us who do not wish to struggle with CAD. Pamela Auburn, PhD _ Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Min n CFA clarification
Dear Marianne, this example was extracted from LISRELĀ“s manual (Structural Equation Modeling with the SIMPLIS command language)on page 53. "(...) There are three sets of parameters in the model: (1) the four factor loadings corresponding to the paths from Verbal and Math to the observed variables, (2) The correlation between Verbal and Math, and (3) the four error variances of the observed variables." I hope this helps. Best regards, Alexandre Moura. P.S. It may help to post your message on SEMNET. http://bama.ua.edu/cgi-bin/wa?A0=semnetD=1H=0O=DT=1 - Original Message - From: Marianne and Dimitrios To: [EMAIL PROTECTED] Sent: Saturday, August 18, 2001 12:49 PM Subject: Min n CFA clarification Let me clarify my previous posting. I want to do a confirmatory factoranalysis to validate a questionnaire. There are 45 questions (subjectsanswer using a 1-5 scale). Theoretically, there are 3 subscales with15 items on each. In a CFA, that gives me 3 factors, 45 error terms,14 factor loadings on each of 3 factors, and 3 covariances. I figurethat gives me 93 parameters. That's the part I need somebody to verifyfor me. Have I counted the number of parameters correctly? If so,then I should have at least a 1:10 and at best a 1:20 ratio ofsubjects to parameters. Hence, the estimate of 930-1860 subjects. Isthis correct?Thanks to all who are helping me with this.Marianne=Instructions for joining and leaving this list and remarks aboutthe problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/=
Re: Venn diagram program?
I have read some of the venn diagram e-mails recently. If we restrict the sets to be represented by (perfect) circles only, it may happen that for certain situations the circle-only venn diagram does not exist. For 2 sets, the circle-version venn diagram always exists. First you draw 2 circles, with areas equal to the 2 given sets. Then you adjust the distance (between the 2 origins) and soon or later the common area will be equal to the proper intersection. And that's it. For 3 sets, you can do the same thing pairwise. First you do 2 sets, say A a nd B, as before. Then you do A and C. Here C is along SOME direction on the plane. Adjust C so that the common area between A, C is equal to the prpper intersection. Finally, consider the set C', and we do the same for B and C', and B and C' is along some direction on the plane. In this construction, we let C and C' have same area, and B intersection C' have the same area as the intersection between B and C. For the the 3-circle venn diagram to exist, we need C and C' coincide. This means distance(A,C)=distance(B,C'), and C=C' There are only 2 points (at most) on the plane that satisfies the above condition. But if C exists, then the common part of A, B and C is fixed --- i.e., not free. This means for certain A, B and C (with all 8 areas pre-determined), the proper circle-based venn diagram does not exist. Min-Te Chao - Original Message - From: Tom Johnson [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Saturday, August 18, 2001 2:26 AM Subject: Re: Venn diagram program? Yes, I am using Powerpoint now. It's harder than it sounds, because one must calculate the radius' that give appropriately scaled circle areas; and one can only guess how close to move the circles to give the correct overlap area. I use rectangular areas in Power Point. That make it easy to get the proportions you want. It of course does not help with the problem of observers incorrectly perceiving relative sizes. Therefore, if the relative size is important, I label the parts. TJ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = Tom Johnson [EMAIL PROTECTED] tel: (919) 515 4620 fax: (919) 515 1824 Box 8109 4336 Nelson Hall North Carolina State University Raleigh, NC 27695-8109 = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =