Setting bounds for repeatability of scores

2001-08-18 Thread Jonathan Robbins

On 11 August in a thread about -How to determine adequate samples Ken
Mintz [EMAIL PROTECTED] wrote:


The std err (se) around the mean is given by the formula:

 se =   sd  / sqrt(n)  (68% conf)
 se = (1.96*sd) / sqrt(n)  (95% conf)
 se = (2.58*sd) / sqrt(n)  (99% conf)

  where sd is the std dev.  Suppose you want to be 99% that the
population
  avg is within +/-3% of the sample avg.  Then, se = 0.03*avg. (We
choose
  3% or whatever arbitrarily.)  Then the minimum sample size (n) is:

 x = (0.03*avg) / (2.58*sd)
 n = x*x


My question is - Is it sensible to use this formula to do something
else?

Situation:  35 'examiners' award a percentage score for the performance
of examinees.  In the course of a year each examiner will see about 500
examinees, perhaps 30 or so in a single session.

I'm using Microsoft Access Database to *enter* the scores awarded (not
to analyse it! - that's to be done using Minitab).  I want to set up a
'rule' in Access that indicates that there is less than 99% certainty
that a session is +/- 3% (or similar arbitrary cut off points) from a
'gold standard'.

(e.g. to compare the session mean for a single examiner with - A the
grand mean for all that examiners scores and - B, the session mean for
that single examiner with the grand mean for the population of
examiners.  The basic question is - Are A  B within the +/- 3% bounds?
If yes = accept, if no = check and adjust).

Help and advice appreciated.

Jonathan Robbins 

J H Robbins FRSA FRPS posting from Dorset in the UK.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: large N, categorical outcomes, significance?

2001-08-18 Thread Donald Burrill

One approach:  (I assume that by residual you mean (O-E)/sqrt(E) for 
each cell of a two-way frequency table, where O=observed frequency and 
E=expected frequency under the null hypothesis).  For the several (or 
the single) largest residual(s), report O and E as proportions (of total 
N).  Express the residual in terms of proportions, which will turn out 
to include N (or its square root) as a factor.  Show that the residual 
can be whatever it was (105.6, say) only if N is as large as it is in 
your dataset, and that the same proportions for some smaller (more 
reasonable?) N would _not_ produce a significant residual.

For purposes of this exercise, you could express the total chi-square 
in terms of proportions and N, and show that for the observed proportions 
only values of N larger than some value would produce a significant 
result;  or you could take, for any single cell, a critical value for 
chi-square with one d.f.  
 (One could argue for d.f. = (r-1)(c-1)/(rc), since the table has rc 
cells but only (r-1)(c-1) d.f., but 1 d.f. is arguably conservative, 
and finding critical values for fractional d.f. may be difficult.) 

On 17 Aug 2001, JDriscoll wrote:

 I have a large dataset (N can be 2,000-9,000) with
 mostly categorical outcome variables.  Any
 chi square is significant with residuals of 100+
 for tiny differences.  I  know one can determine
 effect size for continuous variables and show
 result is sign only due to size of the N, but...how
 do I do this for categorical outcome variables?
 Thanks!

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Min n CFA clarification

2001-08-18 Thread Marianne and Dimitrios

Let me clarify my previous posting. I want to do a confirmatory factor
analysis to validate a questionnaire. There are 45 questions (subjects
answer using a 1-5 scale). Theoretically, there are 3 subscales with
15 items on each.   In a CFA, that gives me 3 factors, 45 error terms,
14 factor loadings on each of 3 factors, and 3 covariances. I figure
that gives me 93 parameters. That's the part I need somebody to verify
for me. Have I counted the number of parameters correctly?  If so,
then I should have at least a 1:10 and at best a 1:20 ratio of
subjects to parameters. Hence, the estimate of 930-1860 subjects. Is
this correct?

Thanks to all who are helping me with this.

Marianne


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



bootstrap hypothesis testing

2001-08-18 Thread Joseph Horton

Hi: I am new to the list and have a question about bootstrap hypothesis testing. I am 
testing the equality of two means according to Algorithm 16.2 in An Introduction to 
the Bootstrap by Efron and Tibshirani (1993). They define the estimated ASL as 
#{t(x*b) = tobs}/B. It seems to me that this is a one sided estimated ASL. I can 
easily determine the significance or lack or significance by changing the order of the 
means I subtract.

My question is why is the ASL defined as =? Why would I not wish to examine both ends 
of the null distribution?

Thanks for your help.
Joe

Joe Horton
Psychology and Social Sciences Department
7373 Admiral Peary Highway
Mount Aloysius College
Cresson, PA  16630

(814) 886-6437
[EMAIL PROTECTED]



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



canonical R and Mancova

2001-08-18 Thread Elliot Cramer

re previous discussion

My old computer program MANOVA has a built in test of parallelism in
multivariate ANCOVA.  It's really standard multivariate regression theory
although it isn't widely known.  (TW Anderson gave MANOVA and CanR as two
different eigenproblems).  
They are easily shown to be equivalent with the same vectors and
eigenvalues related by

R^2 = L/(1+L)

The basic theory is discussed in Bock's multivariate text.
Multivariate regression IS canonical correlation just as multiple
regression IS multiple correlation.  The correlations are the natural
measures of association.  Add a ANOVA or MANOVA structure and you get
ANCOVA or MANCOVA.  The statistical tests are all the standard tests
involving eigenvalues.  Partial correlations can be generalized also.

See my papers
Cramer, E. M. (1974).  Brief report:  The distribution of partial
correlations and generalizations.  Multivariate Behavioral Research, 9,
119-122.

Cramer, E. M. (1973).  Note:  A simple derivation of the canonical
correlation equations.  Biometrics, 29, 379-380.

If you write the equations for MANCOVA you see immediately that it
involves Multivariate regression equations for different subgroups and
that the test of parallelism is in fact a test of equality of slopes for
different canonical correlation problems



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



citation: orthog vs Cholesky

2001-08-18 Thread Elliot Cramer

While cleaning my office I found a 1973 paper by Golub and Styan  which
says

the matrix X'X is greatly influenced by roundoff errors and is often
ill-conditioned ... An excellent way of solving (the LS equations) is
through an orthogonal  triangular decomposition of X.

At a training session, (with some trepidation) I challenged C. R. Rao on
the best way to solve the LS equations;  he said it was
reparameterization.  Rao said that Golub would be present the next day to
tell us the best way; he did and Rao graciously conceded that he was
wrong.

It just shows how slow good computational techniques are to be accepted in
statistics.  For the record, my MANOVA program used orthogonalization
following Bock's methodology in 1964.



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Venn Diagrams

2001-08-18 Thread Pam .

The program I would choose to accomplish this is Visio. There are various 
versions of this program ranging from a simple drawing package to a version 
that will accomodate CAD. It is a microsoft product that has made drawing so 
much easier for those of us who do not wish to struggle with CAD.

Pamela Auburn, PhD


_
Get your FREE download of MSN Explorer at http://explorer.msn.com/intl.asp



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Min n CFA clarification

2001-08-18 Thread Alexandre Moura



Dear Marianne,

this example was extracted from LISRELĀ“s manual 
(Structural Equation Modeling with the SIMPLIS command language)on page 
53. 

"(...) There are three sets of parameters in the 
model: (1) the four factor loadings corresponding to the paths from Verbal and 
Math to the observed variables, (2) The correlation between Verbal and Math, and 
(3) the four error variances of the observed variables."




I hope this helps. 

Best 
regards,

Alexandre 
Moura.
P.S. It may help to post your message on SEMNET. 
http://bama.ua.edu/cgi-bin/wa?A0=semnetD=1H=0O=DT=1


  - Original Message - 
  From: 
  Marianne 
  and Dimitrios 
  To: [EMAIL PROTECTED] 
  Sent: Saturday, August 18, 2001 12:49 
  PM
  Subject: Min n CFA clarification
  Let me clarify my previous posting. I want to do a confirmatory 
  factoranalysis to validate a questionnaire. There are 45 questions 
  (subjectsanswer using a 1-5 scale). Theoretically, there are 3 subscales 
  with15 items on each. In a CFA, that gives me 3 factors, 45 
  error terms,14 factor loadings on each of 3 factors, and 3 covariances. I 
  figurethat gives me 93 parameters. That's the part I need somebody to 
  verifyfor me. Have I counted the number of parameters correctly? If 
  so,then I should have at least a 1:10 and at best a 1:20 ratio 
  ofsubjects to parameters. Hence, the estimate of 930-1860 subjects. 
  Isthis correct?Thanks to all who are helping me with 
  this.Marianne=Instructions 
  for joining and leaving this list and remarks aboutthe problem of 
  INAPPROPRIATE MESSAGES are available 
  at 
  http://jse.stat.ncsu.edu/=


Re: Venn diagram program?

2001-08-18 Thread M. T. Chao

I have read some of the venn diagram e-mails recently. If we restrict the
sets to be represented by (perfect) circles only, it may happen that for
certain situations the circle-only venn diagram does not exist.

For 2 sets, the circle-version venn diagram always exists. First you draw 2
circles, with areas equal to the 2 given sets. Then you adjust the distance
(between the 2 origins) and soon or later the common area will be equal to
the proper intersection. And that's it.

For 3 sets, you can do the same thing pairwise. First you do 2 sets, say A a
nd B, as before.
Then you do A and C. Here C is along SOME direction on the plane. Adjust C
so that the common area between A, C is equal to the prpper intersection.
Finally, consider the set C',
and we do the same for B  and C', and B and C' is along some direction on
the plane. In this construction, we let C and C' have same area, and B
intersection C' have the same area as
the intersection between B and C.

For the the 3-circle venn diagram to exist, we need C and C' coincide. This
means

   distance(A,C)=distance(B,C'), and C=C'

There are only 2 points (at most) on the plane that satisfies the above
condition. But if
C exists, then the common part of A, B and C is fixed --- i.e., not free.
This means for certain
A, B and C (with all 8 areas pre-determined), the proper circle-based venn
diagram does not exist.

Min-Te Chao






- Original Message -
From: Tom Johnson [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Sent: Saturday, August 18, 2001 2:26 AM
Subject: Re: Venn diagram program?



   Yes, I am using Powerpoint now.  It's harder than it sounds, because
   one must calculate the radius' that give appropriately scaled circle
   areas; and one can only guess how close to move the circles to give
   the correct overlap area.
  
 

 I use rectangular areas in Power Point.  That make it easy to get
 the proportions you want.  It of course does not help with the
 problem of observers incorrectly perceiving relative sizes.
 Therefore, if the relative size is important, I label the parts.

 TJ

 
  =
  Instructions for joining and leaving this list and remarks about the
  problem of INAPPROPRIATE MESSAGES are available at
http://jse.stat.ncsu.edu/
  =


 Tom Johnson
 [EMAIL PROTECTED]
 tel: (919) 515 4620
 fax: (919) 515 1824
 Box 8109
 4336 Nelson Hall
 North Carolina State University
 Raleigh, NC 27695-8109


 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=