Re: Normality in Factor Analysis

2001-06-17 Thread Haytham Siala

 I have checked some of the books but I could not find this statment (for
e.g. using multivariate statistics (Tabachnik 1996), Latent variable models
(Loehlin 1998), Easy guide to factor analysis (Kline 1994)).

Can you please give me a examples of references as I reallly need a
reference because I have already conducted a factor analysis on sample of
data containing some non-normal variables .

- Original Message -
From: "Eric Bohlman" <[EMAIL PROTECTED]>
Newsgroups: sci.stat.consult,sci.stat.edu,sci.stat.math
Sent: Sunday, June 17, 2001 2:08 AM
Subject: Re: Normality in Factor Analysis


> In sci.stat.consult haytham siala <[EMAIL PROTECTED]> wrote:
> > I have a question regarding factor analysis: Is normality an important
> > precondition for using factor analysis?
>
> It's necessary for testing hypotheses about factors extracted by
> Joreskog's maximum-likelihood method.  Otherwise, no.
>
> > If no, are there any books that justify this.
>
> Any book on factor analysis or multivariate statistics in general.
>


"Eric Bohlman" <[EMAIL PROTECTED]> wrote in message
9ggvug$451$[EMAIL PROTECTED]">news:9ggvug$451$[EMAIL PROTECTED]...
> In sci.stat.consult haytham siala <[EMAIL PROTECTED]> wrote:
> > I have a question regarding factor analysis: Is normality an important
> > precondition for using factor analysis?
>
> It's necessary for testing hypotheses about factors extracted by
> Joreskog's maximum-likelihood method.  Otherwise, no.
>
> > If no, are there any books that justify this.
>
> Any book on factor analysis or multivariate statistics in general.
>




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



meta-analysis

2001-06-17 Thread Marc

I have to summarize the results of some clinical trials.
Unfortunately the reported information is not complete.
The information given in the trials contain:

(1) Mean effect in the treatment group (days of hospitalization)

(2) Mean effect in the control group (days of hospitalization)

(3) Numbers of patients in the control and treatment group

(4) p-values of a t-test (between the differences of treatment
and control)
My question:
How can I calculate the variance of treatment difference which I need
to perform meta-analysis? Note that the numbers of patients in the
control and treatment group are not equal. Is it possible to do it
like this

s^2 = (difference between contr and treatm in days)^2/ (1/n1+1/n2)*
t^2

How exact would such an approximation be?

I now that meta-analysis is a complex thing and my abilities in 
statistics are limited.

Many thanks for your help.

Marc, Germany


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



cigs & figs

2001-06-17 Thread EugeneGall

On Slate, there is quite a good discussion of the meaning and probabilistic
basis of the statement that 1 in 3 teen smokers will die of cancer.  It is
written by a math prof and it is one of the most effective lay discussions I've
seen of the use of probabilities in describing health risks.

http://slate.msn.com/math/01-06-14/math.asp


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Normality in Factor Analysis

2001-06-17 Thread Herman Rubin

In article <9gg7ht$qa3$[EMAIL PROTECTED]>,
haytham siala <[EMAIL PROTECTED]> wrote:
>Hi,

>I have a question regarding factor analysis: Is normality an important
>precondition for using factor analysis?

>If no, are there any books that justify this.

Factor analysis is quite robust against non-normality.
The essential factor structure is little affected by it
at all, although the representation may get somewhat
sensitive if data-dependent normalizations are used, such
as using correlations rather than covariances, or forcing
normalization on the covariance matrix of the factors.

Some of this is in my paper with Anderson in the
Proceedings of the Third Berkeley Symposium.  The result
on the asymptotic distribution, not at all difficult to
derive, is in one of my abstracts in _Annals of
Mathematical Statistics_, 1955.  It is basically this:

Suppose the factor model is 

x = \Lambda f + s,

f the common factors and s the specific factors.  Further
suppose that f and s, and also the elements of s, are
uncorrelated, and there is adequate normalization and
smooth identification of the model by the elements of
\Lambda alone.  Now estimate \Lambda, M, the covariance
matrix of f, and S, the diagonal covariance matrix of s.
Assuming the usual assumptions for asymptotic normality of
the sample covariances of the elements of f with s, and of
the pairs of different elements of s, the asymptotic
distribution of the estimates of \Lambda and the SAMPLE
values of M and S from their actual values will have the
expected asymptotic joint normal distribution.  This makes
no assumption about the distribution of M and S about 
their expected values, which is the main place were there
is an effect of normality. 



-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Rich Ulrich

On 17 Jun 2001 04:34:26 -0700, [EMAIL PROTECTED] (Marc)
wrote:

> I have to summarize the results of some clinical trials.
> Unfortunately the reported information is not complete.
> The information given in the trials contain:
> 
> (1) Mean effect in the treatment group (days of hospitalization)
> 
> (2) Mean effect in the control group (days of hospitalization)
> 
> (3) Numbers of patients in the control and treatment group
> 
> (4) p-values of a t-test (between the differences of treatment
> and control)
> My question:
> How can I calculate the variance of treatment difference which I need
> to perform meta-analysis? Note that the numbers of patients in the

Aren't you going too far?  You said you have to summarize.
Well, summarize.  The difference is in terms of days.  
Or it is in terms of percentage of increase.

And you have the t-test and p-values.  

You might be right in what you propose, but I think
you are much more likely to produce a useful report 
if you keep it simple.

You are right; meta-analyses are complex.  And a 
majority of the published ones are (in my opinion) awful.
--
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Marijuana

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 02:04:36 -0700, [EMAIL PROTECTED] (Eamon) wrote:

[ snip, Paul Jones.  About marijuana statistics.]

> 
> Surely this whole research is based upon a false premise. Isn't it
> like saying that 90%, say, of heroin users previously used soft drugs.
> Therefore, soft-drug use usually leads to hard-drug use - which does
> not logically follow. (A => B =/= B => A)
> 
> Conclusions drawn from the set of people who have had heart attacks
> cannot be validly applied to the set of people who smoke dope.
> Rather than collect data from a large number of people who had heart
> attacks and look for a backward link, they should monitor a large
> number of people who smoke dope. But, of course this is much more
> expensive.

It is much more expensive, but it is also totally stupid to carry out
the expensive research if the *cheap* and lousy research didn't
give you a hint that there might be something going on.

The numbers that he was asking about do pass the simple
test.  I mean, there were not 1 million people contributing one
hour each, but we should still ask, *Would*  this say something?
If it would not, then the whole question is *totally*  arid.  The 2x2
table is approximately
(dividing the first column by 100; and subtracting from a total):
10687   and  124
   175   and  9

That gives a contingency test of 21.2 or 18.2, with p-values 
under .001.  The Odds Ratio on that is 4.4.
That is pretty convincing that there is SOMETHING
going on, POSSIBLY something that merits an explanation.  
The expectation for the cell with 9  is just 2.2 -- the tiny cell is
the cell that matters for contributions to the test -- which is why it
is okay to lop the "hundreds"  off the first column (to make it
readable).

Now, you may return to your discussion of why the table is
not any good, and what is needed for a proper test.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: meta-analysis

2001-06-17 Thread Donald Burrill

On 17 Jun 2001, Marc wrote (edited):

> I have to summarize the results of some clinical trials.
> The information given in the trials contain:
> 
> Mean effects (days of hospitalization) in treatment & control groups; 
> numbers of patients in the groups;  p-values of a t-test (of the 
> difference between treatment and control) .
> My question:  How can I calculate the variance of the treatment 
> difference, which I need to perform meta-analysis?  Note that the 
> numbers of patients in the groups are not equal.  
> Is it possible to do it like this:
> 
> s^2 = (difference between contr and treatm)^2/ ((1/n1+1/n2)*t^2)

Yes, if you know t.  If all you know is that p < alpha for some alpha, 
you then know only that t > the t corresponding to alpha (AND you need to 
know whether the test had been one-sided or two-sided -- of course, you 
need to know that in any case), you can substitute that corresponding t 
to obtain an upper bound on s^2 -- ASSUMING that the t was calculated 
using a pooled variance (your s^2), not using the expression for separate 
variances in the denominator:  (s1^2/n1 + s2^2/n2).

Note that this s^2 is NOT "the variance of the treatment difference", 
which you said you wanted to know;  it is the pooled variance estimate 
of the variance within each group.  
 The variance of the difference in treatment means, which _may_ be what 
you are interested in, would be 

(difference)^2 / t^2 

with the same caveats concerning what you know about t.

> How exact would such an approximation be?

Depends on the precision with which  p  was reported.

 
 Donald F. Burrill [EMAIL PROTECTED]
 184 Nashua Road, Bedford, NH 03110  603-471-7128



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: individual item analysis

2001-06-17 Thread Rich Ulrich

On 15 Jun 2001 14:24:39 -0700, [EMAIL PROTECTED] (Doug
Sawyer) wrote:

> I am trying to locate a journal article or textbook that addresses
> whether or not exam quesitons can be normalized, when the questions are
> grouped differently.  For example, could a question bank be developed
> where any subset of questions could be selected, and the assembled exam
> is normalized?
> 
> What is name of this area of statistics?  What authors or keywords would
> I use for such a search?  Do you know whether or not this can be done?


I believe that they do this sort of thing in scholastic achievement
tests, as a matter of course.  Isn't that how they make the transition
from year to year?  I guess this would be "norming".

A few weeks ago, I discovered that there is a whole series of
tech-reports put out by one of the big test companies.  I would 
look back to it, for this sort of question.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Factor Analysis

2001-06-17 Thread Ken Reed

It's not really possible to explain this in lay person's terms. The
difference between principal factor analysis and common factor analysis is
roughly that PCA uses raw scores, whereas factor analysis uses scores
predicted from the other variables and does not include the residuals.
That's as close to lay terms as I can get.

I have never heard a simple explanation of maximum likelihood estimation,
but --  MLE compares the observed covariance matrix with a  covariance
matrix predicted by probability theory and uses that information to estimate
factor loadings etc that would 'fit' a normal (multivariate) distribution.

MLE factor analysis is commonly used in structural equation modelling, hence
Tracey Continelli's conflation of it with SEM. This is not correct though.

I'd love to hear simple explanation of MLE!



> From: [EMAIL PROTECTED] (Tracey Continelli)
> Organization: http://groups.google.com/
> Newsgroups: sci.stat.consult,sci.stat.edu,sci.stat.math
> Date: 15 Jun 2001 20:26:48 -0700
> Subject: Re: Factor Analysis
> 
> Hi there,
> 
> would someone please explain in lay person's terms the difference
> betwn.
> principal components, commom factors, and maximum likelihood
> estimation
> procedures for factor analyses?
> 
> Should I expect my factors obtained through maximum likelihood
> estimation
> tobe highly correlated?  Why?  When should I use a Maximum likelihood
> estimation procedure, and when should I not use it?
> 
> Thanks.
> 
> Rita
> 
> [EMAIL PROTECTED]
> 
> 
> Unlike the other methods, maximum likelihood allows you to estimate
> the entire structural model *simultaneously* [i.e., the effects of
> every independent variable upon every dependent variable in your
> model].  Most other methods only permit you to estimate the model in
> pieces, i.e., as a series of regressions whereby you regress every
> dependent variable upon every independent variable that has an arrow
> directly pointing to it.  Moreover, maximum likelihood actually
> provides a statistical test of significance, unlike many other methods
> which only provide generally accepted cut-off points but not an actual
> test of statistical significance.  There are very few cases in which I
> would use anything except a maximum likelihood approach, which you can
> use in either LISREL or if you use SPSS you can add on the module AMOS
> which will do this as well.
> 
> 
> Tracey



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=