Re: What's the Mahanalobis distance?

2000-04-17 Thread dim.brumath

The Mahalanobis distance (MD) is distance between each observation and the
mean of the others.
For the obs. i MD(i)*MD(i)=(X(i)-mean(X))*inverse(S)*transpose(X(i)-mean(X))
where mean(X) is the mean of variables X, X(i) is values of variables X for
i, S is the variance-covariance matrix of X.
A relation with hat-matrix H is MD(i)*MD(i)=(n-1)*(h(ii)-1/n) where n is the
number of observations and h(ii) is the ii-th element of diag of the
hat-matrix H=X*inverse(transpose(X)*X)*transpose(X).

Hope this helps

--
===
Dr SAULEAU Erik-A.
DIM
--
Etablissement Public de Santé Alsace Nord
141, Ave de Strasbourg
67170 Brumath
Tel : 03-88-64-61-81
E-Mail: [EMAIL PROTECTED]
--
Centre Hospitalier d'Erstein
13, Route de Krafft
BP F
67151 Erstein Cedex
E-Mail: [EMAIL PROTECTED]
===
Teo [EMAIL PROTECTED] a écrit dans le message :
[EMAIL PROTECTED]
 Anyone knows in what consist the Mahanalobis distance??
 I have to measure the distance between two histograms...


 Thanks,

Teo.


 * Sent from AltaVista http://www.altavista.com Where you can also find
related Web Pages, Images, Audios, Videos, News, and Shopping.  Smart is
Beautiful




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp test:better def

2000-04-17 Thread Juha Puranen


Milo Schield wrote:
 
 I agree with Dennis that students need to be exposed to the use of Bayesian
 priors within the process of teaching classical hypothesis testing.

Using Bayesian priors can be very difficult for some students. (Why  do
we take the
uniform prior ??? )

For to teach decission making in the class, I have made some WWW-pages .

If you have Netscape 4 or better (this is  not for IE-users), pleace
look my DHTML-pages

http://noppa5.pc.helsinki.fi/koe/dhtml.html

and there 


Making decissions 1 (Critical value) 
Making decissions 2 (Probability) 

Some of my students said "it  was usefull".


Regards 

Juha Puranen


-- 
Juha Puranen
Department of Statistics 
P.O.Box 54 (Unioninkatu 37), 00014 University of Helsinki, Finland
http://noppa5.pc.helsinki.fi


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



The best effect size

2000-04-17 Thread Robert McGrath

I would appreciate feedback on the following from list members.

I recently participated in a discussion at a conference that revolved around
effect sizes.  The discussion had to do with the clinical value of a set of
predictors based on field studies.  In these studies, the predictors (which
were all dichotomous, positive-negative judgments based on a clinical test)
were related to a series of criteria, some of which were dichotomous and
some of which were quantitative.  Pearson correlations were computed for all
comparisons, and d statistics were also generated for all comparisons
involving quantitative criteria.  An important point to keep in mind was
that the base rates for the predictors (and many of the criteria) were quite
skewed; in general, only about 1 in 15 members of the sample were positive
on any one of the predictors.  These were field studies, so the skew
presumably represents real-world base rates.

The basis for the discussion was the extreme difference in conclusions one
would draw based on whether you computed correlations or d.  Because of the
skew, a d value of .71 (generally considered a "large" effect) translated
into an r value of .15.  A d value of .31 (medium-sized) transformed to an r
value of .07.

The discussion that followed focused on which is the "better" effect size
for understanding the usefulness of these predictors.  Some of the key
points raised:

1. r is more useful here for several reasons:
a. It is generally applicable to both the dichotomous and quantitative
criteria.
b. The concept of "proportion of overlapping variance" has more general
usefulness than "mean difference as a proportion of standard deviation."
c. The results of the correlational analyses were more consistent with
the results of significance tests, that is, even with large samples (N 
1000), many of the expected relationships proved to be nonsignificant.

2. d is more useful precisely because it is relatively resistant to the
impact of skew, unless group SDs are markedly different.

3. A third, less important issue, was raised in response to point 2.  If
effect size measures that are resistant to skew are more desirable, is there
one that could be applied to both dichotomous and quantitative criteria?  If
not, which would be the "better" effect size measure for dichotomous
criteria:
a. the tetrachoric r: one person recommended this on the grounds that it
is conceptually similar to the Pearson r and therefore more familiar to
researchers.
b. the odds ratio: recommended because it does not require the
distributional assumptions of the tetrachoric r.

The key issue on which I'd like your input, although please feel free to
comment on any aspect, is this.  Given there is real-world skew in the
occurrence of positives, does r or d present a more accurate picture?
Should we think of these as small or medium-to-large effect sizes?

-

Robert McGrath, Ph.D.
School of Psychology T110A
Fairleigh Dickinson University, Teaneck NJ 07666
voice: 201-692-2445   fax: 201-692-2304




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: Data Mining

2000-04-17 Thread Rich Ulrich

 ( how did we get to HERE, from Data Mining?)

On 15 Apr 2000 17:50:05 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

 In article [EMAIL PROTECTED],
 Rich Ulrich  [EMAIL PROTECTED] wrote:
 
 One thing that remains true about stock investment schemes:  There may
 be some overall growth, somewhere, but in a specific, narrow
 perspective, the whole market makes up a zero-sum game.  If someone
 wins, someone else has to lose.  
 
 The above is internally contradictory, but the final statement is
 clearly false.  

Hey, the final statement is a DEFINITION of zero-sum game.
Where is YOUR mind wandering to?

I have no objection to wise investments, and that is
why I specified tried to specify a different context,
that is,  "schemes."   - Sorry that I 


 Of course, short-term "day trading" is largely a zero-sum game, as the
 return to be expected over such a short time period is very small.

 - much of it only becomes zero-sum, when the time period is LONG.
There are fortunes made on a soaring market.

 - actually, I expect there are a few Wise Guys who will extract most
of the profit,  so techno-stocks will be negative-sum for most
investors.  There is a LONG history like that:  In the 1830s and 1840s
investors poured money into building canals in the U.S. and England.
The countries benefitted from canals; a few manipulators got rich;
most of the companies went broke and most of the investors lost money.
Railroads followed the same pattern in the second half of that
century.  

In the 1910s, the "wireless telegraph" had the investors flocking --
the U.S. government got involved in prosecuting traders for fraudulent
offerings.  But I don't know if that was as big as Railroads, in terms
of dollars.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread dennis roberts

At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:

 There's a chapter in J. Utts' mostly wonderful but flawed low-math intro
text "Seeing Through Statistics", in which she does much the same. She
presents a case study based on some of her own work in which she looked at
the question of gender discrimination in pay at her own university, and
fails to reject the null hypothesis [no systemic difference in pay between
male and female faculty]. She heads the example "Important, but not
significant, differences in salaries"; comments (_perhaps_ technically
correctly but misleadingly) that "a statistically naive reader could
conclude that there is no problem" and in closing states:

the flaw here is that ... she has population data i presume ... or about as 
close as one can come to it ... within the institution ... via the budget 
or comptroller's office ... THE salary data are known ... so, whatever 
differences are found ... DEMS are it!

the notion of statistical significance in this case seems IRRELEVANT ... 
the real issue is ... given that there are a variety of factors that might 
account for such differences (numbers in ranks, time in ranks, etc. etc.) 
 is the remaining difference (if there is one) IMPORTANT TO DEAL WITH ...





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing

2000-04-17 Thread Bruce Weaver



On 15 Apr 2000, Donald F. Burrill wrote:

 (2) My second objection is that if the positive-discrete 
   probability is retained for the value "0" (or whatever value the former 
   "no" is held to represent), the distribution of the observed quantity 
   cannot be one of the standard distributions.  (In particular, it is not 
   normal.)  One then has no basis for asserting the probability of error 
   in rejecting the null hypothesis (at least, not by invoking the standard 
   distributions, as computers do, or the standard tables, as humans do 
   when they aren't relying on computers).  Presumably one could derive the 
   sampling distribution in enough detail to handle simple problems, but 
   that still looks like a lot more work than one can imagine most 
   investigators -- psychologists, say -- cheerfully undertaking.
  
  This would not be a problem if the alternative was one-tailed, would it?
 
 Sorry, Bruce, I do not see your point.  How does 1-tailed vs. 2-tailed 
 make a difference in whatever the underlying probability distribution is? 
 

Donald,
It was clear at the time, but now I'm not sure if I can see my
point either!  I think what I was driving at was the idea that a point
null hypothesis is often false a priori.  But if you have a one-tailed
alternative, then you don't have a point null, because the null
encompasses a whole range of values.  For example, if your alternative is
that a treatment improves performance, then the null states that
performance remains the same or worsens as a result of the treatment.  It
seems that this kind of null hypothesis certainly can be true.  And I
think it is perfectly legitimate to use the appropriate continuous
distribution (e.g., t-distribution) in carrying out a test.  Or am I
missing something? 

Cheers,
Bruce



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread Robert Dawson


- Original Message -
From: dennis roberts
 At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:

  There's a chapter in J. Utts' mostly wonderful but flawed low-math
intro
 text "Seeing Through Statistics", in which she does much the same. She
 presents a case study based on some of her own work in which she looked
at
 the question of gender discrimination in pay at her own university, and
 fails to reject the null hypothesis [no systemic difference in pay
between
 male and female faculty]. She heads the example "Important, but not
 significant, differences in salaries"; comments (_perhaps_ technically
 correctly but misleadingly) that "a statistically naive reader could
 conclude that there is no problem" and in closing states:

and Dennis Roberts replied:

 the flaw here is that ... she has population data i presume ... or about
as
 close as one can come to it ... within the institution ... via the budget
 or comptroller's office ... THE salary data are known ... so, whatever
 differences are found ... DEMS are it!

 the notion of statistical significance in this case seems IRRELEVANT ...
 the real issue is ... given that there are a variety of factors that might
 account for such differences (numbers in ranks, time in ranks, etc. etc.)
  is the remaining difference (if there is one) IMPORTANT TO DEAL WITH
...


If one can totally explain all contributing factors, so that a model
with significantly fewer parameters than there are faculty fits everybody to
within a practically significant margin of error, then yes, either the model
continues to work with gender removed or it doesn't.

If, on the other hand, there are unknown sources of variation (a
reasonable assumption in any situation involving people), or more sources of
variation than there are data (another good bet if one thought hard enough),
one cannot automatically go from the observation

(*)  "The average pay of female faculty members here is less than that of
male faculty members"

to the apparently desired conclusion

(**)  "There is a gender-based _pattern_ of discrimination in faculty
salaries"

without considering the study as a pseudo-experiment, and analyzing it as
such.  One would be trying to decide: is the difference between mean male
and female faculty salaries greater than one would expect if one took N1
males and N2 females and assigned factors such as experience, rank,
skill/luck at negotiating a first contract, demand for specialties,  merit
pay actually deserved [as opposed to given on a gender basis], etc. at
random?

This is what Utts and her coauthors were, it seems, trying to do.
However, when the tests were not significant at the chosen level they seem
to have fallen back on inferring (**) directly from (*).

-Robert Dawson



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: cluster analysis in one-dimensional circular space

2000-04-17 Thread Rich Strauss

Since clustering methods begin with pairwise distances among observations,
why not measure these distances as minimum arc-lengths along the
best-fitting circle (or min chord lengths, or min angular deviations with
respect to the centroid, etc)?  This is how geographic distances are
measured (in 2 dimensions, rather than one) and clustered, and also how
distances are measured among observations in Kendall's shape spaces (e.g.,
Procrustes distances), so there's a well established literature.

Rich Strauss

At 05:32 PM 4/14/00 +0200, you wrote:
Hi everybody.
I face the problem of clustering one-dimensional data that can range in a
circular way. Does anybody knows the best way to solve this problem with no
aid of an additional variable ? Using a well-suitable trigonometric
transform ? Using an ad-hoc metric ?
Thanks.

Carl




===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===
 


Dr Richard E Strauss
Biological Sciences  
Texas Tech University   
Lubbock TX 79409-3131

Email: [EMAIL PROTECTED]
Phone: 806-742-2719
Fax: 806-742-2963 



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread ssolla

Response embedded within message:

In article [EMAIL PROTECTED],
  [EMAIL PROTECTED] wrote:

 The way this world is ---
A master's candidate, or a phD candidate, or a professor,
or a working scientist, has put a lot into his project.
In terms of time, in terms of money, and more important
still, in terms of emotional commitment, (S)he has lived
with this project for two years or more.

 That is a source of subjective bias:  (S)he WANTS the data to
 show something, preferably to support the original idea behind
 the research, but even failing that, to show something.

 There needs be an objective brake on this wish.  An hypothesis
 test is that a brake.  NOT rejecting the null hypothesis means
 that the data has no information (about whatever aspect of the
 data the test was designed to look at),  STOP THERE; go no
 further.

I hope not to get too off topic here, but sometimes the failure to
reject the null hypothesis has more implications than successfully
rejecting it. I understand your point here, and certainly have seen it
happen both personally and in the literature. However, as long as the
experiment has a sufficient sample size to detect a meaningful effect
(not necessarily just a null of an effect size of zero), then there is
something to say. For example, the literature has been overflowing with
reports of "estrogenic compounds" such as DDT/DDE that affect sexual
development of exposed animals. If someone found that DDE has little
ability to competitively bind to estrogen receptors (which someone has
found), at least to an extent necessary to elicit strong estrogenic
activity, this would not only mean that the null hypothesis that DDE is
estrogenic was rejected, but that something ELSE must be happening; ie.
that the known alterations to sexual development after exposure to DDE
is not due to estrogenic actvity. I am sure that this sort of thing must
be happening in other fields.


 Without some objective brake, the master's student, etc. will
 go ahead to say something about the data, even when the test
 would have told her(im) there is nothing to say.

Failure to reject null hypotheses that have been "successfully rejected"
in numerous previous experiments, and thus are generally accepted by the
scientific community at large, can have big implications, even if the
alternative explanations were not tested and thus remain unknown. It may
not happen often, but failure to reject a null hypothesis, particularly
one that was expected to be rejected, may indicate a poorly executed
study, but it may signal that the underlying theory from which the
experiment is based upon is wrong. That alone is valuable.

Shane de Solla
[EMAIL PROTECTED]

snip


Sent via Deja.com http://www.deja.com/
Before you buy.


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread dennis roberts

At 08:07 PM 4/17/00 +, Charles D Madewell wrote:

As a working engineer and part time graduate student I do not even
understand why anyone would want to do away with hypothesis testing.
I have spent many, many hours of my graduate school life learning,
reading, calculating, and analyzing using hypothesis tests.
Hypothesis testing is not bad.  It is errors in designing the
experiment that are bad and this comes from PEOPLE not the math.  What
is the fuss?  Are you guys telling me that all of this knowledge I am
being taught will be worthless?  Come on, find something else to say

some of us find it very difficult ... given how we learned/or were taught a 
subject matter ... AND how we have been practicing it for dozens and dozens 
of years ... to come to the realization that perhaps ... what we have been 
taught ... and what we have practiced ... is disproportional to its benefit 
and utility ...

if we take all the courses that teach (particularly at the more 
introductory levels) statistical material ... and try to establish some 
percent of that that deals with hypothesis testing and related matters ... 
VERSUS time spent on other things ... and then ask: is all that time worth 
the investment of energy?

i think the answer is clearly no ...

but, we are so slow to change ... if we change at all ...

i grew up like that ... and have spent all these years teaching that (have 
to fill those students with sufficient statistical info) ... but, the 
reality is: hypothesis testing the way we do it ... has limited utility ... 
and is overblown to the nth degree

now, that does not mean it is not important ... it is ... just not nearly 
as important as our expenditure of time suggests ... for us AND for students

sure, design is much more important than inferential statistics  but we 
have to share some of the blame ... when we push it so ... and as the ONLY 
way to go about things ... this is not only using our time unwisely ... but 
also doing a disservice to students ...



===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Finding statistical significance between 2 groups with categorical variables

2000-04-17 Thread einsetein

We have 2 groups reporting on the major problem for
attending college.  We are trying to see if the different
response numbers are statistically significant.  For example
one group responded with an answer 13 times and the other
group 32 times.  Since a chi square takes the mean of the
two, it doesn't show how to compare one to the other.  We
have no expected mean since these are opinions.  Do we have
to assume that equal %s of people will choose each category,
even though we categorized them after reading the answers on
the surveys?  Thank you for your time.

einstein


* Sent from AltaVista http://www.altavista.com Where you can also find related Web 
Pages, Images, Audios, Videos, News, and Shopping.  Smart is Beautiful


===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: split half reliability

2000-04-17 Thread dennis roberts

At 04:26 PM 4/17/00 -0500, Paul R Swank wrote:
I disagree with the statement that the split-half reliability coefficient
is of no use anymore. Coefficient alpha, while being an excellent estimator
of reliability, does have one rather stringent requirement. The items must
be homogeneous.

i don't ever seem to recall the coefficient alpha .. REQUIRES homogeneous 
content ... but rather, the SIZE of it will be impacted BY item homogeneity





===
This list is open to everyone.  Occasionally, less thoughtful
people send inappropriate messages.  Please DO NOT COMPLAIN TO
THE POSTMASTER about these messages because the postmaster has no
way of controlling them, and excessive complaints will result in
termination of the list.

For information about this list, including information about the
problem of inappropriate messages and information about how to
unsubscribe, please see the web page at
http://jse.stat.ncsu.edu/
===



Re: hyp testing -Reply

2000-04-17 Thread Joe Ward

Hi, Robert and all --

Yes, there occasionally were discussions in our Air Force research
whether or not we were working with the POPULATION or a SAMPLE.

As Dennis comments:
| 
|  the flaw here is that ... she has population data i presume ... or about
| as
|  close as one can come to it ... within the institution ... via the budget
|  or comptroller's office ... THE salary data are known ... so, whatever
|  differences are found ... DEMS are it!
| 

One of my Professors used to use the Invertebrate Paleontologists as his
example of a POPULATION.  I think at that time there were less than 20
people who were Invertebrate Paleontologists.

-- Joe
 
* Joe Ward  Health Careers High School *
* 167 East Arrowhead Dr 4646 Hamilton Wolfe*
* San Antonio, TX 78228-2402San Antonio, TX 78229  *
* Phone: 210-433-6575   Phone: 210-617-5400*
* Fax: 210-433-2828 Fax: 210-617-5423  *
* [EMAIL PROTECTED]*
* http://www.ijoa.org/joeward/wardindex.html   *




- Original Message - 
From: Robert Dawson [EMAIL PROTECTED]
To: dennis roberts [EMAIL PROTECTED]; [EMAIL PROTECTED]; [EMAIL PROTECTED]
Sent: Monday, April 17, 2000 9:54 AM
Subject: Re: hyp testing -Reply


| 
| - Original Message -
| From: dennis roberts
|  At 10:32 AM 4/17/00 -0300, Robert Dawson wrote:
| 
|   There's a chapter in J. Utts' mostly wonderful but flawed low-math
| intro
|  text "Seeing Through Statistics", in which she does much the same. She
|  presents a case study based on some of her own work in which she looked
| at
|  the question of gender discrimination in pay at her own university, and
|  fails to reject the null hypothesis [no systemic difference in pay
| between
|  male and female faculty]. She heads the example "Important, but not
|  significant, differences in salaries"; comments (_perhaps_ technically
|  correctly but misleadingly) that "a statistically naive reader could
|  conclude that there is no problem" and in closing states:
| 
| and Dennis Roberts replied:
| 
|  the flaw here is that ... she has population data i presume ... or about
| as
|  close as one can come to it ... within the institution ... via the budget
|  or comptroller's office ... THE salary data are known ... so, whatever
|  differences are found ... DEMS are it!
| 
|  the notion of statistical significance in this case seems IRRELEVANT ...
|  the real issue is ... given that there are a variety of factors that might
|  account for such differences (numbers in ranks, time in ranks, etc. etc.)
|   is the remaining difference (if there is one) IMPORTANT TO DEAL WITH
| ...
| 
| 
| If one can totally explain all contributing factors, so that a model
| with significantly fewer parameters than there are faculty fits everybody to
| within a practically significant margin of error, then yes, either the model
| continues to work with gender removed or it doesn't.
| 
| If, on the other hand, there are unknown sources of variation (a
| reasonable assumption in any situation involving people), or more sources of
| variation than there are data (another good bet if one thought hard enough),
| one cannot automatically go from the observation
| 
| (*)  "The average pay of female faculty members here is less than that of
| male faculty members"
| 
| to the apparently desired conclusion
| 
| (**)  "There is a gender-based _pattern_ of discrimination in faculty
| salaries"
| 
| without considering the study as a pseudo-experiment, and analyzing it as
| such.  One would be trying to decide: is the difference between mean male
| and female faculty salaries greater than one would expect if one took N1
| males and N2 females and assigned factors such as experience, rank,
| skill/luck at negotiating a first contract, demand for specialties,  merit
| pay actually deserved [as opposed to given on a gender basis], etc. at
| random?
| 
| This is what Utts and her coauthors were, it seems, trying to do.
| However, when the tests were not significant at the chosen level they seem
| to have fallen back on inferring (**) directly from (*).
| 
| -Robert Dawson
| 
| 
| 
| ===
| This list is open to everyone.  Occasionally, less thoughtful
| people send inappropriate messages.  Please DO NOT COMPLAIN TO
| THE POSTMASTER about these messages because the postmaster has no
| way of controlling them, and excessive complaints will result in
| termination of the list.
| 
| For information about this list, including information about the
| problem of inappropriate messages and information about how to
| unsubscribe, please see the web page 

Re: split half reliability

2000-04-17 Thread Paul Gardner

Paul R Swank wrote:
 
 I disagree with the statement that the split-half reliability coefficient
 is of no use anymore. Coefficient alpha, while being an excellent estimator
 of reliability, does have one rather stringent requirement. The items must
 be homogeneous. This is not always the case with many kinds of scales, nor
 should it be. In many cases homogeneity of item content may lead to reduced
 validity if the consruct is too narrowly defined. Screening measures often
 have this problem. They need to be short but they also need to be broad in
 scope. Internal consistency for such scales would suffer but a split half
 procedure, which is much less sensitive to item homogeneity, would fit the
 bill nicely.

I have four responses to this:

1. Split-half requires the items to be divided into two "equal" halves. 
How is this to be done?  Odd/even?  First half/second half?  Randomly?
Cronbach's alpha does not depend on this arbitrary division into halves.

2. Stanley and Hopkins (1972) demonstrated that Cronbach's alpha was
essentially equivalent to the "mean of all possible split-half
reliability estimates". DeVellis (1991) demonmstrates that if the items
in a scale have similar variances (a condition frequently met in
well-designed scales), it can be shown that the value of alpha (called
standardised alpha) is algebraically equivalent to the Spearman-Brown
formula for estimating split-half.  In other words, there is no great
difference conceptually between the two.

3.  Many writers use the term 'homogeneity' to bolster arguments in
discussions of reliability and validity.  In a paper I have completed
recently which is currently under review for publication, I show that
the term has about six different meanings in the literature.  Whenever I
read the word now, I respond, What exactly does the writer mean by
homogeneity here?

4.  If, by homogeneity, you mean all the items are measuring a similar
construct, i.e. the item scores all inter-correlate with each other
because they are indicators of a unidimensional construct, then the
assertion that Cronbach's alpha depends on being this being the case is
demonstrably untrue.  Cronbach's alpha will be high as long as every
item in a scale correlates well with at least some other items, but not
necessarily all of them. Homogeneity is not a "stringent requirement"
for a high Cronbach alpha level at all.  Cronbach's alpha is simply a
measure of reliability;  it is not an indicator of unidimensionality, a
point widely misunderstood in the literature.

Paul Gardner


begin:vcard 
n:Gardner;Dr Paul
tel;cell:0412 275 623
tel;fax:Int + 61 3 9905 2779 (Faculty office)
tel;home:Int + 61 3 9578 4724
tel;work:Int + 61 3 9905 2854
x-mozilla-html:FALSE
adr:;;
version:2.1
email;internet:[EMAIL PROTECTED]
x-mozilla-cpt:;-29488
fn:Dr Paul Gardner, Reader in Education and Director, Research Degrees, Faculty of Education, Monash University,  Vic. Australia 3800
end:vcard