Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-15 Thread Thom Baguley

Irving Scheffe wrote:
 First, you're addressing the wrong question.
 We are not interested, in the example, in the "ability" of the
 players. We are interested in whether, over the course of the
 preceding 162 games, the Yanks outhomered the Tigers by a substantial

I think that illustrates my point. There is no single "we" here. My question
can't be wrong, except when contrasted with the question someone else asked or
wants to ask. (I repeat that the thread had moved off its original narrow
frame of reference, which I was most definitely not addressing).

 amount. They did. [This is not to say that "ability" isn't an
 interesting question. But your proposed randomization test doesn't
 address that issue well at all.]

I think this is a crucial point. Yes, it doesn't address it particularly well,
but it isn't irrelevant. I'd lump it with quick and dirty exploratory and
descriptive stats that people do when eye-balling data for the first time.

 Second of all, you have chosen a suboptimal unit of analysis, if
 you are really interested in assessing "ability."

To be be fair, I didn't choose any units of analysis at all. I wrote in
responses to the units of analyses already being discussed. The implicit
assumption was that that this is the data you have.

Thom


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-15 Thread Rich Ulrich

On 14 Mar 2001 21:55:48 GMT, [EMAIL PROTECTED] (Radford Neal)
wrote:

 In article [EMAIL PROTECTED],
 Rich Ulrich  [EMAIL PROTECTED] wrote:
 
 (This guy is already posting irrelevant rants as if 
 I've driven him up the wall or something.  So this 
 is just another poke in the eye with a blunt stick, to see
 what he will swing at next)
 
 I think we may take this as an admission by Mr. Ulrich that he is
 incapable of advancing any sensible argument in favour of his
 position.  Certainly he's never made any sensible response to my
 criticism.  

 - In a new thread, I have now provided a response that is sensible, 
or, at least, somewhat numeric.

I notice that Jim C.  has taken up the cudgel, in trying to explain
the basics of t-tests to Jim S, and that  "furthers my position."

I figure that after I state my position in one post, explicate it in
another, and try that again while refining the language -- then
I may as well call it quits with JS, when he still doesn't get the
points from the first (or from the couple of other people who
were posting them before I was).

I may not be saying it all that well, but I wasn't inventing the
position.

You and I are in agreement, now, on one minor conclusion:  
"The t-test isn't good evidence about a difference in averages."
But for me, that's true because the numbers are crappy 
indicators of performance -- which was clued *first*  by the 
distribution.

Whereas, you seem to have much more respect for crude
averages, compared to the several of us who object.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-15 Thread Rich Ulrich

 - I hate having to explain jokes -

On 14 Mar 2001 15:34:45 -0800, [EMAIL PROTECTED] (dennis roberts) wrote:

 At 04:10 PM 3/14/01 -0500, Rich Ulrich wrote:
 
 Oh, I see.   You do the opposite.  Your own
 flabby rationalizations might be subtly valid,
 and, on close examination,
 *do*  have some relationship to the questions
 
 
 could we ALL please lower a notch or two ... the darts and arrows? i can't 
 keep track of who started what and who is tossing the latest flames but ... 
 somehow, i think we can do a little better than this ... 

Dennis,
Please, where is YOUR sense of humor?   

My post was a literary exercise -- I intentionally posted his lines
immediately before mine, so the reader could follow my re-write 
phrase by phrase. 
I'm still hoping "Irving" will lighten up.

You chopped out the original that I was paraphrasing, and you did
*not*  indicate those important [snip]s -- You would mislead the
casual reader to think someone other than JimS is originating lines
like that, or intend them as critique in this group.
 - I'm not always kind, but I think I am never that wild.  
 - It's probably been a dozen years since I purely flamed like that.

(Or maybe I never flamed, if you talk about the really empty ones.  
In the olden days of local Bulletin Boards, with political topics, I
discarded 1/3 of my compositions without ever posting, because of 
poor content or tone.  I still use some judgment in what I post.)


Compare his original line about  'little or no ... relationship'  with
my clever reversal,   "... on close examination, *do*  have some
relationship to the questions."

Well, I was trying for humor, anyway.  Sorry, if I missed.
-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-15 Thread Irving Scheffe

On Thu, 15 Mar 2001 18:09:26 GMT, Jerry Dallal
[EMAIL PROTECTED] wrote:

Irving Scheffe wrote:

 Original MIT Report on the Status of Women Faculty:
  http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/


It is frustrating to keep getting errors when I try to access a
printable version of the report, whether by using IE or Netscape. Is
there a known workaround?

Many people have had problems double-clicking on the Adobe Acrobat
link. This has to do with various integration problems between
Acrobat and Internet Explorer.  It is, in general, much better to
right-click on the link, then choose to save the file locally. After
downloading the whole file, you should (if you have Adobe Acrobat
on your system) be able to read it.

--Jim Steiger


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-14 Thread Irving Scheffe

Thanks again for the clarification, Jim. I think we
are in essential agreement.

To reply succinctly to your message:

1. Certainly, as a general rule
one should *always* look at distributional
shape as well as summary statistics. Feminists seldom
do, by the way, in advancing arguments about
discrimination. Indeed, as those of us who've
heard the "women make 73 cents on the dollar"
mantra for years know all too well, they'll
repeat the most inane statistic if it sounds 
good.

2. In the MIT data, Mr. Ulrich seemed to be implying
that the mean differences *favoring* the men might be
due to one or two outliers. However, there is a serious
question whether the men in the 1 range should actually
be considered outliers. If you don't want to address that,
fine. It seemed like you were agreeing with his position.
It now seems you were not. Sorry if I misread.

3. In the modern academic environment, I think that Nobel
Prize Winners generally make above average salaries, and
tend to be highly productive people as well. I may be wrong,
but some data I've seen suggest otherwise.

4. I probably would not be inclined to use formal inferential
procedures with the MIT data, even if it were provided. Keep in
mind that, in a perfectly fair society, there is a "balance of
unfairness."  What I'd probably do is a regression analysis,
and try to decide, on the basis of some fairly extensive consultation,
when a residual is large enough to merit recompense.

There is a real problem with some of the recommendations
recently agreed to at MIT. Salaries have a natural error variance,
if you take two groups of "equally" performing people, 
they will almost certainly have differences both in pay and
in performance.  The way it now stands, feminists planning
to use MIT as a template want the right to demand a pay increase
anytime they can identify a salary decrement, regardless of 
(a) whether any performance figures have been taken into account,
and (b) whether "natural variation" has been examined. Similar
venues are not open to men. So, in the future, we may find
rapid "fixing" of even minor, well-deserved differences when
women find themselves on the short end, but no such "fixes"
when men find themselves on the short end. This merely perpetuates
more unfairness, and will almost certainly result in a backlash
some time in the future.

BTW, I would like to rebut any notion that I am, in general, against
salary equity procedures. It is a matter of record that, in 1988,
when serving as a member of the salary negotiation team at UBC,
I pointed out that an across-the-board raise of $2700, requested
for all women, would unfairly benefit those who had started
working recently, and not make up the balance for those who had
been working there a long time. As a result, a regression based
procedure was adopted that more equitably distributed the money.
I supported this, as did most of the other members of the team,
among them several women.

What I am against is poorly designed, unfair procedures that reward
people solely on the basis of their race or gender and their
willingness to gripe.


Best regards,

Jim Steiger

--
James H. Steiger, Professor
Dept. of Psychology
University of British Columbia
Vancouver, B.C., V6T 1Z4
-


Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty. 

Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
  http://www.iwf.org/news/mitfinal.pdf  

Judith Kleinfeld's Article Critiquing the MIT Report:
 http://www.uaf.edu/northern/mitstudy/#note9back

Original MIT Report on the Status of Women Faculty:
 http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/

On Mon, 12 Mar 2001 13:10:47 -0600, jim clark [EMAIL PROTECTED]
wrote:

Hi

On Mon, 12 Mar 2001, Irving Scheffe wrote:
 Jim:
 For example, suppose you had a department
 in which the citation data were
 
Males   Females
12220 1298
 2297 1102

When I said outlier, I had in mind hypothetical data of the
following sort (it doesn't matter to me whether it is the
salaries or the citation rates):

MalesFemales
170001000
 10001000
 10001000
 10001000

Avg  50001000

vs.
Males   Females
50001000
50001000
50001000
50001000

Avg 50001000

I would view the latter somewhat differently than the former with
respect to differences between these samples of males and
females, and with respect to the kinds of explanations I would
seek (e.g., somewhat general to males, something specific to
male 1).

 The male with 12220 is, let's imagine, a Nobel Prize
 winner. The salaries for the 4 people are 
 
Males   Females
   156,880  121,176
   112,120  114,324

Of course if the salaries were:
Males   Females
   112,120   121,176
   156,880   114,324

You probably might want not to promote the hypothesis of
productivity 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-13 Thread Irving Scheffe



On Fri, 09 Mar 2001 15:53:12 +, Thom Baguley
[EMAIL PROTECTED] wrote:

Irving Scheffe wrote:
 Imagine it is 1961. Our question is, which outfield has better
 home run hitters, the Yankees or Detroit? Here are the numbers
 for the Yankee and Tiger starting Outfields.
 
 Yanks   Tigers
 -   --
  61   45
  54   19
  22   17
 --
 
 Now, the t-test isn't significant, nor is the permutation test.
 But is either relevant to the question? If you have a reasonable
 understanding of the notion of "home run," the answer is no.
 
snip
 It was, by definition, the population of interest, so it appears that
 you are flat wrong. The question we were asking was, "if we take the
 large identifiable cluster of senior MIT women who graduated between
 1970 and 1976, and compare them with their natural cohort, the men who
 graduated in the same time frame, do we see performance differences?"
 
 The answer is, as shown by the data above: yes. We see huge
 performance differences. Just like with the Yankees and Tigers in
 1961.

It seems to me that you are unncessarily restricting the questions than can be
asked by others. 

I was presenting a counterexample to an erroneous assertion by
Mr. Ulrich. This in no way is "restricting" the discussion at all.
Indeed, if you read my preceding posts carefully enough, 
you'll find an explicit disclaimer to the contrary. I recognize
that the "utility function" relating citations and publications
to quality is complex, and that there are questions of
natural variability to be addressed. 

You are not even restricting them to the interesting
questions. 

Again, please do not engage in straw man mischaracterization.
I'm not "restricting" anybody to anything. Indeed, it is the rigid
and improper insistence on a useless significance test
that is "restrictive," misleading, and lacking a rationale. 

I've simply presented an example of how a t-test not only fails to
add useful information, but provides a misleading conclusion.
If you think otherwise, please provide an example, with a rationale.
But please read on, because I think I'm going to help answer your
questions for you.

For example, asking who scored more in 1961 - is different to which
players were better. 

I cannot imagine anyone, least of all myself, disagreeing. Why you
think it is relevant to my critique of a randomization test is
a mystery. As someone with a lifelong fascination with baseball
statistics, I'd freely admit that virtually any measure of anything in
baseball is impure. 

The key structural point in my argument is this. If you accept
the assumption that the players performance in the previous
season is the thing being evaluated, reference to what might
have happened under some fictitious random sampling process
is irrelevant.

The Yanks outhomered the heck out of the Tigers in 1961. Whether
this indicates they are "better hitters," "more Christian," 
"superior human beings," or even "better home run hitters
in the long run" etc. is, of course, another matter,
and possibly very interesting. But you're not going to address
any of those issues with a t-test or randomization test. If you think
you can, please present a rationale.

Imagine the Tigers approached the
media in late 1961 and said, "Actually, Dr. Randomo isn't
sure that Maris, Mantle and Berra outhomered us in any 
meaningful sense, because, if you think about it, this
difference might be produced by 6 players of equal ability
influenced by a large number of random factors." 

If they were ordinary sportswriters, they'd simply 
say "are you nuts?"

But, if they were statisticians, they'd say (a) you 
are asking the wrong question, and (b) you have the 
wrong model. The question is not whether Mantle, Maris,
and Berra are better collective home run hitters over some
hypothetical long run than Kaline, Colavito, and Bruton. [Actually,
virtually anyone familiar with baseball would agree that they were, as
a group, better players, but that is another matter. All 6 were
outstanding players.]

In a similar vein, the question in the MIT case was not
whether the MIT male senior biologists are better people
than their female counterparts. It is simply, how true is the
implied assertion in the MIT report that there were no
performance differences that might account for [undocumented]
differences in salary and performance between senior
men and women. MIT stated that to assert that differences in resource
allocation might be due to performance differences is "the last refuge
of the bigot." Hausman and I were documenting major performance
differences.


Why not think of it in terms of "Could this difference be
produced by 6 players of equal ability influenced by a large number of random
factors". In that case a significance test might have some value in evaluating
the hypothesis that one group was better.

Again, you're slipping in an alternative question to
the one that was asked. 


Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-12 Thread dennis roberts

At 02:25 PM 3/12/01 +, Radford Neal wrote:


In this context, all that matters is that there is a difference.  As
explained in many previous posts by myself and others, it is NOT
appropriate in this context to do a significance test, and ignore the
difference if you can't reject the null hypothesis of no difference in
the populations from which these people were drawn (whatever one might
think those populations are).

the problem with your argument is this ...

now, whether or not formal inferential statistical procedures are called 
for ... if there is a difference in salary ... and differences in any OTHER 
factor or factors ... one is in the realm of SPECULATION as to what may or 
may not be the "reason" or "reasons" for THAT difference

in other words ... any way you say that the difference "may be explained 
by"  is a hypothesis you have formulated ...

so, in this general context ... it still is a statistical issue ... that 
being, what (may) causes what ... and, this calls for some model 
specification ... that links difference in salaries TO differences in other 
factors/variables

if we do not view it as some kind of a statistical model ... then we are in 
no position to really talk about this case ... not in any causal or quasi 
causal way ... and, i thought that was the main purpose of this entire 
matter ... what LEAD to the gap in salaries?? ... was it something based on 
merit? or something based on bias?

i don't see how else we could check up on these kinds of issues other than 
some statistical questions being asked ... then tested in SOME fashion 
(though i am not specifying exactly how)




Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-12 Thread Irving Scheffe

Jim:

I agree with Radford Neal's comments,
and urge careful reconsideration of the
foundation behind some of the comments
made. 

For example, suppose you had a department
in which the citation data were

   Males   Females
   12220 1298
2297 1102

The male with 12220 is, let's imagine, a Nobel Prize
winner. The salaries for the 4 people are 

   Males   Females
  156,880  121,176
  112,120  114,324

The females approach the dean of science and declare that
there is discrimination against them. They've measured
the labs, and the men have more space. Moreover, they
feel marginalized and depressed, as their status has
been slowly slipping in the department. Moreover, they
are paid less than men of the same age.

Careful examination of mean salary shows that the mean 
salaries are 134,500 for men and only 117,750 for women.

With great brouhaha, the administration, without
publishing the above data, declares that there
was a discrimination problem, and it was addressed
by giving both the women a 16,000 raise.

As Radford Neal has pointed out succinctly, the argument about
outliers is irrelevant, and I want to emphasize with this example
that it is irrelevant on numerous levels. First of all,
it is not necessarily clear whether, and in which of several
senses, our Nobel Prize winner is an outlier in his group.
Second, even if he is -- so what? Surely you would not argue
that this means he didn't deserve his salary!

In fact, careful examination of the salary data [never
made public by the administration] together with the
performance data might well have led to the conclusion
that it is the male faculty who are underpaid.

Although, as Dr. Neal pointed out, it is not logically
relevant to the issue, I would like to
explore your notion, echoed without
justification by Rich Ulrich, that the
huge difference in citation performance between
MIT senior men and women might be due
to "one or two outliers."

Take a look at the data again, and tell me
which male data you consider to be outliers
within the male group, and why. For example, 
are the men with 2133 and
893 "outliers," or those with 12830 and 11313?

The data for the senior men and women:

12 year citation counts:

   MalesFemales
 --
128302719
113131690
106281301
 43961051
 2133 935
  893
 ---

As for the notion of exploring the relationship between
salary, gender, and performance -- I'd be more than happy
to examine any data that MIT would make available. They
will, of course, not make such data available. It is too
private, they say.


Best regards,

Jim Steiger

--
James H. Steiger, Professor
Dept. of Psychology
University of British Columbia
Vancouver, B.C., V6T 1Z4
-


Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty. 

Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
  http://www.iwf.org/news/mitfinal.pdf  

Judith Kleinfeld's Article Critiquing the MIT Report:
 http://www.uaf.edu/northern/mitstudy/#note9back

Original MIT Report on the Status of Women Faculty:
 http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/


On Mon, 12 Mar 2001 08:55:17 -0600, jim clark [EMAIL PROTECTED]
wrote:

Hi

On 12 Mar 2001, Radford Neal wrote:
 Yes indeed.  And the context in this case is the question of whether
 or not the difference in performance provides an alternative
 explanation for why the men were paid more (one supposes, no actual
 salary data has been released).
 
 In this context, all that matters is that there is a difference.  As
 explained in many previous posts by myself and others, it is NOT
 appropriate in this context to do a significance test, and ignore the
 difference if you can't reject the null hypothesis of no difference in
 the populations from which these people were drawn (whatever one might
 think those populations are).

Personally, I am not interested in the question of statistical
testing to dismiss the alternative explanation being proposed;
indeed, I suspect that the original claim about gender being the
cause of salary differences would not stand up very well either
to statistical tests.  But there does seem to me to be more than
just saying ... "see there is a difference" and that statistical
procedures would have a role to play.  For example, wouldn't the
strength and consistency of the differences influence your
confidence that this was indeed the underlying factor?  The same
difference in means due to one or two outliers would surely not
mean the same thing as a uniform pattern of productivity
differences, would it?  And wouldn't you want to demonstrate that
there was a significant and ideally strong within-group
relationship between productivity and salary before claiming that
it is a reasonable alternative for the between-group differences?  
Or at least, wouldn't that 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-09 Thread Thom Baguley

Irving Scheffe wrote:
 Imagine it is 1961. Our question is, which outfield has better
 home run hitters, the Yankees or Detroit? Here are the numbers
 for the Yankee and Tiger starting Outfields.
 
 Yanks   Tigers
 -   --
  61   45
  54   19
  22   17
 --
 
 Now, the t-test isn't significant, nor is the permutation test.
 But is either relevant to the question? If you have a reasonable
 understanding of the notion of "home run," the answer is no.
 
snip
 It was, by definition, the population of interest, so it appears that
 you are flat wrong. The question we were asking was, "if we take the
 large identifiable cluster of senior MIT women who graduated between
 1970 and 1976, and compare them with their natural cohort, the men who
 graduated in the same time frame, do we see performance differences?"
 
 The answer is, as shown by the data above: yes. We see huge
 performance differences. Just like with the Yankees and Tigers in
 1961.

It seems to me that you are unncessarily restricting the questions than can be
asked by others. You are not even restricting them to the interesting
questions. For example, asking who scored more in 1961 - is different to which
players were better. Why not think of it in terms of "Could this difference be
produced by 6 players of equal ability influenced by a large number of random
factors". In that case a significance test might have some value in evaluating
the hypothesis that one group was better.

The second case is even stronger. Take any two groups any you'll almost
certainly find a difference on most measures (citation count, salary, hat size
or whatever).

Finally, what allows you to infer that any difference you observe it "huge".
This is a relative judgement. In statistics we typically reference it to some
indication of (population) variability. In real world contexts we often use
other benchmarks.

For example, think about runs scored in the first innings of a test match by
three top order batsmen from two cricket teams

 England   Sri Lanka
- --
  61   45
  54   19
  22   17
 --

Is this a huge difference? I think not. Does it provide strong evidence that
the England top order batsmen are better than the Sri Lankans? No. What allows
you to infer a huge difference in the baseball case is your knowledge of
baseball (frequency of runs and so on). So at best, I think it is a misleading example.

Thom


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-09 Thread Radford Neal

In article [EMAIL PROTECTED],
Thom Baguley  [EMAIL PROTECTED] wrote:

Why not think of it in terms of "Could this difference be
produced by 6 players of equal ability influenced by a large number of random
factors". In that case a significance test might have some value in evaluating
the hypothesis that one group was better.

Recall that this baseball example was intended to clarify how one
should go about determining whether or not there is reason to think
that MIT discriminated against women faculty.  From your comment, I'd
guess that you think that MIT should not pay faculty based on their
actual achievements, but rather on the basis of some estimate of their
ability, disregarding "random factors".  That's an interesting
opinion, but would a policy of paying based on actual achievement (or
a noisy estimate of actual achievement) constitute discrimination?

   Radford Neal


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford



=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-08 Thread Irving Scheffe

On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED]
wrote:

On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
Scheffe) wrote:

My comments are written as responses to the technical 
comments to Jim Steiger's last post.  This is shorter than his post,
since I omit redundancy and mostly ignore his 'venting.'
I think I offer a little different perspective on my previous posts. 

[ snip, intro. ]

Mr. Ulrich's latest post is a thinly veiled ad hominem, and
I'd urge him to rethink this strategy, as it does not
present him in a favorable light. 

Any objective reader would notice how the post
is riddled with emotional attributions and loaded
language like 

"venting"
"exquisite sensitivity" (a claim attributed to me that I never made)
"hammering your own gavel"
"ferocity"
"angry"
"shouted down"
"blundering around"
"browbeat them"
"crude"

At the same time that Mr. Ulrich makes these disparaging
but completely inaccurate attributions, he characterizes
the posts of another discussant as "polite." Considering 
that this "polite" poster (Gene Gallagher) used terms like "Rush
Limbaugh dittohead," it is clear that Mr. Ulrich's perceptions and
attributions are badly biased.

While he invests an extraordinary amount of effort in
such irrelvant ad hominems, Mr. Ulrich seems unable
to answer the simplest statistical questions regarding
his point of view. And, in his latest post, he reveals
in more detail how he insists on remaining as uninformed
as possible while rendering such judgments.

Most disturbingly, he contradicts himself
and mischaracterizes previous discussions.

For example, 


JS
 You are the one who examined nonrandom data, representing citation
 counts over a 12 year period for senior male and female MIT biologists
 matched for year of Ph.D.  You look at these data, which
 show a HUGE difference in performance between the men and women,
 and declare that a significance test is necessary. But you
 cannot provide any mathematical justification for the test.

 I gave several examples to try to jar you into realizing that
 a statistical test on the data cannot answer the question you
 want answered.

To start with, I never examined any *data*.  I kept away from
the papers because I knew so little about the data and it looked
so messy; I made some comments about how difficult it could be.

Yet he made what appeared to be comments about data. For example:

quote from earlier Ulrich post
I can't say that I have absorbed everything that has been argued.  
But as of now, I think Gene has the better of it.  To me, it is not
very appropriate to be highly impressed at the mean-differences, 
when TESTS that are attempted can't show anything.  The samples 
are small-ish, but the means must be wrecked a bit by outliers.

This raises the question: If he never examined the data, how could he
make a statement about "outliers" in the data?  


I tossed in a couple of comments to encourage Gene G., who made
some good sense, as did Dennis.  

They made no sense, let alone "good sense."  I gave numerous
examples demonstrating this. Mr. Ulrich professes that he doesn't
see the point of them.


As I read it, you proceeded to 
browbeat them, while failing to respond to their substance.

Not true. First of all, there was vitually no substance in
their arguments. Dr. Gallagher wants to do a randomization test
because he is concerned [I'm interpreting and paraphrasing a bit]
about the scale and variability questions that
naturally surround citation data. These concerns are worthwhile,
but he failed (ever) to explain how a randomization test or a t-test
could answer such questions.

Indeed, later, he presented a "logarithmic transform" of the
citation data which made the differences look less severe,
but never provided any rationale for that, either. [See my
Yankees-Tigers example later.]

I have tried to make sense of that early part of *your* argument,
where you want to leap over their critiques.

Actually, they offered no critique. Dr. Gallagher offered mainly 
name-calling and ad hominem in his early posts, using terms
like "Rush Limbaugh dittohead."  Seems like Mr. Ulrich's criticism
is misplaced.

You claim a HUGE difference.  You say you assert this because of 
exquisite sensitivity to numbers.  Dennis challenged this on the
basis of "lousy standards" -- either by their metric or content -- 
and Gene challenged this as misleading, because it was "not 
(nominally) significant."  

To declare that something is "not significant" requires a rationale.

They disagreed with you on the inference
that you drew from two means.

I agree that a huge difference may be useful.  I agree that t-tests
don't offer any final resolution.  (As I posted before,) with
nonrandom data, we have to argue contingencies, explore options, 
and make what inferences that we can.  You seem to cut that short, 
chop! -- pronouncing your own verdict as final -- but I don't see how 
hammering your own gavel can convince people who have 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-08 Thread Rich Ulrich

On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe
[EMAIL PROTECTED] wrote:

 On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED]
 wrote:
 
 On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
 Scheffe) wrote:
 
 My comments are written as responses to the technical 
 comments to Jim Steiger's last post.  This is shorter than his post,
 since I omit redundancy and mostly ignore his 'venting.'
 I think I offer a little different perspective on my previous posts. 
 
 [ snip, intro. ]
 
 Mr. Ulrich's latest post is a thinly veiled ad hominem, and
 I'd urge him to rethink this strategy, as it does not
 present him in a favorable light. 

 - I have a different notion of ad-hominem, since I think it is
something directed towards 'the person'  rather than at the
presentation.  Or else, I don't follow what he means by 'thinly
veiled.'

When a belligerent and nasty and arrogant tone seems to be
an essential part of an argument, I don't consider myself to be
reacting 'ad-hominem' when I complain about it -- it's not that I
hate to be ad-hominem, but I don't like to be misconstrued.

I'm willing, at times, to plunk for the 'ad-hominem'.   
For instance, since my last post on the subject, I looked at those
reports. Also, I searched with google for the IWF -- who printed the
anti-MIT critiques.  I see the organization characterized as an
'anti-feminist' organization, with some large funding from Richard
Scaife.  'Anti-feminist'  could mean a reasoned-opposition, or a
reflex opposition.  Given these papers, it appears to me to qualify as
'reflex' or kneejerk opposition.  Oh, ho! I say,  this explains where
the arguments came from, and why Jim keeps on going --  
Now, THIS PARAGRAPH   is what I consider an ad-hominem argument.  
And I'll give you some more.

Scaife is a paranoid moneybags and publisher who infests this
Pittsburgh region (which is why I have noticed him more than a
westerner like Coors).  His cash was important in persecuting Clinton
for his terms in office.   For example, Scaife  kept alive Victor
Foster's suicide for years.  He held out money for anyone willing to
chase down Clinton-scandals.  Oh, he funded the chair at Pepperdine
that Starr had intended to take.

Now:  My comment on the original reports:  I am happy to say that it
looks to me as if MIT is setting a good model for other universities
to follow.  The senior administrator listens to his faculty,
especially his senior faculty, and responds.  

MIT makes no point about numbers in their statements, and it 
does seem to be wise and proper that they don't do so.  

I see now, Jim is not really arguing with MIT.  They won't argue back.

Jim's purpose  is to create a hostile presence, a shadow to threaten 
other administrators.  He goes, like, "If you try to 'cut a break'
for women, we'll be watching and threatening and undermining,
threatening your job if we can."  

I suppose state universities are more vulnerable than the private
universities like MIT.  On the other hand, with the numbers that Jim
has put into the public eye, the next administrator can point to the
precedent of MIT and assert that, clearly, the simple numbers on
'quality' are substantially irrelevant to the issues, since they were
irrelevant at MIT.

Hope this helps.

-- 
Rich Ulrich, [EMAIL PROTECTED]

http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-08 Thread Irving Scheffe

I think we've now reached an adequate point of conclusion:

To summarize Mr. Ulrich's latest post:

1. He doesn't think his previous litany of 
unfounded emotional attributions is "ad-hominem."

Yet, he continues the same strategy here, characterizing
the Hausman-Steiger report as an attempt to "threaten"
administrators [by presenting relevant facts...]

And, he quotes ad hominem attacks by others as part of
his argument.

2. He feels my previous tone was "nasty" and "beligerent,"
although there was no such tone. [Apparently, anyone asking
Mr. Ulrich to justify a statistical conjecture with an
argument is being "nasty" and "beligerent."]

3. Mr. Ulrich then proceeds to completely ignore 
the statistical issues, and launches into another
irrelevant attack. Indeed, he uses a standard ploy,
"argument by Granting Agency." [a standard feminist
ploy, born of argumentative desperation]

Finally, Mr. Ulrich capitulates completely on the statistical,
logical, and moral issues issues, simply stating that he is pleased
with the outcome of the MIT report.  The final two paragraphs are
classic, and, unfortunately, only slightly more irrational than what
normally is provided to justify reverse discrimination.

It is quite amazing to see a "biostatistician" formally
arguing in print, that one university's ignoring [suppressing?]
relevant information would provide justification for other
universities to declare the same information "irrelevant."

Truly "Landgrebian"!

Of course, the moderately astute undergraduate with minimal
training in critical thinking will recognize Mr. Ulrich's
final circularity, which goes something like this

"The MIT report's misleading statements about performance
are ok, because, well, I like what MIT did, and now other
administrators can do similar things, and justify them on the basis of
what MIT did."

I think I can rest my case now.

--
James H. Steiger, Professor
Dept. of Psychology
University of British Columbia
Vancouver, B.C., V6T 1Z4
-


Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty. 

Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
  http://www.iwf.org/news/mitfinal.pdf  

Judith Kleinfeld's Article Critiquing the MIT Report:
 http://www.uaf.edu/northern/mitstudy/#note9back

Original MIT Report on the Status of Women Faculty:
 http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/



On Thu, 08 Mar 2001 16:03:36 -0500, Rich Ulrich [EMAIL PROTECTED]
wrote:

On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe
[EMAIL PROTECTED] wrote:

 On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED]
 wrote:
 
 On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
 Scheffe) wrote:
 
 My comments are written as responses to the technical 
 comments to Jim Steiger's last post.  This is shorter than his post,
 since I omit redundancy and mostly ignore his 'venting.'
 I think I offer a little different perspective on my previous posts. 
 
 [ snip, intro. ]
 
 Mr. Ulrich's latest post is a thinly veiled ad hominem, and
 I'd urge him to rethink this strategy, as it does not
 present him in a favorable light. 

 - I have a different notion of ad-hominem, since I think it is
something directed towards 'the person'  rather than at the
presentation.  Or else, I don't follow what he means by 'thinly
veiled.'

When a belligerent and nasty and arrogant tone seems to be
an essential part of an argument, I don't consider myself to be
reacting 'ad-hominem' when I complain about it -- it's not that I
hate to be ad-hominem, but I don't like to be misconstrued.

I'm willing, at times, to plunk for the 'ad-hominem'.   
For instance, since my last post on the subject, I looked at those
reports. Also, I searched with google for the IWF -- who printed the
anti-MIT critiques.  I see the organization characterized as an
'anti-feminist' organization, with some large funding from Richard
Scaife.  'Anti-feminist'  could mean a reasoned-opposition, or a
reflex opposition.  Given these papers, it appears to me to qualify as
'reflex' or kneejerk opposition.  Oh, ho! I say,  this explains where
the arguments came from, and why Jim keeps on going --  
Now, THIS PARAGRAPH   is what I consider an ad-hominem argument.  
And I'll give you some more.

Scaife is a paranoid moneybags and publisher who infests this
Pittsburgh region (which is why I have noticed him more than a
westerner like Coors).  His cash was important in persecuting Clinton
for his terms in office.   For example, Scaife  kept alive Victor
Foster's suicide for years.  He held out money for anyone willing to
chase down Clinton-scandals.  Oh, he funded the chair at Pepperdine
that Starr had intended to take.

Now:  My comment on the original reports:  I am happy to say that it
looks to me as if MIT is setting a good model for other universities
to follow.  The senior administrator listens to 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-08 Thread RCKnodt

I would like to make direct contact with Dr. Scheffe.  I have some comments 
that I would like to direct to him but not to the mailing list.  I would 
appreciate it if he could contact me directly.

Dr. Robert C. Knodt
4949 Samish Way, #31
Bellingham, WA 98226
[EMAIL PROTECTED]

"The point to remember is that what the government gives, it must first take 
away."  John S. Coleman at Senate meeting.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-03-02 Thread Rich Ulrich

On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving
Scheffe) wrote:

My comments are written as responses to the technical 
comments to Jim Steiger's last post.  This is shorter than his post,
since I omit redundancy and mostly ignore his 'venting.'
I think I offer a little different perspective on my previous posts. 

[ snip, intro. ]

JS
 You are the one who examined nonrandom data, representing citation
 counts over a 12 year period for senior male and female MIT biologists
 matched for year of Ph.D.  You look at these data, which
 show a HUGE difference in performance between the men and women,
 and declare that a significance test is necessary. But you
 cannot provide any mathematical justification for the test.

 I gave several examples to try to jar you into realizing that
 a statistical test on the data cannot answer the question you
 want answered.

To start with, I never examined any *data*.  I kept away from
the papers because I knew so little about the data and it looked
so messy; I made some comments about how difficult it could be.

I tossed in a couple of comments to encourage Gene G., who made
some good sense, as did Dennis.  As I read it, you proceeded to 
browbeat them, while failing to respond to their substance.
I have tried to make sense of that early part of *your* argument,
where you want to leap over their critiques.

You claim a HUGE difference.  You say you assert this because of 
exquisite sensitivity to numbers.  Dennis challenged this on the
basis of "lousy standards" -- either by their metric or content -- 
and Gene challenged this as misleading, because it was "not 
(nominally) significant."  They disagreed with you on the inference
that you drew from two means.

I agree that a huge difference may be useful.  I agree that t-tests
don't offer any final resolution.  (As I posted before,) with
nonrandom data, we have to argue contingencies, explore options, 
and make what inferences that we can.  You seem to cut that short, 
chop! -- pronouncing your own verdict as final -- but I don't see how 
hammering your own gavel can convince people who have the
choice of looking elsewhere.

You may think that you speaking from unimpeachable epiphany; 
to the rest of us, it looks like you are jumping to a conclusion.

You offer your  *inference* that a huge citation difference explains 
the outcome.  Okay, that could be reasonable.  If the difference 
is direct but attenuated, the "difference" between citations would
be larger, by variance (by some measure)-accounted for, than the 
difference between outcomes: which, I think, we stipulate has some 
size to it.

If those measurements are on a reasonably useful metric, then a 
t-test should show it.  It is my own experience, and part of my
own learned, "exquisite" sensitivity to numbers, that
 (1) a mean difference as large as you illustrated should result 
in a t-test that is significant, unless there is something screwy 
with the numbers. 
 (2) And if there is something so screwy with the numbers, then
it is usually misleading and wrong to present the MEANS as if
their contrast was meaningful ("huge").

Now, there is not a "mathematical necessity" for a test statistic.
It is a request that you respect the conventions of statisticians,
even when we ask for a test on non-random data, for what we might 
learn from it.  Non-significant tests, which I had thought the
data were producing, really undermine your adjective "huge".
A "significant" test, which you now report, lends some credibility.
Gene's permutation test says that those sets are not disjoint, 
however, so there is some basis for direct comparison.  The
most extreme permutation would have undercut  *one* form of
comparison, and the most obvious part of one argument about
discrimination (though, I expect, not everything).  It *looks*
like you didn't want to consider a comparison because you 
figured you could win the argument by repetition and ferocity.

Your first excuse for not computing a t was that this was a 
"population" but that was flat wrong.  I asked for your textbook
references, and finally offered my own, in order to figure out
your context.  I offered "nonrandom", which you used, above.
However, the old arguments about not computing "test statistics" 
on nonrandom samples have hardly any force these days -- I 
offer epidemiology as the pervasive (and persuasive) influence.

Epidemiologists need to be reminded about the limits to their
inference, - they tend to forget it entirely - but I think you 
are standing alone if you refuse to compute, claiming that old
principle.  I don't know if your role is such that someone will 
*have*  to answer you, or if you are fated to wind up 
ignored (as non-responsive and therefore irrelevant); 
and angry.

[snip, my comment to which JS wrote:]
 Not so. If you were following the logic of the many examples I've
 presented, you could see that you can construct a reductio ad
 absurdem for any of the types of significance tests 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-27 Thread Irving Scheffe

Rich,

Both Radford Neal and I have asked
for a statistical rationale supporting
your claim that a significance test
that you advocated
can provide useful information when applied
to the MIT senior biologist data. You
haven't provided one. Instead, you
cite from a web statistics guide which
in turn provides no rationale.

It is now quite apparent that you
have no rationale, only prejudices.
This may be acceptable to the people
who come to you for consulting, but
this is a different forum, with different
standards.

Further comments are interspersed:



On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich [EMAIL PROTECTED]
wrote:

On Mon, 19 Feb 2001 04:27:24 GMT, [EMAIL PROTECTED] (Irving
Scheffe) wrote:

 In responding to Rich, I'll intersperse selected comments with
 selected portions of his text and append his entire post below.

 - I'm not done with the topic yet.  But it is difficult to go on from
this point.

I think the difficulty is that JS has constructed his straw-man
argument about how "hypotheses" are handled; and since it 
is a stupid strategy, it is easy for him to claim that it is fatally
flawed.

All the referents are unclear. I didn't construct any straw man
arguments, and you haven't made clear what you are talking about.
You are the one who examined nonrandom data, representing citation
counts over a 12 year period for senior male and female MIT biologists
matched for year of Ph.D.  You look at these data, which
show a HUGE difference in performance between the men and women,
and declare that a significance test is necessary. But you
cannot provide any mathematical justification for the test.

I gave several examples to try to jar you into realizing that
a statistical test on the data cannot answer the question you
want answered.


From his insistence on his "examples,"  it seems to me that he
believes that someone else is committed to using p-levels in a strict
way, by beating 5%.  

Not so. If you were following the logic of the many examples I've
presented, you could see that you can construct a reductio ad
absurdem for any of the types of significance tests you are
proposing. If I believed strictly in hypothesis testing 
with a 5% significance level, I doubt that I'd have written
an extensive article advocating confidence interval replacements
for many of the classic hypothesis tests employed in the social
sciences, and giving the precise, exact procedures for
constructing these confidence intervals.


That's certainly not the case for me, and I
doubt if anyone defends or promotes it, outside of carefully designed 
Controlled Random Experiments.


It is not the case for me, either, and so everything that follows is
irrelevant.

Despite the fact that I could not make sense of WHY he wanted
his example, it turns out -- after he explains it more -- that my own
analysis covered the relevant bases.  I agree, if you don't have
"statistical power,"  then you don't ask for a 5%  test, or (maybe) 
any test at all.  The JUSTIFICATION for having a test on the MIT
data is that the power is sufficient to say something.  

In order to talk meaningfully about "power", you have to have
a statistical rationale. As I have repeated numerous times,
you have no statistical rationale. You simply "feel like"
you "should" compute a statistical test, when all the assumptions
on which the procedure is based are violated in the data you
are applying the procedure to.

Power to detect what? Under what distributional assumptions?



And what it said is that Jim did BAD INFERENCE.  I said that a 
couple of times.  I regret that I may have confused people with
unnecessary words about "inference."
 Outlier =  No central tendency =  Mean is BAD  statistic;
careful reader insists on more or better information before asserting
there's a difference.

What "outlier" are you referring to? What statistical rule did you 
use to determine the "outlier"?

The MIT paper included all the raw data. At no point did I or my 
coauthor state that we were doing inference on means. (Actually,
a 2 sample t-test done on these data is significant at the .05 
level, but we never imagined computing one.)

Here are the raw data for the citation counts for the 5 senior
MIT female biologists and 6 males who graduated from 1970-76.

MalesFemales
---
128302719
113131690
106281301
 43961051
 2133 935
  893
---

These data are based on 12 years worth of records, from 1989-2000.
The above could be broken down in numerous other ways. For example, we
could produce citation counts per year, try to perform some kind of
correction for the highly specific areas the individuals publish in,
etc. Time series could be examined. 

However, these data are anything but a random sample. MIT is one of
the most selective universities in the world in terms of whom it
hires. 



I asserted that more than once.

Optimistically, my own data analysis technique might be 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-26 Thread Rich Ulrich

 - I want to comment a little more thoroughly about the lines I cited:
what Garson said about inference, and his citation of Olkey.


On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich [EMAIL PROTECTED]
wrote:

[ snip, previous discussion ]

me 
 I think that Garson is wrong, and the last 40 years of epidemiological
 research have proven the worth of statistics provided on non-random,
 "observational"  samples.  When handled with care.
 
 From G. David Garson, "PA 765 Notes: An Online Textbook."
 
 On Sampling
 http://www2.chass.ncsu.edu/garson/pa765/sampling.htm
 
 Significance testing is only appropriate for random samples.
 
 Random sampling is assumed for inferential statistics
 (significance testing). "Inferential" refers to the fact
 that conclusions are drawn about relationships in the data
 based on inference from knowledge of the sampling
 distribution. Significance tests are based on a sampling
 theory which requires that every case have a chance of being
 selected known in advance of sample selection, usually an
 equal chance. Statistical inference assesses the
 significance of estimates made using random samples. For
 enumerations and censuses, such inference is not needed
 since estimates are exact. Sampling error is irrelevant and
 therefore inferential statistics dealing with sampling error
 are irrelevant. 

 - I agree with most of what he says, throughout; there will be a
matter of nuances on interpretation and actions.

For enumerations and censuses, a limited sort of statistics on 'finite
populations,' he says sampling error is irrelevant.  Irrelevant is a
good and fitting word here.  This is not 'illegal  and banned,'  but
rather 'unwanted and totally beside the point.'

Garson 
  Significance tests are sometimes applied
 arbitrarily to non-random samples but there is no existing
 method of assessing the validity of such estimates, though
 analysis of non-response may shed some light. The following
 is typical of a disclaimer footnote in research based on a
 non random sample: 

Here is my perspective on testing, which does not match his.
 - For a randomized experimental design,  a small p-level on 
a "test of hypothesis" establishes that *something*  seemed 
to happen, owing to the treatment; the test might stand 
pretty-much by itself.
 - For a non-random sample, a similar test establishes that
*something*  seems to exist, owing to the factor in question 
*or*  to any of a dozen factors that someone might imagine.  
The test establishes, perhaps, the  _prima facie_  case  but the
investigator has the responsibility of trying to dispute it.  

That is, it is an investigator's responsibility (and not just an
option) to consider potential confounders and covariates.  
If the small p-level stands up robustly, that is good for the 
theory -- but not definitive.  If there are vital aspects or factors
that cannot be tested, then opponents can stay unsatisfied, 
no matter WHAT the available tests may say.


Garson  
 "Because some authors (ex., Oakes, 1986) note the use of
 inferential statistics is warranted for nonprobability
 samples if the sample seems to represent the population, and
 in deference to the widespread social science practice of
 reporting significance levels for nonprobability samples as
 a convenient if arbitrary assessment criterion, significance
 levels have been reported in the tables included in this
 article." See Michael Oakes (1986). Statistical inference: A
 commentary for social and behavioral sciences. NY: Wiley. 
 

Garson is telling his readers and would-be statisticians  a way to
present p-levels,  even when the sampling doesn't justify it.
And, I would say, when the analysis doesn't justify it.
I am not happy with the lines -- The disclaimer does not assume 
that a *good*  analysis has been done, nor does it point to what 
makes up a good analysis.  

 '... if the sample seems to represent the population'  
seems to be a weak reminder of the proper effort to overcome 
'confounding factors';  it is not an assurance that the effects 
have proven to be robust.  

So, the disclaimer should recognize that the non random sample 
is potentially open to various interpretations; the present analysis
has attempted to control for several possibilities;  certain effects
do seem robust statistically, in addition to being supported by 
outside chains of inference, and data collected independently.

I suggested earlier that this is the status of epidemiological,
observational studies.  For the most part, those studies have 
been quite fruitful.  But not always.  They have been especially
likely to mislead, I think, when the designs pretend that binomial
variability is the only source of error in a large survey, and attempt
to interpret small effects.

-- 
Rich Ulrich, [EMAIL PROTECTED]
http://www.pitt.edu/~wpilib/index.html


=
Instructions for joining and 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-23 Thread Radford Neal

In article [EMAIL PROTECTED],
Rich Ulrich  [EMAIL PROTECTED] wrote:

 I agree, if you don't have "statistical power," then you don't ask
 for a 5% test, or (maybe) any test at all.  The JUSTIFICATION for
 having a test on the MIT data is that the power is sufficient to say
 something.

The reason why one should NOT do a significance test on this data, at
any level, and regardless of how much power the test would have, was 
explained by me a while ago in the post I have repeated below.

If you think there is something wrong with my reasoning, I suggest you
explain the flaw.

   Radford Neal

--

I think the statistical issue in this discussion can be boiled down to
a question of how to calculate standard errors for regression
coefficients.

What regression?  Well, there isn't one, because there isn't any data,
but the discussions seems to presuppose the possibility of data that
for each faculty member gives their salary (the response variable, y),
their gender (x1, coded as a dummy variable), and some indicator of
performance (x2).  The question is whether one has evidence that the
regression coefficient for the dummy gender variable (x1) is non-zero.
This will require computing the standard error for the estimate of
this regression coefficient.

The accepted procedure for computing this standard error involves the
sample correlation between the two predictors, x1 and x2.  When the
sample correlation is high, the standard errors for the regression
coefficients will tend to be high, making it more difficult to
conclude that the coefficient for gender is non-zero.

The procedure apparently being advocated by some posters is to perform
a test of the null hypothesis that the correlation between x1 and x2
in the population is zero, and if there is not sufficient evidence to
reject this null hypothesis, compute the standard errors for the
regression coefficients as if the predictors were uncorrelated.

I believe that this procedure is not generally accepted, for very good
reasons.


Radford M. Neal   [EMAIL PROTECTED]
Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED]
University of Toronto http://www.cs.utoronto.ca/~radford




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-18 Thread Irving Scheffe

Milo:

Sure, although I don't see how that is relevant
to the MIT situation, which attributed
the current status of women there to 
discrimination, based on an undisclosed
methodology.

More generally, one CAN indeed do randomization tests
on similar data even though there is no inference toward 
a larger population, and no intention of inferring
later performance, and no random sampling from a 
larger population.

For example, suppose a department head insists that committee
assignments are made by completely random selection, but
the 4 women discover that they have (by some objective criterion)
the 4 worst assignments, while the 4 men faculty have the 4 best
assignments. One can test the hypothesis that the assignment
is random with respect to quality, and reject it at commonly used
significance levels, by asking, "What is the probability
of observing an imbalance this large if these 8 assignments
had been randomly assigned to these 8 people"?

Quality of Assignment (10 point scale)

MalesFemales
 61
 73
 82
 94


Since there are 70 possible assignments,
and this one achieves the absolute worst 
imbalance, things look bad for the
department head. The probability is 
1/70 of obtaining an imbalance greater
than or equal to the one observed, given
random sampling.

Note, we need not assume that the men
and women have been sampled randomly
from a larger population to perform
this combinatorial calculation.

On the other hand, one would not need a significance test to
evaluate the statement that "The women, at this time, have much
worse committee assignments than the men," so long as 
it can be assumed that the measurement scale is reasonable.

--Jim


James H. Steiger, Professor
Department of Psychology
University of British Columbia
Vancouver, B.C., Canada V6T 1Z4
--

Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty. 

Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
  http://www.iwf.org/news/mitfinal.pdf  

Judith Kleinfeld's Article Critiquing the MIT Report:
 http://www.uaf.edu/northern/mitstudy/#note9back

Original MIT Report on the Status of Women Faculty:
 http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/







On Sun, 18 Feb 2001 23:06:44 GMT, "Milo Schield" [EMAIL PROTECTED]
wrote:

Jim has consistently maintained three claims/arguments:
1. we are not trying to generalize from a small group of people to a larger
population.
2.  No inference is involved (if we are not generalizing)
3.  Using statsitical tests is meaningless (since no inference is involved).

I agree with his 1st and 3rd points -- but not his 2nd.
Within the second, I disagree with the truth of his premise.

The generaliation mentioned in #1 is NOT the only possible generalization.
Another generalization may be involved: that involving time.  We sample
things (basketball goals) for two groups of players in a few games and then
want to make an inference about whether these particular scores were
unlikely given that THESE particular players involved had the same average
scores in the long run -- for all games.

Given a time-based generalization, we now have an inference.  Given this
inference, the applicability of statistical tests seems quite relevant.
Milo
PS. This may be a quasi-Bayesian reinterpretation of these problems -- but
if it fits.
-
"Irving Scheffe" [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 There are a wide variety of probabilities that may be calculated
 in this situation, depending on the assumptions you want to
 make, and precisely what you mean by "this result." However, if you
 ask, "How likely is it that the for side won", the answer is that the
 for side won.  If you ask, "How likely is it that the percentage of
 for votes is higher for women than for men in this sample," the answer
 is that it is perfectly likely, because it happened.

 In an example perhaps more relevant to previous examples here, suppose
 this was an actual departmental vote, and the result was 5-3. The
 motion passed.

 If one of the men was a statistician wanting to overthrow
 the result and said, "wait a minute,
 let's perform a statistical test and see the probability of
 obtaining this gender split, given that the 5 yes votes were actually
 randomly assigned with respect to gender," I suspect people
 would think him odd. He can compute this probability, but
 it is irrelevant.




 On 15 Feb 2001 13:49:51 -0800, [EMAIL PROTECTED] (Paul R
 Swank) wrote:

 I remember a question from some stat book about a situation where there
were 8 members of a group, three men and five women (or the reverse, I can't
remember
 
 which) and on some issue the vote was five to three with all five women
voting for. The question was "How 

Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-18 Thread Irving Scheffe

On Mon, 19 Feb 2001 02:12:46 GMT, "Milo Schield" [EMAIL PROTECTED]
wrote:

Snip

But in most of your examples MORE is being claimed.  In most cases, the
claim includes an inference.  Once the claim involves an inference, then a
statistical test may be relevant.

In one case, the claim was discrimination (causal explanation of observed
differences); in another the claim was greater scoring ability (causal
explanation of observed differences).  

I don't think so. I think the issue was scoring production,
not ability, and the example was deliberately restricted:


[This being a 
hypothetical example, assume, for the sake of argument,
that this is a valid and complete measure
 of basketball performance.]

  White   Black
  -
  12.813.7
  11.122.3
  19.920.9
  13.9 
  16.8
  17.1
  13.0


I set up the example trying to
[artificially] restrict a discussion which would, naturally,
evolve in many directions. Of course "point scoring
ability" is not the same as "points scored," etc. 

And, of course, point production is, in the real world,
not the sole measure of basketball performance.



In both cases, the inference involves generalizing from a small "sample" of
time to a larger "population" of time.  Thus, the strength of the argument
is influenced by the time-span of the data.  In the case of MIT, had the
data been based on only one month, the case would be much weaker in support
of discrimination than if we had data for 12 years.  In the case of
basketball scoring of white and black players, the case would be much weaker
if we included only one quarter of one game than if we had included many
games.

How can we measure the influence of the time span involved in the data?
Here is where IMHO one can make a case for statistical tests being RELEVANT.

PS.  Just because MIT can "attribute" an outcome (difference in pay/status)
to a particular cause (discrimination) does not mean their argument is
strong.  A claim involving the existence/influence of an unobservable
(discrimination) requires evidence.  In this case, I think statistical
inference may provide some of that evidence.


And?

--Jim


James H. Steiger, Professor
Department of Psychology
University of British Columbia
Vancouver, B.C., Canada V6T 1Z4
--

Note: I urge all members of this list to read
the following and inform themselves carefully
of the truth about the MIT Report on the Status
of Women Faculty. 

Patricia Hausman and James Steiger Article,
"Confession Without Guilt?" :
  http://www.iwf.org/news/mitfinal.pdf  

Judith Kleinfeld's Article Critiquing the MIT Report:
 http://www.uaf.edu/northern/mitstudy/#note9back

Original MIT Report on the Status of Women Faculty:
 http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/





=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk

2001-02-15 Thread NoSpam54

Gene, 

You have made extended comments about
the IWF report "Confession without Guilt?"
(at http://www.iwf.org/news/mitfinal.pdf 
about women biologists at MIT. 

Some background information:


The IWF is the second in a series criticizing
the MIT report on the Status of Women. 
The original MIT report may be downloaded
from http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/

An earlier IWF report by Judith Kleinfeld revealed many
of the serious shortcomings of the
MIT report. Kleinfeld's report is at
http://www.uaf.edu/northern/mitstudy/#note9back

I would urge all students to download and read
all three of these papers, so they can get a better
feel for who, MIT or its critics, 
is more objective and scientific in
their treatment of a very touchy subject.
-

Gene, although you apparently intended
them as honest criticism, your
comments revealed a deep, fundamental
confusion about statistical inference.
I expect any trained statistician would recognize
the error in your argument immediately,
but some students may have been misled.

So let me try to alleviate some
of the confusion your posts may have
generated, and try to help you see
the error of your ways. And,
might I suggest, strongly, 
that you show this post to
your ECOS611 class? This might
make an excellent discussion piece
for them.
Snip

I'll share Dr. Steiger's comments with my statistics class.  Dr. Steiger asks
the readers of this newsgroup to read the whole set of documents on the MIT
gender-bias issue.  I must confess to not having done so.  I read his IWF
report, co-authored with Dr. Hausman, only after the Boston Globe described his
conclusions under the heading "MIT bias claims debunked."  My post was in
reference to the claims made in his IWF article, not to the other documents
that preceded it.  His article can be downloaded at:

http://www.iwf.org/news/mitfinal.pdf

Dr. Steiger  accuses me of improperly using statistical tests to make
inferences to a larger population.  I didn't do that.  Drs. Steiger and Hausman
in their IWF report claim to have found "striking," "compelling," and
"dramatic" differences in productivity between senior male and female Biology
Faculty at MIT.  I read their report and didn't see much evidence for gender
differences at all.  Most of the apparent gender differences in their graphs
disappeared when the data were plotted on a logarithmic scale:

http://www.es.umb.edu/edg/ECOS611/iwflnfigs.pdf

Dr. Steiger's post states, "There were HUGE differences in the citation rates
of senior men and women. The mean number of citations was, as I recall, roughly
7000 for the men and 1400 for the senior women."  The actual data were 7032 for
the men and 1539 for the women (with sample sizes of 6 and 5 respectively). 
The geometric means were 4800 and 1400.  A Mann-Whitney U test indicates that
12.6% of the permutations of these 11 data would produce differences in
citation number as extreme or more extreme than those reported.  Do these 11
data offer compelling or dramatic evidence for gender differences in
productivity?  Not to my way of thinking.  Was I making inferences to a larger
population?  I didn't intend to.  I was just trying to assess Steiger 
Hausman's claim of HUGE gender-based differences in productivity.

Dr. Steiger's post recommends that I should eschew using any formal statistical
tests on the IWF report data.  What is the alternative?  The approach used in
the IWF report is to point out an individual datum or a few data points, and to
make claims about "striking," "compelling" and "dramatic" differences between
the sexes.  I regard Dr. Steiger's evaluation of these data as being highly
subjective.  I have no idea what objective criteria he used to reach his
conclusions of "compelling," "striking" and "dramatic" differences between male
and female faculty.

Eugene D. Gallagher
ECOS

P.S.  Here are answers to your questions:

1. Suppose the IWF report had gathered  data on ALL female scientists at MIT,
 compared them to a group of males  matched for seniority, and had  shown huge
 differences in performance.  Would you still be doing a significance test? 

Of course not.  I posted a note called "Florida votes and statistical errors,"
on 11/30/00 on this newsgroup on this very topic.  DejaNews is dead, but you
can find this post on Google.

 2. Suppose Mary has 1051 citations  in the last 10 years, and has earned
 4 million in grants, and Fred, in the  same department, and about the same
 age and seniority,  has 12000+  citations and has earned 23 million  in
 grants, and Fred finds out that  Mary is making the same salary he is.
 Should Fred consult you about doing  a statistical "significance" test 
 before asking for a raise?

No.  What does this hypothetical have to do with your claim that "dramatic"
gender differences exist in the productivity of MIT biology faculty?

3. Suppose you go to a nursing school, and find 5 female faculty with 7,000
 citations apiece, and,