Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Irving Scheffe wrote: First, you're addressing the wrong question. We are not interested, in the example, in the "ability" of the players. We are interested in whether, over the course of the preceding 162 games, the Yanks outhomered the Tigers by a substantial I think that illustrates my point. There is no single "we" here. My question can't be wrong, except when contrasted with the question someone else asked or wants to ask. (I repeat that the thread had moved off its original narrow frame of reference, which I was most definitely not addressing). amount. They did. [This is not to say that "ability" isn't an interesting question. But your proposed randomization test doesn't address that issue well at all.] I think this is a crucial point. Yes, it doesn't address it particularly well, but it isn't irrelevant. I'd lump it with quick and dirty exploratory and descriptive stats that people do when eye-balling data for the first time. Second of all, you have chosen a suboptimal unit of analysis, if you are really interested in assessing "ability." To be be fair, I didn't choose any units of analysis at all. I wrote in responses to the units of analyses already being discussed. The implicit assumption was that that this is the data you have. Thom = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On 14 Mar 2001 21:55:48 GMT, [EMAIL PROTECTED] (Radford Neal) wrote: In article [EMAIL PROTECTED], Rich Ulrich [EMAIL PROTECTED] wrote: (This guy is already posting irrelevant rants as if I've driven him up the wall or something. So this is just another poke in the eye with a blunt stick, to see what he will swing at next) I think we may take this as an admission by Mr. Ulrich that he is incapable of advancing any sensible argument in favour of his position. Certainly he's never made any sensible response to my criticism. - In a new thread, I have now provided a response that is sensible, or, at least, somewhat numeric. I notice that Jim C. has taken up the cudgel, in trying to explain the basics of t-tests to Jim S, and that "furthers my position." I figure that after I state my position in one post, explicate it in another, and try that again while refining the language -- then I may as well call it quits with JS, when he still doesn't get the points from the first (or from the couple of other people who were posting them before I was). I may not be saying it all that well, but I wasn't inventing the position. You and I are in agreement, now, on one minor conclusion: "The t-test isn't good evidence about a difference in averages." But for me, that's true because the numbers are crappy indicators of performance -- which was clued *first* by the distribution. Whereas, you seem to have much more respect for crude averages, compared to the several of us who object. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
- I hate having to explain jokes - On 14 Mar 2001 15:34:45 -0800, [EMAIL PROTECTED] (dennis roberts) wrote: At 04:10 PM 3/14/01 -0500, Rich Ulrich wrote: Oh, I see. You do the opposite. Your own flabby rationalizations might be subtly valid, and, on close examination, *do* have some relationship to the questions could we ALL please lower a notch or two ... the darts and arrows? i can't keep track of who started what and who is tossing the latest flames but ... somehow, i think we can do a little better than this ... Dennis, Please, where is YOUR sense of humor? My post was a literary exercise -- I intentionally posted his lines immediately before mine, so the reader could follow my re-write phrase by phrase. I'm still hoping "Irving" will lighten up. You chopped out the original that I was paraphrasing, and you did *not* indicate those important [snip]s -- You would mislead the casual reader to think someone other than JimS is originating lines like that, or intend them as critique in this group. - I'm not always kind, but I think I am never that wild. - It's probably been a dozen years since I purely flamed like that. (Or maybe I never flamed, if you talk about the really empty ones. In the olden days of local Bulletin Boards, with political topics, I discarded 1/3 of my compositions without ever posting, because of poor content or tone. I still use some judgment in what I post.) Compare his original line about 'little or no ... relationship' with my clever reversal, "... on close examination, *do* have some relationship to the questions." Well, I was trying for humor, anyway. Sorry, if I missed. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Thu, 15 Mar 2001 18:09:26 GMT, Jerry Dallal [EMAIL PROTECTED] wrote: Irving Scheffe wrote: Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ It is frustrating to keep getting errors when I try to access a printable version of the report, whether by using IE or Netscape. Is there a known workaround? Many people have had problems double-clicking on the Adobe Acrobat link. This has to do with various integration problems between Acrobat and Internet Explorer. It is, in general, much better to right-click on the link, then choose to save the file locally. After downloading the whole file, you should (if you have Adobe Acrobat on your system) be able to read it. --Jim Steiger = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Thanks again for the clarification, Jim. I think we are in essential agreement. To reply succinctly to your message: 1. Certainly, as a general rule one should *always* look at distributional shape as well as summary statistics. Feminists seldom do, by the way, in advancing arguments about discrimination. Indeed, as those of us who've heard the "women make 73 cents on the dollar" mantra for years know all too well, they'll repeat the most inane statistic if it sounds good. 2. In the MIT data, Mr. Ulrich seemed to be implying that the mean differences *favoring* the men might be due to one or two outliers. However, there is a serious question whether the men in the 1 range should actually be considered outliers. If you don't want to address that, fine. It seemed like you were agreeing with his position. It now seems you were not. Sorry if I misread. 3. In the modern academic environment, I think that Nobel Prize Winners generally make above average salaries, and tend to be highly productive people as well. I may be wrong, but some data I've seen suggest otherwise. 4. I probably would not be inclined to use formal inferential procedures with the MIT data, even if it were provided. Keep in mind that, in a perfectly fair society, there is a "balance of unfairness." What I'd probably do is a regression analysis, and try to decide, on the basis of some fairly extensive consultation, when a residual is large enough to merit recompense. There is a real problem with some of the recommendations recently agreed to at MIT. Salaries have a natural error variance, if you take two groups of "equally" performing people, they will almost certainly have differences both in pay and in performance. The way it now stands, feminists planning to use MIT as a template want the right to demand a pay increase anytime they can identify a salary decrement, regardless of (a) whether any performance figures have been taken into account, and (b) whether "natural variation" has been examined. Similar venues are not open to men. So, in the future, we may find rapid "fixing" of even minor, well-deserved differences when women find themselves on the short end, but no such "fixes" when men find themselves on the short end. This merely perpetuates more unfairness, and will almost certainly result in a backlash some time in the future. BTW, I would like to rebut any notion that I am, in general, against salary equity procedures. It is a matter of record that, in 1988, when serving as a member of the salary negotiation team at UBC, I pointed out that an across-the-board raise of $2700, requested for all women, would unfairly benefit those who had started working recently, and not make up the balance for those who had been working there a long time. As a result, a regression based procedure was adopted that more equitably distributed the money. I supported this, as did most of the other members of the team, among them several women. What I am against is poorly designed, unfair procedures that reward people solely on the basis of their race or gender and their willingness to gripe. Best regards, Jim Steiger -- James H. Steiger, Professor Dept. of Psychology University of British Columbia Vancouver, B.C., V6T 1Z4 - Note: I urge all members of this list to read the following and inform themselves carefully of the truth about the MIT Report on the Status of Women Faculty. Patricia Hausman and James Steiger Article, "Confession Without Guilt?" : http://www.iwf.org/news/mitfinal.pdf Judith Kleinfeld's Article Critiquing the MIT Report: http://www.uaf.edu/northern/mitstudy/#note9back Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ On Mon, 12 Mar 2001 13:10:47 -0600, jim clark [EMAIL PROTECTED] wrote: Hi On Mon, 12 Mar 2001, Irving Scheffe wrote: Jim: For example, suppose you had a department in which the citation data were Males Females 12220 1298 2297 1102 When I said outlier, I had in mind hypothetical data of the following sort (it doesn't matter to me whether it is the salaries or the citation rates): MalesFemales 170001000 10001000 10001000 10001000 Avg 50001000 vs. Males Females 50001000 50001000 50001000 50001000 Avg 50001000 I would view the latter somewhat differently than the former with respect to differences between these samples of males and females, and with respect to the kinds of explanations I would seek (e.g., somewhat general to males, something specific to male 1). The male with 12220 is, let's imagine, a Nobel Prize winner. The salaries for the 4 people are Males Females 156,880 121,176 112,120 114,324 Of course if the salaries were: Males Females 112,120 121,176 156,880 114,324 You probably might want not to promote the hypothesis of productivity
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Fri, 09 Mar 2001 15:53:12 +, Thom Baguley [EMAIL PROTECTED] wrote: Irving Scheffe wrote: Imagine it is 1961. Our question is, which outfield has better home run hitters, the Yankees or Detroit? Here are the numbers for the Yankee and Tiger starting Outfields. Yanks Tigers - -- 61 45 54 19 22 17 -- Now, the t-test isn't significant, nor is the permutation test. But is either relevant to the question? If you have a reasonable understanding of the notion of "home run," the answer is no. snip It was, by definition, the population of interest, so it appears that you are flat wrong. The question we were asking was, "if we take the large identifiable cluster of senior MIT women who graduated between 1970 and 1976, and compare them with their natural cohort, the men who graduated in the same time frame, do we see performance differences?" The answer is, as shown by the data above: yes. We see huge performance differences. Just like with the Yankees and Tigers in 1961. It seems to me that you are unncessarily restricting the questions than can be asked by others. I was presenting a counterexample to an erroneous assertion by Mr. Ulrich. This in no way is "restricting" the discussion at all. Indeed, if you read my preceding posts carefully enough, you'll find an explicit disclaimer to the contrary. I recognize that the "utility function" relating citations and publications to quality is complex, and that there are questions of natural variability to be addressed. You are not even restricting them to the interesting questions. Again, please do not engage in straw man mischaracterization. I'm not "restricting" anybody to anything. Indeed, it is the rigid and improper insistence on a useless significance test that is "restrictive," misleading, and lacking a rationale. I've simply presented an example of how a t-test not only fails to add useful information, but provides a misleading conclusion. If you think otherwise, please provide an example, with a rationale. But please read on, because I think I'm going to help answer your questions for you. For example, asking who scored more in 1961 - is different to which players were better. I cannot imagine anyone, least of all myself, disagreeing. Why you think it is relevant to my critique of a randomization test is a mystery. As someone with a lifelong fascination with baseball statistics, I'd freely admit that virtually any measure of anything in baseball is impure. The key structural point in my argument is this. If you accept the assumption that the players performance in the previous season is the thing being evaluated, reference to what might have happened under some fictitious random sampling process is irrelevant. The Yanks outhomered the heck out of the Tigers in 1961. Whether this indicates they are "better hitters," "more Christian," "superior human beings," or even "better home run hitters in the long run" etc. is, of course, another matter, and possibly very interesting. But you're not going to address any of those issues with a t-test or randomization test. If you think you can, please present a rationale. Imagine the Tigers approached the media in late 1961 and said, "Actually, Dr. Randomo isn't sure that Maris, Mantle and Berra outhomered us in any meaningful sense, because, if you think about it, this difference might be produced by 6 players of equal ability influenced by a large number of random factors." If they were ordinary sportswriters, they'd simply say "are you nuts?" But, if they were statisticians, they'd say (a) you are asking the wrong question, and (b) you have the wrong model. The question is not whether Mantle, Maris, and Berra are better collective home run hitters over some hypothetical long run than Kaline, Colavito, and Bruton. [Actually, virtually anyone familiar with baseball would agree that they were, as a group, better players, but that is another matter. All 6 were outstanding players.] In a similar vein, the question in the MIT case was not whether the MIT male senior biologists are better people than their female counterparts. It is simply, how true is the implied assertion in the MIT report that there were no performance differences that might account for [undocumented] differences in salary and performance between senior men and women. MIT stated that to assert that differences in resource allocation might be due to performance differences is "the last refuge of the bigot." Hausman and I were documenting major performance differences. Why not think of it in terms of "Could this difference be produced by 6 players of equal ability influenced by a large number of random factors". In that case a significance test might have some value in evaluating the hypothesis that one group was better. Again, you're slipping in an alternative question to the one that was asked.
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
At 02:25 PM 3/12/01 +, Radford Neal wrote: In this context, all that matters is that there is a difference. As explained in many previous posts by myself and others, it is NOT appropriate in this context to do a significance test, and ignore the difference if you can't reject the null hypothesis of no difference in the populations from which these people were drawn (whatever one might think those populations are). the problem with your argument is this ... now, whether or not formal inferential statistical procedures are called for ... if there is a difference in salary ... and differences in any OTHER factor or factors ... one is in the realm of SPECULATION as to what may or may not be the "reason" or "reasons" for THAT difference in other words ... any way you say that the difference "may be explained by" is a hypothesis you have formulated ... so, in this general context ... it still is a statistical issue ... that being, what (may) causes what ... and, this calls for some model specification ... that links difference in salaries TO differences in other factors/variables if we do not view it as some kind of a statistical model ... then we are in no position to really talk about this case ... not in any causal or quasi causal way ... and, i thought that was the main purpose of this entire matter ... what LEAD to the gap in salaries?? ... was it something based on merit? or something based on bias? i don't see how else we could check up on these kinds of issues other than some statistical questions being asked ... then tested in SOME fashion (though i am not specifying exactly how) Radford Neal Radford M. Neal [EMAIL PROTECTED] Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] University of Toronto http://www.cs.utoronto.ca/~radford = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ = _ dennis roberts, educational psychology, penn state university 208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED] http://roberts.ed.psu.edu/users/droberts/drober~1.htm = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Jim: I agree with Radford Neal's comments, and urge careful reconsideration of the foundation behind some of the comments made. For example, suppose you had a department in which the citation data were Males Females 12220 1298 2297 1102 The male with 12220 is, let's imagine, a Nobel Prize winner. The salaries for the 4 people are Males Females 156,880 121,176 112,120 114,324 The females approach the dean of science and declare that there is discrimination against them. They've measured the labs, and the men have more space. Moreover, they feel marginalized and depressed, as their status has been slowly slipping in the department. Moreover, they are paid less than men of the same age. Careful examination of mean salary shows that the mean salaries are 134,500 for men and only 117,750 for women. With great brouhaha, the administration, without publishing the above data, declares that there was a discrimination problem, and it was addressed by giving both the women a 16,000 raise. As Radford Neal has pointed out succinctly, the argument about outliers is irrelevant, and I want to emphasize with this example that it is irrelevant on numerous levels. First of all, it is not necessarily clear whether, and in which of several senses, our Nobel Prize winner is an outlier in his group. Second, even if he is -- so what? Surely you would not argue that this means he didn't deserve his salary! In fact, careful examination of the salary data [never made public by the administration] together with the performance data might well have led to the conclusion that it is the male faculty who are underpaid. Although, as Dr. Neal pointed out, it is not logically relevant to the issue, I would like to explore your notion, echoed without justification by Rich Ulrich, that the huge difference in citation performance between MIT senior men and women might be due to "one or two outliers." Take a look at the data again, and tell me which male data you consider to be outliers within the male group, and why. For example, are the men with 2133 and 893 "outliers," or those with 12830 and 11313? The data for the senior men and women: 12 year citation counts: MalesFemales -- 128302719 113131690 106281301 43961051 2133 935 893 --- As for the notion of exploring the relationship between salary, gender, and performance -- I'd be more than happy to examine any data that MIT would make available. They will, of course, not make such data available. It is too private, they say. Best regards, Jim Steiger -- James H. Steiger, Professor Dept. of Psychology University of British Columbia Vancouver, B.C., V6T 1Z4 - Note: I urge all members of this list to read the following and inform themselves carefully of the truth about the MIT Report on the Status of Women Faculty. Patricia Hausman and James Steiger Article, "Confession Without Guilt?" : http://www.iwf.org/news/mitfinal.pdf Judith Kleinfeld's Article Critiquing the MIT Report: http://www.uaf.edu/northern/mitstudy/#note9back Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ On Mon, 12 Mar 2001 08:55:17 -0600, jim clark [EMAIL PROTECTED] wrote: Hi On 12 Mar 2001, Radford Neal wrote: Yes indeed. And the context in this case is the question of whether or not the difference in performance provides an alternative explanation for why the men were paid more (one supposes, no actual salary data has been released). In this context, all that matters is that there is a difference. As explained in many previous posts by myself and others, it is NOT appropriate in this context to do a significance test, and ignore the difference if you can't reject the null hypothesis of no difference in the populations from which these people were drawn (whatever one might think those populations are). Personally, I am not interested in the question of statistical testing to dismiss the alternative explanation being proposed; indeed, I suspect that the original claim about gender being the cause of salary differences would not stand up very well either to statistical tests. But there does seem to me to be more than just saying ... "see there is a difference" and that statistical procedures would have a role to play. For example, wouldn't the strength and consistency of the differences influence your confidence that this was indeed the underlying factor? The same difference in means due to one or two outliers would surely not mean the same thing as a uniform pattern of productivity differences, would it? And wouldn't you want to demonstrate that there was a significant and ideally strong within-group relationship between productivity and salary before claiming that it is a reasonable alternative for the between-group differences? Or at least, wouldn't that
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Irving Scheffe wrote: Imagine it is 1961. Our question is, which outfield has better home run hitters, the Yankees or Detroit? Here are the numbers for the Yankee and Tiger starting Outfields. Yanks Tigers - -- 61 45 54 19 22 17 -- Now, the t-test isn't significant, nor is the permutation test. But is either relevant to the question? If you have a reasonable understanding of the notion of "home run," the answer is no. snip It was, by definition, the population of interest, so it appears that you are flat wrong. The question we were asking was, "if we take the large identifiable cluster of senior MIT women who graduated between 1970 and 1976, and compare them with their natural cohort, the men who graduated in the same time frame, do we see performance differences?" The answer is, as shown by the data above: yes. We see huge performance differences. Just like with the Yankees and Tigers in 1961. It seems to me that you are unncessarily restricting the questions than can be asked by others. You are not even restricting them to the interesting questions. For example, asking who scored more in 1961 - is different to which players were better. Why not think of it in terms of "Could this difference be produced by 6 players of equal ability influenced by a large number of random factors". In that case a significance test might have some value in evaluating the hypothesis that one group was better. The second case is even stronger. Take any two groups any you'll almost certainly find a difference on most measures (citation count, salary, hat size or whatever). Finally, what allows you to infer that any difference you observe it "huge". This is a relative judgement. In statistics we typically reference it to some indication of (population) variability. In real world contexts we often use other benchmarks. For example, think about runs scored in the first innings of a test match by three top order batsmen from two cricket teams England Sri Lanka - -- 61 45 54 19 22 17 -- Is this a huge difference? I think not. Does it provide strong evidence that the England top order batsmen are better than the Sri Lankans? No. What allows you to infer a huge difference in the baseball case is your knowledge of baseball (frequency of runs and so on). So at best, I think it is a misleading example. Thom = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
In article [EMAIL PROTECTED], Thom Baguley [EMAIL PROTECTED] wrote: Why not think of it in terms of "Could this difference be produced by 6 players of equal ability influenced by a large number of random factors". In that case a significance test might have some value in evaluating the hypothesis that one group was better. Recall that this baseball example was intended to clarify how one should go about determining whether or not there is reason to think that MIT discriminated against women faculty. From your comment, I'd guess that you think that MIT should not pay faculty based on their actual achievements, but rather on the basis of some estimate of their ability, disregarding "random factors". That's an interesting opinion, but would a policy of paying based on actual achievement (or a noisy estimate of actual achievement) constitute discrimination? Radford Neal Radford M. Neal [EMAIL PROTECTED] Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] University of Toronto http://www.cs.utoronto.ca/~radford = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: My comments are written as responses to the technical comments to Jim Steiger's last post. This is shorter than his post, since I omit redundancy and mostly ignore his 'venting.' I think I offer a little different perspective on my previous posts. [ snip, intro. ] Mr. Ulrich's latest post is a thinly veiled ad hominem, and I'd urge him to rethink this strategy, as it does not present him in a favorable light. Any objective reader would notice how the post is riddled with emotional attributions and loaded language like "venting" "exquisite sensitivity" (a claim attributed to me that I never made) "hammering your own gavel" "ferocity" "angry" "shouted down" "blundering around" "browbeat them" "crude" At the same time that Mr. Ulrich makes these disparaging but completely inaccurate attributions, he characterizes the posts of another discussant as "polite." Considering that this "polite" poster (Gene Gallagher) used terms like "Rush Limbaugh dittohead," it is clear that Mr. Ulrich's perceptions and attributions are badly biased. While he invests an extraordinary amount of effort in such irrelvant ad hominems, Mr. Ulrich seems unable to answer the simplest statistical questions regarding his point of view. And, in his latest post, he reveals in more detail how he insists on remaining as uninformed as possible while rendering such judgments. Most disturbingly, he contradicts himself and mischaracterizes previous discussions. For example, JS You are the one who examined nonrandom data, representing citation counts over a 12 year period for senior male and female MIT biologists matched for year of Ph.D. You look at these data, which show a HUGE difference in performance between the men and women, and declare that a significance test is necessary. But you cannot provide any mathematical justification for the test. I gave several examples to try to jar you into realizing that a statistical test on the data cannot answer the question you want answered. To start with, I never examined any *data*. I kept away from the papers because I knew so little about the data and it looked so messy; I made some comments about how difficult it could be. Yet he made what appeared to be comments about data. For example: quote from earlier Ulrich post I can't say that I have absorbed everything that has been argued. But as of now, I think Gene has the better of it. To me, it is not very appropriate to be highly impressed at the mean-differences, when TESTS that are attempted can't show anything. The samples are small-ish, but the means must be wrecked a bit by outliers. This raises the question: If he never examined the data, how could he make a statement about "outliers" in the data? I tossed in a couple of comments to encourage Gene G., who made some good sense, as did Dennis. They made no sense, let alone "good sense." I gave numerous examples demonstrating this. Mr. Ulrich professes that he doesn't see the point of them. As I read it, you proceeded to browbeat them, while failing to respond to their substance. Not true. First of all, there was vitually no substance in their arguments. Dr. Gallagher wants to do a randomization test because he is concerned [I'm interpreting and paraphrasing a bit] about the scale and variability questions that naturally surround citation data. These concerns are worthwhile, but he failed (ever) to explain how a randomization test or a t-test could answer such questions. Indeed, later, he presented a "logarithmic transform" of the citation data which made the differences look less severe, but never provided any rationale for that, either. [See my Yankees-Tigers example later.] I have tried to make sense of that early part of *your* argument, where you want to leap over their critiques. Actually, they offered no critique. Dr. Gallagher offered mainly name-calling and ad hominem in his early posts, using terms like "Rush Limbaugh dittohead." Seems like Mr. Ulrich's criticism is misplaced. You claim a HUGE difference. You say you assert this because of exquisite sensitivity to numbers. Dennis challenged this on the basis of "lousy standards" -- either by their metric or content -- and Gene challenged this as misleading, because it was "not (nominally) significant." To declare that something is "not significant" requires a rationale. They disagreed with you on the inference that you drew from two means. I agree that a huge difference may be useful. I agree that t-tests don't offer any final resolution. (As I posted before,) with nonrandom data, we have to argue contingencies, explore options, and make what inferences that we can. You seem to cut that short, chop! -- pronouncing your own verdict as final -- but I don't see how hammering your own gavel can convince people who have
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe [EMAIL PROTECTED] wrote: On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: My comments are written as responses to the technical comments to Jim Steiger's last post. This is shorter than his post, since I omit redundancy and mostly ignore his 'venting.' I think I offer a little different perspective on my previous posts. [ snip, intro. ] Mr. Ulrich's latest post is a thinly veiled ad hominem, and I'd urge him to rethink this strategy, as it does not present him in a favorable light. - I have a different notion of ad-hominem, since I think it is something directed towards 'the person' rather than at the presentation. Or else, I don't follow what he means by 'thinly veiled.' When a belligerent and nasty and arrogant tone seems to be an essential part of an argument, I don't consider myself to be reacting 'ad-hominem' when I complain about it -- it's not that I hate to be ad-hominem, but I don't like to be misconstrued. I'm willing, at times, to plunk for the 'ad-hominem'. For instance, since my last post on the subject, I looked at those reports. Also, I searched with google for the IWF -- who printed the anti-MIT critiques. I see the organization characterized as an 'anti-feminist' organization, with some large funding from Richard Scaife. 'Anti-feminist' could mean a reasoned-opposition, or a reflex opposition. Given these papers, it appears to me to qualify as 'reflex' or kneejerk opposition. Oh, ho! I say, this explains where the arguments came from, and why Jim keeps on going -- Now, THIS PARAGRAPH is what I consider an ad-hominem argument. And I'll give you some more. Scaife is a paranoid moneybags and publisher who infests this Pittsburgh region (which is why I have noticed him more than a westerner like Coors). His cash was important in persecuting Clinton for his terms in office. For example, Scaife kept alive Victor Foster's suicide for years. He held out money for anyone willing to chase down Clinton-scandals. Oh, he funded the chair at Pepperdine that Starr had intended to take. Now: My comment on the original reports: I am happy to say that it looks to me as if MIT is setting a good model for other universities to follow. The senior administrator listens to his faculty, especially his senior faculty, and responds. MIT makes no point about numbers in their statements, and it does seem to be wise and proper that they don't do so. I see now, Jim is not really arguing with MIT. They won't argue back. Jim's purpose is to create a hostile presence, a shadow to threaten other administrators. He goes, like, "If you try to 'cut a break' for women, we'll be watching and threatening and undermining, threatening your job if we can." I suppose state universities are more vulnerable than the private universities like MIT. On the other hand, with the numbers that Jim has put into the public eye, the next administrator can point to the precedent of MIT and assert that, clearly, the simple numbers on 'quality' are substantially irrelevant to the issues, since they were irrelevant at MIT. Hope this helps. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
I think we've now reached an adequate point of conclusion: To summarize Mr. Ulrich's latest post: 1. He doesn't think his previous litany of unfounded emotional attributions is "ad-hominem." Yet, he continues the same strategy here, characterizing the Hausman-Steiger report as an attempt to "threaten" administrators [by presenting relevant facts...] And, he quotes ad hominem attacks by others as part of his argument. 2. He feels my previous tone was "nasty" and "beligerent," although there was no such tone. [Apparently, anyone asking Mr. Ulrich to justify a statistical conjecture with an argument is being "nasty" and "beligerent."] 3. Mr. Ulrich then proceeds to completely ignore the statistical issues, and launches into another irrelevant attack. Indeed, he uses a standard ploy, "argument by Granting Agency." [a standard feminist ploy, born of argumentative desperation] Finally, Mr. Ulrich capitulates completely on the statistical, logical, and moral issues issues, simply stating that he is pleased with the outcome of the MIT report. The final two paragraphs are classic, and, unfortunately, only slightly more irrational than what normally is provided to justify reverse discrimination. It is quite amazing to see a "biostatistician" formally arguing in print, that one university's ignoring [suppressing?] relevant information would provide justification for other universities to declare the same information "irrelevant." Truly "Landgrebian"! Of course, the moderately astute undergraduate with minimal training in critical thinking will recognize Mr. Ulrich's final circularity, which goes something like this "The MIT report's misleading statements about performance are ok, because, well, I like what MIT did, and now other administrators can do similar things, and justify them on the basis of what MIT did." I think I can rest my case now. -- James H. Steiger, Professor Dept. of Psychology University of British Columbia Vancouver, B.C., V6T 1Z4 - Note: I urge all members of this list to read the following and inform themselves carefully of the truth about the MIT Report on the Status of Women Faculty. Patricia Hausman and James Steiger Article, "Confession Without Guilt?" : http://www.iwf.org/news/mitfinal.pdf Judith Kleinfeld's Article Critiquing the MIT Report: http://www.uaf.edu/northern/mitstudy/#note9back Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ On Thu, 08 Mar 2001 16:03:36 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: On Thu, 08 Mar 2001 10:38:59 -0800, Irving Scheffe [EMAIL PROTECTED] wrote: On Fri, 02 Mar 2001 16:28:53 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: My comments are written as responses to the technical comments to Jim Steiger's last post. This is shorter than his post, since I omit redundancy and mostly ignore his 'venting.' I think I offer a little different perspective on my previous posts. [ snip, intro. ] Mr. Ulrich's latest post is a thinly veiled ad hominem, and I'd urge him to rethink this strategy, as it does not present him in a favorable light. - I have a different notion of ad-hominem, since I think it is something directed towards 'the person' rather than at the presentation. Or else, I don't follow what he means by 'thinly veiled.' When a belligerent and nasty and arrogant tone seems to be an essential part of an argument, I don't consider myself to be reacting 'ad-hominem' when I complain about it -- it's not that I hate to be ad-hominem, but I don't like to be misconstrued. I'm willing, at times, to plunk for the 'ad-hominem'. For instance, since my last post on the subject, I looked at those reports. Also, I searched with google for the IWF -- who printed the anti-MIT critiques. I see the organization characterized as an 'anti-feminist' organization, with some large funding from Richard Scaife. 'Anti-feminist' could mean a reasoned-opposition, or a reflex opposition. Given these papers, it appears to me to qualify as 'reflex' or kneejerk opposition. Oh, ho! I say, this explains where the arguments came from, and why Jim keeps on going -- Now, THIS PARAGRAPH is what I consider an ad-hominem argument. And I'll give you some more. Scaife is a paranoid moneybags and publisher who infests this Pittsburgh region (which is why I have noticed him more than a westerner like Coors). His cash was important in persecuting Clinton for his terms in office. For example, Scaife kept alive Victor Foster's suicide for years. He held out money for anyone willing to chase down Clinton-scandals. Oh, he funded the chair at Pepperdine that Starr had intended to take. Now: My comment on the original reports: I am happy to say that it looks to me as if MIT is setting a good model for other universities to follow. The senior administrator listens to
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
I would like to make direct contact with Dr. Scheffe. I have some comments that I would like to direct to him but not to the mailing list. I would appreciate it if he could contact me directly. Dr. Robert C. Knodt 4949 Samish Way, #31 Bellingham, WA 98226 [EMAIL PROTECTED] "The point to remember is that what the government gives, it must first take away." John S. Coleman at Senate meeting. = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Tue, 27 Feb 2001 07:49:23 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: My comments are written as responses to the technical comments to Jim Steiger's last post. This is shorter than his post, since I omit redundancy and mostly ignore his 'venting.' I think I offer a little different perspective on my previous posts. [ snip, intro. ] JS You are the one who examined nonrandom data, representing citation counts over a 12 year period for senior male and female MIT biologists matched for year of Ph.D. You look at these data, which show a HUGE difference in performance between the men and women, and declare that a significance test is necessary. But you cannot provide any mathematical justification for the test. I gave several examples to try to jar you into realizing that a statistical test on the data cannot answer the question you want answered. To start with, I never examined any *data*. I kept away from the papers because I knew so little about the data and it looked so messy; I made some comments about how difficult it could be. I tossed in a couple of comments to encourage Gene G., who made some good sense, as did Dennis. As I read it, you proceeded to browbeat them, while failing to respond to their substance. I have tried to make sense of that early part of *your* argument, where you want to leap over their critiques. You claim a HUGE difference. You say you assert this because of exquisite sensitivity to numbers. Dennis challenged this on the basis of "lousy standards" -- either by their metric or content -- and Gene challenged this as misleading, because it was "not (nominally) significant." They disagreed with you on the inference that you drew from two means. I agree that a huge difference may be useful. I agree that t-tests don't offer any final resolution. (As I posted before,) with nonrandom data, we have to argue contingencies, explore options, and make what inferences that we can. You seem to cut that short, chop! -- pronouncing your own verdict as final -- but I don't see how hammering your own gavel can convince people who have the choice of looking elsewhere. You may think that you speaking from unimpeachable epiphany; to the rest of us, it looks like you are jumping to a conclusion. You offer your *inference* that a huge citation difference explains the outcome. Okay, that could be reasonable. If the difference is direct but attenuated, the "difference" between citations would be larger, by variance (by some measure)-accounted for, than the difference between outcomes: which, I think, we stipulate has some size to it. If those measurements are on a reasonably useful metric, then a t-test should show it. It is my own experience, and part of my own learned, "exquisite" sensitivity to numbers, that (1) a mean difference as large as you illustrated should result in a t-test that is significant, unless there is something screwy with the numbers. (2) And if there is something so screwy with the numbers, then it is usually misleading and wrong to present the MEANS as if their contrast was meaningful ("huge"). Now, there is not a "mathematical necessity" for a test statistic. It is a request that you respect the conventions of statisticians, even when we ask for a test on non-random data, for what we might learn from it. Non-significant tests, which I had thought the data were producing, really undermine your adjective "huge". A "significant" test, which you now report, lends some credibility. Gene's permutation test says that those sets are not disjoint, however, so there is some basis for direct comparison. The most extreme permutation would have undercut *one* form of comparison, and the most obvious part of one argument about discrimination (though, I expect, not everything). It *looks* like you didn't want to consider a comparison because you figured you could win the argument by repetition and ferocity. Your first excuse for not computing a t was that this was a "population" but that was flat wrong. I asked for your textbook references, and finally offered my own, in order to figure out your context. I offered "nonrandom", which you used, above. However, the old arguments about not computing "test statistics" on nonrandom samples have hardly any force these days -- I offer epidemiology as the pervasive (and persuasive) influence. Epidemiologists need to be reminded about the limits to their inference, - they tend to forget it entirely - but I think you are standing alone if you refuse to compute, claiming that old principle. I don't know if your role is such that someone will *have* to answer you, or if you are fated to wind up ignored (as non-responsive and therefore irrelevant); and angry. [snip, my comment to which JS wrote:] Not so. If you were following the logic of the many examples I've presented, you could see that you can construct a reductio ad absurdem for any of the types of significance tests
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Rich, Both Radford Neal and I have asked for a statistical rationale supporting your claim that a significance test that you advocated can provide useful information when applied to the MIT senior biologist data. You haven't provided one. Instead, you cite from a web statistics guide which in turn provides no rationale. It is now quite apparent that you have no rationale, only prejudices. This may be acceptable to the people who come to you for consulting, but this is a different forum, with different standards. Further comments are interspersed: On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: On Mon, 19 Feb 2001 04:27:24 GMT, [EMAIL PROTECTED] (Irving Scheffe) wrote: In responding to Rich, I'll intersperse selected comments with selected portions of his text and append his entire post below. - I'm not done with the topic yet. But it is difficult to go on from this point. I think the difficulty is that JS has constructed his straw-man argument about how "hypotheses" are handled; and since it is a stupid strategy, it is easy for him to claim that it is fatally flawed. All the referents are unclear. I didn't construct any straw man arguments, and you haven't made clear what you are talking about. You are the one who examined nonrandom data, representing citation counts over a 12 year period for senior male and female MIT biologists matched for year of Ph.D. You look at these data, which show a HUGE difference in performance between the men and women, and declare that a significance test is necessary. But you cannot provide any mathematical justification for the test. I gave several examples to try to jar you into realizing that a statistical test on the data cannot answer the question you want answered. From his insistence on his "examples," it seems to me that he believes that someone else is committed to using p-levels in a strict way, by beating 5%. Not so. If you were following the logic of the many examples I've presented, you could see that you can construct a reductio ad absurdem for any of the types of significance tests you are proposing. If I believed strictly in hypothesis testing with a 5% significance level, I doubt that I'd have written an extensive article advocating confidence interval replacements for many of the classic hypothesis tests employed in the social sciences, and giving the precise, exact procedures for constructing these confidence intervals. That's certainly not the case for me, and I doubt if anyone defends or promotes it, outside of carefully designed Controlled Random Experiments. It is not the case for me, either, and so everything that follows is irrelevant. Despite the fact that I could not make sense of WHY he wanted his example, it turns out -- after he explains it more -- that my own analysis covered the relevant bases. I agree, if you don't have "statistical power," then you don't ask for a 5% test, or (maybe) any test at all. The JUSTIFICATION for having a test on the MIT data is that the power is sufficient to say something. In order to talk meaningfully about "power", you have to have a statistical rationale. As I have repeated numerous times, you have no statistical rationale. You simply "feel like" you "should" compute a statistical test, when all the assumptions on which the procedure is based are violated in the data you are applying the procedure to. Power to detect what? Under what distributional assumptions? And what it said is that Jim did BAD INFERENCE. I said that a couple of times. I regret that I may have confused people with unnecessary words about "inference." Outlier = No central tendency = Mean is BAD statistic; careful reader insists on more or better information before asserting there's a difference. What "outlier" are you referring to? What statistical rule did you use to determine the "outlier"? The MIT paper included all the raw data. At no point did I or my coauthor state that we were doing inference on means. (Actually, a 2 sample t-test done on these data is significant at the .05 level, but we never imagined computing one.) Here are the raw data for the citation counts for the 5 senior MIT female biologists and 6 males who graduated from 1970-76. MalesFemales --- 128302719 113131690 106281301 43961051 2133 935 893 --- These data are based on 12 years worth of records, from 1989-2000. The above could be broken down in numerous other ways. For example, we could produce citation counts per year, try to perform some kind of correction for the highly specific areas the individuals publish in, etc. Time series could be examined. However, these data are anything but a random sample. MIT is one of the most selective universities in the world in terms of whom it hires. I asserted that more than once. Optimistically, my own data analysis technique might be
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
- I want to comment a little more thoroughly about the lines I cited: what Garson said about inference, and his citation of Olkey. On Thu, 22 Feb 2001 18:21:41 -0500, Rich Ulrich [EMAIL PROTECTED] wrote: [ snip, previous discussion ] me I think that Garson is wrong, and the last 40 years of epidemiological research have proven the worth of statistics provided on non-random, "observational" samples. When handled with care. From G. David Garson, "PA 765 Notes: An Online Textbook." On Sampling http://www2.chass.ncsu.edu/garson/pa765/sampling.htm Significance testing is only appropriate for random samples. Random sampling is assumed for inferential statistics (significance testing). "Inferential" refers to the fact that conclusions are drawn about relationships in the data based on inference from knowledge of the sampling distribution. Significance tests are based on a sampling theory which requires that every case have a chance of being selected known in advance of sample selection, usually an equal chance. Statistical inference assesses the significance of estimates made using random samples. For enumerations and censuses, such inference is not needed since estimates are exact. Sampling error is irrelevant and therefore inferential statistics dealing with sampling error are irrelevant. - I agree with most of what he says, throughout; there will be a matter of nuances on interpretation and actions. For enumerations and censuses, a limited sort of statistics on 'finite populations,' he says sampling error is irrelevant. Irrelevant is a good and fitting word here. This is not 'illegal and banned,' but rather 'unwanted and totally beside the point.' Garson Significance tests are sometimes applied arbitrarily to non-random samples but there is no existing method of assessing the validity of such estimates, though analysis of non-response may shed some light. The following is typical of a disclaimer footnote in research based on a non random sample: Here is my perspective on testing, which does not match his. - For a randomized experimental design, a small p-level on a "test of hypothesis" establishes that *something* seemed to happen, owing to the treatment; the test might stand pretty-much by itself. - For a non-random sample, a similar test establishes that *something* seems to exist, owing to the factor in question *or* to any of a dozen factors that someone might imagine. The test establishes, perhaps, the _prima facie_ case but the investigator has the responsibility of trying to dispute it. That is, it is an investigator's responsibility (and not just an option) to consider potential confounders and covariates. If the small p-level stands up robustly, that is good for the theory -- but not definitive. If there are vital aspects or factors that cannot be tested, then opponents can stay unsatisfied, no matter WHAT the available tests may say. Garson "Because some authors (ex., Oakes, 1986) note the use of inferential statistics is warranted for nonprobability samples if the sample seems to represent the population, and in deference to the widespread social science practice of reporting significance levels for nonprobability samples as a convenient if arbitrary assessment criterion, significance levels have been reported in the tables included in this article." See Michael Oakes (1986). Statistical inference: A commentary for social and behavioral sciences. NY: Wiley. Garson is telling his readers and would-be statisticians a way to present p-levels, even when the sampling doesn't justify it. And, I would say, when the analysis doesn't justify it. I am not happy with the lines -- The disclaimer does not assume that a *good* analysis has been done, nor does it point to what makes up a good analysis. '... if the sample seems to represent the population' seems to be a weak reminder of the proper effort to overcome 'confounding factors'; it is not an assurance that the effects have proven to be robust. So, the disclaimer should recognize that the non random sample is potentially open to various interpretations; the present analysis has attempted to control for several possibilities; certain effects do seem robust statistically, in addition to being supported by outside chains of inference, and data collected independently. I suggested earlier that this is the status of epidemiological, observational studies. For the most part, those studies have been quite fruitful. But not always. They have been especially likely to mislead, I think, when the designs pretend that binomial variability is the only source of error in a large survey, and attempt to interpret small effects. -- Rich Ulrich, [EMAIL PROTECTED] http://www.pitt.edu/~wpilib/index.html = Instructions for joining and
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
In article [EMAIL PROTECTED], Rich Ulrich [EMAIL PROTECTED] wrote: I agree, if you don't have "statistical power," then you don't ask for a 5% test, or (maybe) any test at all. The JUSTIFICATION for having a test on the MIT data is that the power is sufficient to say something. The reason why one should NOT do a significance test on this data, at any level, and regardless of how much power the test would have, was explained by me a while ago in the post I have repeated below. If you think there is something wrong with my reasoning, I suggest you explain the flaw. Radford Neal -- I think the statistical issue in this discussion can be boiled down to a question of how to calculate standard errors for regression coefficients. What regression? Well, there isn't one, because there isn't any data, but the discussions seems to presuppose the possibility of data that for each faculty member gives their salary (the response variable, y), their gender (x1, coded as a dummy variable), and some indicator of performance (x2). The question is whether one has evidence that the regression coefficient for the dummy gender variable (x1) is non-zero. This will require computing the standard error for the estimate of this regression coefficient. The accepted procedure for computing this standard error involves the sample correlation between the two predictors, x1 and x2. When the sample correlation is high, the standard errors for the regression coefficients will tend to be high, making it more difficult to conclude that the coefficient for gender is non-zero. The procedure apparently being advocated by some posters is to perform a test of the null hypothesis that the correlation between x1 and x2 in the population is zero, and if there is not sufficient evidence to reject this null hypothesis, compute the standard errors for the regression coefficients as if the predictors were uncorrelated. I believe that this procedure is not generally accepted, for very good reasons. Radford M. Neal [EMAIL PROTECTED] Dept. of Statistics and Dept. of Computer Science [EMAIL PROTECTED] University of Toronto http://www.cs.utoronto.ca/~radford = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Milo: Sure, although I don't see how that is relevant to the MIT situation, which attributed the current status of women there to discrimination, based on an undisclosed methodology. More generally, one CAN indeed do randomization tests on similar data even though there is no inference toward a larger population, and no intention of inferring later performance, and no random sampling from a larger population. For example, suppose a department head insists that committee assignments are made by completely random selection, but the 4 women discover that they have (by some objective criterion) the 4 worst assignments, while the 4 men faculty have the 4 best assignments. One can test the hypothesis that the assignment is random with respect to quality, and reject it at commonly used significance levels, by asking, "What is the probability of observing an imbalance this large if these 8 assignments had been randomly assigned to these 8 people"? Quality of Assignment (10 point scale) MalesFemales 61 73 82 94 Since there are 70 possible assignments, and this one achieves the absolute worst imbalance, things look bad for the department head. The probability is 1/70 of obtaining an imbalance greater than or equal to the one observed, given random sampling. Note, we need not assume that the men and women have been sampled randomly from a larger population to perform this combinatorial calculation. On the other hand, one would not need a significance test to evaluate the statement that "The women, at this time, have much worse committee assignments than the men," so long as it can be assumed that the measurement scale is reasonable. --Jim James H. Steiger, Professor Department of Psychology University of British Columbia Vancouver, B.C., Canada V6T 1Z4 -- Note: I urge all members of this list to read the following and inform themselves carefully of the truth about the MIT Report on the Status of Women Faculty. Patricia Hausman and James Steiger Article, "Confession Without Guilt?" : http://www.iwf.org/news/mitfinal.pdf Judith Kleinfeld's Article Critiquing the MIT Report: http://www.uaf.edu/northern/mitstudy/#note9back Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ On Sun, 18 Feb 2001 23:06:44 GMT, "Milo Schield" [EMAIL PROTECTED] wrote: Jim has consistently maintained three claims/arguments: 1. we are not trying to generalize from a small group of people to a larger population. 2. No inference is involved (if we are not generalizing) 3. Using statsitical tests is meaningless (since no inference is involved). I agree with his 1st and 3rd points -- but not his 2nd. Within the second, I disagree with the truth of his premise. The generaliation mentioned in #1 is NOT the only possible generalization. Another generalization may be involved: that involving time. We sample things (basketball goals) for two groups of players in a few games and then want to make an inference about whether these particular scores were unlikely given that THESE particular players involved had the same average scores in the long run -- for all games. Given a time-based generalization, we now have an inference. Given this inference, the applicability of statistical tests seems quite relevant. Milo PS. This may be a quasi-Bayesian reinterpretation of these problems -- but if it fits. - "Irving Scheffe" [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]">news:[EMAIL PROTECTED]... There are a wide variety of probabilities that may be calculated in this situation, depending on the assumptions you want to make, and precisely what you mean by "this result." However, if you ask, "How likely is it that the for side won", the answer is that the for side won. If you ask, "How likely is it that the percentage of for votes is higher for women than for men in this sample," the answer is that it is perfectly likely, because it happened. In an example perhaps more relevant to previous examples here, suppose this was an actual departmental vote, and the result was 5-3. The motion passed. If one of the men was a statistician wanting to overthrow the result and said, "wait a minute, let's perform a statistical test and see the probability of obtaining this gender split, given that the 5 yes votes were actually randomly assigned with respect to gender," I suspect people would think him odd. He can compute this probability, but it is irrelevant. On 15 Feb 2001 13:49:51 -0800, [EMAIL PROTECTED] (Paul R Swank) wrote: I remember a question from some stat book about a situation where there were 8 members of a group, three men and five women (or the reverse, I can't remember which) and on some issue the vote was five to three with all five women voting for. The question was "How
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
On Mon, 19 Feb 2001 02:12:46 GMT, "Milo Schield" [EMAIL PROTECTED] wrote: Snip But in most of your examples MORE is being claimed. In most cases, the claim includes an inference. Once the claim involves an inference, then a statistical test may be relevant. In one case, the claim was discrimination (causal explanation of observed differences); in another the claim was greater scoring ability (causal explanation of observed differences). I don't think so. I think the issue was scoring production, not ability, and the example was deliberately restricted: [This being a hypothetical example, assume, for the sake of argument, that this is a valid and complete measure of basketball performance.] White Black - 12.813.7 11.122.3 19.920.9 13.9 16.8 17.1 13.0 I set up the example trying to [artificially] restrict a discussion which would, naturally, evolve in many directions. Of course "point scoring ability" is not the same as "points scored," etc. And, of course, point production is, in the real world, not the sole measure of basketball performance. In both cases, the inference involves generalizing from a small "sample" of time to a larger "population" of time. Thus, the strength of the argument is influenced by the time-span of the data. In the case of MIT, had the data been based on only one month, the case would be much weaker in support of discrimination than if we had data for 12 years. In the case of basketball scoring of white and black players, the case would be much weaker if we included only one quarter of one game than if we had included many games. How can we measure the influence of the time span involved in the data? Here is where IMHO one can make a case for statistical tests being RELEVANT. PS. Just because MIT can "attribute" an outcome (difference in pay/status) to a particular cause (discrimination) does not mean their argument is strong. A claim involving the existence/influence of an unobservable (discrimination) requires evidence. In this case, I think statistical inference may provide some of that evidence. And? --Jim James H. Steiger, Professor Department of Psychology University of British Columbia Vancouver, B.C., Canada V6T 1Z4 -- Note: I urge all members of this list to read the following and inform themselves carefully of the truth about the MIT Report on the Status of Women Faculty. Patricia Hausman and James Steiger Article, "Confession Without Guilt?" : http://www.iwf.org/news/mitfinal.pdf Judith Kleinfeld's Article Critiquing the MIT Report: http://www.uaf.edu/northern/mitstudy/#note9back Original MIT Report on the Status of Women Faculty: http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ = Instructions for joining and leaving this list and remarks about the problem of INAPPROPRIATE MESSAGES are available at http://jse.stat.ncsu.edu/ =
Re: On inappropriate hypothesis testing. Was: MIT Sexism statistical bunk
Gene, You have made extended comments about the IWF report "Confession without Guilt?" (at http://www.iwf.org/news/mitfinal.pdf about women biologists at MIT. Some background information: The IWF is the second in a series criticizing the MIT report on the Status of Women. The original MIT report may be downloaded from http://mindit.netmind.com/proxy/http://web.mit.edu/fnl/ An earlier IWF report by Judith Kleinfeld revealed many of the serious shortcomings of the MIT report. Kleinfeld's report is at http://www.uaf.edu/northern/mitstudy/#note9back I would urge all students to download and read all three of these papers, so they can get a better feel for who, MIT or its critics, is more objective and scientific in their treatment of a very touchy subject. - Gene, although you apparently intended them as honest criticism, your comments revealed a deep, fundamental confusion about statistical inference. I expect any trained statistician would recognize the error in your argument immediately, but some students may have been misled. So let me try to alleviate some of the confusion your posts may have generated, and try to help you see the error of your ways. And, might I suggest, strongly, that you show this post to your ECOS611 class? This might make an excellent discussion piece for them. Snip I'll share Dr. Steiger's comments with my statistics class. Dr. Steiger asks the readers of this newsgroup to read the whole set of documents on the MIT gender-bias issue. I must confess to not having done so. I read his IWF report, co-authored with Dr. Hausman, only after the Boston Globe described his conclusions under the heading "MIT bias claims debunked." My post was in reference to the claims made in his IWF article, not to the other documents that preceded it. His article can be downloaded at: http://www.iwf.org/news/mitfinal.pdf Dr. Steiger accuses me of improperly using statistical tests to make inferences to a larger population. I didn't do that. Drs. Steiger and Hausman in their IWF report claim to have found "striking," "compelling," and "dramatic" differences in productivity between senior male and female Biology Faculty at MIT. I read their report and didn't see much evidence for gender differences at all. Most of the apparent gender differences in their graphs disappeared when the data were plotted on a logarithmic scale: http://www.es.umb.edu/edg/ECOS611/iwflnfigs.pdf Dr. Steiger's post states, "There were HUGE differences in the citation rates of senior men and women. The mean number of citations was, as I recall, roughly 7000 for the men and 1400 for the senior women." The actual data were 7032 for the men and 1539 for the women (with sample sizes of 6 and 5 respectively). The geometric means were 4800 and 1400. A Mann-Whitney U test indicates that 12.6% of the permutations of these 11 data would produce differences in citation number as extreme or more extreme than those reported. Do these 11 data offer compelling or dramatic evidence for gender differences in productivity? Not to my way of thinking. Was I making inferences to a larger population? I didn't intend to. I was just trying to assess Steiger Hausman's claim of HUGE gender-based differences in productivity. Dr. Steiger's post recommends that I should eschew using any formal statistical tests on the IWF report data. What is the alternative? The approach used in the IWF report is to point out an individual datum or a few data points, and to make claims about "striking," "compelling" and "dramatic" differences between the sexes. I regard Dr. Steiger's evaluation of these data as being highly subjective. I have no idea what objective criteria he used to reach his conclusions of "compelling," "striking" and "dramatic" differences between male and female faculty. Eugene D. Gallagher ECOS P.S. Here are answers to your questions: 1. Suppose the IWF report had gathered data on ALL female scientists at MIT, compared them to a group of males matched for seniority, and had shown huge differences in performance. Would you still be doing a significance test? Of course not. I posted a note called "Florida votes and statistical errors," on 11/30/00 on this newsgroup on this very topic. DejaNews is dead, but you can find this post on Google. 2. Suppose Mary has 1051 citations in the last 10 years, and has earned 4 million in grants, and Fred, in the same department, and about the same age and seniority, has 12000+ citations and has earned 23 million in grants, and Fred finds out that Mary is making the same salary he is. Should Fred consult you about doing a statistical "significance" test before asking for a raise? No. What does this hypothetical have to do with your claim that "dramatic" gender differences exist in the productivity of MIT biology faculty? 3. Suppose you go to a nursing school, and find 5 female faculty with 7,000 citations apiece, and,