Re: Reliability and baseball

Bob Wheeler Wed, 13 Sep 2000 08:20:30 -0700
There seems to be a lot of this sort of junk
statistics in the popular press recently. The
problem is, I think, that it gets picked up as
justification for legislation or legal action. I
doubt if there is much than can be done about it
though. 

It might be interesting to ask students to
critique it, or is that asking too much.

Alan Zaslavsky wrote:
> 
> The following article may be of interest to some of you who are trying to
> get across the notion of reliability, particularly those who are teaching
> H.S. or young college students who have recently gone through high-stakes
> achievement/competency testing programs.  You can also download directly
> from the New York Times web site at
> 
>        http://www.nytimes.com/2000/09/13/national/13LESS.html
> 
> 
> New York Times, September 13, 2000
> 
> LESSONS
> How Tests Can Drop The Ball
> 
> By RICHARD ROTHSTEIN
> 
> MIKE PIAZZA, batting .332, could win this year's Most Valuable Player
> award. He has been good every year, with a .330 career average, twice a
> runner-up for m.v.p. and a member of each All- Star team since his
> rookie season.
> 
> The Mets reward Piazza for this high achievement, at the rate of $13
> million a year.
> 
> But what if the team decided to pay him based not on overall
> performance but on how he hit during one arbitrarily chosen week? How
> well do one week's at-bats describe the ability of a true .330 hitter?
> 
> Not very. Last week Piazza batted only .200.  But in the second week of
> August he batted
> .538. If you picked a random week this season,
> you would have only a 7-in-10 chance of choosing one in which he hit
> .250 or higher.
> 
> Are standardized-test scores, on which many schools rely heavily to
> make promotion or graduation decisions, more indicative of true ability
> than a ballplayer's weekly average?
> 
> Not really. David Rogosa, a professor of educational statistics at
> Stanford University, has calculated the "accuracy" of tests used in
> California to abolish social promotion. (New York uses similar tests.)
> 
> Consider, Dr. Rogosa says, a fourth-grade student whose "true" reading
> score is exactly at grade level (the 50th percentile). The chances are
> better than even (58 percent) that this student will score either above
> the 55th percentile or below the 45th on any one test.
> 
> Results for students at other levels of true performance are also
> surprisingly inconsistent. So if students are held back, required to
> attend summer school or denied diplomas largely because of a single
> test, many will be punished unfairly.
> 
> About half of fourth-grade students held back for scores below the 30th
> percentile on a typical reading test will actually have "true" scores
> above that point. On any particular test, nearly 7 percent of students
> with true scores at the 40th percentile will likely fail, scoring below
> the 30th percentile.
> 
> Are Americans prepared to require large numbers of students to repeat a
> grade when they deserve promotion?
> 
> Professor Rogosa's analysis is straightforward. He has simply converted
> technical reliability information from test publishers (Harcourt
> Educational Measurement, in this case) to more understandable
> "accuracy" guides.
> 
> Test publishers calculate reliability by analyzing thousands of student
> tests to estimate chances that students who answer some questions
> correctly will also answer others correctly. Because some students at
> any performance level will miss questions that most students at that
> level get right, test makers can estimate the reliability of each
> question and of an entire test.
> 
> Typically, districts and states use tests marketed as having high
> reliability. Yet few policy makers understand that seemingly high
> reliability assures only rough accuracy  for example, that true 80th
> percentile students will almost always have higher scores than true
> 20th percentile students.
> 
> But when test results are used for high-stakes purposes like promotion
> or graduation decisions, there should be a different concern: How well
> do they identify students who are truly below a cutoff point like the
> 30th percentile? As Dr. Rogosa has shown, the administering of a single
> test may do a poor job of this.
> 
> Surprisingly, there has not yet been a wave of lawsuits by parents of
> children penalized largely because of a single test score. As more
> parents learn about tests' actual accuracy, litigation regarding
> high-stakes decisions is bound to follow. Districts and states will
> then have to abandon an unfair reliance on single tests to evaluate
> students.
> 
> When Mike Piazza comes to bat, he may face a pitcher who fools him more
> easily than most pitchers do, or fools him more easily on that day.
> Piazza may not have slept well the night before, the lights may bother
> him, or he may be preoccupied by a problem at home. On average, over a
> full season, the distractions do not matter much, and the Mets benefit
> from his overall ability.
> 
> Likewise, when a student takes a test, performance is affected by
> random events. He may have fought with his sister that morning.  A test
> item may stimulate daydreams not suggested by items in similar tests,
> or by the same test on a different day. Despite a teacher's warning to
> eat a good breakfast, he may not have done so.
> 
> If students took tests over and over, average accuracy would improve,
> just as Mike Piazza's full-season batting average more accurately
> reflects his hitting prowess. But school is not baseball; if students
> took tests every day, there would be no time left for learning.
> 
> So to make high-stakes decisions, like whether students should be
> promoted or attend summer school, giving great importance to a single
> test is not only bad policy but extraordinarily unfair. Courts are
> unlikely to permit it much longer.
> 
> Copyright 2000 The New York Times Company
> 
> =================================================================
> Instructions for joining and leaving this list and remarks about
> the problem of INAPPROPRIATE MESSAGES are available at
>                   http://jse.stat.ncsu.edu/
> =================================================================

-- 
Bob Wheeler --- (Reply to: [EMAIL PROTECTED])
        ECHIP, Inc.


=================================================================
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
                  http://jse.stat.ncsu.edu/
=================================================================
Re: Reliability and baseball

Reply via email to