Re: Standardizing evaluation scores

2002-01-10 Thread Herman Rubin

In article [EMAIL PROTECTED],
Dennis Roberts [EMAIL PROTECTED] wrote:
sorry for late reply

ranking is the LEAST useful thing you can do ... so, i would never START 
with simple ranks
any sort of an absolute kind of scale ... imperfect as it is ... would 
generally be better ...

You can say that again!

one can always convert more detailed scale values INTO ranks at the end if 
necessary BUT, you cannot go the reverse route

This cannot be overemphasized.  We see much of this; how valid
are those of the current IQ scales, where the values are given
by converting the raw scores to a normal distribution?  This is
also done in other tests of this type; we need to teach in our
beginning courses not to transform unless one has a REALLY good
reason to do so, and obtaining normality is not one.

-- 
This address is for information only.  I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907-1399
[EMAIL PROTECTED] Phone: (765)494-6054   FAX: (765)494-0558


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2002-01-07 Thread Dennis Roberts

sorry for late reply

ranking is the LEAST useful thing you can do ... so, i would never START 
with simple ranks
any sort of an absolute kind of scale ... imperfect as it is ... would 
generally be better ...

one can always convert more detailed scale values INTO ranks at the end if 
necessary BUT, you cannot go the reverse route

say we have 10 people measured on variable X ... and we end up with no ties 
... so, we get ranks of 1 to 10 ... but, these value give on NO idea 
whatsoever as to the differences amongst the 10

if i had a 3 person senior high school class with cumulative gpas of 4.00, 
3.97, and 2.38 ... the ranks would be 1, 2, and 3 ... but clearly, there is 
a huge difference between either of the top 2 and the bottom ... but, ranks 
give no clue to this at all

so, my message is ... DON'T START WITH RANKS

At 02:11 AM 12/19/01 +, Doug Federman wrote:
I have a dilemma which I haven't found a good solution for.  I work with
students who rotate with different preceptors on a monthly basis.  A
student will have at least 12 evaluations over a year's time.  A
preceptor usually will evaluate several students over the same year.
Unfortunately, the preceptors rarely agree on the grades.  One preceptor
is biased towards the middle of the 1-9 likert scale and another may be
biased towards the upper end.  Rarely, does a given preceptor use the 1-9
range completely.  I suspect that a 6 from an easy grader is equivalent
to a 3 from a tough grader.

I have considered using ranks to give a better evaluation for a given
student, but I have a serious constraint.  At the end of each year, I
must submit to another body their evaluation on the original 1-9 scale,
which is lost when using ranks.

Any suggestions?

--
It has often been remarked that an educated man has probably forgotten
most of the facts he acquired in school and university. Education is what
survives when what has been learned has been forgotten.
- B.F. Skinner New Scientist, 31 May 1964, p. 484




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
=

_
dennis roberts, educational psychology, penn state university
208 cedar, AC 8148632401, mailto:[EMAIL PROTECTED]
http://roberts.ed.psu.edu/users/droberts/drober~1.htm



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-24 Thread Doug Federman

Thanks for your responses.  I have already considered most of these 
issues, but reached no clear solution.  I have tried educating the 
evaluators.  High graders remain high graders and some use the entire 
scale.  I'll just have to try a few methods out and see what works.  I do 
have external objective knowledge evaluations from standardized testing 
and will see how they relate to the scores.

--
It has often been remarked that an educated man has probably forgotten 
most of the facts he acquired in school and university. Education is what 
survives when what has been learned has been forgotten.  
- B.F. Skinner New Scientist, 31 May 1964, p. 484


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-21 Thread Stan Brown

Glen Barnett [EMAIL PROTECTED] wrote in sci.stat.edu:
Stan Brown wrote:
 But is it worth it? Don't the easy graders and :tough graders
 pretty much cancel each other out anyway?

Not if some students only get hard graders and some only get easy
graders.

Right you are -- I read the OP's article as saying every student was 
evaluated by every preceptor, but when I look back again I see that 
I misread it.

-- 
Stan Brown, Oak Road Systems, Cortland County, New York, USA
  http://oakroadsystems.com/
My theory was a perfectly good one. The facts were misleading.
   -- /The Lady Vanishes/ (1938)


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-20 Thread Jay Warner

A classic problem of 'norming' or 'standardizing' the scale and the
preceptors.  Can you find a couple students who fall near the bottom and
tops of the scale?  Preferably ones whose final rankings are not 'permanent
record'?

then you would have each preceptor use these two students as 'baseline'
indicators of what a 2 means, and what an 8 means.  then have each person do
the regular ranking of students, using these as your indicators.

It might be possible for the attendant group of preceptors to agree on the
ranking of a pair of students, in each specialty or area.  then use these
for ranking within that specialty.

Failing this kind of development for mutual agreement, you might be able to
describe a 2 or 3 rating, and a 7 or 8 rating, in such a way that
generalized agreement would be obtained, and each grade would be set in
comparison to this descriptive scale.  This is essentially what the Baldrige
Criteria does, for industrial/ educational/ health care operations.

Of course, if it's grades we are discussing, it is entirely likely that
virtually nobody gets grades in certain ranges, such as the equivalent of C
or below on an A- F scale.  If Harvard can graduate over half a class as Cum
Laude, the rest of us can skew grades anywhere we like.

Jay

Doug Federman wrote:

 I have a dilemma which I haven't found a good solution for.  I work with
 students who rotate with different preceptors on a monthly basis.  A
 student will have at least 12 evaluations over a year's time.  A
 preceptor usually will evaluate several students over the same year.
 Unfortunately, the preceptors rarely agree on the grades.  One preceptor
 is biased towards the middle of the 1-9 likert scale and another may be
 biased towards the upper end.  Rarely, does a given preceptor use the 1-9
 range completely.  I suspect that a 6 from an easy grader is equivalent
 to a 3 from a tough grader.

 I have considered using ranks to give a better evaluation for a given
 student, but I have a serious constraint.  At the end of each year, I
 must submit to another body their evaluation on the original 1-9 scale,
 which is lost when using ranks.

 Any suggestions?

 --
 It has often been remarked that an educated man has probably forgotten
 most of the facts he acquired in school and university. Education is what
 survives when what has been learned has been forgotten.
 - B.F. Skinner New Scientist, 31 May 1964, p. 484

 =
 Instructions for joining and leaving this list and remarks about
 the problem of INAPPROPRIATE MESSAGES are available at
   http://jse.stat.ncsu.edu/
 =

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?






=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-19 Thread Neil W. Henry

A naive solution seems reasonable if I am willing to assume that students are
randomly assigned to the preceptors for evaluation. If so, I'd expect the
average rating given by each judge to be the same. So,  force the judges'
means to be be equal to the overall mean by dividing each individual score
appropriately. Then calculate the students' averages. Of course the results
are no longer integers 1 to 9, but that's where you, the filter to that
other body, will have the Procrustean Responsibility!

 If there is no balance, if for instance some students is rated 12 times by
the same judge while others are rated by 12 different judges, there probably
is no good model for the process. The literature on rating the strength of
sports teams in a league with an unbalanced schedule might give some hints on
how to proceed.


Doug Federman wrote:

 I have a dilemma which I haven't found a good solution for.  I work with
 students who rotate with different preceptors on a monthly basis.  A
 student will have at least 12 evaluations over a year's time.  A
 preceptor usually will evaluate several students over the same year.
 Unfortunately, the preceptors rarely agree on the grades.  One preceptor
 is biased towards the middle of the 1-9 likert scale and another may be
 biased towards the upper end.  Rarely, does a given preceptor use the 1-9
 range completely.  I suspect that a 6 from an easy grader is equivalent
 to a 3 from a tough grader.

 I have considered using ranks to give a better evaluation for a given
 student, but I have a serious constraint.  At the end of each year, I
 must submit to another body their evaluation on the original 1-9 scale,
 which is lost when using ranks.

 Any suggestions?

 --
 It has often been remarked that an educated man has probably forgotten
 most of the facts he acquired in school and university. Education is what
 survives when what has been learned has been forgotten.
 - B.F. Skinner New Scientist, 31 May 1964, p. 484

--

   Neil W. Henry
  Department of Sociology and Anthropology
   Department of Statistical Sciences and Operations
Research
  Box 843083
   Virginia Commonwealth University
   Richmond VA 23284-3083

  (804)828-1301 x124FAX: 828-8785
  http://www.people.vcu.edu/~nhenry




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-19 Thread Stan Brown

Doug Federman [EMAIL PROTECTED] wrote in sci.stat.edu:
I have a dilemma which I haven't found a good solution for.  I work with 
students who rotate with different preceptors on a monthly basis.  A 
student will have at least 12 evaluations over a year's time.  A 
preceptor usually will evaluate several students over the same year.  
Unfortunately, the preceptors rarely agree on the grades.  One preceptor 
is biased towards the middle of the 1-9 likert scale and another may be 
biased towards the upper end.  Rarely, does a given preceptor use the 1-9 
range completely.  I suspect that a 6 from an easy grader is equivalent 
to a 3 from a tough grader. 

First, it is rare that _any_ survey gets a significant number of 
responses at either end. People tend to think, Hmm, 1 to 9. Well, 1 
would be perfect and 9 would be valueless [or vice versa]. Nobody's 
perfect, so I'll write down a 2.

I have considered using ranks to give a better evaluation for a given 
student, but I have a serious constraint.  At the end of each year, I 
must submit to another body their evaluation on the original 1-9 scale, 
which is lost when using ranks.

Any suggestions?

You could make a case for almost any jiggering of the numbers -- and 
a case against it too. Since the 12 preceptors are grading the same 
group of people, you could justify various forms of normalizing.

But is it worth it? Don't the easy graders and :tough graders 
pretty much cancel each other out anyway?

-- 
Stan Brown, Oak Road Systems, Cortland County, New York, USA
  http://oakroadsystems.com
My reply address is correct as is. The courtesy of providing a correct
reply address is more important to me than time spent deleting spam.


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-19 Thread Glen Barnett

Stan Brown wrote:
 But is it worth it? Don't the easy graders and :tough graders
 pretty much cancel each other out anyway?

Not if some students only get hard graders and some only get easy
graders.

If all students got all graders an equal amount of time it probably
won't matter at all.

Glen


=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=



Re: Standardizing evaluation scores

2001-12-19 Thread Jim Snow


Glen Barnett [EMAIL PROTECTED] wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
 Stan Brown wrote:
  But is it worth it? Don't the easy graders and :tough graders
  pretty much cancel each other out anyway?

 Not if some students only get hard graders and some only get easy
 graders.

 If all students got all graders an equal amount of time it probably
 won't matter at all.

 Glen

If some graders use the whole scale and others only use part of the
scale or concentrate grades near the centre, then using raw scores means you
are giving the full scale graders more weight in the overall ranking of
students. If this is undesirable, grades could be scaled to a common mean
and equal mean deviation.  (Standard deviation would give increased weight
to extremes of the scale.)
In all these adjustments, we lose transparency of the process and this must
be weighed against the gains. I suspect that only sharp contrasts between
the behaviour of the graders and/or different students having different sets
of graders would justify this, and may well be better dealt with by
instructing the graders appropriately after pointing out that it is
desirable for all graders to have equal weight in assessment.




=
Instructions for joining and leaving this list and remarks about
the problem of INAPPROPRIATE MESSAGES are available at
  http://jse.stat.ncsu.edu/
=