One of the main reasons I decided to use an Item Response Theory (IRT) framework was that the testing platform, once fully operational, will not give students questions that are either too easy or too difficult for them, thus reducing anxiety and boredom for low and high ability students, respectively. In other words, high ability students will be challenged with more difficult questions and low ability students will receive questions that are challenging but matched to their ability. Each score is on the same scale, although some students will not receive the same questions. This is the beautiful thing! That is the concept of adaptive or tailored testing being implemented in the Python Programming: Procedural Online Test (http://www.adaptiveassessmentservices.com).
After reading the comment on 50% percent being optimal for measurement theory, I have to say about 90 years ago that was the best practice in order to maximize item/test variance, which maximized the distribution of scores. This is primarily a World War I and II convention in developing selection tests, i.e., Alpha and Beta, used to place conscripts in appropriate combat roles. Those two tests are the predecessors of the SAT administered by the Educational Testing Service, which is the organization where most of the war psychologists who developed Alpha and Beta went after the WW II. Because of their influence in selecting recruits who then received money after the war to go to college in the form of the GI Bill, these measurement specialists (psychometricians) did the same thing for ETS with the SAT in screening the same cohort for placement in colleges and universities around America. These psychologists had a strong influence of what constituted good practice in standardized testing. Accordingly, the practice of using 50% became well entrenched. Later, IRT came on the scene in the early 1950s as an alternative to classical test theory and has some great theoretical and practical advantages over the previous approach of selecting items that have a variance of .50. The computing technology was not available then to implement the theory. However, it wasn't until the advent of the PC in the late 70s and early 80s that got psychometricians like me motivated to begin the implementation of IRT; once again at the forefront in the development was the armed services in the late 70s. It will take another decade or so to break the hold that Classical Test Theory has on measurement, and expect students' test anxiety to remain high in the interim. But as more and more begin to realize the benefits of IRT, especially computer adaptive testing, over CTT, it will no longer be an issue of was guidance should be used to administer and score tests. >From: Chuck Allison <[EMAIL PROTECTED]> >Reply-To: Chuck Allison <[EMAIL PROTECTED]> >To: Laura Creighton <[EMAIL PROTECTED]> >CC: edu-sig@python.org, Scott David Daniels <[EMAIL PROTECTED]> >Subject: Re: [Edu-sig] Python Programming: Procedural Online Test >Date: Mon, 5 Dec 2005 00:52:50 -0700 > >Hello Laura, > >That's better than the Abstract Algebra class I took as an >undergraduate. The highest score on Test 1 was 19%. I got 6%! I retook >the class from another teacher and topped the class. Liked the subject >so much I took the second semester just for fun. Testing and teaching >strategies make a tremendous difference. > >Sunday, December 4, 2005, 11:50:22 PM, you wrote: > >LC> In a message of Sun, 04 Dec 2005 11:32:27 PST, Scott David Daniels >writes: > >>I wrote: > >> >> ... keeping people at 80% correct is great rule-of-thumb goal ... > >> > >>To elaborate on the statement above a bit, we did drill-and practice > >>teaching (and had students loving it). The value of the 80% is for > >>maximal learning. Something like 50% is the best for measurement theory > >>(but discourages the student drastically). In graduate school I had > >>one instructor who tried to target his tests to get 50% as the average > >>mark. It was incredibly discouraging for most of the students (I > >>eventually came to be OK with it, but it took half the course). > >LC> <snip> > >LC> 'Discouraging' misses the mark. The University of Toronto has >professors >LC> who like to test to 50% as well. And it causes suicides among >undergraduates >LC> who are first exposed to this, unless there is adequate preparation. >This >LC> is incredibly _dangerous_ stuff. > >LC> Laura > > >>--Scott David Daniels > >>[EMAIL PROTECTED] > >> > >>_______________________________________________ > >>Edu-sig mailing list > >>Edu-sig@python.org > >>http://mail.python.org/mailman/listinfo/edu-sig >LC> _______________________________________________ >LC> Edu-sig mailing list >LC> Edu-sig@python.org >LC> http://mail.python.org/mailman/listinfo/edu-sig > > > > >-- >Best regards, > Chuck > > >_______________________________________________ >Edu-sig mailing list >Edu-sig@python.org >http://mail.python.org/mailman/listinfo/edu-sig _______________________________________________ Edu-sig mailing list Edu-sig@python.org http://mail.python.org/mailman/listinfo/edu-sig