Concerning "The Camel has Two Humps", on 24 Jun 2007, at 8:02 pm, Jens Bennedsen wrote:
Michal Caspersen and I have replicated the study with the result "no correlation" - see http://db.grinnell.edu/sigcse/iticse2007/ Program/viewAcceptedProposal.asp?sessionType=paper&sessionNumber=51
It would be very pleasant if Dehnadi and Bornat had a result; it would not be surprising if they didn't. I have no opinion about whether they are right or wrong, but it is clear that their work should be replicated. It is also clear that Casperson, Bennedsen, and Larsen *DIDN'T* replicate Dehandi and Bornat's study and their paper has nothing useful to tell us about anything much. Frankly, I was deeply dismayed that ITiCSE 2007 saw right to accept "Mental models and Programming Aptitude", and that's the politest way I've been able to express myself about it. D&B claim Measurement M1 applied to population P1 predicts result R1 after treatment T1 well. C,B,&L claim Measurement M1 applied to population P2 does not predict result R2 after treatment T2; measurement M2 applied to population P2 does not predict result R3 after treatment T2. * The populations are dramatically and materially different. P1 is 61 students with NO PRIOR PROGRAMMING EXPERIENCE. P2 is 55 students with no prior programming experience AND 87 students with prior programming experience. The C,B,&L paper fails to separate out the 55 possibly relevant students from the 136 irrelevant students (irrelevant to the goal of replication, that is). For example, table 1 presents only pooled pass/fail results. If I am reading the paper correctly, figure 2 does present results for all and only the relevant students, but the predictor variable is not the predictor in D&B and the reponse variable is not the response variable in D&B either, so figure 2 doesn't really count as replication either. From the numbers in the paper, it is IMPOSSIBLE to tell whether D&B's claim is supported for the relevant population (the students with no prior programming experience) or not. * The treatments are dramatically and materially different. T1 is a 12-week course designed for people with no prior programming experience, which is such that there is about a 50% failure rate. The Camel paper claims that 30%-60% is typical. It appears to be focused on fundamental concepts of programming. T2 is a 7-week course in which a majority (61%) of students have prior experience, which is such that there is about a 4% failure rate. Let me repeat that: FOUR PERCENT. The aim is for students to know about "the role of conceptual modelling". The highest the failure rate for the relevant students could possibly be is 6/55 or about 11%. The difference in failure rates is overwhelming. Clearly, SOMETHING drastically different is happening here. If only the students with prior experience were separated from the ones without in the reporting, we might have some idea what. At least the following possibilities exist: The Aarhus course is not teaching the same thing as the Middlesex one, in which case it is unsurprising and uninformative that the ability to predict whether students are good at the Middlesex task should not transfer to the Aarhus task. The Aarhus teachers are some of the very best CS teachers in the world, far far better than the people getting "30%-60%" rates. This may well be true. The Aarhus students are some of the very best CS students in the world. This may well be true too. As we saw above, they are certainly different from the Middlesex ones, because for a clear majority of them this is NOT their first course. From the D&B paper, it seems likely that the Middlesex/Barnet students were, um, not the pick of the crop. That may be a factor too. The Aarhus examination is much easier than the Middlesex examination. At this University, exam papers go into the library where anyone may inspect them, so I expect that both D&B and C,B,&L can and should make their examinations available for researchers to compare. At any rate, never mind about measurement M, if Aarhus have a way to teach CS1 that results in a 4% failure rate, I don't *CARE* about anything else, I want to know how to do *THAT*. That is a far FAR more important result than replicating D&B or refuting them. There's another difference. D&B reported results based on 61 subjects of whom 8% handed in a blank sheet. (A further 30 subjects refused to take part on grounds not related to the study.) But C,B,&L report a 50% non-response rate (section 4.3). That is a huge non-response rate for a study like this. We are not told how many in the non-response group passed or how many failed; we could have been and we should have been. * Now we come to the statistics. With such a small failure rate (4%) it would have been extraordinarily difficult to demonstrate any ability of M1 to predict R2. In fact, once the small failure rate had been discovered, there wasn't really any point in proceeding further; they had failed to replicate D&B's setup closely enough and there's an end to it. Section 4.4 describes a discrete ordinal variable C with 6 levels. Section 4.4 describes a discrete ordinal variable G with 5 levels. These are not the variables that D&B used; the Camel paper makes no claim about C and G as such. Section 5.2 reports on a Pearson correlation (which is appropriate for continuous variables with a Gaussian distribution) between C and G. From a statistical point of view, this is more than a little dubious, and I would not expect the resulting number to mean anything. I would at least expect the 6x5 table to be displayed so that readers could compute meaningful statistics for themselves. Section 5.2 describes a count variable C with values 0..12. Section 5.2 describes a discretised time variable G with range 10..40 minutes. It is rather confusing that these variables have the same names as the ones in section 4.4, because section 4.4's G and second 5.2's G differ not just in number of levels but in what kind of measurement is involved. Figure 2 plots these variables. It appears that G is right-censored; 4 students hit a deadline without completing the task, and so presumably failed. If this interpretation is right, then the failure rate for no-prior-programming-experience students was 4/55 = about 7%, radically different from the Middlesex/Barnet 50%. It's *almost* sensible to compute a correlation here, although I for one would not be happy to do so without taking the right-censoring into account. In any case, figure 2 and the associated correlation coefficient have no real relevance to an assessment of D&B's claims, because D&B don't *make* any explicit or implicit claims about the time it takes someone to complete the exam. * Questioning the validity of questioning the validity Section 6.2 of C,B,&L says that they interviewed "the 14 students who were inconsistent but did pass the final exam." This is a bit worrying, because table 1 said there were 16 such students, not 14. "Our harsh conclusion" is wholly unwarranted, because C,B,&L did not interview the students who were *consistent* and passed; they merely assert without any evidence at all that those students too were guessing once up front and were merely lucky in their guess. But it is possible that the 'consistent' students acted like someone solving a crossword puzzle: you might think you know a word, but you don't actually write it down until you have checked it against several of the clues that cross it. Maybe the 'consistent' students were looking ahead to check their guess before writing any answers down. Of course it is possible that they *were* just lucky, but because they weren't asked, we shall never know. It would be useful if D&B could repeat their study and conduct post-test interviews with students in all groups to find out what strategy they were following. It might be, for example, that the really predictive thing is "look-ahead" -vs- "go back and fix up" -vs- "dive in and never admit a fault". All in all, I conclude that D&B's claim has yet to be seriously challenged. ---------------------------------------------------------------------- PPIG Discuss List (discuss@ppig.org) Discuss admin: http://limitlessmail.net/mailman/listinfo/discuss Announce admin: http://limitlessmail.net/mailman/listinfo/announce PPIG Discuss archive: http://www.mail-archive.com/discuss%40ppig.org/