Bug Hunting 101 - Finding "The" Alpha Bug I've been told that "The" alpha bug has been around for quite some time and no one has been able to find or fix it. I've also been told looking for this bug has driven a few developers to drink, well, probably "drink more" is a better description. Anyhow, since I could use a drink, I'm going to give it a shot.
Since I don't have the skill to fix it myself, my goal is simply to figure out when "The" alpha bug entered the tree. If I can just figure out the `when' hopefully someone a lot smarter than me can figure out the `what' of the problem. Basically I'm going to turn loose a half dozen alpha systems compiling various versions of OpenBSD until I find where the bug stops occurring. As far as I can tell, the bug smells like a race condition of some sort and if my wild guess is correct, it will be difficult to reproduce consistently. With some (but not all) race conditions, you can increase the chance of triggering them by increasing loads. Since I want the race condition to occur, what is the best way stress to the systems while also doing make build? http://www.holm.cc/stress/ http://www.openbsd.org/cgi-bin/cvsweb/ports/sysutils/stress/ I simply don't know and I'm only guessing but the prime suspects for where the race might live seem to be physical memory management, PAL/interrupt handling or even the scheduler. Are there better ways to stress the system? Are there better ways to increase the odds of a race occurring? Since I needed to find a starting point, I went searching and reading through the archives of misc@, tech@, alpha@ and bugs@ even the netbsd archives in hopes of finding a "patient zero" where the bug was first reported. I found something interesting, namely a (more than once) reported bug that looks very similar to "The" alpha bug. The primary difference is you get "cpu_switch_queuescan" rather than "cpu_switch" in the trace output. 2003-10-01 21:40:00 http://marc.theaimsgroup.com/?l=openbsd-alpha&m=106504464724168&w=2 2003-08-03 12:00:14 http://marc.theaimsgroup.com/?l=openbsd-alpha&m=105999853009839&w=2 There is also another report that is vague but since it is missing the needed trace information, there's no way to tell if it's related. 2003-05-13 22:13:50 http://marc.theaimsgroup.com/?l=openbsd-bugs&m=105286536018393&w=2 >From other bug reports in the archive I know 3.8, 3.7 and 3.6 are all affected by "The" alpha bug if my hunch is correct and the bugs linked above are related to "The" alpha bug, then I should start the compile-a-thon at OpenBSD v3.3 and work backwards. If you've got a better idea, please let me know. Kind Regards, jcr