Bug Hunting 101 - Finding "The" Alpha Bug

I've been told that "The" alpha bug has been around for quite some time
and no one has been able to find or fix it. I've also been told looking
for this bug has driven a few developers to drink, well, probably "drink
more" is a better description. Anyhow, since I could use a drink, I'm
going to give it a shot.

Since I don't have the skill to fix it myself, my goal is simply to
figure out when "The" alpha bug entered the tree. If I can just figure
out the `when' hopefully someone a lot smarter than me can figure out
the `what' of the problem. Basically I'm going to turn loose a half
dozen alpha systems compiling various versions of OpenBSD until I find
where the bug stops occurring.

As far as I can tell, the bug smells like a race condition of some sort
and if my wild guess is correct, it will be difficult to reproduce
consistently. With some (but not all) race conditions, you can increase
the chance of triggering them by increasing loads. Since I want the race
condition to occur, what is the best way stress to the systems while
also doing make build?

http://www.holm.cc/stress/
http://www.openbsd.org/cgi-bin/cvsweb/ports/sysutils/stress/

I simply don't know and I'm only guessing but the prime suspects for
where the race might live seem to be physical memory management,
PAL/interrupt handling or even the scheduler. 

Are there better ways to stress the system?
Are there better ways to increase the odds of a race occurring?

Since I needed to find a starting point, I went searching and reading
through the archives of misc@, tech@, alpha@ and bugs@ even the netbsd
archives in hopes of finding a "patient zero" where the bug was first
reported. I found something interesting, namely a (more than once)
reported bug that looks very similar to "The" alpha bug. The primary
difference is you get "cpu_switch_queuescan" rather than "cpu_switch" in
the trace output.

2003-10-01 21:40:00
http://marc.theaimsgroup.com/?l=openbsd-alpha&m=106504464724168&w=2

2003-08-03 12:00:14
http://marc.theaimsgroup.com/?l=openbsd-alpha&m=105999853009839&w=2

There is also another report that is vague but since it is missing the
needed trace information, there's no way to tell if it's related.
2003-05-13 22:13:50
http://marc.theaimsgroup.com/?l=openbsd-bugs&m=105286536018393&w=2

>From other bug reports in the archive I know 3.8, 3.7 and 3.6 are all
affected by "The" alpha bug if my hunch is correct and the bugs linked
above are related to "The" alpha bug, then I should start the
compile-a-thon at OpenBSD v3.3 and work backwards.

If you've got a better idea, please let me know.

Kind Regards,
jcr

Reply via email to