Scott Plumlee wrote:
o?= wrote:
Hello,

My OpenBSD 3.9-stable Box is quite unstable. I don't have physical access to
my box so I can't debug it directly.
I've recompiled a GENERIC kernel with DEBUG support and set ddb.panic to 0
in sysctl.conf so that it's rebooting automaticly. But no kernel dump is
made after a kernel panic. I searched on the web without finding a solution.

Everytime the kernel panic is different. I tried the -current (and also 3.8). The result is nearly the same: no more kernel panics but the system freeze but it's still responding to the ping.

You totally lost me on that one.  Something panicked, something else didn't.

However, "system freeze but still responds to ping" can also be a memory exhaustion issue -- all RAM+swap got used, and all tasks end up getting deadlocked waiting for additional RAM to become available.


As I said before in another mail, this is NOT due to an hardware failure.
Many SAME machines work perfectly. The only difference is the revision of
the bios (vcore updated and Pstate disabled). I want to find the source of
the bug to correct it if I could.

I'm still awfully new to *nix, but isn't saying that "it's not hardware just because other boxes like this don't fail" the same as "my car can't be out of gas because other cars of the same model are still driving by me"?

pretty darned close.

I can understand if you mean that it's not due to an unsupported piece of hardware, in which case I would think the kernel panic would be the same, but how do you know it's not bad <insert your choice of memory, disk, cables, processor, heatsink, fan, etc etc here>?

Anyone who hasn't seen a broken piece of HW that works fine with X but not Y is new to the game. Anyone who trusts a HW diagnostic to "give" them the answer is really, really new to the game.

By themselves, diagnostics are like a screwdriver: in the hands of a knowledgeable person, very useful. In the hands of an idiot, dangerous. Without a brain engaged in their use and analysis of the results, they are just an inert object.


The OP already answered his own question (and been told this by others).
The machine has a buggy BIOS.
One version works, another doesn't.

Why do you think there is more than one revision? Because bugs were found. Odds are, those bugs were NOT found on OpenBSD, they were probably found running Windows, maybe Linux. OpenBSD *may* expose those bugs more clearly...but odds are, if you use that same buggy BIOS with another OS, you may learn to regret it.

Would it be possible to "fix" OpenBSD to work around this bug? Maybe. Completely pointless and self-defeating, however. Fix it for the buggy BIOS, you probably broke it for the "correct" BIOS....and now you have a chunk of code usable on precisely one variant of one bad computer. The code will not be properly maintained, and will probably do more bad than good some day in the future, if not immediately. Sometimes buggy hardware has to be worked around, because no fix is available or possible from the manufacturer and there is a clear benefit to adding "special case" code. When a proper fix IS available from the vendor, it is usually preferable to use it than to work around it.

Hey, if this problem turns out to expose a true logic bug in OpenBSD, go ahead, find it, show us, and get credit for the fix. But if "everytime the panic is different", it sounds like things are Just Plain Broke on the system, if a BIOS upgrade fixes it, sounds like the hardware wasn't set up properly, and the manufacturer figured that out, and FIXED THE PROBLEM.

Nick.

Reply via email to