Scott Plumlee wrote:
o?= wrote:
Hello,
My OpenBSD 3.9-stable Box is quite unstable. I don't have physical
access to
my box so I can't debug it directly.
I've recompiled a GENERIC kernel with DEBUG support and set ddb.panic
to 0
in sysctl.conf so that it's rebooting automaticly. But no kernel dump is
made after a kernel panic. I searched on the web without finding a
solution.
Everytime the kernel panic is different. I tried the -current (and
also 3.8). The result is nearly the same: no more
kernel panics but the system freeze but it's still responding to the
ping.
You totally lost me on that one. Something panicked, something else didn't.
However, "system freeze but still responds to ping" can also be a memory
exhaustion issue -- all RAM+swap got used, and all tasks end up getting
deadlocked waiting for additional RAM to become available.
As I said before in another mail, this is NOT due to an hardware failure.
Many SAME machines work perfectly. The only difference is the revision of
the bios (vcore updated and Pstate disabled). I want to find the
source of
the bug to correct it if I could.
I'm still awfully new to *nix, but isn't saying that "it's not hardware
just because other boxes like this don't fail" the same as "my car can't
be out of gas because other cars of the same model are still driving by
me"?
pretty darned close.
I can understand if you mean that it's not due to an unsupported piece
of hardware, in which case I would think the kernel panic would be the
same, but how do you know it's not bad <insert your choice of memory,
disk, cables, processor, heatsink, fan, etc etc here>?
Anyone who hasn't seen a broken piece of HW that works fine with X but
not Y is new to the game. Anyone who trusts a HW diagnostic to "give"
them the answer is really, really new to the game.
By themselves, diagnostics are like a screwdriver: in the hands of a
knowledgeable person, very useful. In the hands of an idiot, dangerous.
Without a brain engaged in their use and analysis of the results, they
are just an inert object.
The OP already answered his own question (and been told this by others).
The machine has a buggy BIOS.
One version works, another doesn't.
Why do you think there is more than one revision? Because bugs were
found. Odds are, those bugs were NOT found on OpenBSD, they were
probably found running Windows, maybe Linux. OpenBSD *may* expose those
bugs more clearly...but odds are, if you use that same buggy BIOS with
another OS, you may learn to regret it.
Would it be possible to "fix" OpenBSD to work around this bug? Maybe.
Completely pointless and self-defeating, however. Fix it for the buggy
BIOS, you probably broke it for the "correct" BIOS....and now you have a
chunk of code usable on precisely one variant of one bad computer. The
code will not be properly maintained, and will probably do more bad than
good some day in the future, if not immediately. Sometimes buggy
hardware has to be worked around, because no fix is available or
possible from the manufacturer and there is a clear benefit to adding
"special case" code. When a proper fix IS available from the vendor, it
is usually preferable to use it than to work around it.
Hey, if this problem turns out to expose a true logic bug in OpenBSD, go
ahead, find it, show us, and get credit for the fix. But if "everytime
the panic is different", it sounds like things are Just Plain Broke on
the system, if a BIOS upgrade fixes it, sounds like the hardware wasn't
set up properly, and the manufacturer figured that out, and FIXED THE
PROBLEM.
Nick.