Ted Unangst wrote:
On 11/26/07, Daniel Ouellet <[EMAIL PROTECTED]> wrote:
My understanding's is that all drives are using an abstraction layer
between the kernel and the drivers itself.
Now, I don't know if there is a difference between drivers for a single
processor kernel and a multiple core kernel. So, first question is if
there is a difference or not. This for the SAS drivers in Sun server.
in a sense, that is true. on the i386 platform, up and smp kernels
use different interrupt code, which may result in some differences.
OK, That's a start. Now I am doing more tests and I am able to
crash/reboot the box at will when I increase the writing speed required
to the drive.
Now the way I do this is not very scientific may be, but still valid I
think.
I can copy huge files. I mean multiple GB no problem if I limit the
transfer speed. To do this I use
scp -l 100000 /tmp/test [EMAIL PROTECTED]:/var/test
As an example and I do not have anything else running on the remote box.
I am also checking the output of
systat vmstat
To see if I can see anything prior to crash, but nothing I can point to
yet, but still digging.
I can for presume that, this is most likely directly related to how fast
it is required to write to disk, or may be the level of interrupts
needed to be process that cause the crash/reboot.
I guess this is very stupid and most likely not possible, but to test
this interrupts possibility, is there a way to change the interrupts
limits, or the interrupts code use in the mp kernel to test using the
single core one? Or may be increase the limit if any on the Interrupts
level to isolate that more?
I am getting closer and closer, but suggestions would be welcome.
This bug annoy me so much that I would love to find it and look if I can
patch it too. But I am still trying to isolate the part that it might be
in. The Interrupts part is logical and may well be where it is as well.
Is there any limits preset there that I could test.
Sure look to me as an overflow or something as as soon as I increase the
writing speed to drives, or interrupt level I guess it crash.
May be to know that, or find out more, I could change the writing block
size if that's even possible to generate less interrupts and see the
results on the crash.
Any ideas is welcome. I have been testing and trying for a long time and
I always get a but closer, but not to the exact point yet anyway.
Best,
Daniel