Ted Unangst wrote:
On 11/26/07, Daniel Ouellet <[EMAIL PROTECTED]> wrote:
My understanding's is that all drives are using an abstraction layer
between the kernel and the drivers itself.

Now, I don't know if there is a difference between drivers for a single
processor kernel and a multiple core kernel. So, first question is if
there is a difference or not. This for the SAS drivers in Sun server.

in a sense, that is true.  on the i386 platform, up and smp kernels
use different interrupt code, which may result in some differences.

OK, That's a start. Now I am doing more tests and I am able to crash/reboot the box at will when I increase the writing speed required to the drive.

Now the way I do this is not very scientific may be, but still valid I think.

I can copy huge files. I mean multiple GB no problem if I limit the transfer speed. To do this I use

scp -l 100000 /tmp/test [EMAIL PROTECTED]:/var/test

As an example and I do not have anything else running on the remote box.

I am also checking the output of

systat vmstat

To see if I can see anything prior to crash, but nothing I can point to yet, but still digging.

I can for presume that, this is most likely directly related to how fast it is required to write to disk, or may be the level of interrupts needed to be process that cause the crash/reboot.

I guess this is very stupid and most likely not possible, but to test this interrupts possibility, is there a way to change the interrupts limits, or the interrupts code use in the mp kernel to test using the single core one? Or may be increase the limit if any on the Interrupts level to isolate that more?

I am getting closer and closer, but suggestions would be welcome.

This bug annoy me so much that I would love to find it and look if I can patch it too. But I am still trying to isolate the part that it might be in. The Interrupts part is logical and may well be where it is as well.

Is there any limits preset there that I could test.

Sure look to me as an overflow or something as as soon as I increase the writing speed to drives, or interrupt level I guess it crash.

May be to know that, or find out more, I could change the writing block size if that's even possible to generate less interrupts and see the results on the crash.

Any ideas is welcome. I have been testing and trying for a long time and I always get a but closer, but not to the exact point yet anyway.

Best,

Daniel

Reply via email to