I have tried to get the latest Linux RAID patches to work with the ABit
BE6-2 onboard HPT366 controller (BIOS ver 1.21) without success. There
has been a bit of discussion relating to this on this mailing list (as
well as the Linux-kernel mailing list) so I was hoping this may help
someone and answer some of my questions.
My first attempt was as per the instructions on the HighPoint webpage
http://www.highpoint-tech.com/ which if I'm not mistaken is to use the
following:
linux 2.2.14 (straight from http://www.kernel.org)
pre-patch-2.2.15-17
ide-2.2.15-17.20000405.patch
raid-2.3.15-A0 (from
http://people.redhat.com/mingo/raid-patches)
This resulted with a RAID-0 array of 2 IDE disks (Western Digital 102AA)
up and running with the ability to read and write from the array.
However, if I copied a file from one directory to another on the RAID0
then the system would lockup and die taking the filesystem with it.
eg.
hda -> md0 = Transfer OK
md0 -> hda = Transfer OK
md0 -> md0 = Lockup and corrupt filesystem
My 2nd try used a hint from a previous mail on this mailing list which
used the following:
linux-2.2.14
ide.2.2.14.20000111.patch
raid-2.2.14-B1
This resulted with a RAID-0 array of 2 IDE disks (WD 102AA) up and
running without the problem above. However, on light loads everything
is fine and works well. As soon as I have any more than a few programs
eg. cp, grep etc. accessing the disk at the same time I get an error
like:
hdg: timeout waiting for DMA
which results in a total lockup after what seems like a random duration.
If I run bonnie (any version) than the lockup is always guaranteed
during the rewriting stage.
I have tried all the possible combinations of Master/Slave,
Master/Master etc on the HPT366 controller but the same problems occur.
eg.
HPT366 1st port o----------------o HDD Master o--------------------o HDD
Slave
HPT366 2nd port o----------------o HDD Master o--------------------o HDD
Slave
HPT366 1st port o----------------o HDD Master
HPT366 2nd port o----------------o HDD Master
I have tried this on RedHat 6.1, 6.2 and Mandrake 7.0-2 and the problem
still exists (not surpassing since they are similar)
The hdparm settings in effect when errors occur are:
multicount 16
I/O support 32bit
unmaskirq Off
using DMA On
keepsettings Off
nowerr Off
readonly Off
read ahead 8
I have tried all tests on each HDD separately (not in a RAID array) and
there were no problems.
I have also tested the RAID array under the standard IDE driver and
there were no problems.
I have tried all tests on the RAID array when DMA is turned off on all
drives and there were no problems but sloooow (to be expected).
So it seems that the problem is with the interaction between the RAID
code and the HPT366 driver code when using DMA?? - is this a fair
conclusion?
Does anyone have any ideas?
Is there another known combination of Kernel, IDE and RAID patches that
work for someone else?
Should I use different drives? Are the WD102AA known to be buggy?
Should I update the HPT366 BIOS to the newer version? Is this an issue?
- does Linux use the HPT366 BIOS at all?
Should I use a better known controller? Is the HPT366 known to be buggy
with its use of DMA?
Is there an organised effort to fix this problem? - if there is, what
can I help with?
Thanks in Advance
Adrian Head