Kernel 2.4.20-18.8 + Software-RAID5 = Corruption

Christoph Dworzak Mon, 23 Jun 2003 05:32:44 -0700

Hi

I have troubles with my softwareraid. it keeps corrupting itself. i mean,
it's corrupt, i do "e2fsck /dev/md0", say yes to everything, filesystem
seems ok, work on it, filesystem is corrupt again...



the raid was fine during the last months running on rh7.3 and 8.0. now
i upgraded to rh9 and then the troubles started.

the upgrade went relatively smooth, just anaconda couldn't correctly figure
out my boot-drive to install grub. it suggested /dev/hda but it should be
/dev/sda. i don't know, if it touched hda (i did a nfs-install booting from
floppy, that's probably what got anaconda confused. it didn't see the higher
bootpriority of scsi over ide).
i then fixed grub manually to boot from sda.

during the upgrade my raid was disabled (deleted from fstab) because
anaconda couldn't mount it and aborted the upgrade.

after installation i installed all updates to rh9.

cat /proc/version
Linux version 2.4.20-18.8 ([EMAIL PROTECTED]) (gcc version 3.2 20020903 (Red Hat Linux 
8.0 3.2-7)) #1 Thu May 29 08:57:39 EDT 2003

the system is a 1GHz Athlon on a Asus A7V (i think). the system and bootdrive
is a scsi-disk connected to a ncr-controller. the raid is on 4 ide-disks
connected to the onboard via and promise controllers.

cat /etc/raidtab:
raiddev /dev/md0
    raid-level                5
    nr-raid-disks             4
    nr-spare-disks            0
    chunk-size                32
    persistent-superblock     1
    parity-algorithm          left-symmetric

    device                    /dev/hda1
    raid-disk                 0
    device                    /dev/hdc1
    raid-disk                 1
    device                    /dev/hde1
    raid-disk                 2
    device                    /dev/hdg1
    raid-disk                 3

when i brought the raid back up, it ran in degraded mode. /dev/hde1 was
kicked from the raid. so i cheched the drive with "hdparm -t -T" and
"badblocks /dev/hde", but it seemd fine. so i did "raidhotadd /dev/md0
/dev/hde1".

after waiting for reconstruction:

cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 hdg1[3] hde1[2] hdc1[1] hda1[0]
      480238656 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]

unused devices: <none>

then i ran "e2fsck /dev/md0" which took a long time and tons of errors,
i had tons of stuff in lost+found and also most of the other files
were corrupt (could see bits from different files mixed together :(

of course i have no backup, but the data is not so valuable. it just takes
time to reencode all my cds again...

i tried to sort files out, but then the raid became corrupt agaiain, after
another fsck i tried again, but again the raid became corrupt.
that's where i am now.

i could start from scratch. ditch everything and recreate raid...

then i noticed this:
top -bn1
 14:00:02  up 1 day, 11:45,  4 users,  load average: 1.09, 1.11, 1.09
90 processes: 88 sleeping, 2 running, 0 zombie, 0 stopped
CPU states:   3.0% user   2.8% system  94.0% nice   0.0% iowait   0.0% idle
Mem:   514652k av,  506864k used,    7788k free,       0k shrd,  218388k buff
                    383288k actv,       0k in_d,   10192k in_c
Swap: 1049592k av,   22824k used, 1026768k free                  240800k cached

  PID USER     PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
[..]
    9 root     18446744073709551615 -20     0    0     0 SW<   0.0  0.0   0:00   0 
mdrecoveryd
[..]
 5694 root     18446744073709551615 -20     0    0     0 SW<   0.0  0.0   1:00   0 
raid5d
17801 root       9   0  1144  984   904 S     0.0  0.1   0:00   0 sshd
[..]

note the priority of mdrecoveryd and raid5d! (the sshd is just there to show
the normal priority of all the other processes).

what's wrong here?
does anybody successfully run a raid5 on kernel 2.4.20-18.8?
do you need more information?
how sould i proceed?

thanks for reading so far :)

bye
 dworz


-- 
redhat-list mailing list
unsubscribe mailto:[EMAIL PROTECTED]
https://www.redhat.com/mailman/listinfo/redhat-list

Kernel 2.4.20-18.8 + Software-RAID5 = Corruption

Reply via email to