Dear RAID users,
After struggling with the kernel(w/ debug option) and tons of
/var/log/messages lines, I have solved the "over 1T problem". It was
simply caused by the integer overflow.
in drivers/block/raid5.c
> static inline unsigned long
> raid5_compute_sector (int r_sector, unsigned int raid_disks, unsigned int data_disks,
In other lines of the source file, unsigned long variables are passed
to raid5_compute_sector(). The type of the first argument should be
unsigned long:
> raid5_compute_sector (unsigned long r_sector, unsigned int raid_disks, unsigned int
>data_disks,
After this modification, mkraid finished successfully (still took 20
hours).
> % df /raid
> Filesystem 1k-blocks Used Available Use% Mounted on
> /dev/md0 1105845360 20 1105845340 0% /raid
So far so good. I had difficulties in executing mke2fs properly, but
this is another story.
Best regards,
Seishi
>>> on Thu, 20 Jan 2000 17:42:59 +0900
>>> [EMAIL PROTECTED](TAKAMURA Seishi) said:
>
> Dear RAID experts,
>
> I have just joined this ML today, and have a problem on RAID5 system,
> which I'm installing now.
>
> After about ten hours since "mkraid /dev/md0", HDD access stops and no
> more disk operation (such as "mke2fs /dev/md0") works. I tried once
> again and got exactly the same error starting from the same block
> number (1073743253, according to /var/log/messages). The block number
> cyclicly repeated.
>
> I suspect when block number gets greater than 1024^3(=1073741824) some
> malfunction occurs...
>
> My system configuration, /etc/raidtab, source modification,
> /var/log/messages(part) and /proc/mdstat are attached below.
>
> Suggestions or pointers are highly appreciated.
>
> Best regards,
> Seishi Takamura
>
> Seishi Takamura, Dr.Eng.
> NTT Cyber Space Laboratories
> Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan
> Tel: +81-468-59-2371, Fax: +81-468-59-2829
> E-mail: [EMAIL PROTECTED]
>
>
> (system configuration)
> RedHat 6.1 (Japanese version)
> kernel 2.2.14 + RAID patch(raid0145-19990824-2.2.11)
> raidtools 19990824-0.90
> CPU Pentium III 600MHz + 512MB memory
> 6 GB EIDE HDD (root and /boot), CD-ROM drive
> 3 SCSI Cards (Adaptec AHA2940U2W)
> 24 SCSI HDD Drives (Seagate ST150176LW Barracuda 50.1GB)
>
> Each SCSI card has eight HDD's connected (properly terminated, of
> course).
>
> (/etc/raidtab)
> raiddev /dev/md0
> raid-level 5
> nr-raid-disks 24
> nr-spare-disks 0
> chunk-size 32
> persistent-superblock 1
> parity-algorithm left-symmetric
> device /dev/sda1
> raid-disk 0
> ...
> device /dev/sdx1
> raid-disk 23
>
> (Modification)
> In raidtools-0.90/md-int.h and /usr/src/linux/include/linux/raid/md_p.h,
> I changed from
> #define MD_SB_DISKS_WORDS 384
> to
> #define MD_SB_DISKS_WORDS 800
> to enable up to 25 disks.
>
>
> (initial /proc/mdstat immediately after invoking mkraid)
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sdx1[23] sdw1[22] sdv1[21] sdu1[20] sdt1[19] sds1[18] sdr1[17]
>sdq1[16] sdp1[15] sdo1[14] sdn1[13] sdm1[12] sdl1[11] sdk1[10] sdj1[9] sdi1[8]
>sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 1123474560 blocks
>level 5, 32k chunk, algorithm 2 [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU] resync=0%
>finish=735.7min
> unused devices: <none>
>
> (/var/log/messages)
> Jan 18 00:01:37 localhost kernel: ect
> Jan 18 00:01:37 localhost kernel: compute_blocknr: map not correct
> Jan 18 00:01:37 localhost last message repeated 112 times
> Jan 18 00:01:37 localhost kernel: compute_blocknr: mapect
> Jan 18 00:01:37 localhost kernel: compute_blocknr: map not correct
> Jan 18 00:01:37 localhost last message repeated 454 times
> Jan 18 00:01:37 localhost kernel: e I/O error for block 1073743253
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block
> 1073743285
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block
> 1073743317
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block
> 1073743349
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block
> ...
>
> (/proc/mdstat after the error)
> Personalities : [linear] [raid0] [raid1] [raid5]
> read_ahead 1024 sectors
> md0 : active raid5 sdx1[23](F) sdw1[22](F) sdv1[21](F) sdu1[20](F) sdt1[19](F) s
>ds1[18](F) sdr1[17](F) sdq1[16](F) sdp1[15](F) sdo1[14](F) sdn1[13](F) sdm1[12]( F)
>sdl1[11](F) sdk1[10](F) sdj1[9](F) sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sd d1[3]
>sdc1[2](F) sdb1[1](F) sda1[0](F) 1123474560 blocks level 5, 32k chunk, alg orithm 2
>[24/6] [___UUUUUU_______________]
> unused devices: <none>
Seishi Takamura, Dr.Eng.
NTT Cyber Space Laboratories
Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan
Tel: +81-468-59-2371, Fax: +81-468-59-2829
E-mail: [EMAIL PROTECTED]