Re: over 1T problem?

TAKAMURA Seishi Fri, 28 Jan 2000 04:38:57 -0800
Dear RAID users,

After struggling with the kernel(w/ debug option) and tons of
/var/log/messages lines, I have solved the "over 1T problem".  It was
simply caused by the integer overflow.

in drivers/block/raid5.c
> static inline unsigned long 
> raid5_compute_sector (int r_sector, unsigned int raid_disks, unsigned int data_disks,

In other lines of the source file, unsigned long variables are passed
to raid5_compute_sector().  The type of the first argument should be
unsigned long:

> raid5_compute_sector (unsigned long r_sector, unsigned int raid_disks, unsigned int 
>data_disks,

After this modification, mkraid finished successfully (still took 20
hours).

> % df /raid
> Filesystem           1k-blocks      Used Available Use% Mounted on
> /dev/md0             1105845360        20 1105845340   0% /raid

So far so good.  I had difficulties in executing mke2fs properly, but
this is another story.

Best regards,
     Seishi

>>> on Thu, 20 Jan 2000 17:42:59 +0900
>>> [EMAIL PROTECTED](TAKAMURA Seishi) said:
> 
> Dear RAID experts,
> 
> I have just joined this ML today, and have a problem on RAID5 system,
> which I'm installing now.
> 
> After about ten hours since "mkraid /dev/md0", HDD access stops and no
> more disk operation (such as "mke2fs /dev/md0") works.  I tried once
> again and got exactly the same error starting from the same block
> number (1073743253, according to /var/log/messages).  The block number
> cyclicly repeated.
> 
> I suspect when block number gets greater than 1024^3(=1073741824) some
> malfunction occurs...
> 
> My system configuration, /etc/raidtab, source modification,
> /var/log/messages(part) and /proc/mdstat are attached below.
> 
> Suggestions or pointers are highly appreciated.
> 
> Best regards,
>      Seishi Takamura
> 
> Seishi Takamura, Dr.Eng.
> NTT Cyber Space Laboratories
> Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan
> Tel: +81-468-59-2371, Fax: +81-468-59-2829
> E-mail: [EMAIL PROTECTED]
> 
> 
> (system configuration)
>   RedHat 6.1 (Japanese version)
>   kernel 2.2.14 + RAID patch(raid0145-19990824-2.2.11)
>   raidtools 19990824-0.90
>   CPU Pentium III 600MHz + 512MB memory
>   6 GB EIDE HDD (root and /boot), CD-ROM drive
>   3 SCSI Cards (Adaptec AHA2940U2W)
>   24 SCSI HDD Drives (Seagate ST150176LW Barracuda 50.1GB)
> 
>   Each SCSI card has eight HDD's connected (properly terminated, of
>   course).
> 
> (/etc/raidtab)
>   raiddev /dev/md0
>         raid-level      5
>         nr-raid-disks   24
>         nr-spare-disks  0
>         chunk-size      32
>         persistent-superblock 1
>         parity-algorithm        left-symmetric
>         device          /dev/sda1
>         raid-disk       0
>       ...
>         device          /dev/sdx1
>         raid-disk       23
> 
> (Modification)
>   In raidtools-0.90/md-int.h and /usr/src/linux/include/linux/raid/md_p.h,
>   I changed from
> #define MD_SB_DISKS_WORDS              384
>   to
> #define MD_SB_DISKS_WORDS              800
>   to enable up to 25 disks.
> 
> 
> (initial /proc/mdstat immediately after invoking mkraid)
> Personalities : [linear] [raid0] [raid1] [raid5] 
> read_ahead 1024 sectors
> md0 : active raid5 sdx1[23] sdw1[22] sdv1[21] sdu1[20] sdt1[19] sds1[18] sdr1[17] 
>sdq1[16] sdp1[15] sdo1[14] sdn1[13] sdm1[12] sdl1[11] sdk1[10] sdj1[9] sdi1[8] 
>sdh1[7] sdg1[6] sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 1123474560 blocks 
>level 5, 32k chunk, algorithm 2 [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU] resync=0% 
>finish=735.7min
> unused devices: <none>
> 
> (/var/log/messages)
> Jan 18 00:01:37 localhost kernel: ect 
> Jan 18 00:01:37 localhost kernel: compute_blocknr: map not correct 
> Jan 18 00:01:37 localhost last message repeated 112 times
> Jan 18 00:01:37 localhost kernel: compute_blocknr: mapect 
> Jan 18 00:01:37 localhost kernel: compute_blocknr: map not correct 
> Jan 18 00:01:37 localhost last message repeated 454 times
> Jan 18 00:01:37 localhost kernel: e I/O error for block 1073743253 
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block 
> 1073743285 
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block 
> 1073743317 
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block 
> 1073743349 
> Jan 18 00:01:37 localhost kernel: raid5: md0: unrecoverable I/O error for block 
> ...
> 
> (/proc/mdstat after the error)
> Personalities : [linear] [raid0] [raid1] [raid5] 
> read_ahead 1024 sectors
> md0 : active raid5 sdx1[23](F) sdw1[22](F) sdv1[21](F) sdu1[20](F) sdt1[19](F) s 
>ds1[18](F) sdr1[17](F) sdq1[16](F) sdp1[15](F) sdo1[14](F) sdn1[13](F) sdm1[12]( F) 
>sdl1[11](F) sdk1[10](F) sdj1[9](F) sdi1[8] sdh1[7] sdg1[6] sdf1[5] sde1[4] sd d1[3] 
>sdc1[2](F) sdb1[1](F) sda1[0](F) 1123474560 blocks level 5, 32k chunk, alg orithm 2 
>[24/6] [___UUUUUU_______________]
> unused devices: <none>

Seishi Takamura, Dr.Eng.
NTT Cyber Space Laboratories
Y922A 1-1 Hikarino-Oka, Yokosuka, Kanagawa, 239-0847 Japan
Tel: +81-468-59-2371, Fax: +81-468-59-2829
E-mail: [EMAIL PROTECTED]
Re: over 1T problem?

Reply via email to