Re: Raid 4 idea!

2006-04-08 Thread JaniD++
Hello,

...

  What do you think, Neil?

 I don't know what Neil thinks, but I have never liked the performance
implications of RAID-4, could you say a few words about why 4 rather  than
5? My one test with RAID-4 showed the parity drive as a huge bottleneck, and
seeing that practice followed theory, I gave up on it.

I think the performance is depend on the specific job.
In my case, the level 4 is better than level 5.
My system is a download server.
This job makes a lot of reads, and some write.
I use 4x12 unit raid4 array, and 1 raid0 array from 4 raid4.

Why?
let me see:
1. easy to start without the parity disk.
1/a without the parity disk, it is more faster than raid5/4, and i can
easy and fast upload the server with data, and after the upload is done,
relatively quickly can generate the 4x1 parity information at one time!
1/b if more disk fails, a little easyer to recover than raid5

2. I speaks more read and a little write.
On raid5 if somebody want to write only one bit to the array, all drive need
to read, and two disk is need to write after.
This requires too much time to force to wait the read processes.
But on raid4, all the drives need to read, and only one of the valuable
drives need to write! (+ the parity drive)
This is a little bit faster for me...

3. and this is the most important:
My system have 2 bottleneck with about 1000 downloader users at one time:
a, the drives seek time
b, the io bandwith.
I can make the balance between this 2 bottleneck with the readahead
settings.
On raid 5 the blockdev readahead _reads the parity too_, and waste the
bandwith, and the cache of the drive, the cache in memory, but can seeks N
drive.
On raid 4, the readahead is all valuable, but i can use only N-1 drive to
seeking.

4. on case of very high download traffic, and requires an upload too, i can
disable the party, and speed up the write process, and after the load is
fall back to normal, i can recreate the parity again.
This is a balance between the  performance, and redundancy.
This is a little dangerous, but this is my choise, and this way is, why
linux is so beautiful! :-)

5. with Neils patch i can use the bitmap too. ;-)

6. the parity drive becomes the bottleneck because it is offload the other
drives.
On other hand, if i plan to upgrade the system, i only need to buy faster
parity device! :-)

7/a on extreme case, i can move the parity out from the box, using NBD.
The nbd server can speed up and / or can store all the four parity drive
with more cost effective way.

7/b Optionally i can set up the NBD server again, and silenty (slowly)
reconstruct the parity again using the legacy raid 1, and i can use one USB
mobile rack to move the live parity from loop device to the new HDD in rack,
and i need to stop the system only for replace the bad disk to the new done
synced parity drive.
(I did not use hot-swap at the moment.)

 And in return I'll point out that this makes recovery very expensive, read
everything while reconstructing, then read everything all over again when
making a new parity drive after repair.

On my idea?
Yes, this is right.
But!
If one drive is failing, the parity disk convertion close equal time to
reconstruction, except, it goes more and more faster while the degraded
raid4 array gets closer to the clean raid0! (raid4 without parity)

And with one (exactly 4x1) failed drive my system can go on the top
performance, until i replace the old drive to new one.

The final parity recreation on raid4:
I can only point to the mdadm default raid5 creation mechanism, the fantom
spare drive!
Neil sad, this is faster than norma raid5 creation, and he have right!
With this option, only 1 disk is writing, and all other is only reading!

Cheers,
Janos





-- 
bill davidsen [EMAIL PROTECTED]
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


(X)FS corruption on 2 SATA disk RAID 1

2006-03-29 Thread JaniD++
Hello, list,

I think, this is generally hardware error, but looks like software problem
too.
At this point there is no dirty data in memory!

Cheers,
Janos

[EMAIL PROTECTED] /]# cmp -b /dev/sda1 /dev/sdb1
/dev/sda1 /dev/sdb1 differ: byte 68881481729, line 308395510 is 301 M-A  74

[EMAIL PROTECTED] /]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty]
md10 : active raid1 sdb1[1] sda1[0]
  136729088 blocks [2/2] [UU]
  bitmap: 0/131 pages [0KB], 512KB chunk

unused devices: none

[EMAIL PROTECTED] /]# mount
192.168.0.1://NFS/ROOT-BASE/ on / type nfs
(rw,hard,rsize=8192,wsize=8192,timeo=
5,retrans=0,actimeo=1)
none on /proc type proc (rw,noexec,nosuid,nodev)
none on /dev/pts type devpts (rw,gid=5,mode=620)
none on /dev/shm type tmpfs (rw)
none on /sys type sysfs (rw)
/dev/ram0 on /mnt/fast type ext2 (rw)
none on /dev/cpuset type cpuset (rw)
/dev/md10 on /mnt/1 type xfs (ro)
[EMAIL PROTECTED] /]#

cut from log:

Mar 29 08:14:45 dy-xeon-1 kernel: scsi1 : ata_piix
Mar 29 08:14:45 dy-xeon-1 kernel:   Vendor: ATA   Model: WDC
WD2000JD-19H  Rev: 08.0
Mar 29 08:14:45 dy-xeon-1 kernel:   Type:   Direct-Access
ANSI SCSI revision: 05
Mar 29 08:14:45 dy-xeon-1 kernel:   Vendor: ATA   Model: WDC
WD2000JD-19H  Rev: 08.0
Mar 29 08:14:45 dy-xeon-1 kernel:   Type:   Direct-Access
ANSI SCSI revision: 05
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sda: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel:  sda: sda1 sda2
Mar 29 08:14:45 dy-xeon-1 kernel: sd 0:0:0:0: Attached scsi disk sda
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: 390721968 512-byte hdwr
sectors (200050 MB)
Mar 29 08:14:45 dy-xeon-1 kernel: SCSI device sdb: drive cache: write back
Mar 29 08:14:45 dy-xeon-1 kernel:  sdb: sdb1 sdb2
Mar 29 08:14:45 dy-xeon-1 kernel: sd 1:0:0:0: Attached scsi disk sdb
Mar 29 08:14:45 dy-xeon-1 kernel: sd 0:0:0:0: Attached scsi generic sg0 type
0
Mar 29 08:14:45 dy-xeon-1 kernel: sd 1:0:0:0: Attached scsi generic sg1 type
0

Smart logs:
sda:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  3 Spin_Up_Time0x0007   130   124   021Pre-fail
s   -   6025
  4 Start_Stop_Count0x0032   100   100   040Old_age
ys   -   97
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail
s   -   0
  7 Seek_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  9 Power_On_Hours  0x0032   089   089   000Old_age
ys   -   8047
 10 Spin_Retry_Count0x0013   100   253   051Pre-fail
s   -   0
 11 Calibration_Retry_Count 0x0013   100   253   051Pre-fail
s   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
ys   -   97
194 Temperature_Celsius 0x0022   120   111   000Old_age
ys   -   30
196 Reallocated_Event_Count 0x0032   200   200   000Old_age
ys   -   0
197 Current_Pending_Sector  0x0012   200   200   000Old_age
ys   -   0
198 Offline_Uncorrectable   0x0012   200   200   000Old_age
ys   -   0
199 UDMA_CRC_Error_Count0x000a   200   253   000Old_age
ys   -   0
200 Multi_Zone_Error_Rate   0x0009   200   200   051Pre-fail  Offline
-   0

SMART Error Log Version: 1
No Errors Logged

sdb:
SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME  FLAG VALUE WORST THRESH TYPE  UPDATED
WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  3 Spin_Up_Time0x0007   127   120   021Pre-fail
s   -   6175
  4 Start_Stop_Count0x0032   100   100   040Old_age
ys   -   94
  5 Reallocated_Sector_Ct   0x0033   200   200   140Pre-fail
s   -   0
  7 Seek_Error_Rate 0x000b   200   200   051Pre-fail
s   -   0
  9 Power_On_Hours  0x0032   089   089   000Old_age
ys   -   8065
 10 Spin_Retry_Count0x0013   100   253   051Pre-fail
s   -   0
 11 Calibration_Retry_Count 0x0013   100   253   051Pre-fail
s   -   0
 12 Power_Cycle_Count   0x0032   100   100   000Old_age
ys   -   94
194 Temperature_Celsius 0x0022   117   109   000Old_age
ys   -   33

Re: Help Please! mdadm hangs when using nbd or gnbd

2006-02-23 Thread JaniD++

- Original Message - 
From: Brian Kelly [EMAIL PROTECTED]
To: linux-raid@vger.kernel.org
Sent: Thursday, February 23, 2006 1:25 AM
Subject: Help Please! mdadm hangs when using nbd or gnbd


 Hail to the Great Linux RAID Gurus!  I humbly seek any assistance you
 can offer.

 I am building a couple of 20 TB logical volumes from six storage nodes
 each offering two 8TB raw storage devices built with Broadcom RAIDCore
 BC4852 SATA cards.  Each storage node (called leadstor1-6) needs to
 publish its two raw devices with iSCSI, nbd or gnbd over a gigabit
 network which the head node (leadstor) combines into a RAID 5 volume
 using mdadm.

 My problem is that when using nbd or gnbd the original build of the
 array on the head node quickly halts, as if a deadlock has occurred.  I
 have this problem with RAID 1 and RAID 5 configurations regardless of
 the size of the storage node published devices.  Here's a demonstration
 with two 4 TB drives being mirrored using nbd:

 *** Begin Demonstration ***

 [EMAIL PROTECTED] nbd-2.8.3]# uname -a
 Linux leadstor.unidata.ucar.edu 2.6.15-1.1831_FC4smp #1 SMP Tue Feb 7
 13:51:52 EST 2006 x86_64 x86_64 x86_64 GNU/Linux

   I start by preparing the system for nbd and md devices

 [EMAIL PROTECTED] ~]# modprobe nbd
 [EMAIL PROTECTED] ~]# cd /dev
 [EMAIL PROTECTED] dev]# ./MAKEDEV nb
 [EMAIL PROTECTED] dev]# ./MAKEDEV md

   I then mount two 4TB volumes from leadstor5 and leadstor6

 [EMAIL PROTECTED] dev]# cd /opt/nbd-2.8.3
 [EMAIL PROTECTED] nbd-2.8.3]# ./nbd-client leadstor5 2002 /dev/nb5
 Negotiation: ..size = 3899484160KB
 bs=1024, sz=3899484160
 [EMAIL PROTECTED] nbd-2.8.3]# ./nbd-client leadstor6 2002 /dev/nb6
 Negotiation: ..size = 3899484160KB
 bs=1024, sz=3899484160

   I confirm the volumes are mounted properly

 [EMAIL PROTECTED] nbd-2.8.3]# fdisk -l /dev/nb5

 Disk /dev/nb5: 3993.0 GB, 3993071779840 bytes
 255 heads, 63 sectors/track, 485463 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

 Disk /dev/nb5 doesn't contain a valid partition table
 [EMAIL PROTECTED] nbd-2.8.3]# fdisk -l /dev/nb6

 Disk /dev/nb6: 3993.0 GB, 3993071779840 bytes
 255 heads, 63 sectors/track, 485463 cylinders
 Units = cylinders of 16065 * 512 = 8225280 bytes

 Disk /dev/nb6 doesn't contain a valid partition table

   I prepare the drives to be used in mdadm

 [EMAIL PROTECTED] nbd-2.8.3]# mdadm -V
 mdadm - v1.12.0 - 14 June 2005
 [EMAIL PROTECTED] nbd-2.8.3]# mdadm --zero-superblock /dev/nb5
 [EMAIL PROTECTED] nbd-2.8.3]# mdadm --zero-superblock /dev/nb6

   I create a device to mirror the two volumes

 [EMAIL PROTECTED] nbd-2.8.3]# mdadm --create /dev/md2 -l 1 -n 2 /dev/nb5
 /dev/nb6
 mdadm: array /dev/md2 started.

   And watch the progress in /proc/mdstat

 [EMAIL PROTECTED] nbd-2.8.3]# date
 Wed Feb 22 16:18:55 MST 2006
 [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat
 Personalities : [raid1]
 md2 : active raid1 nbd6[1] nbd5[0]
   3899484096 blocks [2/2] [UU]
   []  resync =  0.0% (1408/3899484096)
 finish=389948.2min speed=156K/sec

 md1 : active raid1 sdb3[1] sda3[0]
   78188288 blocks [2/2] [UU]

 md0 : active raid1 sdb1[1] sda1[0]
   128384 blocks [2/2] [UU]

 unused devices: none

   But no more has been done a minute later

 [EMAIL PROTECTED] nbd-2.8.3]# date
 Wed Feb 22 16:19:49 MST 2006
 [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat
 Personalities : [raid1]
 md2 : active raid1 nbd6[1] nbd5[0]
   3899484096 blocks [2/2] [UU]
   []  resync =  0.0% (1408/3899484096)
 finish=2599655.1min speed=23K/sec

 md1 : active raid1 sdb3[1] sda3[0]
   78188288 blocks [2/2] [UU]

 md0 : active raid1 sdb1[1] sda1[0]
   128384 blocks [2/2] [UU]

 unused devices: none

   And later still, no more of the resync has been done

 [EMAIL PROTECTED] nbd-2.8.3]# date
 Wed Feb 22 16:20:38 MST 2006
 [EMAIL PROTECTED] nbd-2.8.3]# cat /proc/mdstat
 Personalities : [raid1]
 md2 : active raid1 nbd6[1] nbd5[0]
   3899484096 blocks [2/2] [UU]
   []  resync =  0.0% (1408/3899484096)
 finish=4679379.2min speed=13K/sec

 md1 : active raid1 sdb3[1] sda3[0]
   78188288 blocks [2/2] [UU]

 md0 : active raid1 sdb1[1] sda1[0]
   128384 blocks [2/2] [UU]

 unused devices: none

   At this point, the resync is stuck and the system is idle.  I have
 left it overnight, but it progresses no further.  100% of the time this
 test will stop at 1408 on the rebuild.  With other configurations, the
 number will change (for example, it was 1280 for a 6 column RAID 5), but
 always halt at the same spot.

   Nothing is logged in the system files

 [EMAIL PROTECTED] nbd-2.8.3]# tail -15 /var/log/messages
 Feb 22 15:48:35 leadstor kernel: parport: PnPBIOS parport detected.
 Feb 22 15:48:35 leadstor kernel: parport0: PC-style at 0x378, irq 7
[PCSPP]
 Feb 22 15:48:35 leadstor kernel: lp0: using parport0 (interrupt-driven).
 Feb 22 15:48:35 leadstor kernel: lp0: console ready
 Feb 22 15:48:37 leadstor 

Re: raid 4, and bitmap.

2006-02-04 Thread JaniD++
 Ahh, i almost forget!
 The mdadm is sometimes drop cannot allocate memory and next try
segfault
 when i try -G --bitmap=internal on 2TB arrays!
 And after segfault, the full raid is stops...

 Cheers,
 Janos

I think i found the bug, its me. :-)

Today it happens again, and i see, i have misstyped the internal word like
intarnal.
The mdadm is accepted that, and try to make the bitmap on NFS again, and
crashed the raid.

I think it is neccessary to better test:

- the bitmap file's fs
- the filename itself.
(i mean did not allow the current directory, or the internal and none to be
a separated option)

Cheers,
Janos

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 4 resize, raid0 limit question

2006-02-04 Thread JaniD++

--cut--

   I plan to resize (grow) one raid4 array.
  
   1. stop the array.
   2. resize the partition on all disks to fit the maximum size.
 
  The approach is currently not supported.  It would need a change to
  mdadm to find the old superblock and relocate it to the new end of the
  partition.
 
  The only currently 'supported' way it to remove devices one at a time,
  resize them, and add them back in as new devices, waiting for the
  resync.

 Good news! :-)
 This takes about 1 weeks for me... :-(
 I should recreate


 
  NeilBrown

Neil!

What do you think about making 2 files into proc or sys, as 2 margin for
raid sync?
The default value is 0 and sectorcount. (or KiB of the array)

The user can set this befor the sync is starts, or when the sync is run, and
if the sync is done, the default values are set again automatically.
The sync is move between the two value only.

This is easy to write - i think-, not too dangerous, and some times (or
often) very practical.

This will help for me often, including this time, to raid4 resize from 2TB
to 3.6TB.


 
 
  
   After this restart(assemble) the array is possiple?
   I mean, how can the kernel find the superblock fits on the half of the
 new
   partitions?
   I need to recreate the array instead of using -G option?
   Can i force raid to resync only the new area?
  
   The raid0 in 2.6.16-rc1 supports 4x 3.6TB soure devices? :-)
 
  ... maybe?
  I think it does, but I cannot promise anything.

 Anyway, i will test it on the weekend, and i dont need to grow the FS too
on
 it.

 How can i safe test it (and NBD 2TB) to work well, without data lost?

Anybody know any good tool to test the 13.4TB raid0 array inside 8TB live,
and valuable fs without data lost?

I need to test the raid0 and NBD before resize the FS to fit to the array.

I can think only to dd with skip=NN option, but at this time i did'nt trust
the dd enough. :-)

Thanks,
Janos



 Thanks,
 Janos

 
  NeilBrown
 
 
  
   Thanks,
   Janos
  
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid
in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid 4, and bitmap.

2006-02-02 Thread JaniD++

- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Friday, February 03, 2006 1:09 AM
Subject: Re: raid 4, and bitmap.


 On Friday February 3, [EMAIL PROTECTED] wrote:
  Hello, list, Neil,
 
  I try to add bitmaps to raid4, and mdadm is done this fine.
  In the /proc/mdstat shows this, and it is really works well.
 
  But on reboot, the kernel drops the bitmap(, and resync the entire array
if
  it is unclean). :(
 
  It is still uncomplete now? (2.6.16-rc1)


 It is an 'internal' bitmap, or is the bitmap in a file?

It is internal.
The external bitmap is not works for me, because on the boot, only the NFS
is reachable, and it cause crash.

(note: this is raid4 not raid5!)


 If the bitmap is in a file, you need to me sure that the file is
 provided by mdadm when the array is assembled - using in-kernel
 autodetect won't work.

 If it is an internal bitmap it should work.
 Are there any kernel messages during boot that might be interesting?

It is something, i will find it one minute! :-)


 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Raid 4 resize, raid0 limit question

2006-02-02 Thread JaniD++

- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Friday, February 03, 2006 1:12 AM
Subject: Re: Raid 4 resize, raid0 limit question


 On Friday February 3, [EMAIL PROTECTED] wrote:
  Hello, list,
 
  I plan to resize (grow) one raid4 array.
 
  1. stop the array.
  2. resize the partition on all disks to fit the maximum size.

 The approach is currently not supported.  It would need a change to
 mdadm to find the old superblock and relocate it to the new end of the
 partition.

 The only currently 'supported' way it to remove devices one at a time,
 resize them, and add them back in as new devices, waiting for the
 resync.

Good news! :-)
This takes about 1 weeks for me... :-(
I should recreate



 NeilBrown


 
  After this restart(assemble) the array is possiple?
  I mean, how can the kernel find the superblock fits on the half of the
new
  partitions?
  I need to recreate the array instead of using -G option?
  Can i force raid to resync only the new area?
 
  The raid0 in 2.6.16-rc1 supports 4x 3.6TB soure devices? :-)

 ... maybe?
 I think it does, but I cannot promise anything.

Anyway, i will test it on the weekend, and i dont need to grow the FS too on
it.

How can i safe test it (and NBD 2TB) to work well, without data lost?

Thanks,
Janos


 NeilBrown


 
  Thanks,
  Janos
 
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Fw: raid 4, and bitmap.

2006-02-02 Thread JaniD++

- Original Message - 
From: JaniD++ [EMAIL PROTECTED]
To: Neil Brown [EMAIL PROTECTED]
Sent: Friday, February 03, 2006 1:20 AM
Subject: Re: raid 4, and bitmap.



 - Original Message - 
 From: Neil Brown [EMAIL PROTECTED]
 To: JaniD++ [EMAIL PROTECTED]
 Cc: linux-raid@vger.kernel.org
 Sent: Friday, February 03, 2006 1:09 AM
 Subject: Re: raid 4, and bitmap.


  On Friday February 3, [EMAIL PROTECTED] wrote:
   Hello, list, Neil,
  
   I try to add bitmaps to raid4, and mdadm is done this fine.
   In the /proc/mdstat shows this, and it is really works well.
  
   But on reboot, the kernel drops the bitmap(, and resync the entire
array
 if
   it is unclean). :(
  
   It is still uncomplete now? (2.6.16-rc1)
 
 
  It is an 'internal' bitmap, or is the bitmap in a file?
 
  If the bitmap is in a file, you need to me sure that the file is
  provided by mdadm when the array is assembled - using in-kernel
  autodetect won't work.
 
  If it is an internal bitmap it should work.
  Are there any kernel messages during boot that might be interesting?

 Sorry, i did not log this, and i dont want to restart the sync and system
 for this.
 Anyway, i add back the bitmap, and the next crash we will see

 I mean this was:
 bitmap is only support in raid1.
 bitmap is removed.

 But not so sure. :(

 Cheers,
 Janos

 
  NeilBrown


-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: where is the spare drive? :-)

2006-01-12 Thread JaniD++
- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Thursday, January 12, 2006 4:07 AM
Subject: Re: where is the spare drive? :-)


 On Monday January 2, [EMAIL PROTECTED] wrote:
 
  5. The question
 
  Why shows sdh2 as spare?
  The MD array size is correct.
  And i really can see, the all drive is reading, and sdh2 is *ONLY*
writing.
 

  man mdadm

 Towards the end of the CREATE MODE section:

When creating a RAID5 array, mdadm will automatically create a
degraded
array with an extra spare drive.  This is because  building  the
spare
into a degraded array is in general faster than resyncing the
parity on
a non-degraded, but not clean, array.  This feature can be
over-ridden
with the --force option.


 I hope this clarifies the situation.

 NeilBrown

Ahh, this was avoid my attention.
The mdadm man page (and functionallity) is quite large.

I think this is more important to let some people to overwrite own data.
I think it is neccessary to place some note to the man page to warn people
about this exception.

Anyway this is a good idea! :-)

Thanks to note me about this.

Cheers,
Janos


 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


built in readahead? - chunk size question

2006-01-12 Thread JaniD++
Hello, list,

I have found one interesting issue.
I use 4 disk node with NBD, and the concentrator distributes the load equal
thanks to 32KB chunksize RAID0 inside.

At this time i am working on the system upgrade, and found one interesting
issue, and possibly one bottleneck on the system.

The concentrator shows this with  iostat -d -k -x 10:
(I have marked the interesting parts with [ ])

Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s
avgrq-sz avgqu-sz   await  svctm  %util
nbd054.15   0.00 45.85  0.00 6169.830.00  3084.92 0.00
134.55 1.43   31.11   7.04  32.27   --node-1
nbd158.24   0.00 44.06  0.00 6205.790.00  [3102.90] 0.00
140.86   516.74 11490.79  22.70 100.00 --node-2
nbd255.84   0.00 44.76  0.00 6159.440.00  3079.72 0.00
137.62 1.51   33.73   6.88  30.77
nbd355.34   0.00 45.05  0.00 6169.030.00  3084.52 0.00
136.92 1.07   23.79   5.72  25.77
md31 0.00   0.00 401.70  0.10 24607.391.00 12303.70 0.50
61.25 0.000.00   0.00   0.00

The old node-1 shows this:
Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s
avgrq-sz avgqu-sz   await  svctm  %util
hda140.26   0.80  9.19  3.50 1195.60   34.37   597.8017.18
96.94 0.20   15.43  11.81  14.99
hdc133.37   0.00  8.89  3.30 1138.06   26.37   569.0313.19
95.54 0.17   13.85  11.15  13.59
hde142.76   1.40 13.99  3.90 1253.95   42.36   626.9721.18
72.49 0.29   16.31  10.00  17.88
hdi136.56   0.20 13.19  3.10 1197.20   26.37   598.6013.19
75.14 0.33   20.12  12.82  20.88
hdk134.07   0.30 13.89  3.40 1183.62   29.57   591.8114.79
70.20 0.28   16.30  10.87  18.78
hdm137.46   0.20 13.39  3.80 1205.99   31.97   603.0015.98
72.05 0.38   21.98  12.67  21.78
hdo125.07   0.10 11.69  3.20 1093.31   26.37   546.6513.19
75.22 0.32   21.54  14.23  21.18
hdq131.37   1.20 12.49  3.70 1150.85   39.16   575.4219.58
73.53 0.30   18.77  12.04  19.48
hds130.97   1.40 13.59  4.10 1155.64   43.96   577.8221.98
67.84 0.57   32.37  14.80  26.17
sda148.55   1.30 10.09  3.70 1269.13   39.96   634.5719.98
94.96 0.30   21.81  14.86  20.48
sdb131.07   0.10  9.69  3.30 1125.27   27.17   562.6413.59
88.74 0.18   13.92  11.31  14.69
md0  0.00   0.00 1611.49  5.29 12891.91   42.36  [6445.95]21.18
8.00 0.000.00   0.00   0.00

The new node #2 shows this:
Device:rrqm/s wrqm/s   r/s   w/s  rsec/s  wsec/srkB/swkB/s
avgrq-sz avgqu-sz   await  svctm  %util
hda1377.02   0.00 15.88  0.20 11143.261.60  5571.63 0.80
692.92 0.39   24.47  18.76  30.17
hdb1406.79   0.00  8.59  0.20 11323.081.60  5661.54 0.80
1288.18 0.28   32.16  31.48  27.67
hde1430.77   0.00  8.19  0.20 11511.691.60  5755.84 0.80
1372.00 0.27   32.74  29.17  24.48
hdf1384.42   0.00  6.99  0.20 11130.471.60  5565.23 0.80
1547.67 0.40   56.94  54.86  39.46
sda1489.11   0.00 15.08  0.20 12033.571.60  6016.78 0.80
787.40 0.36   23.33  14.38  21.98
sdb1392.11   0.00 14.39  0.20 11251.951.60  5625.97 0.80
771.56 0.39   26.78  16.16  23.58
sdc1468.33   3.00 14.29  0.40 11860.94   27.17  5930.4713.59
809.52 0.37   25.24  14.97  21.98
sdd1498.30   1.50 14.99  0.30 12106.29   14.39  6053.15 7.19
792.99 0.40   26.21  15.82  24.18
sde1446.55   0.00 13.79  0.20 11683.521.60  5841.76 0.80
835.49 0.37   26.36  16.14  22.58
sdf1510.59   0.00 13.19  0.20 12191.011.60  6095.50 0.80
910.81 0.39   28.96  17.39  23.28
sdg1421.18   0.00 14.69  0.20 11486.911.60  5743.46 0.80
771.81 0.35   23.83  15.23  22.68
sdh  4.50   4.50  0.30  0.50   38.36   39.9619.1819.98
98.00 0.001.25   1.25   0.10
md1  0.00   0.00 15960.54  4.80 127684.32   38.36 [63842.16]
19.18 8.00 0.000.00   0.00   0.00

The node-1 (+3,4) have one raid-5 with chunksize 32K
The new node-2 have currently raid4, chunksize 1024K

The NBD is serves only 1KB blocks. (ethernet network)

Currently to clean test, the readahead on all nodes is set to 0 on all
devices, including md[0-1]!

The question is this:
The 3.1MB/s requests on concentrator how can generate 6.4MB/s read on node1
and 63.8MB/s on node2 with all readahead 0?

Does the raid 4,5 hardcoded readahead?
Or if the nbd-server fetch one kb, the raid (or another part of OS) reads
the entire chunk?

Thanks,
Janos

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 read performance

2006-01-10 Thread JaniD++

- Original Message - 
From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
Sent: Tuesday, January 10, 2006 12:25 AM
Subject: Re: raid5 read performance


 1. it is not good to use so many disks in one raid. this means that in
 degraded mode
 10 disks would be needed to reconstruct one slice of data.
 2. i did not understand what is raid purpose.

Yes, i know that.
In my system, this was the best choise.

I have 4 disk node inside 4x12 Maxtor 200GB (exactly 10xIDE+2xSATA).
The disk nodes sevres nbd.
The concentrator joins the nodes with sw-raid0

The system is a generally free web storage.

 3. 10 MB/s is very slow. what sort of disks do u have ?

4x(2xSATA+10xIDE) Maxtor 200GB

The system sometimes have 500-800-1000 downloaders at same time.
In this load, the per node traffic is only 10MB/s. (~100Mbit/s)

First i think the sync/async IO problem.
At this time i think the bottleneck on the nodes is the PCI-32 with 8 HDD.
:(

 4. what is the raid stripe size ?

Currently all raid layers have 32KB chunks.

Cheers,
Janos


 On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote:
 
  - Original Message -
  From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
  To: JaniD++ [EMAIL PROTECTED]
  Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
  Sent: Wednesday, January 04, 2006 2:49 PM
  Subject: Re: raid5 read performance
 
 
   1. do you want the code ?
 
  Yes.
  If it is difficult to set.
  I use 4 big raid5 array (4 disk node), and the performace is not too
good.
  My standalone disk can do ~50MB/s, but 11 disk in one raid array does
only
  ~150Mbit/s.
  (With linear read using dd)
  At this time i think this is my systems pci-bus bottleneck.
  But on normal use, and random seeks, i am happy, if one disk-node can do
  10MB/s ! :-(
 
  Thats why i am guessing this...
 
   2. I managed to gain linear perfromance with raid5.
   it seems that both raid 5 and raid 0 are caching read a head
buffers.
   raid 5 cached small amount of read a head while raid0 did not.
 
  Aham.
  But...
  I dont understand...
  You wrote that, the RAID5 is slower than RAID0.
  The read a head buffering/caching is bad for performance?
 
  Cheers,
  Janos
 
 
  
  
   On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote:
   
- Original Message -
From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
To: Mark Hahn [EMAIL PROTECTED]
Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
Sent: Wednesday, January 04, 2006 9:14 AM
Subject: Re: raid5 read performance
   
   
 I guess i was not clear enough.

 i am using raid5 over 3 maxtor disks. the chunk size is 1MB.
 i mesured the io coming from one disk alone when I READ
 from it with 1MB buffers , and i know that it is ~32MB/s.

 I created raid0 over two disks and my throughput grown to
 64 MB/s.

 Doing the same thing with raid5 ended in 32 MB/s.

 I am using async io since i do not want to wait for several disks
 when i send an IO. By sending a buffer which is striped aligned
 i am supposed to have one to one relation between a disk and an
 io.

 iostat show that all of the three disks work but not fully.
   
Hello,
   
How do you set sync/async io?
Please, let me know! :-)
   
Thanks,
Janos
   
   
   
  
  
   --
   Raz
 
 


 --
 Raz
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: raid5 read performance

2006-01-10 Thread JaniD++
- Original Message - 
From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Tuesday, January 10, 2006 9:05 PM
Subject: Re: raid5 read performance


 NBD for network block device ?

Yes. :-)

 why do u use it ?

I need only one big block device.
In the beginning, i try almost all tool to transport the block devices to
the concentrator, and the best choise (speed and stability) looks like
RedHat's GNBD.
But GNBD is have the same problem, like NBD, the old deadlock problem on
heavy write.
The only difference is the GNBD issues that rarely than NBD.
Couple of months ago, Herbert Xu have fixed the NBD-deadlock problem (with
my help:-), and now the fixed NBD is the best choise!

Do you have better idea? :-)
Please let me know!

 what type of elevator do you use ?

Elevator?
What do you think exactly?
My system's actually performance is thanks to block devices good readahead
settings. (in all layer, including nbd)

Cheers,
Janos



 On 1/10/06, JaniD++ [EMAIL PROTECTED] wrote:
 
  - Original Message -
  From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
  To: JaniD++ [EMAIL PROTECTED]
  Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
  Sent: Tuesday, January 10, 2006 12:25 AM
  Subject: Re: raid5 read performance
 
 
   1. it is not good to use so many disks in one raid. this means that in
   degraded mode
   10 disks would be needed to reconstruct one slice of data.
   2. i did not understand what is raid purpose.
 
  Yes, i know that.
  In my system, this was the best choise.
 
  I have 4 disk node inside 4x12 Maxtor 200GB (exactly 10xIDE+2xSATA).
  The disk nodes sevres nbd.
  The concentrator joins the nodes with sw-raid0
 
  The system is a generally free web storage.
 
   3. 10 MB/s is very slow. what sort of disks do u have ?
 
  4x(2xSATA+10xIDE) Maxtor 200GB
 
  The system sometimes have 500-800-1000 downloaders at same time.
  In this load, the per node traffic is only 10MB/s. (~100Mbit/s)
 
  First i think the sync/async IO problem.
  At this time i think the bottleneck on the nodes is the PCI-32 with 8
HDD.
  :(
 
   4. what is the raid stripe size ?
 
  Currently all raid layers have 32KB chunks.
 
  Cheers,
  Janos
 
  
   On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote:
   
- Original Message -
From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
Sent: Wednesday, January 04, 2006 2:49 PM
Subject: Re: raid5 read performance
   
   
 1. do you want the code ?
   
Yes.
If it is difficult to set.
I use 4 big raid5 array (4 disk node), and the performace is not too
  good.
My standalone disk can do ~50MB/s, but 11 disk in one raid array
does
  only
~150Mbit/s.
(With linear read using dd)
At this time i think this is my systems pci-bus bottleneck.
But on normal use, and random seeks, i am happy, if one disk-node
can do
10MB/s ! :-(
   
Thats why i am guessing this...
   
 2. I managed to gain linear perfromance with raid5.
 it seems that both raid 5 and raid 0 are caching read a head
  buffers.
 raid 5 cached small amount of read a head while raid0 did not.
   
Aham.
But...
I dont understand...
You wrote that, the RAID5 is slower than RAID0.
The read a head buffering/caching is bad for performance?
   
Cheers,
Janos
   
   


 On 1/4/06, JaniD++ [EMAIL PROTECTED] wrote:
 
  - Original Message -
  From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
  To: Mark Hahn [EMAIL PROTECTED]
  Cc: Linux RAID Mailing List linux-raid@vger.kernel.org
  Sent: Wednesday, January 04, 2006 9:14 AM
  Subject: Re: raid5 read performance
 
 
   I guess i was not clear enough.
  
   i am using raid5 over 3 maxtor disks. the chunk size is 1MB.
   i mesured the io coming from one disk alone when I READ
   from it with 1MB buffers , and i know that it is ~32MB/s.
  
   I created raid0 over two disks and my throughput grown to
   64 MB/s.
  
   Doing the same thing with raid5 ended in 32 MB/s.
  
   I am using async io since i do not want to wait for several
disks
   when i send an IO. By sending a buffer which is striped
aligned
   i am supposed to have one to one relation between a disk and
an
   io.
  
   iostat show that all of the three disks work but not fully.
 
  Hello,
 
  How do you set sync/async io?
  Please, let me know! :-)
 
  Thanks,
  Janos
 
 
 


 --
 Raz
   
   
  
  
   --
   Raz
   -
   To unsubscribe from this list: send the line unsubscribe linux-raid
in
   the body of a message to [EMAIL PROTECTED]
   More majordomo info at  http://vger.kernel.org/majordomo-info.html
 
 


 --
 Raz

-
To unsubscribe from this list: send the line unsubscribe linux-raid

Re: where is the spare drive? :-)

2006-01-05 Thread JaniD++

- Original Message - 
From: Marc [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]; linux-raid@vger.kernel.org
Sent: Thursday, January 05, 2006 7:16 AM
Subject: Re: where is the spare drive? :-)


 On Mon, 2 Jan 2006 00:26:58 +0100, JaniD++ wrote
  Hello, list,
 
  I found something interesting when i try to create a brand new array
  on brand new drives
 

 snip

  5. The question
 
  Why shows sdh2 as spare?
  The MD array size is correct.
  And i really can see, the all drive is reading, and sdh2 is *ONLY*
writing.
 

 I'm not 100% sure but from a post by Neil a while a go on the list, the
spare
 device is a temporary construct created during the resync operation. Once
the
 resync is complete it should disappear.

 You could try searching the list archives for the post - choice of
keywords is
 up to you ;)

Thanks, but i have found the bug already. ;-)

If i create new raid5, it should only parity resyncing, and not spare
rebuilding!

This happens, only if i use mdadm.
With raidtools works fine.

My problem is now the bitmap. :(
Only mdadm supports this

Cheers,
Janos


 Regards,
 Marc

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


raid-reconf question

2005-12-29 Thread JaniD++
Hello, list,

I try to test raidreconf utility on my spare drives in my disk nodes.
(i want to convert raid0 chunksize 32K to 1M)

Why happenning this?

[EMAIL PROTECTED] raid-converter]# cat /proc/mdstat
Personalities : [linear] [raid0] [raid1] [raid5] [multipath] [faulty]
md20 : active raid0 nbd7[3] nbd6[2] nbd5[1] nbd4[0]
  39101696 blocks 32k chunks

unused devices: none
[EMAIL PROTECTED] raid-converter]# mdadm -D /dev/md20
/dev/md20:
Version : 00.90.03
  Creation Time : Thu Dec 29 02:02:45 2005
 Raid Level : raid0
 Array Size : 39101696 (37.29 GiB 40.04 GB)
   Raid Devices : 4
  Total Devices : 4
Preferred Minor : 20
Persistence : Superblock is persistent

Update Time : Thu Dec 29 02:02:45 2005
  State : clean
 Active Devices : 4
Working Devices : 4
 Failed Devices : 0
  Spare Devices : 0

 Chunk Size : 32K

   UUID : 6865dd35:58a41e0c:b0c1a78a:2aa6d02d
 Events : 0.1

Number   Major   Minor   RaidDevice State
   0  4340  active sync   /dev/nb4
   1  4351  active sync   /dev/nb5
   2  4362  active sync   /dev/nb6
   3  4373  active sync   /dev/nb7
[EMAIL PROTECTED] raid-converter]# raidstop --configfile=raidtab.old /dev/md20
[EMAIL PROTECTED] raid-converter]# ./raidreconf -o raidtab.old -n
raidtab.new -m /dev/md20
Working with device /dev/md20
Parsing raidtab.old
Parsing raidtab.new
Your on-disk array MUST be clean first.
reconfiguration failed
[EMAIL PROTECTED] raid-converter]#

Anybody has an idea?

Thanks,
Janos

File: raidtab.old
raiddev /dev/md20
raid-level  0
nr-raid-disks   4
chunk-size  32
persistent-superblock 1
device  /dev/nb4
raid-disk   0
device  /dev/nb5
raid-disk   1
device  /dev/nb6
raid-disk   2
device  /dev/nb7
raid-disk   3

File: raidtab.new
raiddev /dev/md20
raid-level  0
nr-raid-disks   4
chunk-size  1024
persistent-superblock 1
device  /dev/nb4
raid-disk   0
device  /dev/nb5
raid-disk   1
device  /dev/nb6
raid-disk   2
device  /dev/nb7
raid-disk   3

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question BUGREPORT!

2005-12-22 Thread JaniD++
- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Thursday, December 22, 2005 5:46 AM
Subject: Re: RAID5 resync question BUGREPORT!


 On Monday December 19, [EMAIL PROTECTED] wrote:
  - Original Message - 
  From: Neil Brown [EMAIL PROTECTED]
  To: JaniD++ [EMAIL PROTECTED]
  Cc: linux-raid@vger.kernel.org
  Sent: Monday, December 19, 2005 1:57 AM
  Subject: Re: RAID5 resync question BUGREPORT!
  
   How big is your array?
 
   Raid Level : raid5
   Array Size : 1953583360 (1863.08 GiB 2000.47 GB)
  Device Size : 195358336 (186.31 GiB 200.05 GB)
 
 
   The default bitmap-chunk-size when the bitmap is in a file is 4K, this
   makes a very large bitmap on a large array.

 Hmmm The bitmap chunks are in the device space rather than the array
 space. So 4K chunks in 186GiB is 48million chunks, so 48million bits.
 8*4096 bits per page, so 1490 pages, which is a lot, and maybe a
 waste, but you should be able to allocate 4.5Meg...

 But there is a table which holds pointers to these pages.
 4 bytes per pointer (8 on a 64bit machine) so 6K or 12K for the table.
 Allocating anything bigger than 4K can be a problem, so that is
 presumably the limit you hit.

 The max the table size should be is 4K, which is 1024 pages (on a
 32bit machine), which is 33 million bits.  So we shouldn't allow more
 than 33million (33554432 actually) chunks.
 On you array, that would be 5.8K, so 8K chunks should be ok, unless
 you have a 64bit machine, then 16K chunks.
 Still that is wasting a lot of space.

My system is currently running on i386, 32.
I can see, the 2TB array is usually hit some limits. :-)
My first idea was the variables phisical size. (eg: int:32768, double 65535,
etc...)
Did you chech that? :-)


 
  Yes, and if i can see correctly, it makes overflow.
 
   Try a larger bitmap-chunk size e.g.
  
  mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
 
  I think it is still uncompleted!
 
  [EMAIL PROTECTED] /]# mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
  mdadm: Warning - bitmaps created on this kernel are not portable
between different architectured.  Consider upgrading the Linux kernel.
  Segmentation fault

 Oh dear There should have been an 'oops' message in the kernel
 logs.  Can you post it.

Yes, you have right!

If i think correclty, the problem is the live bitmap file on NFS. :-)
(i am a really good tester! :-D)


Dec 19 10:58:37 st-0001 kernel: md0: bitmap file is out of date (0 
82198273) -- forcing full recovery
Dec 19 10:58:37 st-0001 kernel: md0: bitmap file is out of date, doing full
recovery
Dec 19 10:58:37 st-0001 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 0078
Dec 19 10:58:38 st-0001 kernel:  printing eip:
Dec 19 10:58:38 st-0001 kernel: c0213524
Dec 19 10:58:38 st-0001 kernel: *pde = 
Dec 19 10:58:38 st-0001 kernel: Oops:  [#1]
Dec 19 10:58:38 st-0001 kernel: SMP
Dec 19 10:58:38 st-0001 kernel: Modules linked in: netconsole
Dec 19 10:58:38 st-0001 kernel: CPU:0
Dec 19 10:58:38 st-0001 kernel: EIP:0060:[c0213524]Not tainted VLI
Dec 19 10:58:38 st-0001 kernel: EFLAGS: 00010292   (2.6.14.2-NBDFIX)
Dec 19 10:58:38 st-0001 kernel: EIP is at nfs_flush_incompatible+0xf/0x8d
Dec 19 10:58:38 st-0001
Dec 19 10:58:38 st-0001 kernel: eax:    ebx: 0f00   ecx:
   edx: 0282
Dec 19 10:58:38 st-0001 kernel: esi: 0001   edi: c1fcaf40   ebp:
f7dc7500   esp: e2281d7c
Dec 19 10:58:38 st-0001 kernel: ds: 007b   es: 007b   ss: 0068
Dec 19 10:58:38 st-0001 kernel: Process mdadm (pid: 30771,
threadinfo=e228 task=f6f28540)
Dec 19 10:58:38 st-0001 kernel: Stack:  0282 c014fd3f c1fcaf40
0060 0f00 0001 c1fcaf40
Dec 19 10:58:38 st-0001 kernel:f7dc7500 c04607e1  c1fcaf40
 1000 c1fcaf40 0f00
Dec 19 10:58:38 st-0001 kernel:c1fcaf40 ffaa6000  c04619a7
f7dc7500 c1fcaf40 0001 
Dec 19 10:58:38 st-0001 kernel: Call Trace:
Dec 19 10:58:38 st-0001 kernel:  [c014fd3f] page_address+0x8e/0x94
Dec 19 10:58:38 st-0001 kernel:  [c04607e1] write_page+0x5b/0x15d
Dec 19 10:58:38 st-0001 kernel:  [c04619a7]
bitmap_init_from_disk+0x3eb/0x4df
Dec 19 10:58:38 st-0001 kernel:  [c0462b79] bitmap_create+0x1dc/0x2d3
Dec 19 10:58:38 st-0001 kernel:  [c045d579] set_bitmap_file+0x68/0x19f
Dec 19 10:58:38 st-0001 kernel:  [c045e0f6] md_ioctl+0x456/0x678
Dec 19 10:58:38 st-0001 kernel:  [c04f7640]
rpcauth_lookup_credcache+0xe3/0x1cb
Dec 19 10:58:38 st-0001 kernel:  [c04f7781] rpcauth_lookupcred+0x59/0x95
Dec 19 10:58:38 st-0001 kernel:  [c020c240]
nfs_file_set_open_context+0x29/0x4b
Dec 19 10:58:38 st-0001 kernel:  [c03656e8] blkdev_driver_ioctl+0x6b/0x80
Dec 19 10:58:38 st-0001 kernel:  [c0365824] blkdev_ioctl+0x127/0x19e
Dec 19 10:58:38 st-0001 kernel:  [c016a2fb] block_ioctl+0x2b/0x2f
Dec 19 10:58:38 st-0001 kernel:  [c01745ed] do_ioctl+0x2d/0x81
Dec 19 10

Re: RAID0 performance question

2005-12-20 Thread JaniD++

- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: Al Boldi [EMAIL PROTECTED]; linux-raid@vger.kernel.org
Sent: Wednesday, December 21, 2005 2:40 AM
Subject: Re: RAID0 performance question


 On Sunday December 18, [EMAIL PROTECTED] wrote:
 
  The raid (md) device why dont have scheduler in sysfs?
  And if it have scheduler, where can i tune it?

 raid0 doesn't do any scheduling.
 All it does is take requests from the filesystem, decide which device
 they should go do (possibly splitting them if needed) and forwarding
 them on to the device.  That is all.

  The raid0 can handle multiple requests at one time?

 Yes.  But raid0 doesn't exactly 'handle' requests.  It 'directs'
 requests for other devices to 'handle'.

 
  For me, the performance bottleneck is cleanly about RAID0 layer used
exactly
  as concentrator to join the 4x2TB to 1x8TB.
  But it is only a software, and i cant beleave it is unfixable, or
  tunable.

 There is really nothing to tune apart from chunksize.

 You can tune the way the filesystem/vm accesses the device by setting
 readahead (readahead on component devices of a raid0 has exactly 0
 effect).

First i want to sorry, about Neil not interested thing in previous mail...

:-(
I have already try the all available options, including readahead in all
layer (result in earlyer mails), and chunksize.
But with this settings, i cannot workaround this.
And the result is incomprehensible for me!
The raid0 performance is not equal with one component , with sum of all
component , and not equal with the slowest component!


 You can tune the underlying devices by choosing a scheduler (for a
 disk drive) or a packet size (for over-the-network devices) or
 whatever.

The NBD has a scheduler, and this is already tuned for really top
performance, and for the components it is really great! :-)
(I have planned to set the NBD to 4KB packets, but this is hard, becaused by
my NICs are not supported the jumbo packets...)


 But there is nothing to tune in raid0.


 Also, rather than doing measurements on the block devices (/dev/mdX)
 do measurements on a filesystem created on that device.
 I have often found that the filesystem goes faster than the block
 device.

I use XFS, and the two performance is almost equal, depends on kind of load.
But in most often case, it is almost equal.

Thanks,
Janos



 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question BUGREPORT!

2005-12-19 Thread JaniD++
- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Monday, December 19, 2005 1:57 AM
Subject: Re: RAID5 resync question BUGREPORT!


 On Thursday November 17, [EMAIL PROTECTED] wrote:
  Hello,
 
  Now i trying the patch
 
  [EMAIL PROTECTED] root]# mdadm -G --bitmap=/raid.bm /dev/md0
  mdadm: Warning - bitmaps created on this kernel are not portable
between different architectured.  Consider upgrading the Linux kernel.
  mdadm: Cannot set bitmap file for /dev/md0: Cannot allocate memory

 How big is your array?

 Raid Level : raid5
 Array Size : 1953583360 (1863.08 GiB 2000.47 GB)
Device Size : 195358336 (186.31 GiB 200.05 GB)


 The default bitmap-chunk-size when the bitmap is in a file is 4K, this
 makes a very large bitmap on a large array.

Yes, and if i can see correctly, it makes overflow.

 Try a larger bitmap-chunk size e.g.

mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0

I think it is still uncompleted!

[EMAIL PROTECTED] /]# mdadm -G --bitmap-chunk=256 --bitmap=/raid.bm /dev/md0
mdadm: Warning - bitmaps created on this kernel are not portable
  between different architectured.  Consider upgrading the Linux kernel.
Segmentation fault
[EMAIL PROTECTED] /]#

And the raid layer is stopped.
(The nbd-server stops to serving, and the cat /proc/mdstat is hangs too.
i try to sync, and
echo b /proc/sysrq-trigger
After reboot, everything is back to normal.)

This generates one 96000 byte /raid.bm.

(Anyway i think the --bitmap-chunk option is neccessary to be automaticaly
generated.)


  [EMAIL PROTECTED] root]# mdadm -X /dev/md0

 This usage is only appropriate for arrays with internal bitmaps (I
 should get mdadm to check that..).

Is there a way to check external bitmaps?

 
  And now what? :-)

 Either create an 'internal' bitmap, or choose a --bitmap-chunk size
 that is larger.

First you sad, the space to the internal bitmap is only 64K.
My first bitmap file is ~4MB, and with --bitmap-chunk=256 option still 96000
Byte.

I don't think so... :-)

I am affraid to overwrite an existing data.

Cheers,
Janos



 Thanks for the report.

 NeilBrown
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID0 performance question

2005-12-17 Thread JaniD++

- Original Message - 
From: Al Boldi [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Friday, December 02, 2005 8:53 PM
Subject: Re: RAID0 performance question


 JaniD++ wrote:
   But the cat /dev/md31 /dev/null (RAID0, the sum of 4 nodes)
   only makes ~450-490 Mbit/s, and i dont know why
  
   Somebody have an idea? :-)
 
  Try increasing the read-ahead setting on /dev/md31 using
  'blockdev'. network block devices are likely to have latency
  issues and would benefit from large read-ahead.

 Also try larger chunk-size ~4mb.
 
  But i don't know exactly what to try.
  increase or decrease the chunksize?
  In the top layer raid (md31,raid0) or in the middle layer raids (md1-4,
  raid1) or both?
 

 What I found is that raid over nbd is highly max-chunksize dependent, due
to
 nbd running over TCP.  But increasing chunksize does not necessarily mean
 better system utilization.  Much depends on your application request size.

 Tuning performance to maximize cat/dd /dev/md# throughput may only be
 suitable for a synthetic indication of overall performance in system
 comparisons.

Yes, you have right!
I already know that. ;-)

But the bottleneck-effect is visible with dd/cat too.  (and i am a litte bit
lazy :-)

Now i try the system with my spare drives, with the bigger chunk size
(=4096K on RAID0 and all RAID1), and the slowness is still here. :(
The problem is _exactly_ the same as previously.
I think unneccessary to try smaller chunk size, because the 32k is allready
small for 2,5,8MB readahead.

The problem is somewhere else... :-/

I have got one (or more) question for the raid list!

The raid (md) device why dont have scheduler in sysfs?
And if it have scheduler, where can i tune it?
The raid0 can handle multiple requests at one time?

For me, the performance bottleneck is cleanly about RAID0 layer used exactly
as concentrator to join the 4x2TB to 1x8TB.
But it is only a software, and i cant beleave it is unfixable, or tunable.
;-)

Cheers,
Janos


 If your aim is to increase system utilization, then look for a good
benchmark
 specific to your application requirements which would mimic a realistic
 load.

 --
 Al

 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question BUGREPORT!

2005-12-08 Thread JaniD++
Hello, Neil,

[EMAIL PROTECTED] mdadm-2.2]# mdadm --grow /dev/md0 --bitmap=internal
mdadm: Warning - bitmaps created on this kernel are not portable
  between different architectured.  Consider upgrading the Linux kernel.

Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date (0 
81015178) -- forcing full recovery
Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date, doing full
recovery
Dec  8 23:59:46 st-0001 kernel: md0: bitmap initialized from disk: read
12/12 pages, set 381560 bits, status: 0
Dec  8 23:59:46 st-0001 kernel: created bitmap (187 pages) for device md0

And the system is crashed.
no ping reply, no netconsole error logging, no panic and reboot.

Thanks,
Janos


- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Tuesday, December 06, 2005 2:05 AM
Subject: Re: RAID5 resync question


 On Tuesday December 6, [EMAIL PROTECTED] wrote:
 
  - Original Message - 
  From: Neil Brown [EMAIL PROTECTED]
  To: JaniD++ [EMAIL PROTECTED]
  Cc: linux-raid@vger.kernel.org
  Sent: Tuesday, December 06, 2005 1:32 AM
  Subject: Re: RAID5 resync question
 
 
   On Tuesday December 6, [EMAIL PROTECTED] wrote:
Hello, list,
   
   
Is there a way to force the raid to skip this type of resync?
  
   Why would you want to?
   The array is 'unclean', presumably due to a system crash.  The parity
   isn't certain to be correct so your data isn't safe against a device
   failure.  You *want* this resync.
 
  Thanks for the warning.
  Yes, you have right, the system is crashed.
 
  I know, it is some chance to leave some incorrect parity information on
the
  array, but may be corrected by next write.

 Or it may not be corrected by the next write.  The parity-update
 algorithm assumes that the parity is correct.


  On my system is very little dirty data, thanks to vm configuration and
  *very* often flushes.
  The risk is low, but the time what takes the resync is bigger problem.
:-(
 
  If i can, i want to break this resync.
  And same on the fresh NEW raid5 array
 
  (One possible way:
  in this time rebuild the array with --force-skip-resync option or
  something similar...)

 If you have mdadm 2.2. then you can recreate the array with
 '--assume-clean', and all your data should still be intact.  But if
 you get corruption one day, don't complain about it - it's your
 choice.

 
  
   If you are using 2.6.14 to later you can try turning on the
   write-intent bitmap (mdadm --grow /dev/md0 --bitmap=internal).
   That may impact write performance a bit (reports on how much would be
   appreciated) but will make this resync-after-crash much faster.
 
  Hmm.
  What does this exactly?

 Divides the array into approximately 200,000 sections (all a power of
 2 in size) and keeps track (in a bitmap) of which sections might have
 inconsistent parity.  if you crash, it only syncs sections recorded in
 the bitmap.

  Changes the existing array's structure?

 In a forwards/backwards compatible way (makes use of some otherwise
 un-used space).

  Need to resync? :-D

 You really should let your array sync this time.  Once it is synced,
 add the bitmap.  Then next time you have a crash, the cost will be
 much smaller.

  Safe with existing data?

 Yes.

 
  What do you think about full external log?

 Too much overhead without specialised hardware.

  To use some checkpoints in ext file or device to resync an array?
  And the better handling of half-synced array?

 I don't know what these mean.

 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question BUGREPORT!

2005-12-08 Thread JaniD++
Hi,

After i get this on one of my disk node, imediately send this letter, and go
to the hosting company, to see, is any message on the screen.
But unfortunately nothing what i found.
simple freeze.
no message, no ping, no num lock!

The full message of  the node next reboot is here:
http://download.netcenter.hu/bughunt/20051209/boot.log

Next step, i try to restart the whole system. (the concentrator is hangs
too, caused by lost the st-0001 node)
The part of the next reboot message of the concentrator is here:
http://download.netcenter.hu/bughunt/20051209/dy-boot.log

Next step, i stops everything, to awoid more data lost.
Try to remove the possible bitmap from the md0 of  node-1 (st-0001).

The messages is there:
http://download.netcenter.hu/bughunt/20051209/mdadm.log

At this time i cannot remove the broken bitmap, only deactivating the use of
it.
But on next reboot, the node will try to use it again. :(

I have try to change the array to use an external bitmap, but the mdadm
failed to create it too.
The external bitmap file is here: (6 MB!)
http://download.netcenter.hu/bughunt/20051209/md0.bitmap

The error message is the same of internal bitmap creation.

I dont know exactly, what caused the fs-damage, but here is my possible
list: (sorted)
1. the mdadm  (wrong bitmap size)
2. the kernel (wrong resync on startup)
3. the half written data, caused by first crash.

One question:
On a working array doing the bitmap creation is safe and race-free?
(I mean race between the bitmap-create and bitmap update.)

My data lost finally, really minimal. :-)

Cheers,
Janos


- Original Message - 
From: Neil Brown [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Friday, December 09, 2005 12:43 AM
Subject: Re: RAID5 resync question BUGREPORT!


 On Friday December 9, [EMAIL PROTECTED] wrote:
  Hello, Neil,
 
  [EMAIL PROTECTED] mdadm-2.2]# mdadm --grow /dev/md0 --bitmap=internal
  mdadm: Warning - bitmaps created on this kernel are not portable
between different architectured.  Consider upgrading the Linux kernel.
 
  Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date (0 
  81015178) -- forcing full recovery
  Dec  8 23:59:45 st-0001 kernel: md0: bitmap file is out of date, doing
full
  recovery
  Dec  8 23:59:46 st-0001 kernel: md0: bitmap initialized from disk: read
  12/12 pages, set 381560 bits, status: 0
  Dec  8 23:59:46 st-0001 kernel: created bitmap (187 pages) for device
md0
 
  And the system is crashed.
  no ping reply, no netconsole error logging, no panic and reboot.

 Hmmm, that's unfortunate :-(

 Exactly what kernel were you running?

 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question

2005-12-06 Thread JaniD++

  I know, it is some chance to leave some incorrect parity information on
the
  array, but may be corrected by next write.

 Or it may not be corrected by the next write.  The parity-update
 algorithm assumes that the parity is correct.

Hmm.
If it works with parity-update algorithm, instead of parity rewrite
algorithm, you have right.
But it works block-based, and if the entire block is written, the parity is
turned to de correct, or not? :-)
What is the block size?
It isequal to chunk-size?
Thanks the warning again!

  (One possible way:
  in this time rebuild the array with --force-skip-resync option or
  something similar...)

 If you have mdadm 2.2. then you can recreate the array with
 '--assume-clean', and all your data should still be intact.  But if
 you get corruption one day, don't complain about it - it's your
 choice.

Ahh, thats what i want. :-)
(But reading this letter looks like unneccessary in this case...)


  What does this exactly?

 Divides the array into approximately 200,000 sections (all a power of
 2 in size) and keeps track (in a bitmap) of which sections might have
 inconsistent parity.  if you crash, it only syncs sections recorded in
 the bitmap.

  Changes the existing array's structure?

 In a forwards/backwards compatible way (makes use of some otherwise
 un-used space).

What unused space?
In the raid superblock?
The end of the drives or the end of the array?
It leaves the raid structure unchanged except the superblocks?


  Need to resync? :-D

 You really should let your array sync this time.  Once it is synced,
 add the bitmap.  Then next time you have a crash, the cost will be
 much smaller.

This looks like really good idea!
With this bitmap, the force skip resync is really unnecessary


  To use some checkpoints in ext file or device to resync an array?
  And the better handling of half-synced array?

 I don't know what these mean.

(a little background:
I have write a little stat program, using /sys/block/#/stat -files, to find
performance bottlenecks.
In the stat files i can see, if the device is reads or writes, and the
needed times for these.)

One time while my array is really rebuild one disk (paralel normal
workload), i see, the new drive in the array *only* writes.
i means with better handling of half-synced array is this:
If read request comes to the ?% synced array, and if the read is on the
synced half, only need to read from *new* device, instead reading all other
to calculate data from parity.

On a working system this can be a little speed up the rebuild process, and
some offload the system.
Or i'm on a wrong clue? :-)

Cheers,
Janos


 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID5 resync question

2005-12-06 Thread JaniD++
 
  One time while my array is really rebuild one disk (paralel normal
  workload), i see, the new drive in the array *only* writes.
  i means with better handling of half-synced array is this:
  If read request comes to the ?% synced array, and if the read is on the
  synced half, only need to read from *new* device, instead reading all
other
  to calculate data from parity.
 
  On a working system this can be a little speed up the rebuild process,
and
  some offload the system.
  Or i'm on a wrong clue? :-)

 Yes, it would probably be possible to get it to read from the
 recovering drive once that section had been recovered.  I'll put it on
 my todo list.

If i can add some idea to the world's greatest raid software, it is my
pleasure! :-)

But, Neil!
It is still something what i cannot understand.

(Preliminary, i never have read the raid5 code.
However i cannot programming in C or C++, only a little can read.)

I cannot cleanly understand what u sad about the parity-updating!

If the array is clean, the parity spaces (blocks) only need to write. (or
not?)
Why use the raid code read-modify-write?
I think it is unnecessary to read these blocks!
The parity block recalculate in memory is more faster than
read-modify-write.

Why the parity space is continous area? (if it is...)
I think it is only need to be block-based, from a lot of independent blocks.
This can be speed up the resync, easy to using always checkpoints, and some
more...

And if the parity data is damaged (like system crash or sg.), and it is
impossible to detect, the next write to the block will turn to correct again
the parity.

Cheers,
Janos




 NeilBrown

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID0 performance question

2005-11-30 Thread JaniD++
Hello,

 But the cat /dev/md31 /dev/null (RAID0, the sum of 4 nodes) only
 makes ~450-490 Mbit/s, and i dont know why

 Somebody have an idea? :-)
   
Try increasing the read-ahead setting on /dev/md31 using 'blockdev'.
network block devices are likely to have latency issues and would
benefit from large read-ahead.
  
   Also try larger chunk-size ~4mb.
 
  Ahh.
  This is what i can't do. :-(
  I dont know how to backup 8TB! ;-)

 Maybe you could use your mirror!?

I have one idea! :-)

I can use the spare drives in the disknodes! :-)

But i don't know exactly what to try.
increase or decrease the chunksize?
In the top layer raid (md31,raid0) or in the middle layer raids (md1-4,
raid1) or both?

Can somebody help me to find the performance problem source?

Thanks,
Janos



 --
 Al

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: RAID0 performance question

2005-11-26 Thread JaniD++
Hello, Raz,

Think this is not cpu usage problem. :-)
The system is divided to 4 cpuset, and each cpuset uses only one disknode.
(CPU0-nb0, CPU1-nb1, ...)

this top is under cat /dev/md31 (raid0)

Thanks,
Janos

 17:16:01  up 14:19,  4 users,  load average: 7.74, 5.03, 4.20
305 processes: 301 sleeping, 4 running, 0 zombie, 0 stopped
CPU0 states:  33.1% user  47.0% system0.0% nice   0.0% iowait  18.0%
idle
CPU1 states:  21.0% user  52.0% system0.0% nice   6.0% iowait  19.0%
idle
CPU2 states:   2.0% user  74.0% system0.0% nice   3.0% iowait  18.0%
idle
CPU3 states:  10.0% user  57.0% system0.0% nice   5.0% iowait  26.0%
idle
Mem:  4149412k av, 3961084k used,  188328k free,   0k shrd,  557032k
buff
   911068k active,2881680k inactive
Swap:   0k av,   0k used,   0k free 2779388k
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
 2410 root   0 -19  1584  10836 S   48.3  0.0  21:57   3 nbd-client
16191 root  25   0  4832  820   664 R48.3  0.0   3:04   0 grep
 2408 root   0 -19  1588  11236 S   47.3  0.0  24:05   2 nbd-client
 2406 root   0 -19  1584  10836 S   40.8  0.0  22:56   1 nbd-client
18126 root  18   0  5780 1604   508 D38.0  0.0   0:12   1 dd
 2404 root   0 -19  1588  11236 S   36.2  0.0  22:56   0 nbd-client
  294 root  15   0 00 0 SW7.4  0.0   3:22   1 kswapd0
 2284 root  16   0 13500 5376  3040 S 7.4  0.1   8:53   2 httpd
18307 root  16   0  6320 2232  1432 S 4.6  0.0   0:00   2 sendmail
16789 root  16   0  5472 1552   952 R 3.7  0.0   0:03   3 top
 2431 root  10  -5 00 0 SW   2.7  0.0   7:32   2 md2_raid1
29076 root  17   0  4776  772   680 S 2.7  0.0   1:09   3 xfs_fsr
 6955 root  15   0  1588  10836 S 2.7  0.0   0:56   2 nbd-client

- Original Message - 
From: Raz Ben-Jehuda(caro) [EMAIL PROTECTED]
To: JaniD++ [EMAIL PROTECTED]
Cc: linux-raid@vger.kernel.org
Sent: Saturday, November 26, 2005 4:56 PM
Subject: Re: RAID0 performance question


 look at the cpu consumption.

 On 11/26/05, JaniD++ [EMAIL PROTECTED] wrote:
  Hello list,
 
  I have searching the bottleneck of my system, and found something what i
  cant cleanly understand.
 
  I have use NBD with 4 disk nodes. (raidtab is the bottom of mail)
 
  The cat /dev/nb# /dev/nullmakes ~ 350 Mbit/s on each nodes.
  The cat /dev/nb0 + nb1 + nb2 + nb3 in one time parallel makes ~ 780-800
  Mbit/s. - i think this is my network bottleneck.
 
  But the cat /dev/md31 /dev/null (RAID0, the sum of 4 nodes) only makes
  ~450-490 Mbit/s, and i dont know why
 
  Somebody have an idea? :-)
 
  (the nb31,30,29,28 only possible mirrors)
 
  Thanks
  Janos
 
  raiddev /dev/md1
  raid-level  1
  nr-raid-disks   2
  chunk-size  32
  persistent-superblock 1
  device  /dev/nb0
  raid-disk   0
  device  /dev/nb31
  raid-disk   1
  failed-disk /dev/nb31
 
  raiddev /dev/md2
  raid-level  1
  nr-raid-disks   2
  chunk-size  32
  persistent-superblock 1
  device  /dev/nb1
  raid-disk   0
  device  /dev/hb30
  raid-disk   1
  failed-disk /dev/nb30
 
  raiddev /dev/md3
  raid-level  1
  nr-raid-disks   2
  chunk-size  32
  persistent-superblock 1
  device  /dev/nb2
  raid-disk   0
  device  /dev/nb29
  raid-disk   1
  failed-disk /dev/nb29
 
  raiddev /dev/md4
  raid-level  1
  nr-raid-disks   2
  chunk-size  32
  persistent-superblock 1
  device  /dev/nb3
  raid-disk   0
  device  /dev/nb28
  raid-disk   1
  failed-disk /dev/nb28
 
  raiddev /dev/md31
  raid-level  0
  nr-raid-disks   4
  chunk-size  32
  persistent-superblock 1
  device  /dev/md1
  raid-disk   0
  device  /dev/md2
  raid-disk   1
  device  /dev/md3
  raid-disk   2
  device  /dev/md4
  raid-disk   3
 
 
  -
  To unsubscribe from this list: send the line unsubscribe linux-raid in
  the body of a message to [EMAIL PROTECTED]
  More majordomo info at  http://vger.kernel.org/majordomo-info.html
 


 --
 Raz

-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html