Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Tejun Heo
Hello,

Mikael Pettersson wrote:
> On Sat, 16 Jun 2007 15:52:33 +0400, Brad Campbell wrote:
>> I've got a box here based on current Debian Stable.
>> It's got 15 Maxtor SATA drives in it on 4 Promise TX4 controllers.
>>
>> Using kernel 2.6.21.x it shuts down, but of course with a huge "clack" as 15 
>> drives all do emergency 
>> head parks simultaneously. I thought I'd upgrade to 2.6.22-rc to get around 
>> this but the machine 
>> just hangs up hard apparently trying to sync cache on a drive.
>>
>> I've run this process manually, so I know it is being performed properly.
>>
>> Prior to shutdown, all nfsd processes are stopped, filesystems unmounted and 
>> md arrays stopped.
>> /proc/mdstat shows
>> [EMAIL PROTECTED]:~# cat /proc/mdstat
>> Personalities : [raid6] [raid5] [raid4]
>> unused devices: 
>> [EMAIL PROTECTED]:~#
>>
>> Here is the final hangup.
>>
>> http://www.fnarfbargle.com/CIMG1029.JPG
> 
> Something sent a command to the disk on ata15 after the PHY had been
> offlined and the interface had been put in SLUMBER state (SStatus 614).
> Consequently the command timed out. Libata tried a soft reset, and then
> a hard reset, after which the machine hung.

Hmm... weird.  Maybe device initiated power saving (DIPS) is active?

> I don't think sata_promise is the guilty party here. Looks like some
> layer above sata_promise got confused about the state of the interface.

But locking up hard after hardreset is a problem of sata_promise, no?

Thanks.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-18 Thread David Greaves

David Greaves wrote:

David Robinson wrote:

David Greaves wrote:

This isn't a regression.

I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited 
to try it).
I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - 
no.


Note this is a different (desktop) machine to that involved my recent 
bugs.


The machine will work for days (continually powered up) without a 
problem and then exhibits a filesystem failure within minutes of a 
resume.





OK, that gave me an idea.

Freeze the filesystem
md5sum the lvm
hibernate
resume
md5sum the lvm



So the lvm and below looks OK...

I'll see how it behaves now the filesystem has been frozen/thawed over 
the hibernate...



And it appears to behave well. (A few hours compile/clean cycling kernel builds 
on that filesystem were OK).



Historically I've done:
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume

and had filesystem corruption (only on this machine, my other hibernating xfs 
machines don't have this problem)


So doing:
xfs_freeze -f /scratch
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume
xfs_freeze -u /scratch

Works (for now - more usage testing tonight)

David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XFS Tunables for High Speed Linux SW RAID5 Systems?

2007-06-18 Thread Justin Piszcz

Dave,

Questions inline and below.

On Mon, 18 Jun 2007, David Chinner wrote:


On Fri, Jun 15, 2007 at 04:36:07PM -0400, Justin Piszcz wrote:

Hi,

I was wondering if the XFS folks can recommend any optimizations for high
speed disk arrays using RAID5?


[sysctls snipped]

None of those options will make much difference to performance.
mkfs parameters are the big ticket item here



There is also vm/dirty tunable in /proc.


That changes benchmark times by starting writeback earlier, but
doesn't affect actual writeback speed.


I was wondering what are some things to tune for speed?  I've already
tuned the MD layer but is there anything with XFS I can also tune?

echo "Setting read-ahead to 64MB for /dev/md3"
blockdev --setra 65536 /dev/md3
This proved to give the fastest performance, I have always used 4GB then 
recently 8GB of memory in the machine.


http://www.rhic.bnl.gov/hepix/talks/041019pm/schoen.pdf

See page 13.



Why so large? That's likely to cause readahead thrashing problems
under low memory


echo "Setting stripe_cache_size to 16MB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size

(also set max_sectors_kb) to 128K (chunk size) and disable NCQ


Why do that? You want XFS to issue large I/Os and the block layer
to split them across all the disks. i.e. you are preventing full
stripe writes from occurring by doing that.
I use a 128k stripe, what should I use for the max_sectors_kb? I read that 
128kb was optimal.


Can you please comment on all of the optimizations below?

#!/bin/bash

# source profile
. /etc/profile

echo "Optimizing RAID Arrays..."


# This step must come first.
# See: http://www.3ware.com/KB/article.aspx?id=11050

echo "Setting max_sectors_kb to chunk size of RAID5 arrays..."
for i in sdc sdd sde sdf sdg sdh sdi sdj sdk sdl
do
  echo "Setting /dev/$i to 128K..."
  echo 128 > /sys/block/"$i"/queue/max_sectors_kb
done

echo "Setting read-ahead to 64MB for /dev/md3"
blockdev --setra 65536 /dev/md3

echo "Setting stripe_cache_size to 16MB for /dev/md3"
echo 16384 > /sys/block/md3/md/stripe_cache_size

# if you use more than the default 64kb stripe with raid5
# this feature is broken so you need to limit it to 30MB/s
# neil has a patch, not sure when it will be merged.
echo "Setting minimum and maximum resync speed to 30MB/s..."
echo 3 > /sys/block/md0/md/sync_speed_min
echo 3 > /sys/block/md0/md/sync_speed_max
echo 3 > /sys/block/md1/md/sync_speed_min
echo 3 > /sys/block/md1/md/sync_speed_max
echo 3 > /sys/block/md2/md/sync_speed_min
echo 3 > /sys/block/md2/md/sync_speed_max
echo 3 > /sys/block/md3/md/sync_speed_min
echo 3 > /sys/block/md3/md/sync_speed_max

# Disable NCQ.
echo "Disabling NCQ..."
for i in sdc sdd sde sdf sdg sdh sdi sdj sdk sdl
do
  echo "Disabling NCQ on $i"
  echo 1 > /sys/block/"$i"/device/queue_depth
done




-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Mikael Pettersson
On Mon, 18 Jun 2007 16:09:49 +0900, Tejun Heo wrote:
> Mikael Pettersson wrote:
> > On Sat, 16 Jun 2007 15:52:33 +0400, Brad Campbell wrote:
> >> I've got a box here based on current Debian Stable.
> >> It's got 15 Maxtor SATA drives in it on 4 Promise TX4 controllers.
> >>
> >> Using kernel 2.6.21.x it shuts down, but of course with a huge "clack" as 
> >> 15 drives all do emergency 
> >> head parks simultaneously. I thought I'd upgrade to 2.6.22-rc to get 
> >> around this but the machine 
> >> just hangs up hard apparently trying to sync cache on a drive.
> >>
> >> I've run this process manually, so I know it is being performed properly.
> >>
> >> Prior to shutdown, all nfsd processes are stopped, filesystems unmounted 
> >> and md arrays stopped.
> >> /proc/mdstat shows
> >> [EMAIL PROTECTED]:~# cat /proc/mdstat
> >> Personalities : [raid6] [raid5] [raid4]
> >> unused devices: 
> >> [EMAIL PROTECTED]:~#
> >>
> >> Here is the final hangup.
> >>
> >> http://www.fnarfbargle.com/CIMG1029.JPG
> > 
> > Something sent a command to the disk on ata15 after the PHY had been
> > offlined and the interface had been put in SLUMBER state (SStatus 614).
> > Consequently the command timed out. Libata tried a soft reset, and then
> > a hard reset, after which the machine hung.
> 
> Hmm... weird.  Maybe device initiated power saving (DIPS) is active?
> 
> > I don't think sata_promise is the guilty party here. Looks like some
> > layer above sata_promise got confused about the state of the interface.
> 
> But locking up hard after hardreset is a problem of sata_promise, no?

Maybe, maybe not. The original report doesn't specify where/how
the machine hung.

Brad: can you enable sysrq and check if the kernel responds to
sysrq when it appears to hang, and if so, where it's executing?

sata_promise just passes sata_std_hardreset to ata_do_eh.
I've certainly seen EH hardresets work before, so I'm assuming
that something in this particular situation (PHY offlined,
kernel close to shutting down) breaks things.

FWIW, I'm seeing scsi layer accesses (cache flushes) after things
like rmmod sata_promise. They error out and don't seem to cause
any harm, but the fact that they occur at all makes me nervous.

/Mikael
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Tejun Heo
Mikael Pettersson wrote:
> FWIW, I'm seeing scsi layer accesses (cache flushes) after things
> like rmmod sata_promise. They error out and don't seem to cause
> any harm, but the fact that they occur at all makes me nervous.

That's okay.  On rmmod, as the low level device (ATA) goes away first
just as in hot unplug, sd gets notified *after* the device is gone but
sd still tries to clean up and issues the commands which are properly
rejected by the SCSI midlayer as the device is marked offline already,
so nothing to worry about there.

-- 
tejun
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]

2007-06-18 Thread Brad Campbell

Mikael Pettersson wrote:


I don't think sata_promise is the guilty party here. Looks like some
layer above sata_promise got confused about the state of the interface.

But locking up hard after hardreset is a problem of sata_promise, no?


Maybe, maybe not. The original report doesn't specify where/how
the machine hung.


It hangs in the process of trying to power it off. Unmount everything and halt 
the machine.

I've tried halt with and without the -h.

With the -h you can hear the drives spin down, then it tries to spin them up 
again and hangs.

Without the -h it just hangs hard where you see in the photo.


Brad: can you enable sysrq and check if the kernel responds to
sysrq when it appears to hang, and if so, where it's executing?


All my kernels have sysrq enabled. Once the hard reset is displayed on the 
screen everything locks.


sata_promise just passes sata_std_hardreset to ata_do_eh.
I've certainly seen EH hardresets work before, so I'm assuming
that something in this particular situation (PHY offlined,
kernel close to shutting down) breaks things.


That is my thought. I thought on a .22-rc kernel if I used halt -h and it spun the disks down that 
the kernel would detect that and not try to flush the caches on them, or have I read something 
incorrectly?



FWIW, I'm seeing scsi layer accesses (cache flushes) after things
like rmmod sata_promise. They error out and don't seem to cause
any harm, but the fact that they occur at all makes me nervous.


Brad
--
"Human beings, who are almost unique in having the ability
to learn from the experience of others, are also remarkable
for their apparent disinclination to do so." -- Douglas Adams
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: XFS Tunables for High Speed Linux SW RAID5 Systems?

2007-06-18 Thread David Greaves

David Chinner wrote:

On Fri, Jun 15, 2007 at 04:36:07PM -0400, Justin Piszcz wrote:

Hi,

I was wondering if the XFS folks can recommend any optimizations for high 
speed disk arrays using RAID5?


[sysctls snipped]

None of those options will make much difference to performance.
mkfs parameters are the big ticket item here


Is there anywhere you can point to that expands on this?

Is there anything raid specific that would be worth including in the Wiki?

David

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: Software based SATA RAID-5 expandable arrays?

2007-06-18 Thread Daniel Korstad
 
Last I check expanding drives (reshaping the RAID) in a raid set within Windows 
is not supported.
 
Significant size is relative I guess, but 4-8 terabytes will not be a problem 
in either OS.
 
I run a RAID 6 (Windows does not support this either last I checked).  I 
started out with 5 drives and have reshaped it to ten drives now.  I have a few 
250G (old original drives) and many 500G drives (added and replacement drives) 
in the set.  Once all the old 250G die off and I replace them with 500G drives 
I will grow the RAID to the size of its new smallest disk, 500G.  Grow and 
Reshape are slightly different, both supported in Linux mdadm.  I have tested 
both with succcess.
 
I too use my set for media and it is not in use 90% of the time.
 
I use put this line in my /etc/rc.local to put the drives to sleep after a 
specified min of inactivity;
hdparm -S 241 /dev/sd*
The values for the -S switch are not intuitive, read the man page.  The value I 
use (241) put them to standby (spindown) after 30min.  My OS is on EIDE and my 
RAID set is all SATA, hence the splat for all SATA drives. 
 
I have been running this for a year now with my RAID set.  It works great and I 
have had no problems with mdadm waiting on drives to spinup when I access them.
 
The one caveat, be prepared to wait a few moments if the are all in spindown 
state before you can access your data.  For me with ten drives, it is always 
less than a minute, usually 30sec or so.
 
For a filesystem, I use XFS for my large media files.
 
Dan.




- Inline Message Follows -
To: linux-raid@vger.kernel.org
From: greenjelly
Subject: Software based SATA RAID-5 expandable arrays?


I am researching my option to build a Media NAS server.  Sorry for the long
message, but I wanted to provide as much details as possible to my problem,
for the best solution.  I have Bolded sections as to save people who don't
have the time to read all of this.

Option 1: Expand My current Dream Machine!
I could buying a RAID-5 Hardware card for my current system (vista ultimate
64 with a extreme 6800 and 1066mb 2 gig RAM).  The Adaptec RAID controller
(model "3805", you can search NewEgg for the infomation) will cost me near
$500 (consume 23w) and support 8 drives (I have 6).  This controller
contains a 800mhz processor with a large cache of memory.  It will support
expandable RAID-5 array!  I would also buy a 750w+ PSU (for the additional
safety and security).  The drives in this machine would be placed in shock
absorbing (noise reduction) 3 slot 4 drive bay containers with fans ( I have
2 of these) and I will be removing a IDE based Pioneer DVD Burner (1 of 3)
because of its flaky performance given the p965 intel chip set lack of
native IDE support and thus the Motherboards Micron SATA to IDE device.  Ive
already installed 4 drives in this machine (on the native MB SATA
controller) only to find a fan fail on me within days of the installation. 
One of the drives went bad (may or may not have to do with the heat).  There
are 5mm between these drives, and I would now replace both fans with higher
RPM ball baring fans for added reliability (more noise).  I would also need
to find a Freeware SMART monitor software which at this time I can not find
for Vista, to warn me of increased temps due to failure of fan, increased
environmental heat, etc.  The only option is commercial SMART monitoring
software (which may not work with the Adaptec RAID adapter.

Option 2: Build a server.
I have a copy of Windows 2003 server, which I have yet to find out if it
supports native software expandable RAID-5 arrays.  I can also use Linux
(which I have very little experience with) but have always wanted to use and
learn. 

To do either of the last two options, I would still need to buy a new power
supply for my current VISTA machine (for added reliability).  The current
PSU is 550w and with a power hungry RADEON, 3 DVD Drives and a X-Fi sound
card... My nerves are getting frayed. 

I would buy a cheap motherboard, processor and 1gig or less of RAM.  Lastly
I would want a VERY large Case.  I have a 7300 NVidia PCI card that was
replaced with a X1950GT on my Home Theater PC so that I may play back
HD/Blue Ray DVD's.

The server option may cost a bit more then the $500 for the Adaptec Raid
controller.  This will only work if Linux or Windows 2003 supports my much
needed requirements.  My Linux OS will be installed on a 40mb IDE Drive (not
part of the Array).  

The options I seek are to be able to start with a 6 Drive array RAID-5
array, then as my demand for more space increases in the future I want to be
able to plug in more drives and incorporate them into the Array without the
need to backup the data.  Basically I need the software to add the
drive/drives to the Array, then Rebuild the array incorporating the new
drives while preserving the data on the original array.

QUESTIONS
Since this is a media server, and would only be used to serve Movies and
Video to my two machines It woul

Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-18 Thread David Chinner
On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:
> David Greaves wrote:
> >OK, that gave me an idea.
> >
> >Freeze the filesystem
> >md5sum the lvm
> >hibernate
> >resume
> >md5sum the lvm
> 
> >So the lvm and below looks OK...
> >
> >I'll see how it behaves now the filesystem has been frozen/thawed over 
> >the hibernate...
> 
> 
> And it appears to behave well. (A few hours compile/clean cycling kernel 
> builds on that filesystem were OK).
> 
> 
> Historically I've done:
> sync
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # resume
> 
> and had filesystem corruption (only on this machine, my other hibernating 
> xfs machines don't have this problem)
> 
> So doing:
> xfs_freeze -f /scratch
> sync
> echo platform > /sys/power/disk
> echo disk > /sys/power/state
> # resume
> xfs_freeze -u /scratch
>
> Works (for now - more usage testing tonight)

Verrry interesting.

What you were seeing was an XFS shutdown occurring because the free space
btree was corrupted. IOWs, the process of suspend/resume has resulted
in either bad data being written to disk, the correct data not being
written to disk or the cached block being corrupted in memory.

If you run xfs_check on the filesystem after it has shut down after a resume,
can you tell us if it reports on-disk corruption? Note: do not run xfs_repair
to check this - it does not check the free space btrees; instead it simply
rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair
to fix it up.

FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
filesystem for a suspend/resume to work safely and have argued that the only
safe thing to do is freeze the filesystem before suspend and thaw it after
resume. This is why I originally asked you to test that with the other problem
that you reported. Up until this point in time, there's been no evidence to
prove either side of the argument..

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


resync to last 27h - usually 3. what's this?

2007-06-18 Thread Dexter Filmore
Bootet today, got this in dmesg:

[   44.884915] md: bind
[   44.885150] md: bind
[   44.885352] md: bind
[   44.885552] md: bind
[   44.885601] md: kicking non-fresh sdd1 from array!
[   44.885637] md: unbind
[   44.885671] md: export_rdev(sdd1)
[   44.900824] raid5: device sdc1 operational as raid disk 1
[   44.900860] raid5: device sdb1 operational as raid disk 3
[   44.900895] raid5: device sda1 operational as raid disk 2
[   44.901207] raid5: allocated 4203kB for md0
[   44.901241] raid5: raid level 5 set md0 active with 3 out of 4 devices, 
algorithm 2
[   44.901284] RAID5 conf printout:
[   44.901317]  --- rd:4 wd:3
[   44.901349]  disk 1, o:1, dev:sdc1
[   44.901381]  disk 2, o:1, dev:sda1
[   44.901414]  disk 3, o:1, dev:sdb1

Checked the disk, seemed fine (not the first time linux kicked a disk for no 
apparent reason), readded it with mdadm which triggered a resync.
Now having a look at it I get:

$ cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdd[4] sdc1[1] sdb1[3] sda1[2]
  732563712 blocks level 5, 32k chunk, algorithm 2 [4/3] [_UUU]
  [=>...]  recovery =  8.1% (19867520/244187904) 
finish=1661.6min speed=2248K/sec

1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually takes 3 
hours to resync or check, for that matter.

So, what's this? 


-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: resync to last 27h - usually 3. what's this?

2007-06-18 Thread David Greaves

Dexter Filmore wrote:
1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually takes 3 
hours to resync or check, for that matter.


So, what's this? 

kernel, mdadm verisons?

I seem to recall a long fixed ETA calculation bug some time back...

David
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


suggestions for inexpensive linux jbod?

2007-06-18 Thread Mike
I'm creating a larger backup server that uses bacula (this
software works well). The way I'm going about this I need
lots of space in the filesystem where temporary files are
stored. I have been looking at the Norco (link at the bottom),
but there seem to be some grumblings that the adapter card
does not play well with linux.

Has anyone used this device or have another suggestion? I'm
looking at something that will present lots of disk to the
linux box (fedore core 5, kernel 2.6.20) that I will put
under md/RAID and LVM. I want to have between 1.5TB and 3.0TB
of usable space after all the RAID'ing.

Mike

http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: resync to last 27h - usually 3. what's this?

2007-06-18 Thread Dexter Filmore
On Monday 18 June 2007 17:22:06 David Greaves wrote:
> Dexter Filmore wrote:
> > 1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually
> > takes 3 hours to resync or check, for that matter.
> >
> > So, what's this?
>
> kernel, mdadm verisons?
>
> I seem to recall a long fixed ETA calculation bug some time back...
>

2.6.21.1, mdadm 2.5.3.
First time I sync since upgrade from 2.6.17.

Definetly no calc bug, only 4% progress in one hour.


-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suggestions for inexpensive linux jbod?

2007-06-18 Thread Justin Piszcz



On Mon, 18 Jun 2007, Mike wrote:


I'm creating a larger backup server that uses bacula (this
software works well). The way I'm going about this I need
lots of space in the filesystem where temporary files are
stored. I have been looking at the Norco (link at the bottom),
but there seem to be some grumblings that the adapter card
does not play well with linux.

Has anyone used this device or have another suggestion? I'm
looking at something that will present lots of disk to the
linux box (fedore core 5, kernel 2.6.20) that I will put
under md/RAID and LVM. I want to have between 1.5TB and 3.0TB
of usable space after all the RAID'ing.

Mike

http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR 
use PCI-e cards that use the Silicon Image chipset.


04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: resync to last 27h - usually 3. what's this?

2007-06-18 Thread Justin Piszcz



On Mon, 18 Jun 2007, Dexter Filmore wrote:


On Monday 18 June 2007 17:22:06 David Greaves wrote:

Dexter Filmore wrote:

1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually
takes 3 hours to resync or check, for that matter.

So, what's this?


kernel, mdadm verisons?

I seem to recall a long fixed ETA calculation bug some time back...



2.6.21.1, mdadm 2.5.3.
First time I sync since upgrade from 2.6.17.

Definetly no calc bug, only 4% progress in one hour.


--
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html



Hi.

What is your stripe size?

What if you set this higher? By default it will use 1MB/s or so if you do 
not FORCE it to use more than the idle I/O on the box.


echo "Setting minimum and maximum resync speed to 30MB/s..."
echo 3 > /sys/block/md0/md/sync_speed_min
echo 3 > /sys/block/md0/md/sync_speed_max
echo 3 > /sys/block/md1/md/sync_speed_min
echo 3 > /sys/block/md1/md/sync_speed_max
echo 3 > /sys/block/md2/md/sync_speed_min
echo 3 > /sys/block/md2/md/sync_speed_max
echo 3 > /sys/block/md3/md/sync_speed_min
echo 3 > /sys/block/md3/md/sync_speed_max


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Brendan Conoboy

[EMAIL PROTECTED] wrote:
in my case it takes 2+ days to resync the array before I can do any 
performance testing with it. for some reason it's only doing the rebuild 
at ~5M/sec (even though I've increased the min and max rebuild speeds 
and a dd to the array seems to be ~44M/sec, even during the rebuild)


With performance like that, it sounds like you're saturating a bus 
somewhere along the line.  If you're using scsi, for instance, it's very 
easy for a long chain of drives to overwhelm a channel.  You might also 
want to consider some other RAID layouts like 1+0 or 5+0 depending upon 
your space vs. reliability needs.


--
Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


stop sync?

2007-06-18 Thread Dexter Filmore
How do I stop a running sync? I just figured that I --add-ed an entire disk 
instead of the partition.
What do I do anyway? remove the disk, set it as faulty?

-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Brendan Conoboy wrote:


[EMAIL PROTECTED] wrote:

 in my case it takes 2+ days to resync the array before I can do any
 performance testing with it. for some reason it's only doing the rebuild
 at ~5M/sec (even though I've increased the min and max rebuild speeds and
 a dd to the array seems to be ~44M/sec, even during the rebuild)


With performance like that, it sounds like you're saturating a bus somewhere 
along the line.  If you're using scsi, for instance, it's very easy for a 
long chain of drives to overwhelm a channel.  You might also want to consider 
some other RAID layouts like 1+0 or 5+0 depending upon your space vs. 
reliability needs.


I plan to test the different configurations.

however, if I was saturating the bus with the reconstruct how can I fire 
off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the 
reconstruct to ~4M/sec?


I'm putting 10x as much data through the bus at that point, it would seem 
to proove that it's not the bus that's saturated.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Brendan Conoboy

[EMAIL PROTECTED] wrote:

I plan to test the different configurations.

however, if I was saturating the bus with the reconstruct how can I fire 
off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing 
the reconstruct to ~4M/sec?


I'm putting 10x as much data through the bus at that point, it would 
seem to proove that it's not the bus that's saturated.


I am unconvinced.  If you take ~1MB/s for each active drive, add in SCSI 
overhead, 45M/sec seems reasonable.  Have you look at a running iostat 
while all this is going on?  Try it out- add up the kb/s from each drive 
and see how close you are to your maximum theoretical IO.


Also, how's your CPU utilization?

--
Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Lennart Sorensen wrote:


On Mon, Jun 18, 2007 at 10:28:38AM -0700, [EMAIL PROTECTED] wrote:

I plan to test the different configurations.

however, if I was saturating the bus with the reconstruct how can I fire
off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the
reconstruct to ~4M/sec?

I'm putting 10x as much data through the bus at that point, it would seem
to proove that it's not the bus that's saturated.


dd 45MB/s from the raid sounds reasonable.

If you have 45 drives, doing a resync of raid5 or radi6 should probably
involve reading all the disks, and writing new parity data to one drive.
So if you are writing 5MB/s, then you are reading 44*5MB/s from the
other drives, which is 220MB/s.  If your resync drops to 4MB/s when
doing dd, then you have 44*4MB/s which is 176MB/s or 44MB/s less read
capacity, which surprisingly seems to match the dd speed you are
getting.  Seems like you are indeed very much saturating a bus
somewhere.  The numbers certainly agree with that theory.

What kind of setup is the drives connected to?


simple ultra-wide SCSI to a single controller.

I didn't realize that the rate reported by /proc/mdstat was the write 
speed that was takeing place, I thought it was the total data rate (reads 
+ writes). the next time this message gets changed it would be a good 
thing to clarify this.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Brendan Conoboy wrote:


[EMAIL PROTECTED] wrote:

 I plan to test the different configurations.

 however, if I was saturating the bus with the reconstruct how can I fire
 off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the
 reconstruct to ~4M/sec?

 I'm putting 10x as much data through the bus at that point, it would seem
 to proove that it's not the bus that's saturated.


I am unconvinced.  If you take ~1MB/s for each active drive, add in SCSI 
overhead, 45M/sec seems reasonable.  Have you look at a running iostat while 
all this is going on?  Try it out- add up the kb/s from each drive and see 
how close you are to your maximum theoretical IO.


I didn't try iostat, I did look at vmstat, and there the numbers look even 
worse, the bo column is ~500 for the resync by itself, but with the DD 
it's ~50,000. when I get access to the box again I'll try iostat to get 
more details



Also, how's your CPU utilization?


~30% of one cpu for the raid 6 thread, ~5% of one cpu for the resync 
thread


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Lennart Sorensen
On Mon, Jun 18, 2007 at 11:12:45AM -0700, [EMAIL PROTECTED] wrote:
> simple ultra-wide SCSI to a single controller.

Hmm, isn't ultra-wide limited to 40MB/s?  Is it Ultra320 wide?  That
could do a lot more, and 220MB/s sounds plausable for 320 scsi.

> I didn't realize that the rate reported by /proc/mdstat was the write 
> speed that was takeing place, I thought it was the total data rate (reads 
> + writes). the next time this message gets changed it would be a good 
> thing to clarify this.

Well I suppose itcould make sense to show rate of rebuild which you can
then compare against the total size of tha raid, or you can have rate of
write, which you then compare against the size of the drive being
synced.  Certainly I would expect much higer speeds if it was the
overall raid size, while the numbers seem pretty reasonable as a write
speed.  4MB/s would take for ever if it was the overall raid resync
speed.  I usually see SATA raid1 resync at 50 to 60MB/s or so, which
matches the read and write speeds of the drives in the raid.

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Lennart Sorensen
On Mon, Jun 18, 2007 at 10:28:38AM -0700, [EMAIL PROTECTED] wrote:
> I plan to test the different configurations.
> 
> however, if I was saturating the bus with the reconstruct how can I fire 
> off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the 
> reconstruct to ~4M/sec?
> 
> I'm putting 10x as much data through the bus at that point, it would seem 
> to proove that it's not the bus that's saturated.

dd 45MB/s from the raid sounds reasonable.

If you have 45 drives, doing a resync of raid5 or radi6 should probably
involve reading all the disks, and writing new parity data to one drive.
So if you are writing 5MB/s, then you are reading 44*5MB/s from the
other drives, which is 220MB/s.  If your resync drops to 4MB/s when
doing dd, then you have 44*4MB/s which is 176MB/s or 44MB/s less read
capacity, which surprisingly seems to match the dd speed you are
getting.  Seems like you are indeed very much saturating a bus
somewhere.  The numbers certainly agree with that theory.

What kind of setup is the drives connected to?

--
Len Sorensen
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Lennart Sorensen wrote:


On Mon, Jun 18, 2007 at 11:12:45AM -0700, [EMAIL PROTECTED] wrote:

simple ultra-wide SCSI to a single controller.


Hmm, isn't ultra-wide limited to 40MB/s?  Is it Ultra320 wide?  That
could do a lot more, and 220MB/s sounds plausable for 320 scsi.


yes, sorry, ultra 320 wide.


I didn't realize that the rate reported by /proc/mdstat was the write
speed that was takeing place, I thought it was the total data rate (reads
+ writes). the next time this message gets changed it would be a good
thing to clarify this.


Well I suppose itcould make sense to show rate of rebuild which you can
then compare against the total size of tha raid, or you can have rate of
write, which you then compare against the size of the drive being
synced.  Certainly I would expect much higer speeds if it was the
overall raid size, while the numbers seem pretty reasonable as a write
speed.  4MB/s would take for ever if it was the overall raid resync
speed.  I usually see SATA raid1 resync at 50 to 60MB/s or so, which
matches the read and write speeds of the drives in the raid.


as I read it right now what happens is the worst of the options, you show 
the total size of the array for the amount of work that needs to be done, 
but then show only the write speed for the rate pf progress being made 
through the job.


total rebuild time was estimated at ~3200 min

David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Brendan Conoboy

[EMAIL PROTECTED] wrote:

yes, sorry, ultra 320 wide.


Exactly how many channels and drives?

--
Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume

2007-06-18 Thread David Greaves

OK, just an quick ack

When I resumed tonight (having done a freeze/thaw over the suspend) some libata 
errors threw up during the resume and there was an eventual hard hang. Maybe I 
spoke to soon?


I'm going to have to do some more testing...

David Chinner wrote:

On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote:

David Greaves wrote:
So doing:
xfs_freeze -f /scratch
sync
echo platform > /sys/power/disk
echo disk > /sys/power/state
# resume
xfs_freeze -u /scratch

Works (for now - more usage testing tonight)


Verrry interesting.

Good :)



What you were seeing was an XFS shutdown occurring because the free space
btree was corrupted. IOWs, the process of suspend/resume has resulted
in either bad data being written to disk, the correct data not being
written to disk or the cached block being corrupted in memory.

That's the kind of thing I was suspecting, yes.


If you run xfs_check on the filesystem after it has shut down after a resume,
can you tell us if it reports on-disk corruption? Note: do not run xfs_repair
to check this - it does not check the free space btrees; instead it simply
rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair
to fix it up.

OK, I can try this tonight...


FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS
filesystem for a suspend/resume to work safely and have argued that the only
safe thing to do is freeze the filesystem before suspend and thaw it after
resume. This is why I originally asked you to test that with the other problem
that you reported. Up until this point in time, there's been no evidence to
prove either side of the argument..

Cheers,

Dave.


-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Brendan Conoboy wrote:


[EMAIL PROTECTED] wrote:

 yes, sorry, ultra 320 wide.


Exactly how many channels and drives?


one channel, 2 OS drives plus the 45 drives in the array.

yes I realize that there will be bottlenecks with this, the large capacity 
is to handle longer history (it's going to be a 30TB circular buffer being 
fed by a pair of OC-12 links)


it appears that my big mistake was not understanding what /proc/mdstat is 
telling me.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH git-md-accel 0/2] raid5 refactor, and pr_debug cleanup

2007-06-18 Thread Dan Williams
Neil,

The following two patches are the respin of the changes you suggested to
"raid5: coding style cleanup / refactor".  I have added them to the
git-md-accel tree for a 2.6.23-rc1 pull.  The full, rebased, raid
acceleration patchset will be sent for a another round of review once I
address Andrew's concerns about the commit messages.

Dan Williams (2):
  raid5: refactor handle_stripe5 and handle_stripe6
  raid5: replace custom debug print with standard pr_debug
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6

2007-06-18 Thread Dan Williams
handle_stripe5 and handle_stripe6 have very deep logic paths handling the
various states of a stripe_head.  By introducing the 'stripe_head_state'
and 'r6_state' objects, large portions of the logic can be moved to
sub-routines.

'struct stripe_head_state' consumes all of the automatic variables that 
previously
stood alone in handle_stripe5,6.  'struct r6_state' contains the handle_stripe6
specific variables like p_failed and q_failed.

One of the nice side effects of the 'stripe_head_state' change is that it
allows for further reductions in code duplication between raid5 and raid6.
The following new routines are shared between raid5 and raid6:

handle_completed_write_requests
handle_requests_to_failed_array
handle_stripe_expansion

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c | 1484 +---
 include/linux/raid/raid5.h |   16 
 2 files changed, 733 insertions(+), 767 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 4f51dfa..68834d2 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -1326,6 +1326,604 @@ static int stripe_to_pdidx(sector_t stripe, 
raid5_conf_t *conf, int disks)
return pd_idx;
 }
 
+static void
+handle_requests_to_failed_array(raid5_conf_t *conf, struct stripe_head *sh,
+   struct stripe_head_state *s, int disks,
+   struct bio **return_bi)
+{
+   int i;
+   for (i = disks; i--;) {
+   struct bio *bi;
+   int bitmap_end = 0;
+
+   if (test_bit(R5_ReadError, &sh->dev[i].flags)) {
+   mdk_rdev_t *rdev;
+   rcu_read_lock();
+   rdev = rcu_dereference(conf->disks[i].rdev);
+   if (rdev && test_bit(In_sync, &rdev->flags))
+   /* multiple read failures in one stripe */
+   md_error(conf->mddev, rdev);
+   rcu_read_unlock();
+   }
+   spin_lock_irq(&conf->device_lock);
+   /* fail all writes first */
+   bi = sh->dev[i].towrite;
+   sh->dev[i].towrite = NULL;
+   if (bi) {
+   s->to_write--;
+   bitmap_end = 1;
+   }
+
+   if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+   wake_up(&conf->wait_for_overlap);
+
+   while (bi && bi->bi_sector <
+   sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, &bi->bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   md_write_end(conf->mddev);
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = nextbi;
+   }
+   /* and fail all 'written' */
+   bi = sh->dev[i].written;
+   sh->dev[i].written = NULL;
+   if (bi) bitmap_end = 1;
+   while (bi && bi->bi_sector <
+  sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, &bi->bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   md_write_end(conf->mddev);
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = bi2;
+   }
+
+   /* fail any reads if this device is non-operational */
+   if (!test_bit(R5_Insync, &sh->dev[i].flags) ||
+   test_bit(R5_ReadError, &sh->dev[i].flags)) {
+   bi = sh->dev[i].toread;
+   sh->dev[i].toread = NULL;
+   if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags))
+   wake_up(&conf->wait_for_overlap);
+   if (bi) s->to_read--;
+   while (bi && bi->bi_sector <
+  sh->dev[i].sector + STRIPE_SECTORS) {
+   struct bio *nextbi =
+   r5_next_bio(bi, sh->dev[i].sector);
+   clear_bit(BIO_UPTODATE, &bi->bi_flags);
+   if (--bi->bi_phys_segments == 0) {
+   bi->bi_next = *return_bi;
+   *return_bi = bi;
+   }
+   bi = nextbi;
+   }
+   }
+   spin_unlock_irq(&conf->device_lock);
+   if (bitmap_end)
+

[PATCH git-md-accel 2/2] raid5: replace custom debug print with standard pr_debug

2007-06-18 Thread Dan Williams
Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in
favor of the global DEBUG definition.  To get local debug messages just add
'#define DEBUG' to the top of the file.

Signed-off-by: Dan Williams <[EMAIL PROTECTED]>
---

 drivers/md/raid5.c |  116 ++--
 1 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 68834d2..fa562e7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -80,7 +80,6 @@
 /*
  * The following can be used to debug the driver
  */
-#define RAID5_DEBUG0
 #define RAID5_PARANOIA 1
 #if RAID5_PARANOIA && defined(CONFIG_SMP)
 # define CHECK_DEVLOCK() assert_spin_locked(&conf->device_lock)
@@ -88,8 +87,7 @@
 # define CHECK_DEVLOCK()
 #endif
 
-#define PRINTK(x...) ((void)(RAID5_DEBUG && printk(x)))
-#if RAID5_DEBUG
+#ifdef DEBUG
 #define inline
 #define __inline__
 #endif
@@ -152,7 +150,8 @@ static void release_stripe(struct stripe_head *sh)
 
 static inline void remove_hash(struct stripe_head *sh)
 {
-   PRINTK("remove_hash(), stripe %llu\n", (unsigned long long)sh->sector);
+   pr_debug("remove_hash(), stripe %llu\n",
+   (unsigned long long)sh->sector);
 
hlist_del_init(&sh->hash);
 }
@@ -161,7 +160,8 @@ static inline void insert_hash(raid5_conf_t *conf, struct 
stripe_head *sh)
 {
struct hlist_head *hp = stripe_hash(conf, sh->sector);
 
-   PRINTK("insert_hash(), stripe %llu\n", (unsigned long long)sh->sector);
+   pr_debug("insert_hash(), stripe %llu\n",
+   (unsigned long long)sh->sector);
 
CHECK_DEVLOCK();
hlist_add_head(&sh->hash, hp);
@@ -226,7 +226,7 @@ static void init_stripe(struct stripe_head *sh, sector_t 
sector, int pd_idx, int
BUG_ON(test_bit(STRIPE_HANDLE, &sh->state));

CHECK_DEVLOCK();
-   PRINTK("init_stripe called, stripe %llu\n", 
+   pr_debug("init_stripe called, stripe %llu\n",
(unsigned long long)sh->sector);
 
remove_hash(sh);
@@ -260,11 +260,11 @@ static struct stripe_head *__find_stripe(raid5_conf_t 
*conf, sector_t sector, in
struct hlist_node *hn;
 
CHECK_DEVLOCK();
-   PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector);
+   pr_debug("__find_stripe, sector %llu\n", (unsigned long long)sector);
hlist_for_each_entry(sh, hn, stripe_hash(conf, sector), hash)
if (sh->sector == sector && sh->disks == disks)
return sh;
-   PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector);
+   pr_debug("__stripe %llu not in cache\n", (unsigned long long)sector);
return NULL;
 }
 
@@ -276,7 +276,7 @@ static struct stripe_head *get_active_stripe(raid5_conf_t 
*conf, sector_t sector
 {
struct stripe_head *sh;
 
-   PRINTK("get_stripe, sector %llu\n", (unsigned long long)sector);
+   pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector);
 
spin_lock_irq(&conf->device_lock);
 
@@ -537,8 +537,8 @@ static int raid5_end_read_request(struct bio * bi, unsigned 
int bytes_done,
if (bi == &sh->dev[i].req)
break;
 
-   PRINTK("end_read_request %llu/%d, count: %d, uptodate %d.\n", 
-   (unsigned long long)sh->sector, i, atomic_read(&sh->count), 
+   pr_debug("end_read_request %llu/%d, count: %d, uptodate %d.\n",
+   (unsigned long long)sh->sector, i, atomic_read(&sh->count),
uptodate);
if (i == disks) {
BUG();
@@ -613,7 +613,7 @@ static int raid5_end_write_request (struct bio *bi, 
unsigned int bytes_done,
if (bi == &sh->dev[i].req)
break;
 
-   PRINTK("end_write_request %llu/%d, count %d, uptodate: %d.\n", 
+   pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n",
(unsigned long long)sh->sector, i, atomic_read(&sh->count),
uptodate);
if (i == disks) {
@@ -658,7 +658,7 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev)
 {
char b[BDEVNAME_SIZE];
raid5_conf_t *conf = (raid5_conf_t *) mddev->private;
-   PRINTK("raid5: error called\n");
+   pr_debug("raid5: error called\n");
 
if (!test_bit(Faulty, &rdev->flags)) {
set_bit(MD_CHANGE_DEVS, &mddev->flags);
@@ -929,7 +929,7 @@ static void compute_block(struct stripe_head *sh, int 
dd_idx)
int i, count, disks = sh->disks;
void *ptr[MAX_XOR_BLOCKS], *dest, *p;
 
-   PRINTK("compute_block, stripe %llu, idx %d\n", 
+   pr_debug("compute_block, stripe %llu, idx %d\n",
(unsigned long long)sh->sector, dd_idx);
 
dest = page_address(sh->dev[dd_idx].page);
@@ -960,7 +960,7 @@ static void compute_parity5(struct stripe_head *sh, int 
method)
void *ptr[MAX_XOR_BLOCKS], *dest;
struct bio *chosen;
 
-   PRINTK("compute_parity5, strip

Re: Software based SATA RAID-5 expandable arrays?

2007-06-18 Thread Dexter Filmore
Why dontcha just cut all the "look how big my ePenis is" chatter and tell us 
what you wanna do?
Nobody gives a rat if your ultra1337 sound cards needs a 10 megawatt power 
supply.


-- 
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K-
w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ 
b++(+++) DI+++ D- G++ e* h>++ r* y?
--END GEEK CODE BLOCK--

http://www.stop1984.com
http://www.againsttcpa.com
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Wakko Warner
[EMAIL PROTECTED] wrote:
> On Mon, 18 Jun 2007, Brendan Conoboy wrote:
> 
> >[EMAIL PROTECTED] wrote:
> >> yes, sorry, ultra 320 wide.
> >
> >Exactly how many channels and drives?
> 
> one channel, 2 OS drives plus the 45 drives in the array.

Given that the drives only have 4 ID bits, how can you have 47 drives on 1
cable?  You'd need a minimum of 3 channels for 47 drives.  Do you have some
sort of external box that holds X number of drives and only uses a single
ID?

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread david

On Mon, 18 Jun 2007, Wakko Warner wrote:


Subject: Re: limits on raid

[EMAIL PROTECTED] wrote:

On Mon, 18 Jun 2007, Brendan Conoboy wrote:


[EMAIL PROTECTED] wrote:

yes, sorry, ultra 320 wide.


Exactly how many channels and drives?


one channel, 2 OS drives plus the 45 drives in the array.


Given that the drives only have 4 ID bits, how can you have 47 drives on 1
cable?  You'd need a minimum of 3 channels for 47 drives.  Do you have some
sort of external box that holds X number of drives and only uses a single
ID?


yes, I'm useing promise drive shelves, I have them configured to export 
the 15 drives as 15 LUNs on a single ID.


I'm going to be useing this as a huge circular buffer that will just be 
overwritten eventually 99% of the time, but once in a while I will need to 
go back into the buffer and extract and process the data.


David Lang
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: limits on raid

2007-06-18 Thread Brendan Conoboy

[EMAIL PROTECTED] wrote:
yes, I'm useing promise drive shelves, I have them configured to export 
the 15 drives as 15 LUNs on a single ID.


Well, that would account for it.  Your bus is very, very saturated.  If 
all your drives are active, you can't get more than ~7MB/s per disk 
under perfect conditions.


--
Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED]
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6

2007-06-18 Thread Dan Williams

On 6/18/07, Dan Williams <[EMAIL PROTECTED]> wrote:
...

+static void handle_stripe_expansion(raid5_conf_t *conf, struct stripe_head *sh,
+   struct r6_state *r6s)
+{
+   int i;
+
+   /* We have read all the blocks in this stripe and now we need to
+* copy some of them into a target stripe for expand.
+*/
+   clear_bit(STRIPE_EXPAND_SOURCE, &sh->state);
+   for (i = 0; i < sh->disks; i++)
+   if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) {
+   int dd_idx, pd_idx, j;
+   struct stripe_head *sh2;
+
+   sector_t bn = compute_blocknr(sh, i);
+   sector_t s = raid5_compute_sector(bn, conf->raid_disks,
+   conf->raid_disks-1, &dd_idx,
+   &pd_idx, conf);

this bug made it through the regression test:

'conf->raid_disks-1' should be 'conf->raid_disks - conf->max_degraded'

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suggestions for inexpensive linux jbod?

2007-06-18 Thread Mike
In article <[EMAIL PROTECTED]>, Justin Piszcz wrote:
> 
> 
> On Mon, 18 Jun 2007, Mike wrote:
> 
>> I'm creating a larger backup server that uses bacula (this
>> software works well). The way I'm going about this I need
>> lots of space in the filesystem where temporary files are
>> stored. I have been looking at the Norco (link at the bottom),
>> but there seem to be some grumblings that the adapter card
>> does not play well with linux.
>>
>> Has anyone used this device or have another suggestion? I'm
>> looking at something that will present lots of disk to the
>> linux box (fedore core 5, kernel 2.6.20) that I will put
>> under md/RAID and LVM. I want to have between 1.5TB and 3.0TB
>> of usable space after all the RAID'ing.
>>
>> Mike
>>
>> http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001
> 
> Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR 
> use PCI-e cards that use the Silicon Image chipset.
> 
> 04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
> II Controller (rev 01)
> 
> Justin.

Any idea if the Hitachi 1 TB drives will work in a Dell PowerEdge 800? I
have two 80GB drives in the box right now that I would like to replace
with four of the 1 TB drives.

Mike

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: suggestions for inexpensive linux jbod?

2007-06-18 Thread linux
> Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR 
> use PCI-e cards that use the Silicon Image chipset.
> 
> 04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
> II Controller (rev 01)

What he said.  After some thinking, I loaded up a machine with 3x
SiI3132 and figured I'll expand by port multiplier.  Port multipliers
are documented in the standard, not vendor-defined, and are currently
made only by, guess who, Silicon Image.  I haven't used any yet, but
apparently they're working, and you know that both the manufacturer and
the linux-ide deveopers have tested them throroughly with SiI controllers.

03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)
04:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)
05:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid 
II Controller (rev 01)

md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5]
  131837184 blocks 256K chunks 2 near-copies [6/6] [UU]
  bitmap: 2/126 pages [8KB], 512KB chunk

md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0]
  1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU]
  bitmap: 27/164 pages [108KB], 1024KB chunk

md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0]
  979840 blocks [6/6] [UU]
  bitmap: 0/120 pages [0KB], 4KB chunk

md0 is the boot partition, md4 is root, and md5 is the main backup data.
Note the way the drives ar arranged on the RAID-10; mirrored pairs are
aplit across different SATA controllers.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html