Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]
Hello, Mikael Pettersson wrote: > On Sat, 16 Jun 2007 15:52:33 +0400, Brad Campbell wrote: >> I've got a box here based on current Debian Stable. >> It's got 15 Maxtor SATA drives in it on 4 Promise TX4 controllers. >> >> Using kernel 2.6.21.x it shuts down, but of course with a huge "clack" as 15 >> drives all do emergency >> head parks simultaneously. I thought I'd upgrade to 2.6.22-rc to get around >> this but the machine >> just hangs up hard apparently trying to sync cache on a drive. >> >> I've run this process manually, so I know it is being performed properly. >> >> Prior to shutdown, all nfsd processes are stopped, filesystems unmounted and >> md arrays stopped. >> /proc/mdstat shows >> [EMAIL PROTECTED]:~# cat /proc/mdstat >> Personalities : [raid6] [raid5] [raid4] >> unused devices: >> [EMAIL PROTECTED]:~# >> >> Here is the final hangup. >> >> http://www.fnarfbargle.com/CIMG1029.JPG > > Something sent a command to the disk on ata15 after the PHY had been > offlined and the interface had been put in SLUMBER state (SStatus 614). > Consequently the command timed out. Libata tried a soft reset, and then > a hard reset, after which the machine hung. Hmm... weird. Maybe device initiated power saving (DIPS) is active? > I don't think sata_promise is the guilty party here. Looks like some > layer above sata_promise got confused about the state of the interface. But locking up hard after hardreset is a problem of sata_promise, no? Thanks. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
David Greaves wrote: David Robinson wrote: David Greaves wrote: This isn't a regression. I was seeing these problems on 2.6.21 (but 22 was in -rc so I waited to try it). I tried 2.6.22-rc4 (with Tejun's patches) to see if it had improved - no. Note this is a different (desktop) machine to that involved my recent bugs. The machine will work for days (continually powered up) without a problem and then exhibits a filesystem failure within minutes of a resume. OK, that gave me an idea. Freeze the filesystem md5sum the lvm hibernate resume md5sum the lvm So the lvm and below looks OK... I'll see how it behaves now the filesystem has been frozen/thawed over the hibernate... And it appears to behave well. (A few hours compile/clean cycling kernel builds on that filesystem were OK). Historically I've done: sync echo platform > /sys/power/disk echo disk > /sys/power/state # resume and had filesystem corruption (only on this machine, my other hibernating xfs machines don't have this problem) So doing: xfs_freeze -f /scratch sync echo platform > /sys/power/disk echo disk > /sys/power/state # resume xfs_freeze -u /scratch Works (for now - more usage testing tonight) David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS Tunables for High Speed Linux SW RAID5 Systems?
Dave, Questions inline and below. On Mon, 18 Jun 2007, David Chinner wrote: On Fri, Jun 15, 2007 at 04:36:07PM -0400, Justin Piszcz wrote: Hi, I was wondering if the XFS folks can recommend any optimizations for high speed disk arrays using RAID5? [sysctls snipped] None of those options will make much difference to performance. mkfs parameters are the big ticket item here There is also vm/dirty tunable in /proc. That changes benchmark times by starting writeback earlier, but doesn't affect actual writeback speed. I was wondering what are some things to tune for speed? I've already tuned the MD layer but is there anything with XFS I can also tune? echo "Setting read-ahead to 64MB for /dev/md3" blockdev --setra 65536 /dev/md3 This proved to give the fastest performance, I have always used 4GB then recently 8GB of memory in the machine. http://www.rhic.bnl.gov/hepix/talks/041019pm/schoen.pdf See page 13. Why so large? That's likely to cause readahead thrashing problems under low memory echo "Setting stripe_cache_size to 16MB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size (also set max_sectors_kb) to 128K (chunk size) and disable NCQ Why do that? You want XFS to issue large I/Os and the block layer to split them across all the disks. i.e. you are preventing full stripe writes from occurring by doing that. I use a 128k stripe, what should I use for the max_sectors_kb? I read that 128kb was optimal. Can you please comment on all of the optimizations below? #!/bin/bash # source profile . /etc/profile echo "Optimizing RAID Arrays..." # This step must come first. # See: http://www.3ware.com/KB/article.aspx?id=11050 echo "Setting max_sectors_kb to chunk size of RAID5 arrays..." for i in sdc sdd sde sdf sdg sdh sdi sdj sdk sdl do echo "Setting /dev/$i to 128K..." echo 128 > /sys/block/"$i"/queue/max_sectors_kb done echo "Setting read-ahead to 64MB for /dev/md3" blockdev --setra 65536 /dev/md3 echo "Setting stripe_cache_size to 16MB for /dev/md3" echo 16384 > /sys/block/md3/md/stripe_cache_size # if you use more than the default 64kb stripe with raid5 # this feature is broken so you need to limit it to 30MB/s # neil has a patch, not sure when it will be merged. echo "Setting minimum and maximum resync speed to 30MB/s..." echo 3 > /sys/block/md0/md/sync_speed_min echo 3 > /sys/block/md0/md/sync_speed_max echo 3 > /sys/block/md1/md/sync_speed_min echo 3 > /sys/block/md1/md/sync_speed_max echo 3 > /sys/block/md2/md/sync_speed_min echo 3 > /sys/block/md2/md/sync_speed_max echo 3 > /sys/block/md3/md/sync_speed_min echo 3 > /sys/block/md3/md/sync_speed_max # Disable NCQ. echo "Disabling NCQ..." for i in sdc sdd sde sdf sdg sdh sdi sdj sdk sdl do echo "Disabling NCQ on $i" echo 1 > /sys/block/"$i"/device/queue_depth done - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]
On Mon, 18 Jun 2007 16:09:49 +0900, Tejun Heo wrote: > Mikael Pettersson wrote: > > On Sat, 16 Jun 2007 15:52:33 +0400, Brad Campbell wrote: > >> I've got a box here based on current Debian Stable. > >> It's got 15 Maxtor SATA drives in it on 4 Promise TX4 controllers. > >> > >> Using kernel 2.6.21.x it shuts down, but of course with a huge "clack" as > >> 15 drives all do emergency > >> head parks simultaneously. I thought I'd upgrade to 2.6.22-rc to get > >> around this but the machine > >> just hangs up hard apparently trying to sync cache on a drive. > >> > >> I've run this process manually, so I know it is being performed properly. > >> > >> Prior to shutdown, all nfsd processes are stopped, filesystems unmounted > >> and md arrays stopped. > >> /proc/mdstat shows > >> [EMAIL PROTECTED]:~# cat /proc/mdstat > >> Personalities : [raid6] [raid5] [raid4] > >> unused devices: > >> [EMAIL PROTECTED]:~# > >> > >> Here is the final hangup. > >> > >> http://www.fnarfbargle.com/CIMG1029.JPG > > > > Something sent a command to the disk on ata15 after the PHY had been > > offlined and the interface had been put in SLUMBER state (SStatus 614). > > Consequently the command timed out. Libata tried a soft reset, and then > > a hard reset, after which the machine hung. > > Hmm... weird. Maybe device initiated power saving (DIPS) is active? > > > I don't think sata_promise is the guilty party here. Looks like some > > layer above sata_promise got confused about the state of the interface. > > But locking up hard after hardreset is a problem of sata_promise, no? Maybe, maybe not. The original report doesn't specify where/how the machine hung. Brad: can you enable sysrq and check if the kernel responds to sysrq when it appears to hang, and if so, where it's executing? sata_promise just passes sata_std_hardreset to ata_do_eh. I've certainly seen EH hardresets work before, so I'm assuming that something in this particular situation (PHY offlined, kernel close to shutting down) breaks things. FWIW, I'm seeing scsi layer accesses (cache flushes) after things like rmmod sata_promise. They error out and don't seem to cause any harm, but the fact that they occur at all makes me nervous. /Mikael - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]
Mikael Pettersson wrote: > FWIW, I'm seeing scsi layer accesses (cache flushes) after things > like rmmod sata_promise. They error out and don't seem to cause > any harm, but the fact that they occur at all makes me nervous. That's okay. On rmmod, as the low level device (ATA) goes away first just as in hot unplug, sd gets notified *after* the device is gone but sd still tries to clean up and issues the commands which are properly rejected by the SCSI midlayer as the device is marked offline already, so nothing to worry about there. -- tejun - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Machine hanging on synchronize cache on shutdown 2.6.22-rc4-git[45678]
Mikael Pettersson wrote: I don't think sata_promise is the guilty party here. Looks like some layer above sata_promise got confused about the state of the interface. But locking up hard after hardreset is a problem of sata_promise, no? Maybe, maybe not. The original report doesn't specify where/how the machine hung. It hangs in the process of trying to power it off. Unmount everything and halt the machine. I've tried halt with and without the -h. With the -h you can hear the drives spin down, then it tries to spin them up again and hangs. Without the -h it just hangs hard where you see in the photo. Brad: can you enable sysrq and check if the kernel responds to sysrq when it appears to hang, and if so, where it's executing? All my kernels have sysrq enabled. Once the hard reset is displayed on the screen everything locks. sata_promise just passes sata_std_hardreset to ata_do_eh. I've certainly seen EH hardresets work before, so I'm assuming that something in this particular situation (PHY offlined, kernel close to shutting down) breaks things. That is my thought. I thought on a .22-rc kernel if I used halt -h and it spun the disks down that the kernel would detect that and not try to flush the caches on them, or have I read something incorrectly? FWIW, I'm seeing scsi layer accesses (cache flushes) after things like rmmod sata_promise. They error out and don't seem to cause any harm, but the fact that they occur at all makes me nervous. Brad -- "Human beings, who are almost unique in having the ability to learn from the experience of others, are also remarkable for their apparent disinclination to do so." -- Douglas Adams - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: XFS Tunables for High Speed Linux SW RAID5 Systems?
David Chinner wrote: On Fri, Jun 15, 2007 at 04:36:07PM -0400, Justin Piszcz wrote: Hi, I was wondering if the XFS folks can recommend any optimizations for high speed disk arrays using RAID5? [sysctls snipped] None of those options will make much difference to performance. mkfs parameters are the big ticket item here Is there anywhere you can point to that expands on this? Is there anything raid specific that would be worth including in the Wiki? David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: Software based SATA RAID-5 expandable arrays?
Last I check expanding drives (reshaping the RAID) in a raid set within Windows is not supported. Significant size is relative I guess, but 4-8 terabytes will not be a problem in either OS. I run a RAID 6 (Windows does not support this either last I checked). I started out with 5 drives and have reshaped it to ten drives now. I have a few 250G (old original drives) and many 500G drives (added and replacement drives) in the set. Once all the old 250G die off and I replace them with 500G drives I will grow the RAID to the size of its new smallest disk, 500G. Grow and Reshape are slightly different, both supported in Linux mdadm. I have tested both with succcess. I too use my set for media and it is not in use 90% of the time. I use put this line in my /etc/rc.local to put the drives to sleep after a specified min of inactivity; hdparm -S 241 /dev/sd* The values for the -S switch are not intuitive, read the man page. The value I use (241) put them to standby (spindown) after 30min. My OS is on EIDE and my RAID set is all SATA, hence the splat for all SATA drives. I have been running this for a year now with my RAID set. It works great and I have had no problems with mdadm waiting on drives to spinup when I access them. The one caveat, be prepared to wait a few moments if the are all in spindown state before you can access your data. For me with ten drives, it is always less than a minute, usually 30sec or so. For a filesystem, I use XFS for my large media files. Dan. - Inline Message Follows - To: linux-raid@vger.kernel.org From: greenjelly Subject: Software based SATA RAID-5 expandable arrays? I am researching my option to build a Media NAS server. Sorry for the long message, but I wanted to provide as much details as possible to my problem, for the best solution. I have Bolded sections as to save people who don't have the time to read all of this. Option 1: Expand My current Dream Machine! I could buying a RAID-5 Hardware card for my current system (vista ultimate 64 with a extreme 6800 and 1066mb 2 gig RAM). The Adaptec RAID controller (model "3805", you can search NewEgg for the infomation) will cost me near $500 (consume 23w) and support 8 drives (I have 6). This controller contains a 800mhz processor with a large cache of memory. It will support expandable RAID-5 array! I would also buy a 750w+ PSU (for the additional safety and security). The drives in this machine would be placed in shock absorbing (noise reduction) 3 slot 4 drive bay containers with fans ( I have 2 of these) and I will be removing a IDE based Pioneer DVD Burner (1 of 3) because of its flaky performance given the p965 intel chip set lack of native IDE support and thus the Motherboards Micron SATA to IDE device. Ive already installed 4 drives in this machine (on the native MB SATA controller) only to find a fan fail on me within days of the installation. One of the drives went bad (may or may not have to do with the heat). There are 5mm between these drives, and I would now replace both fans with higher RPM ball baring fans for added reliability (more noise). I would also need to find a Freeware SMART monitor software which at this time I can not find for Vista, to warn me of increased temps due to failure of fan, increased environmental heat, etc. The only option is commercial SMART monitoring software (which may not work with the Adaptec RAID adapter. Option 2: Build a server. I have a copy of Windows 2003 server, which I have yet to find out if it supports native software expandable RAID-5 arrays. I can also use Linux (which I have very little experience with) but have always wanted to use and learn. To do either of the last two options, I would still need to buy a new power supply for my current VISTA machine (for added reliability). The current PSU is 550w and with a power hungry RADEON, 3 DVD Drives and a X-Fi sound card... My nerves are getting frayed. I would buy a cheap motherboard, processor and 1gig or less of RAM. Lastly I would want a VERY large Case. I have a 7300 NVidia PCI card that was replaced with a X1950GT on my Home Theater PC so that I may play back HD/Blue Ray DVD's. The server option may cost a bit more then the $500 for the Adaptec Raid controller. This will only work if Linux or Windows 2003 supports my much needed requirements. My Linux OS will be installed on a 40mb IDE Drive (not part of the Array). The options I seek are to be able to start with a 6 Drive array RAID-5 array, then as my demand for more space increases in the future I want to be able to plug in more drives and incorporate them into the Array without the need to backup the data. Basically I need the software to add the drive/drives to the Array, then Rebuild the array incorporating the new drives while preserving the data on the original array. QUESTIONS Since this is a media server, and would only be used to serve Movies and Video to my two machines It woul
Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote: > David Greaves wrote: > >OK, that gave me an idea. > > > >Freeze the filesystem > >md5sum the lvm > >hibernate > >resume > >md5sum the lvm > > >So the lvm and below looks OK... > > > >I'll see how it behaves now the filesystem has been frozen/thawed over > >the hibernate... > > > And it appears to behave well. (A few hours compile/clean cycling kernel > builds on that filesystem were OK). > > > Historically I've done: > sync > echo platform > /sys/power/disk > echo disk > /sys/power/state > # resume > > and had filesystem corruption (only on this machine, my other hibernating > xfs machines don't have this problem) > > So doing: > xfs_freeze -f /scratch > sync > echo platform > /sys/power/disk > echo disk > /sys/power/state > # resume > xfs_freeze -u /scratch > > Works (for now - more usage testing tonight) Verrry interesting. What you were seeing was an XFS shutdown occurring because the free space btree was corrupted. IOWs, the process of suspend/resume has resulted in either bad data being written to disk, the correct data not being written to disk or the cached block being corrupted in memory. If you run xfs_check on the filesystem after it has shut down after a resume, can you tell us if it reports on-disk corruption? Note: do not run xfs_repair to check this - it does not check the free space btrees; instead it simply rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair to fix it up. FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS filesystem for a suspend/resume to work safely and have argued that the only safe thing to do is freeze the filesystem before suspend and thaw it after resume. This is why I originally asked you to test that with the other problem that you reported. Up until this point in time, there's been no evidence to prove either side of the argument.. Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
resync to last 27h - usually 3. what's this?
Bootet today, got this in dmesg: [ 44.884915] md: bind [ 44.885150] md: bind [ 44.885352] md: bind [ 44.885552] md: bind [ 44.885601] md: kicking non-fresh sdd1 from array! [ 44.885637] md: unbind [ 44.885671] md: export_rdev(sdd1) [ 44.900824] raid5: device sdc1 operational as raid disk 1 [ 44.900860] raid5: device sdb1 operational as raid disk 3 [ 44.900895] raid5: device sda1 operational as raid disk 2 [ 44.901207] raid5: allocated 4203kB for md0 [ 44.901241] raid5: raid level 5 set md0 active with 3 out of 4 devices, algorithm 2 [ 44.901284] RAID5 conf printout: [ 44.901317] --- rd:4 wd:3 [ 44.901349] disk 1, o:1, dev:sdc1 [ 44.901381] disk 2, o:1, dev:sda1 [ 44.901414] disk 3, o:1, dev:sdb1 Checked the disk, seemed fine (not the first time linux kicked a disk for no apparent reason), readded it with mdadm which triggered a resync. Now having a look at it I get: $ cat /proc/mdstat Personalities : [raid6] [raid5] [raid4] md0 : active raid5 sdd[4] sdc1[1] sdb1[3] sda1[2] 732563712 blocks level 5, 32k chunk, algorithm 2 [4/3] [_UUU] [=>...] recovery = 8.1% (19867520/244187904) finish=1661.6min speed=2248K/sec 1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually takes 3 hours to resync or check, for that matter. So, what's this? -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: resync to last 27h - usually 3. what's this?
Dexter Filmore wrote: 1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually takes 3 hours to resync or check, for that matter. So, what's this? kernel, mdadm verisons? I seem to recall a long fixed ETA calculation bug some time back... David - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
suggestions for inexpensive linux jbod?
I'm creating a larger backup server that uses bacula (this software works well). The way I'm going about this I need lots of space in the filesystem where temporary files are stored. I have been looking at the Norco (link at the bottom), but there seem to be some grumblings that the adapter card does not play well with linux. Has anyone used this device or have another suggestion? I'm looking at something that will present lots of disk to the linux box (fedore core 5, kernel 2.6.20) that I will put under md/RAID and LVM. I want to have between 1.5TB and 3.0TB of usable space after all the RAID'ing. Mike http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: resync to last 27h - usually 3. what's this?
On Monday 18 June 2007 17:22:06 David Greaves wrote: > Dexter Filmore wrote: > > 1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually > > takes 3 hours to resync or check, for that matter. > > > > So, what's this? > > kernel, mdadm verisons? > > I seem to recall a long fixed ETA calculation bug some time back... > 2.6.21.1, mdadm 2.5.3. First time I sync since upgrade from 2.6.17. Definetly no calc bug, only 4% progress in one hour. -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggestions for inexpensive linux jbod?
On Mon, 18 Jun 2007, Mike wrote: I'm creating a larger backup server that uses bacula (this software works well). The way I'm going about this I need lots of space in the filesystem where temporary files are stored. I have been looking at the Norco (link at the bottom), but there seem to be some grumblings that the adapter card does not play well with linux. Has anyone used this device or have another suggestion? I'm looking at something that will present lots of disk to the linux box (fedore core 5, kernel 2.6.20) that I will put under md/RAID and LVM. I want to have between 1.5TB and 3.0TB of usable space after all the RAID'ing. Mike http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001 - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR use PCI-e cards that use the Silicon Image chipset. 04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) Justin. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: resync to last 27h - usually 3. what's this?
On Mon, 18 Jun 2007, Dexter Filmore wrote: On Monday 18 June 2007 17:22:06 David Greaves wrote: Dexter Filmore wrote: 1661 minutes is *way* too long. it's a 4x250GiB sATA array and usually takes 3 hours to resync or check, for that matter. So, what's this? kernel, mdadm verisons? I seem to recall a long fixed ETA calculation bug some time back... 2.6.21.1, mdadm 2.5.3. First time I sync since upgrade from 2.6.17. Definetly no calc bug, only 4% progress in one hour. -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html Hi. What is your stripe size? What if you set this higher? By default it will use 1MB/s or so if you do not FORCE it to use more than the idle I/O on the box. echo "Setting minimum and maximum resync speed to 30MB/s..." echo 3 > /sys/block/md0/md/sync_speed_min echo 3 > /sys/block/md0/md/sync_speed_max echo 3 > /sys/block/md1/md/sync_speed_min echo 3 > /sys/block/md1/md/sync_speed_max echo 3 > /sys/block/md2/md/sync_speed_min echo 3 > /sys/block/md2/md/sync_speed_max echo 3 > /sys/block/md3/md/sync_speed_min echo 3 > /sys/block/md3/md/sync_speed_max - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
[EMAIL PROTECTED] wrote: in my case it takes 2+ days to resync the array before I can do any performance testing with it. for some reason it's only doing the rebuild at ~5M/sec (even though I've increased the min and max rebuild speeds and a dd to the array seems to be ~44M/sec, even during the rebuild) With performance like that, it sounds like you're saturating a bus somewhere along the line. If you're using scsi, for instance, it's very easy for a long chain of drives to overwhelm a channel. You might also want to consider some other RAID layouts like 1+0 or 5+0 depending upon your space vs. reliability needs. -- Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
stop sync?
How do I stop a running sync? I just figured that I --add-ed an entire disk instead of the partition. What do I do anyway? remove the disk, set it as faulty? -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Brendan Conoboy wrote: [EMAIL PROTECTED] wrote: in my case it takes 2+ days to resync the array before I can do any performance testing with it. for some reason it's only doing the rebuild at ~5M/sec (even though I've increased the min and max rebuild speeds and a dd to the array seems to be ~44M/sec, even during the rebuild) With performance like that, it sounds like you're saturating a bus somewhere along the line. If you're using scsi, for instance, it's very easy for a long chain of drives to overwhelm a channel. You might also want to consider some other RAID layouts like 1+0 or 5+0 depending upon your space vs. reliability needs. I plan to test the different configurations. however, if I was saturating the bus with the reconstruct how can I fire off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the reconstruct to ~4M/sec? I'm putting 10x as much data through the bus at that point, it would seem to proove that it's not the bus that's saturated. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
[EMAIL PROTECTED] wrote: I plan to test the different configurations. however, if I was saturating the bus with the reconstruct how can I fire off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the reconstruct to ~4M/sec? I'm putting 10x as much data through the bus at that point, it would seem to proove that it's not the bus that's saturated. I am unconvinced. If you take ~1MB/s for each active drive, add in SCSI overhead, 45M/sec seems reasonable. Have you look at a running iostat while all this is going on? Try it out- add up the kb/s from each drive and see how close you are to your maximum theoretical IO. Also, how's your CPU utilization? -- Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Lennart Sorensen wrote: On Mon, Jun 18, 2007 at 10:28:38AM -0700, [EMAIL PROTECTED] wrote: I plan to test the different configurations. however, if I was saturating the bus with the reconstruct how can I fire off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the reconstruct to ~4M/sec? I'm putting 10x as much data through the bus at that point, it would seem to proove that it's not the bus that's saturated. dd 45MB/s from the raid sounds reasonable. If you have 45 drives, doing a resync of raid5 or radi6 should probably involve reading all the disks, and writing new parity data to one drive. So if you are writing 5MB/s, then you are reading 44*5MB/s from the other drives, which is 220MB/s. If your resync drops to 4MB/s when doing dd, then you have 44*4MB/s which is 176MB/s or 44MB/s less read capacity, which surprisingly seems to match the dd speed you are getting. Seems like you are indeed very much saturating a bus somewhere. The numbers certainly agree with that theory. What kind of setup is the drives connected to? simple ultra-wide SCSI to a single controller. I didn't realize that the rate reported by /proc/mdstat was the write speed that was takeing place, I thought it was the total data rate (reads + writes). the next time this message gets changed it would be a good thing to clarify this. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Brendan Conoboy wrote: [EMAIL PROTECTED] wrote: I plan to test the different configurations. however, if I was saturating the bus with the reconstruct how can I fire off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the reconstruct to ~4M/sec? I'm putting 10x as much data through the bus at that point, it would seem to proove that it's not the bus that's saturated. I am unconvinced. If you take ~1MB/s for each active drive, add in SCSI overhead, 45M/sec seems reasonable. Have you look at a running iostat while all this is going on? Try it out- add up the kb/s from each drive and see how close you are to your maximum theoretical IO. I didn't try iostat, I did look at vmstat, and there the numbers look even worse, the bo column is ~500 for the resync by itself, but with the DD it's ~50,000. when I get access to the box again I'll try iostat to get more details Also, how's your CPU utilization? ~30% of one cpu for the raid 6 thread, ~5% of one cpu for the resync thread David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, Jun 18, 2007 at 11:12:45AM -0700, [EMAIL PROTECTED] wrote: > simple ultra-wide SCSI to a single controller. Hmm, isn't ultra-wide limited to 40MB/s? Is it Ultra320 wide? That could do a lot more, and 220MB/s sounds plausable for 320 scsi. > I didn't realize that the rate reported by /proc/mdstat was the write > speed that was takeing place, I thought it was the total data rate (reads > + writes). the next time this message gets changed it would be a good > thing to clarify this. Well I suppose itcould make sense to show rate of rebuild which you can then compare against the total size of tha raid, or you can have rate of write, which you then compare against the size of the drive being synced. Certainly I would expect much higer speeds if it was the overall raid size, while the numbers seem pretty reasonable as a write speed. 4MB/s would take for ever if it was the overall raid resync speed. I usually see SATA raid1 resync at 50 to 60MB/s or so, which matches the read and write speeds of the drives in the raid. -- Len Sorensen - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, Jun 18, 2007 at 10:28:38AM -0700, [EMAIL PROTECTED] wrote: > I plan to test the different configurations. > > however, if I was saturating the bus with the reconstruct how can I fire > off a dd if=/dev/zero of=/mnt/test and get ~45M/sec whild only slowing the > reconstruct to ~4M/sec? > > I'm putting 10x as much data through the bus at that point, it would seem > to proove that it's not the bus that's saturated. dd 45MB/s from the raid sounds reasonable. If you have 45 drives, doing a resync of raid5 or radi6 should probably involve reading all the disks, and writing new parity data to one drive. So if you are writing 5MB/s, then you are reading 44*5MB/s from the other drives, which is 220MB/s. If your resync drops to 4MB/s when doing dd, then you have 44*4MB/s which is 176MB/s or 44MB/s less read capacity, which surprisingly seems to match the dd speed you are getting. Seems like you are indeed very much saturating a bus somewhere. The numbers certainly agree with that theory. What kind of setup is the drives connected to? -- Len Sorensen - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Lennart Sorensen wrote: On Mon, Jun 18, 2007 at 11:12:45AM -0700, [EMAIL PROTECTED] wrote: simple ultra-wide SCSI to a single controller. Hmm, isn't ultra-wide limited to 40MB/s? Is it Ultra320 wide? That could do a lot more, and 220MB/s sounds plausable for 320 scsi. yes, sorry, ultra 320 wide. I didn't realize that the rate reported by /proc/mdstat was the write speed that was takeing place, I thought it was the total data rate (reads + writes). the next time this message gets changed it would be a good thing to clarify this. Well I suppose itcould make sense to show rate of rebuild which you can then compare against the total size of tha raid, or you can have rate of write, which you then compare against the size of the drive being synced. Certainly I would expect much higer speeds if it was the overall raid size, while the numbers seem pretty reasonable as a write speed. 4MB/s would take for ever if it was the overall raid resync speed. I usually see SATA raid1 resync at 50 to 60MB/s or so, which matches the read and write speeds of the drives in the raid. as I read it right now what happens is the worst of the options, you show the total size of the array for the amount of work that needs to be done, but then show only the write speed for the rate pf progress being made through the job. total rebuild time was estimated at ~3200 min David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
[EMAIL PROTECTED] wrote: yes, sorry, ultra 320 wide. Exactly how many channels and drives? -- Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [linux-lvm] 2.6.22-rc4 XFS fails after hibernate/resume
OK, just an quick ack When I resumed tonight (having done a freeze/thaw over the suspend) some libata errors threw up during the resume and there was an eventual hard hang. Maybe I spoke to soon? I'm going to have to do some more testing... David Chinner wrote: On Mon, Jun 18, 2007 at 08:49:34AM +0100, David Greaves wrote: David Greaves wrote: So doing: xfs_freeze -f /scratch sync echo platform > /sys/power/disk echo disk > /sys/power/state # resume xfs_freeze -u /scratch Works (for now - more usage testing tonight) Verrry interesting. Good :) What you were seeing was an XFS shutdown occurring because the free space btree was corrupted. IOWs, the process of suspend/resume has resulted in either bad data being written to disk, the correct data not being written to disk or the cached block being corrupted in memory. That's the kind of thing I was suspecting, yes. If you run xfs_check on the filesystem after it has shut down after a resume, can you tell us if it reports on-disk corruption? Note: do not run xfs_repair to check this - it does not check the free space btrees; instead it simply rebuilds them from scratch. If xfs_check reports an error, then run xfs_repair to fix it up. OK, I can try this tonight... FWIW, I'm on record stating that "sync" is not sufficient to quiesce an XFS filesystem for a suspend/resume to work safely and have argued that the only safe thing to do is freeze the filesystem before suspend and thaw it after resume. This is why I originally asked you to test that with the other problem that you reported. Up until this point in time, there's been no evidence to prove either side of the argument.. Cheers, Dave. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Brendan Conoboy wrote: [EMAIL PROTECTED] wrote: yes, sorry, ultra 320 wide. Exactly how many channels and drives? one channel, 2 OS drives plus the 45 drives in the array. yes I realize that there will be bottlenecks with this, the large capacity is to handle longer history (it's going to be a 30TB circular buffer being fed by a pair of OC-12 links) it appears that my big mistake was not understanding what /proc/mdstat is telling me. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH git-md-accel 0/2] raid5 refactor, and pr_debug cleanup
Neil, The following two patches are the respin of the changes you suggested to "raid5: coding style cleanup / refactor". I have added them to the git-md-accel tree for a 2.6.23-rc1 pull. The full, rebased, raid acceleration patchset will be sent for a another round of review once I address Andrew's concerns about the commit messages. Dan Williams (2): raid5: refactor handle_stripe5 and handle_stripe6 raid5: replace custom debug print with standard pr_debug - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6
handle_stripe5 and handle_stripe6 have very deep logic paths handling the various states of a stripe_head. By introducing the 'stripe_head_state' and 'r6_state' objects, large portions of the logic can be moved to sub-routines. 'struct stripe_head_state' consumes all of the automatic variables that previously stood alone in handle_stripe5,6. 'struct r6_state' contains the handle_stripe6 specific variables like p_failed and q_failed. One of the nice side effects of the 'stripe_head_state' change is that it allows for further reductions in code duplication between raid5 and raid6. The following new routines are shared between raid5 and raid6: handle_completed_write_requests handle_requests_to_failed_array handle_stripe_expansion Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 1484 +--- include/linux/raid/raid5.h | 16 2 files changed, 733 insertions(+), 767 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 4f51dfa..68834d2 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -1326,6 +1326,604 @@ static int stripe_to_pdidx(sector_t stripe, raid5_conf_t *conf, int disks) return pd_idx; } +static void +handle_requests_to_failed_array(raid5_conf_t *conf, struct stripe_head *sh, + struct stripe_head_state *s, int disks, + struct bio **return_bi) +{ + int i; + for (i = disks; i--;) { + struct bio *bi; + int bitmap_end = 0; + + if (test_bit(R5_ReadError, &sh->dev[i].flags)) { + mdk_rdev_t *rdev; + rcu_read_lock(); + rdev = rcu_dereference(conf->disks[i].rdev); + if (rdev && test_bit(In_sync, &rdev->flags)) + /* multiple read failures in one stripe */ + md_error(conf->mddev, rdev); + rcu_read_unlock(); + } + spin_lock_irq(&conf->device_lock); + /* fail all writes first */ + bi = sh->dev[i].towrite; + sh->dev[i].towrite = NULL; + if (bi) { + s->to_write--; + bitmap_end = 1; + } + + if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags)) + wake_up(&conf->wait_for_overlap); + + while (bi && bi->bi_sector < + sh->dev[i].sector + STRIPE_SECTORS) { + struct bio *nextbi = r5_next_bio(bi, sh->dev[i].sector); + clear_bit(BIO_UPTODATE, &bi->bi_flags); + if (--bi->bi_phys_segments == 0) { + md_write_end(conf->mddev); + bi->bi_next = *return_bi; + *return_bi = bi; + } + bi = nextbi; + } + /* and fail all 'written' */ + bi = sh->dev[i].written; + sh->dev[i].written = NULL; + if (bi) bitmap_end = 1; + while (bi && bi->bi_sector < + sh->dev[i].sector + STRIPE_SECTORS) { + struct bio *bi2 = r5_next_bio(bi, sh->dev[i].sector); + clear_bit(BIO_UPTODATE, &bi->bi_flags); + if (--bi->bi_phys_segments == 0) { + md_write_end(conf->mddev); + bi->bi_next = *return_bi; + *return_bi = bi; + } + bi = bi2; + } + + /* fail any reads if this device is non-operational */ + if (!test_bit(R5_Insync, &sh->dev[i].flags) || + test_bit(R5_ReadError, &sh->dev[i].flags)) { + bi = sh->dev[i].toread; + sh->dev[i].toread = NULL; + if (test_and_clear_bit(R5_Overlap, &sh->dev[i].flags)) + wake_up(&conf->wait_for_overlap); + if (bi) s->to_read--; + while (bi && bi->bi_sector < + sh->dev[i].sector + STRIPE_SECTORS) { + struct bio *nextbi = + r5_next_bio(bi, sh->dev[i].sector); + clear_bit(BIO_UPTODATE, &bi->bi_flags); + if (--bi->bi_phys_segments == 0) { + bi->bi_next = *return_bi; + *return_bi = bi; + } + bi = nextbi; + } + } + spin_unlock_irq(&conf->device_lock); + if (bitmap_end) +
[PATCH git-md-accel 2/2] raid5: replace custom debug print with standard pr_debug
Replaces PRINTK with pr_debug, and kills the RAID5_DEBUG definition in favor of the global DEBUG definition. To get local debug messages just add '#define DEBUG' to the top of the file. Signed-off-by: Dan Williams <[EMAIL PROTECTED]> --- drivers/md/raid5.c | 116 ++-- 1 files changed, 58 insertions(+), 58 deletions(-) diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c index 68834d2..fa562e7 100644 --- a/drivers/md/raid5.c +++ b/drivers/md/raid5.c @@ -80,7 +80,6 @@ /* * The following can be used to debug the driver */ -#define RAID5_DEBUG0 #define RAID5_PARANOIA 1 #if RAID5_PARANOIA && defined(CONFIG_SMP) # define CHECK_DEVLOCK() assert_spin_locked(&conf->device_lock) @@ -88,8 +87,7 @@ # define CHECK_DEVLOCK() #endif -#define PRINTK(x...) ((void)(RAID5_DEBUG && printk(x))) -#if RAID5_DEBUG +#ifdef DEBUG #define inline #define __inline__ #endif @@ -152,7 +150,8 @@ static void release_stripe(struct stripe_head *sh) static inline void remove_hash(struct stripe_head *sh) { - PRINTK("remove_hash(), stripe %llu\n", (unsigned long long)sh->sector); + pr_debug("remove_hash(), stripe %llu\n", + (unsigned long long)sh->sector); hlist_del_init(&sh->hash); } @@ -161,7 +160,8 @@ static inline void insert_hash(raid5_conf_t *conf, struct stripe_head *sh) { struct hlist_head *hp = stripe_hash(conf, sh->sector); - PRINTK("insert_hash(), stripe %llu\n", (unsigned long long)sh->sector); + pr_debug("insert_hash(), stripe %llu\n", + (unsigned long long)sh->sector); CHECK_DEVLOCK(); hlist_add_head(&sh->hash, hp); @@ -226,7 +226,7 @@ static void init_stripe(struct stripe_head *sh, sector_t sector, int pd_idx, int BUG_ON(test_bit(STRIPE_HANDLE, &sh->state)); CHECK_DEVLOCK(); - PRINTK("init_stripe called, stripe %llu\n", + pr_debug("init_stripe called, stripe %llu\n", (unsigned long long)sh->sector); remove_hash(sh); @@ -260,11 +260,11 @@ static struct stripe_head *__find_stripe(raid5_conf_t *conf, sector_t sector, in struct hlist_node *hn; CHECK_DEVLOCK(); - PRINTK("__find_stripe, sector %llu\n", (unsigned long long)sector); + pr_debug("__find_stripe, sector %llu\n", (unsigned long long)sector); hlist_for_each_entry(sh, hn, stripe_hash(conf, sector), hash) if (sh->sector == sector && sh->disks == disks) return sh; - PRINTK("__stripe %llu not in cache\n", (unsigned long long)sector); + pr_debug("__stripe %llu not in cache\n", (unsigned long long)sector); return NULL; } @@ -276,7 +276,7 @@ static struct stripe_head *get_active_stripe(raid5_conf_t *conf, sector_t sector { struct stripe_head *sh; - PRINTK("get_stripe, sector %llu\n", (unsigned long long)sector); + pr_debug("get_stripe, sector %llu\n", (unsigned long long)sector); spin_lock_irq(&conf->device_lock); @@ -537,8 +537,8 @@ static int raid5_end_read_request(struct bio * bi, unsigned int bytes_done, if (bi == &sh->dev[i].req) break; - PRINTK("end_read_request %llu/%d, count: %d, uptodate %d.\n", - (unsigned long long)sh->sector, i, atomic_read(&sh->count), + pr_debug("end_read_request %llu/%d, count: %d, uptodate %d.\n", + (unsigned long long)sh->sector, i, atomic_read(&sh->count), uptodate); if (i == disks) { BUG(); @@ -613,7 +613,7 @@ static int raid5_end_write_request (struct bio *bi, unsigned int bytes_done, if (bi == &sh->dev[i].req) break; - PRINTK("end_write_request %llu/%d, count %d, uptodate: %d.\n", + pr_debug("end_write_request %llu/%d, count %d, uptodate: %d.\n", (unsigned long long)sh->sector, i, atomic_read(&sh->count), uptodate); if (i == disks) { @@ -658,7 +658,7 @@ static void error(mddev_t *mddev, mdk_rdev_t *rdev) { char b[BDEVNAME_SIZE]; raid5_conf_t *conf = (raid5_conf_t *) mddev->private; - PRINTK("raid5: error called\n"); + pr_debug("raid5: error called\n"); if (!test_bit(Faulty, &rdev->flags)) { set_bit(MD_CHANGE_DEVS, &mddev->flags); @@ -929,7 +929,7 @@ static void compute_block(struct stripe_head *sh, int dd_idx) int i, count, disks = sh->disks; void *ptr[MAX_XOR_BLOCKS], *dest, *p; - PRINTK("compute_block, stripe %llu, idx %d\n", + pr_debug("compute_block, stripe %llu, idx %d\n", (unsigned long long)sh->sector, dd_idx); dest = page_address(sh->dev[dd_idx].page); @@ -960,7 +960,7 @@ static void compute_parity5(struct stripe_head *sh, int method) void *ptr[MAX_XOR_BLOCKS], *dest; struct bio *chosen; - PRINTK("compute_parity5, strip
Re: Software based SATA RAID-5 expandable arrays?
Why dontcha just cut all the "look how big my ePenis is" chatter and tell us what you wanna do? Nobody gives a rat if your ultra1337 sound cards needs a 10 megawatt power supply. -- -BEGIN GEEK CODE BLOCK- Version: 3.12 GCS d--(+)@ s-:+ a- C UL++ P+>++ L+++> E-- W++ N o? K- w--(---) !O M+ V- PS+ PE Y++ PGP t++(---)@ 5 X+(++) R+(++) tv--(+)@ b++(+++) DI+++ D- G++ e* h>++ r* y? --END GEEK CODE BLOCK-- http://www.stop1984.com http://www.againsttcpa.com - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
[EMAIL PROTECTED] wrote: > On Mon, 18 Jun 2007, Brendan Conoboy wrote: > > >[EMAIL PROTECTED] wrote: > >> yes, sorry, ultra 320 wide. > > > >Exactly how many channels and drives? > > one channel, 2 OS drives plus the 45 drives in the array. Given that the drives only have 4 ID bits, how can you have 47 drives on 1 cable? You'd need a minimum of 3 channels for 47 drives. Do you have some sort of external box that holds X number of drives and only uses a single ID? -- Lab tests show that use of micro$oft causes cancer in lab animals Got Gas??? - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
On Mon, 18 Jun 2007, Wakko Warner wrote: Subject: Re: limits on raid [EMAIL PROTECTED] wrote: On Mon, 18 Jun 2007, Brendan Conoboy wrote: [EMAIL PROTECTED] wrote: yes, sorry, ultra 320 wide. Exactly how many channels and drives? one channel, 2 OS drives plus the 45 drives in the array. Given that the drives only have 4 ID bits, how can you have 47 drives on 1 cable? You'd need a minimum of 3 channels for 47 drives. Do you have some sort of external box that holds X number of drives and only uses a single ID? yes, I'm useing promise drive shelves, I have them configured to export the 15 drives as 15 LUNs on a single ID. I'm going to be useing this as a huge circular buffer that will just be overwritten eventually 99% of the time, but once in a while I will need to go back into the buffer and extract and process the data. David Lang - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: limits on raid
[EMAIL PROTECTED] wrote: yes, I'm useing promise drive shelves, I have them configured to export the 15 drives as 15 LUNs on a single ID. Well, that would account for it. Your bus is very, very saturated. If all your drives are active, you can't get more than ~7MB/s per disk under perfect conditions. -- Brendan Conoboy / Red Hat, Inc. / [EMAIL PROTECTED] - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH git-md-accel 1/2] raid5: refactor handle_stripe5 and handle_stripe6
On 6/18/07, Dan Williams <[EMAIL PROTECTED]> wrote: ... +static void handle_stripe_expansion(raid5_conf_t *conf, struct stripe_head *sh, + struct r6_state *r6s) +{ + int i; + + /* We have read all the blocks in this stripe and now we need to +* copy some of them into a target stripe for expand. +*/ + clear_bit(STRIPE_EXPAND_SOURCE, &sh->state); + for (i = 0; i < sh->disks; i++) + if (i != sh->pd_idx && (r6s && i != r6s->qd_idx)) { + int dd_idx, pd_idx, j; + struct stripe_head *sh2; + + sector_t bn = compute_blocknr(sh, i); + sector_t s = raid5_compute_sector(bn, conf->raid_disks, + conf->raid_disks-1, &dd_idx, + &pd_idx, conf); this bug made it through the regression test: 'conf->raid_disks-1' should be 'conf->raid_disks - conf->max_degraded' -- Dan - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggestions for inexpensive linux jbod?
In article <[EMAIL PROTECTED]>, Justin Piszcz wrote: > > > On Mon, 18 Jun 2007, Mike wrote: > >> I'm creating a larger backup server that uses bacula (this >> software works well). The way I'm going about this I need >> lots of space in the filesystem where temporary files are >> stored. I have been looking at the Norco (link at the bottom), >> but there seem to be some grumblings that the adapter card >> does not play well with linux. >> >> Has anyone used this device or have another suggestion? I'm >> looking at something that will present lots of disk to the >> linux box (fedore core 5, kernel 2.6.20) that I will put >> under md/RAID and LVM. I want to have between 1.5TB and 3.0TB >> of usable space after all the RAID'ing. >> >> Mike >> >> http://www.newegg.com/Product/Product.aspx?Item=N82E16816133001 > > Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR > use PCI-e cards that use the Silicon Image chipset. > > 04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid > II Controller (rev 01) > > Justin. Any idea if the Hitachi 1 TB drives will work in a Dell PowerEdge 800? I have two 80GB drives in the box right now that I would like to replace with four of the 1 TB drives. Mike - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: suggestions for inexpensive linux jbod?
> Get 3 Hitachi 1TB drives and use SW RAID5 on an Intel 965 motherboard OR > use PCI-e cards that use the Silicon Image chipset. > > 04:00.0 RAID bus controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid > II Controller (rev 01) What he said. After some thinking, I loaded up a machine with 3x SiI3132 and figured I'll expand by port multiplier. Port multipliers are documented in the standard, not vendor-defined, and are currently made only by, guess who, Silicon Image. I haven't used any yet, but apparently they're working, and you know that both the manufacturer and the linux-ide deveopers have tested them throroughly with SiI controllers. 03:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) 04:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) 05:00.0 Mass storage controller: Silicon Image, Inc. SiI 3132 Serial ATA Raid II Controller (rev 01) md4 : active raid10 sdf3[4] sde3[3] sdd3[2] sdc3[1] sdb3[0] sda3[5] 131837184 blocks 256K chunks 2 near-copies [6/6] [UU] bitmap: 2/126 pages [8KB], 512KB chunk md5 : active raid5 sdf4[5] sde4[4] sdd4[3] sdc4[2] sdb4[1] sda4[0] 1719155200 blocks level 5, 64k chunk, algorithm 2 [6/6] [UU] bitmap: 27/164 pages [108KB], 1024KB chunk md0 : active raid1 sdf1[5] sde1[4] sdd1[3] sdc1[2] sdb1[1] sda1[0] 979840 blocks [6/6] [UU] bitmap: 0/120 pages [0KB], 4KB chunk md0 is the boot partition, md4 is root, and md5 is the main backup data. Note the way the drives ar arranged on the RAID-10; mirrored pairs are aplit across different SATA controllers. - To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html