Re: LSI MegaRAID SAS 9240-4i hangs system at boot

Ramon Hofer Sun, 17 Jun 2012 15:47:31 -0700

I'm again having problems with the disks getting kicked out of the
array :-o


First of all the old WD green 2TB disk which was marked failed also
makes problems in the Netgear ReadyNas. I will see if I still have
warranty and try to get a new one.

But the other issue scares me a bit ;-)

Here's what I've done so far:

Yesterday I had setup md1 with the four new WD black 2TB disks
~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]
~$ mdadm --readwrite /dev/md

I created md0 with md1 as a linear array
~$ mdadm -C /dev/md0 --force -n1 -l linear /dev/md1

On md0 I created the xfs filesystem
~$ mkfs.xfs -d agcount=7,su=131072,sw=3 /dev/md0

Then I copied everything from the old md9 raid5 with the Samsung 1.5TB
to md0.

Today I shut the server down and mounted the mobo, os hdd, the Samsung
1.5 TB drives from the old md9 hdds and the mythtv recording hdd to the
Norco.
Everything went well. I mounted the expander to the case wall and fixed
the cables to stay in place.

Then I booted up again and created md2 with the four Samsung 1.5TB disks
~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]
~$ mdadm --readwrite /dev/md2

After this I expanded the linear array
~$ mdadm --grow /dev/md0 --add /dev/md2

and the filesystem
~$ xfs_growfs /mnt/media-raid

All this went well too.

But this evening I got 10 emails from mdadm. I've again "pastbined"
them because I didn't want to add them to this text:
http://pastebin.com/raw.php?i=ftpmfSpv


I wanted to recreate the array
~$ sudo mdadm -A /dev/md1 /dev/sd[abcd]
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has no superblock - assembly aborted

Here's the output of blkid:
http://pastebin.com/raw.php?i=5AK0Eia1


> I forgot /var/log/dmesg only contains boot info.  Entries since boot
> are only available via the dmesg command.
> 
> ~$ dmesg|sendmail s...@hardwarefreak.com
> 
> should email your current dmesg output directly to me with no
> copy/paste required, assuming exim or postfix is installed.  If not
> you can use paste bin again.  I prefer it in email so I can quote
> interesting parts directly, properly.

I'm not sure if you dmesg helps solving this problem too. Unfortunately
I couldn't email it so I created a pastebin:
http://pastebin.com/raw.php?i=2pNf9wGe


> > I removed the 2 TB disks from the NAS and mounted them in the Norco
> > and connected to the server vio lsi and expander. On these WD
> > drives I created the raid5 (md1) and on top of that the linear
> > array (md0). Upon creation of md1 the fourth disk (sdd) was added
> > as a spare which I had to add manually by setting 
> > 
> > mdadm --readwrite /dev/md1
> 
> That's my fault.  Sorry.  I forgot to have you use "--force" when
> creating the RAID5s.  I overlooked this because I NEVER use md parity
> arrays, nor any parity arrays.  Reason for the spare:
> 
> "When creating a RAID5 array, mdadm will automatically create a
> degraded array with an extra spare drive. This is because building
> the spare into a degraded array is in general faster than resyncing
> the parity on a non-degraded, but not clean, array. This feature can
> be overridden with the --force option."

Thanks for the explanation and the hint. I will use --force from now
on :-)


> > While it was syncing the disks I copied the files from md9 to md0.
> > During this proces sdb was set as faulty.
> 
> Probably too much IO load with the array sync + file copy.  Regardless
> of what anyone says, wait for md arrays to finish building/syncing
> before trying to put anything on top, whether another md layer,
> filesystem, or files.

I didn't read this before doing all the stuff above. Maybe it would
have saved from some headaches...


> >>> That's why I'm already thinking of buying new disks.
> >>
> >> Well lets look at this more closely.  The disks may not be bad.
> >> How old are they?
> 
> You didn't answer.  How old are the 2TB and 1.5TB drives?  What does
> SMART say about /dev/sdb?

Here are the dates I bought the disks:

04.10.2009: 1x Samsung HD154UI
17.02.2010: 3x Samsung HD154UI

12.12.2010: 1x Western Digital Caviar Green 2TB
17.03.2011: 1x Western Digital Caviar Green 2TB
11.08.2011: 2x Western Digital Caviar Green 2TB
01.10.2011: 2x Western Digital Caviar Green 2TB

To be honest I can't remember why I bought 6 of the WDs. But I have sold
at least one of them. The fifth must have disappeared somehow ;-)

I have now stopped md0 and md2 and removed the Samsung and the WD green
drives again. If you want me to post the details of them to I will add
them again. But for now I have here the output of hdparm for the four
drives:
http://pastebin.com/raw.php?i=xcD3mLUA


Maybe the problem now is related to the case because it's again sdb?
Or maybe it's already broken because I didn't cool them while copying
the files and rebuilding the spare drive.


> > Yes sorry it's absolutely fine. I was just curious because you wrote
> > "when the array fills up it gets slower". So I thought when I add
> > four new disks I'll get free space added and the linear array won't
> > be filled anymore as much as before and so it could regain it's
> > previous speed again.
> 
> This is generally true and there are multiple reasons for it.  To
> explain them fully would occupy many chapters in a book, and I'm sure
> someone has already written on this subject.
> 
> In your case, using XFS atop a linear array, each time you add a new
> striped array underneath and grow XFS, access to space in the new
> striped array will generally be faster than into the sections of the
> filesystem that reside on the previous striped array(s) which are
> full, or near full.
> 
> One of the reasons is metadata lookup--where is the file I need to
> get? If a phone book has 10 entries it's very quick to look up any one
> entry.  What if it has 10 million entries?  Takes a bit longer.  I
> need to write a new 100GB file, where can I write it?  Oh, there's
> not a 100GB chunk of free space to hold the file.  Show me the table
> of empty spaces and their sizes.  Calculate the best combination of
> those spaces to split the file across.  The spaces are far apart on
> the device (array).  We go to each one and write a small piece of the
> file.
> 
> An hour later we want to read that file.  Where is the file?  Oh, it's
> here, and here, and here, and here and...  So we go here, read a
> chunk, go there read a chunk...
> 
> Those a just a couple of the reasons you slow down as your filesystem
> ages.  This is true of both arrays and single disks.  SSDs have no
> such limitations as the time to go from here to there retrieving file
> fragments is zero as there are no moving parts.
> 
> > But really not important for my case!
> > Just curiosity ;-)
> 
> I hope that was enough to satisfy your curiosity. :)  Plenty of people
> have written about it if you care to Google.

Thank you for the explanation. It's especially hard to get into a new
topic because one doesn't know what to ask google :-)


> >>> No really. The adventure of enlarging my media server would have
> >>> ended in total frustration!
> >>
> >> There's still time for frustration--you're not done quite yet.  lol
> > 
> > Yes but now I'm in semi known territory ;-)
> 
> Heheh.  Yeah, at least you're starting to get a little solid footing
> under you.  I first started working with hardware RAID about 15 years
> ago when single drive throughput peaked at 15MB/s and you were lucky
> to get 115MB/s out of a 20 drive array due the controllers being
> slow, and due to the PCI bus peaking at 115MB/s after protocol
> overhead when you used 2 or 4 controllers.  Now single drives do that
> rate routinely.

I fear the solid footing is already becoming loose :-o


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120618004655.7fc12dd6@hoferr-x61s.hofer.rummelring

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

Reply via email to