Recovering nested raid (was LSI MegaRAID SAS 9240-4i hangs system at boot)

2012-06-18 Thread Ramon Hofer
On Mon, 18 Jun 2012 00:46:55 +0200
Ramon Hofer ramonho...@bluewin.ch wrote:

 I'm again having problems with the disks getting kicked out of the
 array :-o

I've already asked this before on the debian list and got an answer.
But I'm not sure if I should do this.

Here's a link to my old problem:
http://lists.debian.org/debian-user/2012/04/msg01290.html

The answer from Daniel Koch (thx again) was:

 - Zero all the superblocks on all the disks.
 ~$ mdadm --zero-superblock /dev/sd{b..d}
 
 - Recreate the array with the --assume-clean option.
 ~$ mdadm --create --verbose /dev/md0 --auto=yes --assume-clean
 --level=5 --raid-devices=3 /dev/sdb /dev/sdc /dev/sdd
 
 - Mark it possibly dirty with:
 ~$ mdadm --assemble /dev/md0 --update=resync
 
 - Let it resync
 
 - Mount it and see if it is restored

I'm not sure if this is the correct way here too because I have a
nested raid.

If yes then this should work for me now:

~$ mdadm --zero-superblock /dev/sd[abcd]
~$ mdadm --zero-superblock /dev/sd[efgh]

~$ mdadm --create --verbose /dev/md1 --auto=yes --assume-clean
--level=5 --raid-devices=4 /dev/sd[abcd]
~$ mdadm --create --verbose /dev/md2 --auto=yes --assume-clean
--level=5 --raid-devices=4 /dev/sd[efgh]

~$ mdadm --assemble /dev/md1 --update=resync
~$ mdadm --assemble /dev/md2 --update=resync

Now md0 should have it's members back and I can start it again
~$ mdadm -A /dev/md0 /dev/md[12]

And if I'm very lucky this time I still have my data on the array :-)


I wanted to ask you before I try this if this could help.
Maybe I should ask in the linux raid mailing list too?


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120618122803.3df6f666@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-17 Thread Ramon Hofer
I'm again having problems with the disks getting kicked out of the
array :-o

First of all the old WD green 2TB disk which was marked failed also
makes problems in the Netgear ReadyNas. I will see if I still have
warranty and try to get a new one.

But the other issue scares me a bit ;-)

Here's what I've done so far:

Yesterday I had setup md1 with the four new WD black 2TB disks
~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]
~$ mdadm --readwrite /dev/md

I created md0 with md1 as a linear array
~$ mdadm -C /dev/md0 --force -n1 -l linear /dev/md1

On md0 I created the xfs filesystem
~$ mkfs.xfs -d agcount=7,su=131072,sw=3 /dev/md0

Then I copied everything from the old md9 raid5 with the Samsung 1.5TB
to md0.

Today I shut the server down and mounted the mobo, os hdd, the Samsung
1.5 TB drives from the old md9 hdds and the mythtv recording hdd to the
Norco.
Everything went well. I mounted the expander to the case wall and fixed
the cables to stay in place.

Then I booted up again and created md2 with the four Samsung 1.5TB disks
~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]
~$ mdadm --readwrite /dev/md2

After this I expanded the linear array
~$ mdadm --grow /dev/md0 --add /dev/md2

and the filesystem
~$ xfs_growfs /mnt/media-raid

All this went well too.

But this evening I got 10 emails from mdadm. I've again pastbined
them because I didn't want to add them to this text:
http://pastebin.com/raw.php?i=ftpmfSpv


I wanted to recreate the array
~$ sudo mdadm -A /dev/md1 /dev/sd[abcd]
mdadm: cannot open device /dev/sda: Device or resource busy
mdadm: /dev/sda has no superblock - assembly aborted

Here's the output of blkid:
http://pastebin.com/raw.php?i=5AK0Eia1


 I forgot /var/log/dmesg only contains boot info.  Entries since boot
 are only available via the dmesg command.
 
 ~$ dmesg|sendmail s...@hardwarefreak.com
 
 should email your current dmesg output directly to me with no
 copy/paste required, assuming exim or postfix is installed.  If not
 you can use paste bin again.  I prefer it in email so I can quote
 interesting parts directly, properly.

I'm not sure if you dmesg helps solving this problem too. Unfortunately
I couldn't email it so I created a pastebin:
http://pastebin.com/raw.php?i=2pNf9wGe


  I removed the 2 TB disks from the NAS and mounted them in the Norco
  and connected to the server vio lsi and expander. On these WD
  drives I created the raid5 (md1) and on top of that the linear
  array (md0). Upon creation of md1 the fourth disk (sdd) was added
  as a spare which I had to add manually by setting 
  
  mdadm --readwrite /dev/md1
 
 That's my fault.  Sorry.  I forgot to have you use --force when
 creating the RAID5s.  I overlooked this because I NEVER use md parity
 arrays, nor any parity arrays.  Reason for the spare:
 
 When creating a RAID5 array, mdadm will automatically create a
 degraded array with an extra spare drive. This is because building
 the spare into a degraded array is in general faster than resyncing
 the parity on a non-degraded, but not clean, array. This feature can
 be overridden with the --force option.

Thanks for the explanation and the hint. I will use --force from now
on :-)


  While it was syncing the disks I copied the files from md9 to md0.
  During this proces sdb was set as faulty.
 
 Probably too much IO load with the array sync + file copy.  Regardless
 of what anyone says, wait for md arrays to finish building/syncing
 before trying to put anything on top, whether another md layer,
 filesystem, or files.

I didn't read this before doing all the stuff above. Maybe it would
have saved from some headaches...


  That's why I'm already thinking of buying new disks.
 
  Well lets look at this more closely.  The disks may not be bad.
  How old are they?
 
 You didn't answer.  How old are the 2TB and 1.5TB drives?  What does
 SMART say about /dev/sdb?

Here are the dates I bought the disks:

04.10.2009: 1x Samsung HD154UI
17.02.2010: 3x Samsung HD154UI

12.12.2010: 1x Western Digital Caviar Green 2TB
17.03.2011: 1x Western Digital Caviar Green 2TB
11.08.2011: 2x Western Digital Caviar Green 2TB
01.10.2011: 2x Western Digital Caviar Green 2TB

To be honest I can't remember why I bought 6 of the WDs. But I have sold
at least one of them. The fifth must have disappeared somehow ;-)

I have now stopped md0 and md2 and removed the Samsung and the WD green
drives again. If you want me to post the details of them to I will add
them again. But for now I have here the output of hdparm for the four
drives:
http://pastebin.com/raw.php?i=xcD3mLUA


Maybe the problem now is related to the case because it's again sdb?
Or maybe it's already broken because I didn't cool them while copying
the files and rebuilding the spare drive.


  Yes sorry it's absolutely fine. I was just curious because you wrote
  when the array fills up it gets slower. So I thought when I add
  four new disks I'll get free space added and the linear array won't
  be filled anymore as much 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-16 Thread Ramon Hofer
On Fri, 15 Jun 2012 16:40:56 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 Well lets look at this more closely.  The disks may not be bad.  How
 old are they?  Send me your dmesg output:

Sorry I forgot to write the last time: The WD20EARS I have bough
between 14. Dec 2010 and 01. Oct 2011.

Maybe it was also caused by inappripriate cooling.
I'm copying the things from the old raid md9 to the new linear array
while have the old disks in the old case and still directly attached to
the mobo.
The new raid disks are already in the Nroco case. They're attached over
the expander to the LSI which is in the Asus mobo mounted in the old
case.
The expander and all of the disks are powered from the same PSU which
powers the mobo etc.

I had to do this because the sata cables are too short to mount the
mobo in the Norco. Unfortunately I can't connect the fans from the
Norco because these wires are too short as well. But I thought just to
copy things over and having only these four disks in the Norco it would
be ok :-?

Do you think this could cause the problem?


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120616130453.1d66cdfa@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-15 Thread Stan Hoeppner
On 6/14/2012 9:45 AM, Ramon Hofer wrote:
 On Thu, 14 Jun 2012 08:38:27 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 Couldn't hurt.  And while you're at it, mount with inode64 in your
 fstab immediately after you create the XFS.  You were running with
 inode32, which sticks all the inodes at the front of AG0 causing lots
 of seeks.  Inode64 puts file/dir inodes in the AG where the file gets
 written.  In short, inode64 is more efficient for most workloads.  And
 this is also why getting the agcount correct is so critical with
 tiered linear/striped parity setups such as this.

 When you recreate the XFS use 'agcount=6'.  That's the smallest you
 can go with 2TB disks.  A force will be required since you already
 have an XFS on the device.
 
 Sorry I haven't much time now. I'm invoted to a BBQ and already
 hungry :-)
 
 I just wanted to create the filesystem and start to copy the files.
 
 So I tried and got this warning:
 
 ~$ sudo mkfs.xfs -f -d agcount=6,su=131072,sw=3 /dev/md0
 Warning: AG size is a multiple of stripe width.  This can cause
 performance problems by aligning all AGs on the same disk.  To avoid
 this, run mkfs with an AG size that is one stripe unit smaller, for
 example 244189120.

Grr.  This is another reason it is preferable to create the XFS atop the
linear array with both RAIDs already present, from the beginning, which
would allow the proper 11 AGs, and proper placement of them.

 Should I take this seriously?

This is a valid warning and relates to metadata performance, which is
important for everyday use.  So yeah, you should take it seriously.  So
what you should do now is, instead of making another attempt and
manually setting 7 AGs, just leave out that parm and let mkfs pick the
agcount/agsize on its own.  It will likely choose 7, but it may choose
more.  The fewer the better with 3 slow disks in this RAID5.  mkfs.xfs
doesn't take spindle speed into account, which is why I usually set
parms manually, to best fit the storage hardware.

 Btw: Should I mount every xfs filesystem (also the one for the mythtv
 recordings) with inode64.

Yes.  Especially with XFS atop a linear array.  The inode64 allocator
spreads directory and file metadata, and files relatively evenly across
all AGs, providing better locality between files and their metadata.
This improves performance for most workloads.

Inode32, the default allocator, puts all directory and file metadata in
AG0, so you end up with a hotspot, causing excessive disk seeking on the
first RAID5 (which is where AG0 is) in the linear array.

Inode64 will be the XFS default in the not too distant future.  It would
have been so already, but there are still some key applications in
production, namely some enterprise backup applications, that don't
understand 64bit inode numbers.  This is the only reason inode32 is
still the default.  Note than with 32bit Linux kernels you are limited
to inode32.  So make sure you're running an x64 kernel, which IIRC, you are.

Note that for any XFS filesystem greater than 16TB, you must use the
inode64 allocator as inode32 is limited to 16TB (and again you need an
x64 kernel). In your case you will be continuously expanding your XFS as
you add more 4 drive arrays in the future.  Once you add your 4x1.5TB
drives you'll be at 10.4TB.  When you add 3x4TB drives your XFS will hit
19.5TB.  It's best to already be using inode64 when you go over the 16TB
limit to avoid problems.

 This is not true for the smaller ext4 filesystems I use for the os and
 the home dir I suppose?

No, the inode64 mount option is unique to XFS.  It simply tells the XFS
kernel driver to use the inoe64 code path instead the inode32 code path
for a given XFS filesystem, in essence passing a 0 or 1 to an XFS
variable.  You can mount multiple XFS filesystems on one machine, some
with inode32 and others with inode64.  See:  'man mount'  XFS is waay
down at the bottom.  Note that it's possible, but not advisable, to
change the inodeXX mount option after the filesystem has some age on
it.  Pick the right one from the start and stick with it.  This is
usually inode64.  There are some unique workload cases where a highly
tweaked inode32 filesystem 16TB has a performance advantage, but your
workloads aren't such cases.

And make sure you're using linux-image-3.2.0-0.bpo.2-amd64 so you have
all the latest XFS features and fixes, mainly the delayed logging code
turned on by default.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fdaf519.7020...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-15 Thread Stan Hoeppner
On 6/14/2012 8:02 AM, Ramon Hofer wrote:

 AF drives are Advanced Format drives with more than 512 bytes per
 sector right?

Correct.  Advanced Format is the industry wide name chosen for drives
that have 4096B physical sectors, but present 512B sectors at the
interface level, doing translation internally, transparently.

 I don't trust anybody ;-)

Good for you! :)

 Here's what I was referring to:
 http://www.mythtv.org/docs/mythtv-HOWTO-3.html

 JFS is the absolute best at
 deletion, so you may want to try it if XFS gives you problems.

Interesting.  Lets see:

~$ time dd if=/dev/zero of=myth-test bs=8192 count=512000
512000+0 records in
512000+0 records out
4194304000 bytes (4.2 GB) copied, 50.1455 s, 83.6 MB/s

real0m50.167s
user0m1.560s
sys 0m43.915s

-rw-r--r--  1 root root 4.0G Jun 15 04:52 myth-test

~$ echo 3  /proc/sys/vm/drop_caches
~$ time rm myth-test; sync

real0m0.027s
user0m0.000s
sys 0m0.004s

XFS and the kernel block layer required 4ms to perform the 4GB file
delete.  The disk access required 23ms.  What does this say about the
JFS claim?  I simply don't get the if XFS gives you problems bit.  The
author was obviously nothing close to a filesystem expert.


 
 I additionally found a foum post from four years ago were someone
 states that xfs has problems with interrupted power supply:
 http://www.linuxquestions.org/questions/linux-general-1/xfs-or-jfs-685745/#post3352854

I found a forum post from 4 years ago

Myths, lies, and fairy tales.  There was an XFS bug related to power
fail that was fixed over a year before this forum post was made.  Note
that nobody in that thread posts anything from the authoritative source,
as I do here?

http://www.xfs.org/index.php/XFS_FAQ#Q:_Why_do_I_see_binary_NULLS_in_some_files_after_recovery_when_I_unplugged_the_power.3F

 I only advise XFS if you have any means to guarantee uninterrupted
 power supply. It's not the most resistant fs when it comes to power
 outages.

I advise using a computer only if you have a UPS, no matter what
filesystem you use.  It's incredibe that this guy would make such a
statement, instead of promoting the use of UPS devices.  Abrupt power
loss, or worse, voltage bumping which often accompanies brown
conditions, is not good for any computer equipment, especially PSUs and
mechanical hard drives, regardless of what filesystem one uses.

The only data lost due to power failure is inflight write data.  The
vast majority of that is going to be due to Linux buffer cache.  No
matter what FS you use, if you're writing, especially a large file, when
power dies the write has failed and you've lost that file.  EXT3 was a
bit more resilient to power loss because of a bug, not a design goal.
 The same bug caused horrible performance with some workloads because of
the excessive hard coded syncs.

 I usually don't have blackouts. At least as long that the PC turn off.
 But I don't have a UPS.

Get one.  Best investment you'll ever make computer-wise.  For your
Norco, we'll assume all 20 bays are filled for sizing purposes.   One of
these should be large enough to run your server and your desktop:

http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=BR900G-GRtotal_watts=200
http://www.apc.com/products/resource/include/techspec_index.cfm?base_sku=BR900GItotal_watts=200

(Sorry if I mis guessed your native language as German instead of French
or Italian)  I listed both units as I don't know which power plug
configuration you need.

If these UPS seem expensive, consider the fact that they may continue
working for 20+ years.  I bought my home office APC SU1400RMNET used in
2003 for US $250 ($1000+ new) after it had been in corporate service for
3 years on lease.  It's at least 12 years old and I've been running it
for 9 years continuously.  I've replaced the batteries ($80) twice,
about every 4 years.  Buying this unit used, at a steal of a price, is
one of the best investments I ever made.  I expect it to last at least
another 8 years, if not more.


 I will get better performance if I have the correct parameters.

Yes.

 
 2.  If device is a single level md striped array, AGs=16, unless the
 device size is  16TB.  In that case AGs=device_size/1TB.
 
 A single level md striped array is any linux raid containing disks.
 Like my raid5.

I use single level simply to differentiate from a nested array, which
is multi-level.

 In contrast would be my linear raid containing one or more raids?

This is called a nested array.  The term comes from nested loop in
programming.

 Ok, the chunck (=stripe) 

chunk = strip, not stripe

Chunk and strip are two words for the same thing.  Linux md uses the
term chunk.  LSI and other hardware vendors use the term strip.
They describe the amount of data written to an individual array disk
during a striped write operation.  Stripe is equal to all of the
chunks/strips added together.

E.g.  A 16 disk RAID10 has 8 stripe spindles (8 are mirrors).  Each
spindle has a 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-15 Thread Ramon Hofer
On Thu, 14 Jun 2012 08:38:27 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 6/14/2012 4:51 AM, Ramon Hofer wrote:
 
  These commands don't match the pastebin.  The pastebin shows you
  creating a 4 disk RAID5 as /dev/md0.
  
  Really :-?
 
 That kind of (wrong) analysis is one of the many outcomes of severe
 lack of sleep, too much to do, and not enough time. ;)  Having 3
 response/reply chains going for the same project doesn't help either.
 We share fault on that one:  you sent 3 emails before I replied to the
 first.  I replied to all 3 in succession instead of consolidating all
 3 into one response.  Normally I'd do that.  Here I simply didn't
 have the time.  So in the future, with me or anyone else, please keep
 it to one response/reply. :)  Cuts down on the confusion and overlap
 of thoughts.

Ok I will.
I'm still learning the code of conduct for mailing lists ;-)

I then start right now.


First of all I tried to set the raid5 with the WD 20EARS and didn't
have much luck. They led to fail events when mdadm builds the array.
They worked in my Netgear NV+ with very low r/w rates 5 MB/s (which
I now assume is because of the disks.

That's why I'm already thinking of buying new disks. 

I have found these drives at my local dealer (the prices are in Swiss
Francs).

2 TB:
- Seagate Barracuda 2TB, 7200rpm, 64MB, 2TB, SATA-3 (129.-)
- Seagate ST2000DL004/HD204UI, 5400rpm, 32MB, 2TB, SATA-II (129.-)

3 TB:
- Seagate Barracuda 3TB, 7200rpm, 64MB, 3TB, SATA-3 (179.-)

I think the Seagate Barracuda 3TB are the best value for money and I
didn't find any problems that could prevent me from using them as raid
drives.

Btw. When I tried to set up the WD20EARS mdstat told me that the
syncing would take about 6 hours. Hopefully the Barracudas have at
least the same rate. Then the process would be finished on maybe less
than 9 hours. This seems to be acceptable for my case.


 Also, please note that with 2TB drives, the throughput will decrease
 dramatically as you fill the disks.  If you're copying over 3-4TB of
 files, a write rate of 20-30MB/s at the end of the copy process should
 be expected, as you're now writing to the far inner tracks, which have
 1/8th or so the diameter of the outer tracks.  Aerial density * track
 (cylinder) length * spindle RPM = data rate.  The aerial density and
 RPM are constants.

So if I see low rates in the future I can add a new raid5 and get
higher throughbput again because the linear raid would write first to
the new array?


  Now I only have to setup the details correctly.
  Like the agcount...
 
 Like I said, it may not make a huge difference, at least when the XFS
 is new, fresh.  But at it ages (write/delete/write) over time, the
 wonky agcount could hurt performance badly.  You balked at that
 20MB/s rate which is actually normal.  With XFS parms incorrect, a
 year from now you could be seeing max 50MB/s and min 5MB/s.  Yeah,
 ouch.

Another reason to set it up properly now :-)


  You really were an incredible help!
 
 When I'm not such a zombie that I misread stuff, yeah, maybe a little
 help. ;)

No really. The adventure of enlarging my media server would have ended
in total frustration!



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120615213116.0631a666@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-15 Thread Ramon Hofer
On Fri, 15 Jun 2012 16:40:56 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 6/15/2012 8:36 AM, Ramon Hofer wrote:
 
  First of all I tried to set the raid5 with the WD 20EARS and didn't
  have much luck. They led to fail events when mdadm builds the array.
  They worked in my Netgear NV+ with very low r/w rates 5 MB/s
  (which I now assume is because of the disks.
 
 Ok, I'm confused.  You had stated you currently have 4x2TB disks and
 4x1.5TB disks.  WD20EARS are 2TB disks.  You said you'd already
 created a RAID5 and added to a linear array, then copied a bunch of
 files from the 1.5TB array, these 1.5TB disks presumably in the
 Netgear.  Is this correct?  Is the md RAID5 inside the linear array
 still working?  Which disks is it made of?

Ok, sorry for the confusion.
The four 2 TB WD green were in the Netgear NAS and the four 1.5 TB
Samsung are in the old raid5 (md9).
I removed the 2 TB disks from the NAS and mounted them in the Norco and
connected to the server vio lsi and expander. On these WD drives I
created the raid5 (md1) and on top of that the linear array (md0).
Upon creation of md1 the fourth disk (sdd) was added as a spare which I
had to add manually by setting 

mdadm --readwrite /dev/md1

While it was syncing the disks I copied the files from md9 to md0.
During this proces sdb was set as faulty.


  That's why I'm already thinking of buying new disks.
 
 Well lets look at this more closely.  The disks may not be bad.  How
 old are they?  Send me your dmesg output:
 
 ~$ cp /var/log/dmesg /tmp/dmesg.txt
 
 then email dmesg.txt to me.

I've uploaded dmesg to pastebin hope this is ok.

http://pastebin.com/raw.php?i=dek1wca4


 The WD Black 2TB 7.2k is tested for desktop RAID use (Linux md) and
 has a 5 year warranty, costs $210.  The Seagate Barracuda TX 2TB 7.2k
 is also tested for desktop RAID use, has a 3 year warranty, and costs
 $210.
 
 My advice:  spend more per drive for less capacity and get a 3/5 times
 longer warranty, and a little piece of mind that the drives are
 designed/tested for RAID use and will last at least 5 years, or be
 replaced at no cost for up to 5 years.

Great advice!
I'll go for the WD Black 2TB. I found them for CHF 199.-


  So if I see low rates in the future I can add a new raid5 and get
  higher throughbput again because the linear raid would write first
  to the new array?
 
 I'm not sure what you're asking here.  Adding a new 4 disk RAID5 to
 the linear array doesn't make anything inherently faster.  It simply
 adds capacity.  Your read/write speed on a per file basis will be
 about the same in the new space as in the old.  I explained all of
 this before you made the decision to go with the linear route instead
 of using md reshaping to expand.  You said you understood and that it
 was fine as your performance requirements are low.

Yes sorry it's absolutely fine. I was just curious because you wrote
when the array fills up it gets slower. So I thought when I add four
new disks I'll get free space added and the linear array won't be
filled anymore as much as before and so it could regain it's previous
speed again.

But really not important for my case!
Just curiosity ;-)



 
  No really. The adventure of enlarging my media server would have
  ended in total frustration!
 
 There's still time for frustration--you're not done quite yet.  lol

Yes but now I'm in semi known territory ;-)


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120616041540.0532a794@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-14 Thread Stan Hoeppner
On 6/13/2012 2:22 PM, Ramon Hofer wrote:
 On Tue, 12 Jun 2012 17:30:43 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:

This chain is so long I'm going to liberally snip lots of stuff already
covered.  Hope that's ok.

 Note I stated call.  You're likely to get more/better
 information/assistance speaking to a live person.
 
 I didn't have enough confidence in my oral english :-(

Understood.  Didn't realize that could be an issue.  Apologies for my
'cultural insensitivity. ;)

 This is incorrect advice, as it occurs with the LSI BIOS both enabled
 and disabled.  Apparently you didn't convey this in your email.
 
 I will write it to them again.
 But to be honest I think I'll leave the Supermicro and use it for my
 Desktop.

If you're happy with an Asus+LSI server and SuperMicro PC, and it all
works the way you want, I'd not bother with further troubleshooting either.

 Building md arrays from partitions on disks is a means to an end.  Do
 you have an end that requires these means?  If not, don't use
 partitions.  The biggest reason to NOT use partitions is misalignment
 on advanced format drives.  The partitioning utilities shipped with
 Squeeze, AFAIK, don't do automatic alignment on AF drives.
 
 Ok, I was just confused because most the tutorials (or at least most of
 the ones I found) use partitions over the whole disk...

Most of the md tutorials were written long before AF drives became
widespread, which has been a relatively recent phenomenon, the last 2
years or so.

It seems md atop partitions is recommended by two classes of users:

1.  Ultra cheap bastards who buy drive of the week.
2.  Those who want to boot from disks in an md array

I'd rather not fully explain this due to space.  If you reread your
tutorials and other ones, you'll start to understand.

 If you misalign the partitions, RAID5/6 performance will drop by a
 factor of 4, or more, during RMW operations, i.e. modifying a file or
 directory metadata.  The latter case is where you really take the
 performance hit as metadata is modified so frequently.  Creating md
 arrays from bare AF disks avoids partition misalignment.
 
 So if I can make things simpler I'm happy :-)

Simpler is not always better, but it is most of the time.

The only caveat to using md on bare drives is that all members should
ideally be of identical size.  If they're not, md takes the sector count
of the smallest drive and uses that number of sectors on all the others.
 If you try to add a drive later whose sector count is less, it won't
work.  Drive of the week buyer applies here. ;)

More savvy users don't add drives to and reshape their arrays.  They add
an entire new array, add it to an existing umbrella linear array, then
grow their XFS filesystem over it.  There is zero downtime or degraded
access to current data with this method.  Reshaping runs for a day or
more and data access, especially writes, is horribly slow during the
process.

Misguided souls who measure their array performance exclusively with
single stream 'dd' reads instead of real workload will balk at this
approach.  They're also the crowd that promotes using md over
partitions.  ;)

 You're right.
 I just had the impression that you'd suggested that I'd use the hw raid
 capability of the lsi at the beginning of this conversation.

I did.  And if you could, you should.  And you did HW RAID with the SM
board, but the Debian kernel locks up.  With the Asus board you can't
seem get into the HBA BIOS to configure HW RAID.  So it's really not an
option now.  The main reason for it is automatic rebuild on failure.
But since you don't have dedicated spare drives that advantage goes out
the window.  So md RAID is fine.

 I must have read outdated wikis (mostly from the mythtv project).

Trust NASA more than MythTV users?  From:
http://www.nas.nasa.gov/hecc/resources/columbia.html

Storage
Online: DataDirect Networks® and LSI® RAID, 800 TB (raw)
...
Local SGI XFS

That 800TB is carved up into a handful of multi-hundred TB XFS
filesystems.  It's mostly used for scratch space during sim runs.  They
have a multi-petabyte CXFS filesystem for site wide archival storage.
NASA is but one of many sites with multi-hundred TB XFS filesystems
spanning hundreds of disk drives.

IBM unofficially abandoned GFS on Linux, which is why it hasn't seen a
feature release since 2004.  Enhanced JFS, called JFS2, is proprietary,
and is only available on IBM pSeries servers.

MythTV users running JFS are simply unaware of these facts, and use JFS
because it still works for them, and that's great.  Choice and freedom
are good things.  But if they're stating it's better than XFS they're
hitting the crack pipe too often. ;)

 Translated: Since kernel version 2.6 it's an official part of the
 kernel.
 
 Maybe I misunderstood this sentence in what the writer meant or maybe
 it's even wrong what they wrote in the first place :-?

What they wrote is correct.  JFS has been in Linux mainline since the
release 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-14 Thread Stan Hoeppner
On 6/13/2012 1:04 PM, Ramon Hofer wrote:
 On Tue, 12 Jun 2012 18:35:57 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 Hmm, probably I have to create a raid5 with the four empty 2 TB
 disks attached to the LSI. Then:

 ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1

 WTF?
 
 I also had to add --force to create the array with one raid5.
 
 
 Now I copy the content from the old raid5 with the four 1.5 TB
 disks to the new linear md0.

 Shuffleboard...  You didn't previously make clear that not all 8 disks
 were freely available to build your stack from the ground up.  The
 instructions I gave you assumed that all 8 drives were clean.  Now
 you're attempting to modify the precise instructions I gave you and
 play shuffleboard with your data and disks, attempting to migrate on
 the fly.
 
 Sorry, I wrote this too in another thread :-(
 I like taking some risk :-)
 
 But since the old raid isn't written only read I didn't fear to loose
 the data.

No, no, no risk of losing the data.  The problem is with the resulting
XFS AG layout you got from this procedure, as I mentioned in the other
reply.  Everything will still work, assuming mdadm will allow you to add
another array after having to use --force to create the linear.  I've
never tried creating a linear array with one device before.  If that
works, and the xfs grow works, you should be ok, just with less performance.

 
 This may not have a good outcome.  I guess you feel that you
 understand this stuff and are confident in your ability at this point
 to effect the outcome you desire.  If things break badly, I'll try to
 assist, but I make no promises WRT outcomes nor guarantee my
 availability.
 
 I wanted to finally do something ;-)

I was being a bit dramatic there, frustration showing I guess. ;)  Like
I said, should be ok, if mdadm doesn't puke adding the other array in.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fd9a2b7.6010...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-14 Thread Ramon Hofer
On Thu, 14 Jun 2012 03:29:25 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 6/13/2012 2:22 PM, Ramon Hofer wrote:
  On Tue, 12 Jun 2012 17:30:43 -0500
  Stan Hoeppner s...@hardwarefreak.com wrote:
 
 This chain is so long I'm going to liberally snip lots of stuff
 already covered.  Hope that's ok.

Sure. Your mail still blew my mind :-)


  This is incorrect advice, as it occurs with the LSI BIOS both
  enabled and disabled.  Apparently you didn't convey this in your
  email.
  
  I will write it to them again.
  But to be honest I think I'll leave the Supermicro and use it for my
  Desktop.
 
 If you're happy with an Asus+LSI server and SuperMicro PC, and it all
 works the way you want, I'd not bother with further troubleshooting
 either.

Well the only differences are:

1. Can't enter the LSI BIOS to set up hw raid which I don't need to. So
no problem.

2. I can't see the network activity leds in the front of the case.
Which is a gadget I don't really need. If there are problems I can look
at the mobo leds if there's lan activity. So no problem too.


  Building md arrays from partitions on disks is a means to an end.
  Do you have an end that requires these means?  If not, don't use
  partitions.  The biggest reason to NOT use partitions is
  misalignment on advanced format drives.  The partitioning
  utilities shipped with Squeeze, AFAIK, don't do automatic
  alignment on AF drives.
  
  Ok, I was just confused because most the tutorials (or at least
  most of the ones I found) use partitions over the whole disk...
 
 Most of the md tutorials were written long before AF drives became
 widespread, which has been a relatively recent phenomenon, the last 2
 years or so.

AF drives are Advanced Format drives with more than 512 bytes per
sector right?


  I must have read outdated wikis (mostly from the mythtv project).
 
 Trust NASA more than MythTV users?  From:
 http://www.nas.nasa.gov/hecc/resources/columbia.html

I don't trust anybody ;-)


 Storage
 Online: DataDirect Networks® and LSI® RAID, 800 TB (raw)
 ...
 Local SGI XFS
 
 That 800TB is carved up into a handful of multi-hundred TB XFS
 filesystems.  It's mostly used for scratch space during sim runs.
 They have a multi-petabyte CXFS filesystem for site wide archival
 storage. NASA is but one of many sites with multi-hundred TB XFS
 filesystems spanning hundreds of disk drives.
 
 IBM unofficially abandoned GFS on Linux, which is why it hasn't seen a
 feature release since 2004.  Enhanced JFS, called JFS2, is
 proprietary, and is only available on IBM pSeries servers.
 
 MythTV users running JFS are simply unaware of these facts, and use
 JFS because it still works for them, and that's great.  Choice and
 freedom are good things.  But if they're stating it's better than XFS
 they're hitting the crack pipe too often. ;)

Here's what I was referring to:
http://www.mythtv.org/docs/mythtv-HOWTO-3.html

Filesystems

MythTV creates large files, many in excess of 4GB. You must use a 64 or
128 bit filesystem. These will allow you to create large files.
Filesystems known to have problems with large files are FAT (all
versions), and ReiserFS (versions 3 and 4).

Because MythTV creates very large files, a filesystem that does well at
deleting them is important. Numerous benchmarks show that XFS and JFS
do very well at this task. You are strongly encouraged to consider one
of these for your MythTV filesystem. JFS is the absolute best at
deletion, so you may want to try it if XFS gives you problems. MythTV
incorporates a slow delete feature, which progressively shrinks the
file rather than attempting to delete it all at once, so if you're more
comfortable with a filesystem such as ext3 (whose delete performance
for large files isn't that good) you may use it rather than one of the
known-good high-performance file systems. There are other ramifications
to using XFS and JFS - neither offer the opportunity to shrink a
filesystem; they may only be expanded.

NOTE: You must not use ReiserFS v3 for your recordings. You will get
corrupted recordings if you do.

Because of the size of the MythTV files, it may be useful to plan for
future expansion right from the beginning. If your case and power
supply have the capacity for additional hard drives, read through the
Advanced Partition Formatting sections for some pointers.


So they say it's about the same. But this page must be at least some
years old without any changes at least in this paragraph.

I additionally found a foum post from four years ago were someone
states that xfs has problems with interrupted power supply:
http://www.linuxquestions.org/questions/linux-general-1/xfs-or-jfs-685745/#post3352854

I only advise XFS if you have any means to guarantee uninterrupted
power supply. It's not the most resistant fs when it comes to power
outages.

I usually don't have blackouts. At least as long that the PC turn off.
But I don't have a UPS.


  Ok if I read it right it divides the array 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-14 Thread Ramon Hofer
On Thu, 14 Jun 2012 08:38:27 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 Couldn't hurt.  And while you're at it, mount with inode64 in your
 fstab immediately after you create the XFS.  You were running with
 inode32, which sticks all the inodes at the front of AG0 causing lots
 of seeks.  Inode64 puts file/dir inodes in the AG where the file gets
 written.  In short, inode64 is more efficient for most workloads.  And
 this is also why getting the agcount correct is so critical with
 tiered linear/striped parity setups such as this.
 
 When you recreate the XFS use 'agcount=6'.  That's the smallest you
 can go with 2TB disks.  A force will be required since you already
 have an XFS on the device.

Sorry I haven't much time now. I'm invoted to a BBQ and already
hungry :-)

I just wanted to create the filesystem and start to copy the files.

So I tried and got this warning:

~$ sudo mkfs.xfs -f -d agcount=6,su=131072,sw=3 /dev/md0
Warning: AG size is a multiple of stripe width.  This can cause
performance problems by aligning all AGs on the same disk.  To avoid
this, run mkfs with an AG size that is one stripe unit smaller, for
example 244189120.

Should I take this seriously?


Btw: Should I mount every xfs filesystem (also the one for the mythtv
recordings) with inode64.
This is not true for the smaller ext4 filesystems I use for the os and
the home dir I suppose?


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120614164501.5c36b9f4@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-13 Thread Ramon Hofer
On Tue, 12 Jun 2012 18:35:57 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

  Hmm, probably I have to create a raid5 with the four empty 2 TB
  disks attached to the LSI. Then:
  
  ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1
 
 WTF?

I also had to add --force to create the array with one raid5.


  Now I copy the content from the old raid5 with the four 1.5 TB
  disks to the new linear md0.
 
 Shuffleboard...  You didn't previously make clear that not all 8 disks
 were freely available to build your stack from the ground up.  The
 instructions I gave you assumed that all 8 drives were clean.  Now
 you're attempting to modify the precise instructions I gave you and
 play shuffleboard with your data and disks, attempting to migrate on
 the fly.

Sorry, I wrote this too in another thread :-(
I like taking some risk :-)

But since the old raid isn't written only read I didn't fear to loose
the data.


 This may not have a good outcome.  I guess you feel that you
 understand this stuff and are confident in your ability at this point
 to effect the outcome you desire.  If things break badly, I'll try to
 assist, but I make no promises WRT outcomes nor guarantee my
 availability.

I wanted to finally do something ;-)


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120613200433.3fc9ce78@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-13 Thread Ramon Hofer
On Tue, 12 Jun 2012 17:30:43 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 6/12/2012 8:40 AM, Ramon Hofer wrote:
  On Sun, 10 Jun 2012 17:30:08 -0500
  Stan Hoeppner s...@hardwarefreak.com wrote:
 
  Try the Wheezy installer.  Try OpenSuSE.  Try Fedora.  If any of
  these work without lockup we know the problem is Debian 6.
  However...
  
  I didn't do this because it the LSI worked with the Asus mobo and
  Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
  But I will give it another try...
 
 Your problem may involve more than just the two variables.  The
 problem may be mobo+LSI+distro_kernel, not just mobo+LSI.  This is
 why I suggested trying to install other distros.

Aha, this is true - didn't think about this...


  Please call LSI support before you attempt any additional
  BIOS/firmware updates.
 
 Note I stated call.  You're likely to get more/better
 information/assistance speaking to a live person.

I didn't have enough confidence in my oral english :-(


  It sounds like the issue is related to the bootstrap, so either to
  resolve the issue you will have to free up the option ROM space or
  limit the number of devices during POST.
 
 This is incorrect advice, as it occurs with the LSI BIOS both enabled
 and disabled.  Apparently you didn't convey this in your email.

I will write it to them again.
But to be honest I think I'll leave the Supermicro and use it for my
Desktop.


(...) 

  Nono, I was aware that I can have several RAID arrays.
  My initial plan was to use four disks with the same size and have
  several RAID5 devices. 
 
 This is what you should do.  I usually recommend RAID10 for many
 reasons, but I'm guessing you need more than half of your raw storage
 space.  RAID10 eats 1/2 of your disks for redundancy.  It also has the
 best performance by far, and the lowest rebuild times by far.  RAID5
 eats 1 disk for redundancy, RAID6 eats 2.  Both are very slow compared
 to RAID10, and both have long rebuild times which increase severely as
 the number of drives in the array increases.  The drive rebuild time
 for RAID10 is the same whether your array has 4 disks or 40 disks.

Yes, I think for me raid5 is sufficient. I don't need extreme
performance nor extreme security. I just hope that the raid5 setup will
be enough safe :-)


 If you're more concerned with double drive failure during rebuild (not
 RESHAPE as you stated) than usable space, make 4 drive RAID10 arrays
 or 4 drive RAID6s, again, without partitions, using the command
 examples I provided as a guide.

Well this is just multimedia data stored on this server. So if I loose
it it won't kill me :-)


  Is there some documentation why partitions aren't good to use?
  I'd like to learn more :-)
 
 Building md arrays from partitions on disks is a means to an end.  Do
 you have an end that requires these means?  If not, don't use
 partitions.  The biggest reason to NOT use partitions is misalignment
 on advanced format drives.  The partitioning utilities shipped with
 Squeeze, AFAIK, don't do automatic alignment on AF drives.

Ok, I was just confused because most the tutorials (or at least most of
the ones I found) use partitions over the whole disk...


 If you misalign the partitions, RAID5/6 performance will drop by a
 factor of 4, or more, during RMW operations, i.e. modifying a file or
 directory metadata.  The latter case is where you really take the
 performance hit as metadata is modified so frequently.  Creating md
 arrays from bare AF disks avoids partition misalignment.

So if I can make things simpler I'm happy :-)


  Does it work as well with hw RAID devices from the LSI card?
 
 Your LSI card is an HBA with full RAID functions.  It is not however a
 full blown RAID card--its ASIC is much lower performance and it has no
 cache memory.  For RAID1/10 it's probably a toss up at low disk counts
 (4-8).  At higher disk counts, or with parity RAID, md will be faster.
 But given your target workloads you'll likely not notice a difference.

You're right.
I just had the impression that you'd suggested that I'd use the hw raid
capability of the lsi at the beginning of this conversation.


  Then make a write aligned XFS filesystem on this linear device:
 
  ~$ mkfs.xfs -d agcount=11 su=131072,sw=3 /dev/md2
  
  Are there similar options for jfs?
 
 Dunno.  Never used as XFS is superior in every way.  JFS hasn't seen a
 feature release since 2004.  It's been in bug fix only mode for 8
 years now.  XFS has a development team of about 30 people working at
 all the major Linux distros, SGI, and IBM, yes, IBM.  It has seen
 constant development since it's initial release on IRIX in 1994 and
 port to Linux in the early 2000s.

I must have read outdated wikis (mostly from the mythtv project).


  Especially because I read in wikipedia that xfs is
  integrated in the kernel and to use jfs one has to install
  additional packages.
 
 You must have misread something.  The JFS driver was still in mainline
 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-12 Thread Ramon Hofer
On Sun, 10 Jun 2012 17:30:08 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 6/10/2012 9:00 AM, Ramon Hofer wrote:
  A situation update: Mounted the mobo with the CPU and RAM, attached
  the PSU, the OS SATA disk, the LSI and expander as well as the
  graphics card. There are no disks attached to the expander because
  I put them again into the old NAS and backing up the data from the
  1.5 TB disks to it.
  
  Then I installed Debian Squeeze AMD64 without problems. I don't have
  the over-current error messages anymore :-)
  But it still hangs at the same time as before.
 
 Try the Wheezy installer.  Try OpenSuSE.  Try Fedora.  If any of these
 work without lockup we know the problem is Debian 6.  However...

I didn't do this because it the LSI worked with the Asus mobo and
Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
But I will give it another try...


 Please call LSI support before you attempt any additional
 BIOS/firmware updates.

I mailed them and got this answer:

Unfortunately, the system board has not been qualified on the hardware
compatibility list for the LSI MegaRAID 9240 series controllers. There
could be any number of reason for this, either it has not yet been
tested or did not pass testing, but the issue is likely an
incompatibility.

It sounds like the issue is related to the bootstrap, so either to
resolve the issue you will have to free up the option ROM space or
limit the number of devices during POST.

This is what you've already told me.
If I understand it right you already told me to try both: free up the
option ROM and limit the number of devices, right?


(...)

  Thanks again very much.
  The air flow / cooling argument is very convincing. I haven't
  thought about that.
 
 Airflow is 80% of the reason the SAS and SATA specifications were
 created.

You've convinced me: I will mount the expander properly to the case :-)


  It was the P7P55D premium.
  
  The only two problems I have with this board is that I'd have to
  find the right BIOS settings to enable the LSI online setting
  program (or how is it called exactly?) where one can set up the
  disks as JBOD / HW RAID.
 
 I already told you how to do this with the C7P67.  Read the P7P55D
 manual, BIOS section.  There will be a similar parameter to load the
 BIOS ROMs of add in cards.

Ok, thanks!


  Sorry I don't understand what you mean by don't put partitions on
  your mdraid devices before creating the array.
  Is it wrong to partition the disks and the do mdadm --create
  --verbose /dev/md0 --auto=yes --level=6
  --raid-devices=4 /dev/sda1.1 /dev/sdb1.1 /dev/sdc1.1 /dev/sdd1.1?
  
  Should I first create an empty array with mdadm --create
  --verbose /dev/md0 --auto=yes --level=6 --raid-devices=0
  
  And then add the partitions?
 
 Don't partition the drives before creating your md array.  Don't
 create partitions on it afterward.  Do not use any partitions at
 all.  They are not needed.  Create the array from the bare drive
 device names.  After the array is created format it with your
 preferred filesystem, such as:
 
 ~$ mkfs.xfs /dev/md0

Ok understood. RAID arrays containing partitions are bad.


  Hmm, that's a very hard decision.
  You probably understand that I don't want to buy 20 3 TB drives
  now. And still I want to be able to add some 3 TB drives in the
  future. But at
 
 Most novices make the mistake of assuming they can only have one md
 RAID device on the system, and if they add disks in the future they
 need to stick them into that same md device.  This is absolutely not
 true, and it's not a smart thing to do, especially if it's a parity
 array that requires a reshape, which takes dozens of hours.
 Instead...

Nono, I was aware that I can have several RAID arrays.
My initial plan was to use four disks with the same size and have
several RAID5 devices. But Cameleon from the debian list told me to not
use such big disks (500 GB) because reshaping takes too long and
another failure during reshaping will kill the data. So she proposed to
use 500 GB partitions and RAID6 with them.

Is there some documentation why partitions aren't good to use?
I'd like to learn more :-)


  the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
  TB).
 
 You create two 4 drive md RAID5 arrays, one composed of the four
 identical 1.5TB drives and the other composed of the four identical
 2TB drives.  Then concatenate the two arrays together into an md
 --linear array, similar to this:
 
 ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]  -- 2.0TB drives

May I ask what the -c 128 option means? The mdadm man page says that -c
is to specify the config file?


 ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]  -- 1.5TB drives
 ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]

This is very interesting. I didn't know that this is possible :-o
Does it work as well with hw RAID devices from the LSI card?
Since you tell me that RAIDs with partitions aren't wise I'm thinking
about 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-12 Thread Jeremy T. Bouse
On 06/12/2012 09:40 AM, Ramon Hofer wrote:
 On Sun, 10 Jun 2012 17:30:08 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 On 6/10/2012 9:00 AM, Ramon Hofer wrote:
 A situation update: Mounted the mobo with the CPU and RAM, attached
 the PSU, the OS SATA disk, the LSI and expander as well as the
 graphics card. There are no disks attached to the expander because
 I put them again into the old NAS and backing up the data from the
 1.5 TB disks to it.

 Then I installed Debian Squeeze AMD64 without problems. I don't have
 the over-current error messages anymore :-)
 But it still hangs at the same time as before.

 Try the Wheezy installer.  Try OpenSuSE.  Try Fedora.  If any of these
 work without lockup we know the problem is Debian 6.  However...
 
 I didn't do this because it the LSI worked with the Asus mobo and
 Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
 But I will give it another try...
 
 
 Please call LSI support before you attempt any additional
 BIOS/firmware updates.
 
 I mailed them and got this answer:
 
 Unfortunately, the system board has not been qualified on the hardware
 compatibility list for the LSI MegaRAID 9240 series controllers. There
 could be any number of reason for this, either it has not yet been
 tested or did not pass testing, but the issue is likely an
 incompatibility.
 
 It sounds like the issue is related to the bootstrap, so either to
 resolve the issue you will have to free up the option ROM space or
 limit the number of devices during POST.
 
 This is what you've already told me.
 If I understand it right you already told me to try both: free up the
 option ROM and limit the number of devices, right?
 
 
 (...)
 
 Thanks again very much.
 The air flow / cooling argument is very convincing. I haven't
 thought about that.

 Airflow is 80% of the reason the SAS and SATA specifications were
 created.
 
 You've convinced me: I will mount the expander properly to the case :-)
 
 
 It was the P7P55D premium.

 The only two problems I have with this board is that I'd have to
 find the right BIOS settings to enable the LSI online setting
 program (or how is it called exactly?) where one can set up the
 disks as JBOD / HW RAID.

 I already told you how to do this with the C7P67.  Read the P7P55D
 manual, BIOS section.  There will be a similar parameter to load the
 BIOS ROMs of add in cards.
 
 Ok, thanks!
 
 
 Sorry I don't understand what you mean by don't put partitions on
 your mdraid devices before creating the array.
 Is it wrong to partition the disks and the do mdadm --create
 --verbose /dev/md0 --auto=yes --level=6
 --raid-devices=4 /dev/sda1.1 /dev/sdb1.1 /dev/sdc1.1 /dev/sdd1.1?

 Should I first create an empty array with mdadm --create
 --verbose /dev/md0 --auto=yes --level=6 --raid-devices=0

 And then add the partitions?

 Don't partition the drives before creating your md array.  Don't
 create partitions on it afterward.  Do not use any partitions at
 all.  They are not needed.  Create the array from the bare drive
 device names.  After the array is created format it with your
 preferred filesystem, such as:

 ~$ mkfs.xfs /dev/md0
 
 Ok understood. RAID arrays containing partitions are bad.
 
 
 Hmm, that's a very hard decision.
 You probably understand that I don't want to buy 20 3 TB drives
 now. And still I want to be able to add some 3 TB drives in the
 future. But at

 Most novices make the mistake of assuming they can only have one md
 RAID device on the system, and if they add disks in the future they
 need to stick them into that same md device.  This is absolutely not
 true, and it's not a smart thing to do, especially if it's a parity
 array that requires a reshape, which takes dozens of hours.
 Instead...
 
 Nono, I was aware that I can have several RAID arrays.
 My initial plan was to use four disks with the same size and have
 several RAID5 devices. But Cameleon from the debian list told me to not
 use such big disks (500 GB) because reshaping takes too long and
 another failure during reshaping will kill the data. So she proposed to
 use 500 GB partitions and RAID6 with them.
 
 Is there some documentation why partitions aren't good to use?
 I'd like to learn more :-)
 
 
 the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
 TB).

 You create two 4 drive md RAID5 arrays, one composed of the four
 identical 1.5TB drives and the other composed of the four identical
 2TB drives.  Then concatenate the two arrays together into an md
 --linear array, similar to this:

 ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]  -- 2.0TB drives
 
 May I ask what the -c 128 option means? The mdadm man page says that -c
 is to specify the config file?
 
 
 ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]  -- 1.5TB drives
 ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]
 
 This is very interesting. I didn't know that this is possible :-o
 Does it work as well with hw RAID devices from the LSI card?
 Since you tell me that 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-12 Thread Ramon Hofer
On Sun, 10 Jun 2012 17:30:08 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

(...)

 You create two 4 drive md RAID5 arrays, one composed of the four
 identical 1.5TB drives and the other composed of the four identical
 2TB drives.  Then concatenate the two arrays together into an md
 --linear array, similar to this:
 
 ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]  -- 2.0TB drives
 ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]  -- 1.5TB drives
 ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]

Sorry I have another question to this procedure:

Can I put the raid5 from the old server which was attached over sata
to the LSI and mdadm will still recognize the disks? Will the disks
uuids be the same?

And when I have added the old raid5 which contains the data can I add
this to the linear array and still have the data or will it be lost?

Hmm, probably I have to create a raid5 with the four empty 2 TB disks
attached to the LSI. Then:

~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1

Now I copy the content from the old raid5 with the four 1.5 TB disks to
the new linear md0.


Cheers
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: 
http://lists.debian.org/20120612190518.6d6fb1d5@hoferr-x61s.hofer.rummelring



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-12 Thread Stan Hoeppner
On 6/12/2012 8:40 AM, Ramon Hofer wrote:
 On Sun, 10 Jun 2012 17:30:08 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:

 Try the Wheezy installer.  Try OpenSuSE.  Try Fedora.  If any of these
 work without lockup we know the problem is Debian 6.  However...
 
 I didn't do this because it the LSI worked with the Asus mobo and
 Debian squeeze. And because I couldn't install OpenSuSE nor Fedora.
 But I will give it another try...

Your problem may involve more than just the two variables.  The problem
may be mobo+LSI+distro_kernel, not just mobo+LSI.  This is why I
suggested trying to install other distros.

 Please call LSI support before you attempt any additional
 BIOS/firmware updates.

Note I stated call.  You're likely to get more/better
information/assistance speaking to a live person.

 It sounds like the issue is related to the bootstrap, so either to
 resolve the issue you will have to free up the option ROM space or
 limit the number of devices during POST.

This is incorrect advice, as it occurs with the LSI BIOS both enabled
and disabled.  Apparently you didn't convey this in your email.

 This is what you've already told me.
 If I understand it right you already told me to try both: free up the
 option ROM and limit the number of devices, right?

No, this person is not talented.  You only have one HBA with BIOS to
load.  There should be plenty of free memory in the ROM pool area.  This
is the case with any mobo.  The LSI ROM is big, but not _that_ big as to
eat up all available space.  Please don't ask me to explain how option
(i.e. add in card) ROMs are mapped into system memory.  That information
is easily found on Wikipedia and in other places.  My point here is that
the problem isn't related to insufficient space for mapping ROMs.

 You've convinced me: I will mount the expander properly to the case :-)

There are many SAS expander that can only be mounted to the chassis,
such as this one:

http://www.hellotrade.com/astek-corporation/serial-attached-scsi-expanders-sas-expander-add-in-card.html

 Ok understood. RAID arrays containing partitions are bad.

Not necessarily.  It depends on the system.  In your system they'd serve
not purpose, and simply complicate your storage stack.

 Nono, I was aware that I can have several RAID arrays.
 My initial plan was to use four disks with the same size and have
 several RAID5 devices. 

This is what you should do.  I usually recommend RAID10 for many
reasons, but I'm guessing you need more than half of your raw storage
space.  RAID10 eats 1/2 of your disks for redundancy.  It also has the
best performance by far, and the lowest rebuild times by far.  RAID5
eats 1 disk for redundancy, RAID6 eats 2.  Both are very slow compared
to RAID10, and both have long rebuild times which increase severely as
the number of drives in the array increases.  The drive rebuild time for
RAID10 is the same whether your array has 4 disks or 40 disks.

 But Cameleon from the debian list told me to not
 use such big disks (500 GB) because reshaping takes too long and
 another failure during reshaping will kill the data. So she proposed to
 use 500 GB partitions and RAID6 with them.

I didn't read the post you refer to, but I'm guessing you misunderstood
what Camaleón stated, as such a thing is simply silly.  Running multiple
md arrays on the same set of disks is also silly, and can be detrimental
to performance.  For a deeper explanation of this see my recent posts to
the Linux-RAID list.

If you're more concerned with double drive failure during rebuild (not
RESHAPE as you stated) than usable space, make 4 drive RAID10 arrays or
4 drive RAID6s, again, without partitions, using the command examples I
provided as a guide.

 Is there some documentation why partitions aren't good to use?
 I'd like to learn more :-)

Building md arrays from partitions on disks is a means to an end.  Do
you have an end that requires these means?  If not, don't use
partitions.  The biggest reason to NOT use partitions is misalignment on
advanced format drives.  The partitioning utilities shipped with
Squeeze, AFAIK, don't do automatic alignment on AF drives.

If you misalign the partitions, RAID5/6 performance will drop by a
factor of 4, or more, during RMW operations, i.e. modifying a file or
directory metadata.  The latter case is where you really take the
performance hit as metadata is modified so frequently.  Creating md
arrays from bare AF disks avoids partition misalignment.

There have been dozens, maybe hundreds, of articles and blog posts
covering this issue, so I won't elaborate further.

 the moment I have four Samsung HD154UI (1.5 TB) and four WD20EARS (2
 TB).

 You create two 4 drive md RAID5 arrays, one composed of the four
 identical 1.5TB drives and the other composed of the four identical
 2TB drives.  Then concatenate the two arrays together into an md
 --linear array, similar to this:

 ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]  -- 2.0TB drives
 
 May I ask what the 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-12 Thread Stan Hoeppner
On 6/12/2012 12:05 PM, Ramon Hofer wrote:
 On Sun, 10 Jun 2012 17:30:08 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 (...)
 
 You create two 4 drive md RAID5 arrays, one composed of the four
 identical 1.5TB drives and the other composed of the four identical
 2TB drives.  Then concatenate the two arrays together into an md
 --linear array, similar to this:

 ~$ mdadm -C /dev/md1 -c 128 -n4 -l5 /dev/sd[abcd]  -- 2.0TB drives
 ~$ mdadm -C /dev/md2 -c 128 -n4 -l5 /dev/sd[efgh]  -- 1.5TB drives
 ~$ mdadm -C /dev/md0 -n2 -l linear /dev/md[12]
 
 Sorry I have another question to this procedure:
 
 Can I put the raid5 from the old server which was attached over sata
 to the LSI and mdadm will still recognize the disks? Will the disks
 uuids be the same?

Assuming you create it again with the same device order and parameters,
yes, it should work.  You _need_ to ask for assistance with this on the
linux-raid list.  I've never done it.  People there have, and can
explain in much better.  It is not a simple process for anyone who has
not done it, and is fraught with pitfalls.

 And when I have added the old raid5 which contains the data can I add
 this to the linear array and still have the data or will it be lost?

No, you cannot add it to the linear array without wiping it first.
Beyond that, if the drive count is not 4, or not RAID level 5, or the
other parameters are different, then you can't use it in the linear
array.  As I already mentioned, all the RAID parameters but for disk
size must be the same.

 Hmm, probably I have to create a raid5 with the four empty 2 TB disks
 attached to the LSI. Then:
 
 ~$ mdadm -C /dev/md0 -n1 -l linear /dev/md1

WTF?

 Now I copy the content from the old raid5 with the four 1.5 TB disks to
 the new linear md0.

Shuffleboard...  You didn't previously make clear that not all 8 disks
were freely available to build your stack from the ground up.  The
instructions I gave you assumed that all 8 drives were clean.  Now
you're attempting to modify the precise instructions I gave you and play
shuffleboard with your data and disks, attempting to migrate on the fly.

This may not have a good outcome.  I guess you feel that you understand
this stuff and are confident in your ability at this point to effect the
outcome you desire.  If things break badly, I'll try to assist, but I
make no promises WRT outcomes nor guarantee my availability.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fd7d25d.80...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-10 Thread Ramon Hofer
A situation update: Mounted the mobo with the CPU and RAM, attached the
PSU, the OS SATA disk, the LSI and expander as well as the graphics
card. There are no disks attached to the expander because I put them
again into the old NAS and backing up the data from the 1.5 TB disks to
it.

Then I installed Debian Squeeze AMD64 without problems. I don't have
the over-current error messages anymore :-)
But it still hangs at the same time as before.

I removed the LSI and installed the pbo kernel. Mounted the LSI again
and it stopps again at the same place.

I tried the BIOS settings you described earlier. It didn't help too.

So I wanted to update the BIOS. So I created a FreeDOS usb stick and
put the BIOS update files onto it. I got to the DOS prompt ran the
command to install the BIOS (ami.bat ROM.FILE). The prompt was blocked
for some time (about 5-10 mins or even more). Then a message was shown
that the file couldn't be found.
The whole directory where I put the BIOS update file into was empty or
even deleted completely (I can't remember anymore).

I'll try it again afterwards maybe the Supermicro doesn't like my
FreeDOS usb stick. So I'll try it with the Win program Supermicro
proposed [1] to create the usb stick.

If this doesn't help I'll contact LSI and if they want me to update the
BIOS I will ask my dealer again to do it. Probably they will have the
same problems and will have to send the mobo to Supermicro which will
take a month until I have it back :-/


[1]
http://www.softpedia.com/get/System/Boot-Manager-Disk/BootFlashDOS.shtml


On Fri, 08 Jun 2012 18:38:24 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

(...)

 I always do this when I build the system so I don't have to mess with
 it when I need to install more HBAs/cards later.  It's a 10 minute
 operation for me so it's better done up front.  In you case I
 understand the desire to wait until necessary.  However, the better
 airflow alone makes with worth doing.  Especially given that the
 heatsink on the 9240 needs good airflow.  If it runs hot it might act
 goofy, such as slow data transfer speeds, lockups, etc.

Thanks again very much.
The air flow / cooling argument is very convincing. I haven't thought
about that.

To mount the expander I'll probably have a month available until
the mobo is back ;-)


  Yes and the fact that I didn't have any problems with the Asus
  board. I could use LSI RAID1 to install Debian (couldn't boot
  probably because the option RAM option of the Asus board was
  disabled). I could also use the JBOD drives to set up a linux RAID.
  But I didn't mention it before the throughput was very low (100
  GB/s at the beginning and after some secs/min it went down to ~5
  GB/s) when I copied recordings from a directly attached WD green 2
  TB SATA disk to the linux RAID5 containing 4 JBOD drives attached
  to the expander and the LSI.
  
  I hope this was a problem I caused and not the hardware :-/
 
 Too early to tell.  You were probably copying through Gnome/KDE
 desktop. Could have been other stuff slow it down, or it could have
 been something to do with the Green drive.  They are not known for
 high performance, and people have had lots of problems with them.

Probably the green drives.
I don't have a desktop environment installed on the server. It was done
using `rsync -Pha`.
But it could also be because I've split the RAM from the running server
to have some for the new server. That's why now the running old Asus
server has only 2 GB RAM and on the Supermicro I mounted the other 2 GB
RAM stick (but when the disks are set up I'd like to put some more in).


(...)

  Exactly and the Asus doesn't. So if you'd have told me get another
  mobo this would be a option I'd liked to have :-)
  
  An other option I was thinking of was using the Asus board for the
  new server and use the Supermicro for my new desktop. And not the
  other way around as I had planned to do.
 
 That's entirely up to you.  I have no advice here.
 
 Which Asus board is it again?

It was the P7P55D premium.

The only two problems I have with this board is that I'd have to find
the right BIOS settings to enable the LSI online setting program (or
how is it called exactly?) where one can set up the disks as JBOD / HW
RAID.

And that it doesn't have any chassis LAN LED connectors :-o
But this is absolutely not important...


(...)

  Btw. I saw that the JBOD devices which are seen by Debian from the
  are e.g. /dev/sda1, /dev/sdb1. When I partition them I get something
  like /dev/sda1.1, /dev/sda1.2, /dev/sdb1.1, /dev/sdb1.2 (I don't
  remember exactly if it's only a number behind the point because I
  think it had a prefix containing one or two character before the
  number after the point).
 
 I'd have so see more of your system setup.  This may be normal
 depending on how/when your mobo sata controller devices are
 enumerated.

Probably yes. I was just confused because it was not consistent with
how Debian names the normal drives and 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-07 Thread Stan Hoeppner
On 6/6/2012 9:36 AM, Ramon Hofer wrote:
 It's me again.
 
 After several unsuccessful tries to update the BIOS I brought it back to
 my dealer to let him do it.
 He now says that the mainboard is broken and I get my money back.

Interesting...

 Now my question is should I go for the same mainboard again or what do
 you recommend?

What did I previously recommend in this regard?  I don't recall
recommending you replace the mainboard/CPU.

 I suppose the LSI problem was due to the broken mainboard but the

Maybe, maybe not.  Too early to tell.

 dealer also said that the LSI has the C7P67 not listed as a compatible
 board.

He's simply giving himself an excuse to not help you with your problem.
 Ask him how many mobo+GPU combos he's sold that are listed as
compatible.  The answer will be none, because nobody does that kind of
testing.  Then ask how many don't work together.  His answer will
likely be none.  This quashes his compatibility list excuse instantly.

You never did call LSI.  When you finally do, as I suggested long ago,
they'll also tell you it's not listed.  But they will then tell you it
doesn't matter, and that the two boards should work fine together.

There are over 10,000 different motherboards on the market.  Nobody
tests against them all, not even close to 1/4th.  And in the case of
LSI, they don't test against any board that is not marketed as server
or workstation, as that is their target market.  Your board is
neither.  The fact that the C7P67 isn't listed has nothing to do with
whether it's compatible with the 9240.  The fact it isn't listed is
simply that they chose not to test it because it's a desktop board.

 What I want to connect to the mainboard is:
 
 2x PCIe x8 for the LSI and the expander

Again, you don't need a PCIe slot for the expander.  If you're not
mechanically gifted and are unable to drill holes and screw it to your
chassis using standoffs, which is the preferred mounting type, simply
wrap it up in a non conductive material, such as bubble wrap, tape it
closed with a few wraps of electrical tape, and lay it where there's a
relatively empty space on the floor of the chassis, such as behind the
drive cage.  This is ugly but will work, and free up a PCIe x4/x8 slot.
 I wish you lived near me, as I'd come over install the expander
correctly, on the chassis floor, wall, top panel, PSU housing, or drive
cage, in less than 30 minutes.  And it would look like it was installed
at the case factory.

 1x PCIe x1 for the graphics card
 1-2x PCIe x1 for TeVii sat card(s)
 1-2x PCI for PVR-500 analogue TV card(s)

I'd get another C7P67.  There's no reason the LSI shouldn't work with
it.  If it doesn't work off the bat with the replacement C7P67 then it's
certain we have a problem with the Debian kernel driver, or that the
wrong one is being loaded.  You've not tried mpt2sas yet, only
megaraid_sas, which Debian loads automatically.

 It would be nice if it had a connector for the lan chassis LEDs :-)

LAN chassis?  I thought you had a 24 bay rack chassis?  The drive cage
LEDs should be powered directly from the SAS/SATA pins on the back of
the drive, through the backplane.  If not, then you've got a really
cheap 24 bay case. :(

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fd05941.9080...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-06-06 Thread Ramon Hofer
It's me again.

After several unsuccessful tries to update the BIOS I brought it back to
my dealer to let him do it.
He now says that the mainboard is broken and I get my money back.

Now my question is should I go for the same mainboard again or what do
you recommend?
I suppose the LSI problem was due to the broken mainboard but the
dealer also said that the LSI has the C7P67 not listed as a compatible
board.

What I want to connect to the mainboard is:

2x PCIe x8 for the LSI and the expander
1x PCIe x1 for the graphics card
1-2x PCIe x1 for TeVii sat card(s)
1-2x PCI for PVR-500 analogue TV card(s)

It would be nice if it had a connector for the lan chassis LEDs :-)


Best regards
Ramon


On Wed, 30 May 2012 17:38:01 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 5/30/2012 4:52 PM, Ramon Hofer wrote:
  On Tue, 29 May 2012 20:49:32 -0500
  Stan Hoeppner s...@hardwarefreak.com wrote:
  
  On 5/29/2012 7:09 AM, Ramon Hofer wrote:
  On Sun, 20 May 2012 21:37:19 -0500
  Stan Hoeppner s...@hardwarefreak.com wrote:
 
  (...)
 
  Does the mobo BIOS show the disk device?  If not, does the 9240
  BIOS show the disk device, RAID level, and its size?
 
  What we need to figure out is whether this is a BIOS problem at
  this point or a Debian installer kernel driver problem.
 
  I have finally found some time to work on the problem:
 
  I set up a raid1 in the hba bios. I couldn't install onto it with
  the supermicro mb.
 
  Then I mounted the lsi hba into my old server with an Asus mb
  (can't remember which one it is, must have to check it at
  home...). It (almost) works like a charm.
  The only issue is that I can't enter the hba BIOS when it's
  mounted in the Asus mb. But when I put it back into the
  Supermicro mb I can access it again. Very strange!
 
  This behavior isn't strange.  Just about every mobo BIOS has an
  option to ignore or load option ROMs.  On your SuperMicro board
  this is controlled by the setting AddOn ROM Display Mode under
  the Boot Feature menu.  Your ASUS board likely has a similar
  feature that is currently disabled, preventing the LSI option ROM
  from being loaded.
  
  Very interesting! I didn't know that.
  The values I can choose for the AddOn ROM Display Mode are
  Keep current and Force Bios. I have chosen the Force Bios
  option. And I have disable the two options you describe below.
  In the supermicro the hba's init screen isn't displayed at all now.
  On the other hand in the asus I saw the init screen when the
  attached discs are listed I just can't enter the configuration
  program with ctrl+h although the message to press these keys is
  shown.
  
  I'm now able to boot into the 2.6.32-5 kernel.
  It takes quite a while until the megasas module was loaded (I
  suppose: the over-current messages are shown for a while ~2 mins
  and then it's boot normally until the login prompt.
  When I leave it alone I get the message:
  
  INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
  echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
  message.
  
  After booting the first time this evening I installed the bpo 3.2
  kernel.
  When I try to reboot the stable kernel the system hangs after the
  message Will now restart.
  
  After a while the above message about the blocked task appears
  again.
  
  The bpo kernel 3.2 seems to fail. The two over current-messages are
  shown and then this message:
  http://pastebin.com/raw.php?i=XqVunR9e
  
  
  When I load the stable kernel it stop for a while again after the
  over-current message then finally gets to the login prompt. After a
  while I got this message:
  http://pastebin.com/raw.php?i=w409KaFN
  
  
  But apart from that I could install Debian onto the raid1. Then I
  set
 
  This was on the ASUS board correct?  Were you able to boot the
  RAID1 device after install?  If so this indeed would be strange as
  you should not be able to boot from the HBA if its ROM isn't
  loaded.
  
  No I wasn't able to boot the kernel installed to the RAID1. Grub was
  loaded but only because I've installed it to the disk directly
  attached to the MB's SATA controller.
  But when choosing the RAID1 kernel it stopped (can't remember the
  message anymore). I thought I haven't set the boot option for the
  raid1 in the hba bios properly.
  
  
  the bios to use the disks as jbods and installed Debian gain to a
  drive directly attached to the mb sata controller.
  With the original squeeze kernel the disks attached to the hba
  weren't visible. But after updating to the bpo kernel I can fdisk
  them separately and put it into a raid5 (in the end I want to
  apply the 500G partition method Cameleon suggested).
 
  This experience with the ASUS board leads me to wonder if disabling
  the option ROM and INT19 on the SM board would allow everything to
  function properly.  Try that before you take the board to the
  dealer for flashing.  Assuming you've deleted any BIOS configured
  RAID devices in the HBA BIOS 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-30 Thread Camaleón
On Tue, 29 May 2012 14:09:27 +0200, Ramon Hofer wrote:

 Did you already flash the C7P67 BIOS to the latest version?  I can't
 recall.
 
 I have tried to do that but it was quite strange. I created a freedos
 usb stick with unetbootin and copied the files for the update from
 supermicro into the stick. I did exactly what the readmes told me. But
 when I did it the first time there was no output of the flash process
 and the directory where the supermicro files were located on the stick
 was empty. When I tried to do the procedure again it complains that I
 have to first install version 1.

BIOS revisions are not accumulative (or so it was when computers were 
computers -in the true sense of the word- back in the old good days...), 
that is, every version patches some aspects of the BIOS firmware code and 
you need to apply whatever revision you need that corrects your specific 
problem. It can be normal that a BIOS revision requires a previous 
version to be installed first but you better ask this to Supermicro tech 
support so they confirm this point because every manufacturer has 
developed its own tricks.

 I will now bring it to my dealer who can do the BIOS update for me.
 
 And I will write to Supermicro if they are aware of the issue.

Yes, contact them and they'll tell you how to proceed.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jq5ckb$l92$9...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-30 Thread Ramon Hofer
On Tue, 29 May 2012 20:49:32 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 5/29/2012 7:09 AM, Ramon Hofer wrote:
  On Sun, 20 May 2012 21:37:19 -0500
  Stan Hoeppner s...@hardwarefreak.com wrote:
  
  (...)
  
  Does the mobo BIOS show the disk device?  If not, does the 9240
  BIOS show the disk device, RAID level, and its size?
 
  What we need to figure out is whether this is a BIOS problem at
  this point or a Debian installer kernel driver problem.
  
  I have finally found some time to work on the problem:
  
  I set up a raid1 in the hba bios. I couldn't install onto it with
  the supermicro mb.
  
  Then I mounted the lsi hba into my old server with an Asus mb (can't
  remember which one it is, must have to check it at home...). It
  (almost) works like a charm.
  The only issue is that I can't enter the hba BIOS when it's mounted
  in the Asus mb. But when I put it back into the Supermicro mb I can
  access it again. Very strange!
 
 This behavior isn't strange.  Just about every mobo BIOS has an option
 to ignore or load option ROMs.  On your SuperMicro board this is
 controlled by the setting AddOn ROM Display Mode under the Boot
 Feature menu.  Your ASUS board likely has a similar feature that is
 currently disabled, preventing the LSI option ROM from being loaded.

Very interesting! I didn't know that.
The values I can choose for the AddOn ROM Display Mode are
Keep current and Force Bios. I have chosen the Force Bios option.
And I have disable the two options you describe below.
In the supermicro the hba's init screen isn't displayed at all now.
On the other hand in the asus I saw the init screen when the attached
discs are listed I just can't enter the configuration program with
ctrl+h although the message to press these keys is shown.

I'm now able to boot into the 2.6.32-5 kernel.
It takes quite a while until the megasas module was loaded (I suppose:
the over-current messages are shown for a while ~2 mins and then it's
boot normally until the login prompt.
When I leave it alone I get the message:

INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
message.

After booting the first time this evening I installed the bpo 3.2
kernel.
When I try to reboot the stable kernel the system hangs after the
message Will now restart.

After a while the above message about the blocked task appears again.

The bpo kernel 3.2 seems to fail. The two over current-messages are
shown and then this message:
http://pastebin.com/raw.php?i=XqVunR9e


When I load the stable kernel it stop for a while again after the
over-current message then finally gets to the login prompt. After a
while I got this message:
http://pastebin.com/raw.php?i=w409KaFN


  But apart from that I could install Debian onto the raid1. Then I
  set
 
 This was on the ASUS board correct?  Were you able to boot the RAID1
 device after install?  If so this indeed would be strange as you
 should not be able to boot from the HBA if its ROM isn't loaded.

No I wasn't able to boot the kernel installed to the RAID1. Grub was
loaded but only because I've installed it to the disk directly attached
to the MB's SATA controller.
But when choosing the RAID1 kernel it stopped (can't remember the
message anymore). I thought I haven't set the boot option for the raid1
in the hba bios properly.


  the bios to use the disks as jbods and installed Debian gain to a
  drive directly attached to the mb sata controller.
  With the original squeeze kernel the disks attached to the hba
  weren't visible. But after updating to the bpo kernel I can fdisk
  them separately and put it into a raid5 (in the end I want to apply
  the 500G partition method Cameleon suggested).
 
 This experience with the ASUS board leads me to wonder if disabling
 the option ROM and INT19 on the SM board would allow everything to
 function properly.  Try that before you take the board to the dealer
 for flashing.  Assuming you've deleted any BIOS configured RAID
 devices in the HBA BIOS already and all drives are configured for
 JBOD mode, drop the HBA back into the SM board, go into the SM BIOS,
 set PCI Slot X Option ROM to DISABLED where X is the number of
 the PCIe slot in which the LSI HBA is inserted.  Set Interrupt 19
 Capture to DISABLED.  Save settings and reboot.
 
 You should now see the same behavior as on the ASUS, including the HBA
 BIOS not showing up during the boot process.  Which I'm thinking is
 the key to it working on the ASUS as the ROM code is never resident.
 Thus it is not causing problems with kernel driver, which is
 apparently assuming the 9240 series ROM will not be resident.

Maybe I wasn't clear about that. The hba BIOS seems to be loaded in the
asus as well but I just can't enter its setting with ctrl+h.

Does all of this tell us anything :-?


 This loading of the option ROM code is what some would consider the
 difference between HBA RAID mode and HBA JBOD mode.

Well then it 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-30 Thread Stan Hoeppner
On 5/30/2012 4:52 PM, Ramon Hofer wrote:
 On Tue, 29 May 2012 20:49:32 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 On 5/29/2012 7:09 AM, Ramon Hofer wrote:
 On Sun, 20 May 2012 21:37:19 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:

 (...)

 Does the mobo BIOS show the disk device?  If not, does the 9240
 BIOS show the disk device, RAID level, and its size?

 What we need to figure out is whether this is a BIOS problem at
 this point or a Debian installer kernel driver problem.

 I have finally found some time to work on the problem:

 I set up a raid1 in the hba bios. I couldn't install onto it with
 the supermicro mb.

 Then I mounted the lsi hba into my old server with an Asus mb (can't
 remember which one it is, must have to check it at home...). It
 (almost) works like a charm.
 The only issue is that I can't enter the hba BIOS when it's mounted
 in the Asus mb. But when I put it back into the Supermicro mb I can
 access it again. Very strange!

 This behavior isn't strange.  Just about every mobo BIOS has an option
 to ignore or load option ROMs.  On your SuperMicro board this is
 controlled by the setting AddOn ROM Display Mode under the Boot
 Feature menu.  Your ASUS board likely has a similar feature that is
 currently disabled, preventing the LSI option ROM from being loaded.
 
 Very interesting! I didn't know that.
 The values I can choose for the AddOn ROM Display Mode are
 Keep current and Force Bios. I have chosen the Force Bios option.
 And I have disable the two options you describe below.
 In the supermicro the hba's init screen isn't displayed at all now.
 On the other hand in the asus I saw the init screen when the attached
 discs are listed I just can't enter the configuration program with
 ctrl+h although the message to press these keys is shown.
 
 I'm now able to boot into the 2.6.32-5 kernel.
 It takes quite a while until the megasas module was loaded (I suppose:
 the over-current messages are shown for a while ~2 mins and then it's
 boot normally until the login prompt.
 When I leave it alone I get the message:
 
 INFO: task scsi_scan_0:341 blocked for more than 120 seconds.
 echo 0  /proc/sys/kernel/hung_task_timeout_secs disables this
 message.
 
 After booting the first time this evening I installed the bpo 3.2
 kernel.
 When I try to reboot the stable kernel the system hangs after the
 message Will now restart.
 
 After a while the above message about the blocked task appears again.
 
 The bpo kernel 3.2 seems to fail. The two over current-messages are
 shown and then this message:
 http://pastebin.com/raw.php?i=XqVunR9e
 
 
 When I load the stable kernel it stop for a while again after the
 over-current message then finally gets to the login prompt. After a
 while I got this message:
 http://pastebin.com/raw.php?i=w409KaFN
 
 
 But apart from that I could install Debian onto the raid1. Then I
 set

 This was on the ASUS board correct?  Were you able to boot the RAID1
 device after install?  If so this indeed would be strange as you
 should not be able to boot from the HBA if its ROM isn't loaded.
 
 No I wasn't able to boot the kernel installed to the RAID1. Grub was
 loaded but only because I've installed it to the disk directly attached
 to the MB's SATA controller.
 But when choosing the RAID1 kernel it stopped (can't remember the
 message anymore). I thought I haven't set the boot option for the raid1
 in the hba bios properly.
 
 
 the bios to use the disks as jbods and installed Debian gain to a
 drive directly attached to the mb sata controller.
 With the original squeeze kernel the disks attached to the hba
 weren't visible. But after updating to the bpo kernel I can fdisk
 them separately and put it into a raid5 (in the end I want to apply
 the 500G partition method Cameleon suggested).

 This experience with the ASUS board leads me to wonder if disabling
 the option ROM and INT19 on the SM board would allow everything to
 function properly.  Try that before you take the board to the dealer
 for flashing.  Assuming you've deleted any BIOS configured RAID
 devices in the HBA BIOS already and all drives are configured for
 JBOD mode, drop the HBA back into the SM board, go into the SM BIOS,
 set PCI Slot X Option ROM to DISABLED where X is the number of
 the PCIe slot in which the LSI HBA is inserted.  Set Interrupt 19
 Capture to DISABLED.  Save settings and reboot.

 You should now see the same behavior as on the ASUS, including the HBA
 BIOS not showing up during the boot process.  Which I'm thinking is
 the key to it working on the ASUS as the ROM code is never resident.
 Thus it is not causing problems with kernel driver, which is
 apparently assuming the 9240 series ROM will not be resident.
 
 Maybe I wasn't clear about that. The hba BIOS seems to be loaded in the
 asus as well but I just can't enter its setting with ctrl+h.
 
 Does all of this tell us anything :-?
 
 
 This loading of the option ROM code is what some would consider the
 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-29 Thread Ramon Hofer
On Sun, 20 May 2012 21:37:19 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

(...)

 Does the mobo BIOS show the disk device?  If not, does the 9240 BIOS
 show the disk device, RAID level, and its size?
 
 What we need to figure out is whether this is a BIOS problem at this
 point or a Debian installer kernel driver problem.

I have finally found some time to work on the problem:

I set up a raid1 in the hba bios. I couldn't install onto it with the
supermicro mb.

Then I mounted the lsi hba into my old server with an Asus mb (can't
remember which one it is, must have to check it at home...). It (almost)
works like a charm.
The only issue is that I can't enter the hba BIOS when it's mounted in
the Asus mb. But when I put it back into the Supermicro mb I can access
it again. Very strange!
But apart from that I could install Debian onto the raid1. Then I set
the bios to use the disks as jbods and installed Debian gain to a drive
directly attached to the mb sata controller.
With the original squeeze kernel the disks attached to the hba weren't
visible. But after updating to the bpo kernel I can fdisk them
separately and put it into a raid5 (in the end I want to apply the 500G
partition method Cameleon suggested).


 Did you already flash the C7P67 BIOS to the latest version?  I can't
 recall.

I have tried to do that but it was quite strange.
I created a freedos usb stick with unetbootin and copied the files for
the update from supermicro into the stick. I did exactly what the
readmes told me. But when I did it the first time there was no output
of the flash process and the directory where the supermicro files were
located on the stick was empty.
When I tried to do the procedure again it complains that I have to
first install version 1.

I will now bring it to my dealer who can do the BIOS update for me.

And I will write to Supermicro if they are aware of the issue.


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120529140927.10dde651@nb-10114



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-29 Thread Stan Hoeppner
On 5/29/2012 7:09 AM, Ramon Hofer wrote:
 On Sun, 20 May 2012 21:37:19 -0500
 Stan Hoeppner s...@hardwarefreak.com wrote:
 
 (...)
 
 Does the mobo BIOS show the disk device?  If not, does the 9240 BIOS
 show the disk device, RAID level, and its size?

 What we need to figure out is whether this is a BIOS problem at this
 point or a Debian installer kernel driver problem.
 
 I have finally found some time to work on the problem:
 
 I set up a raid1 in the hba bios. I couldn't install onto it with the
 supermicro mb.
 
 Then I mounted the lsi hba into my old server with an Asus mb (can't
 remember which one it is, must have to check it at home...). It (almost)
 works like a charm.
 The only issue is that I can't enter the hba BIOS when it's mounted in
 the Asus mb. But when I put it back into the Supermicro mb I can access
 it again. Very strange!

This behavior isn't strange.  Just about every mobo BIOS has an option
to ignore or load option ROMs.  On your SuperMicro board this is
controlled by the setting AddOn ROM Display Mode under the Boot
Feature menu.  Your ASUS board likely has a similar feature that is
currently disabled, preventing the LSI option ROM from being loaded.

 But apart from that I could install Debian onto the raid1. Then I set

This was on the ASUS board correct?  Were you able to boot the RAID1
device after install?  If so this indeed would be strange as you should
not be able to boot from the HBA if its ROM isn't loaded.

 the bios to use the disks as jbods and installed Debian gain to a drive
 directly attached to the mb sata controller.
 With the original squeeze kernel the disks attached to the hba weren't
 visible. But after updating to the bpo kernel I can fdisk them
 separately and put it into a raid5 (in the end I want to apply the 500G
 partition method Cameleon suggested).

This experience with the ASUS board leads me to wonder if disabling the
option ROM and INT19 on the SM board would allow everything to function
properly.  Try that before you take the board to the dealer for
flashing.  Assuming you've deleted any BIOS configured RAID devices in
the HBA BIOS already and all drives are configured for JBOD mode, drop
the HBA back into the SM board, go into the SM BIOS, set PCI Slot X
Option ROM to DISABLED where X is the number of the PCIe slot in
which the LSI HBA is inserted.  Set Interrupt 19 Capture to
DISABLED.  Save settings and reboot.

You should now see the same behavior as on the ASUS, including the HBA
BIOS not showing up during the boot process.  Which I'm thinking is the
key to it working on the ASUS as the ROM code is never resident.  Thus
it is not causing problems with kernel driver, which is apparently
assuming the 9240 series ROM will not be resident.

This loading of the option ROM code is what some would consider the
difference between HBA RAID mode and HBA JBOD mode.

 Did you already flash the C7P67 BIOS to the latest version?  I can't
 recall.
 
 I have tried to do that but it was quite strange.
 I created a freedos usb stick with unetbootin and copied the files for
 the update from supermicro into the stick. I did exactly what the
 readmes told me. But when I did it the first time there was no output
 of the flash process and the directory where the supermicro files were
 located on the stick was empty.
 When I tried to do the procedure again it complains that I have to
 first install version 1.

Unfortunately flashing mobo BIOS is still not always an uneventful nor
routine process, even in 2012.

 I will now bring it to my dealer who can do the BIOS update for me.
 
 And I will write to Supermicro if they are aware of the issue.

Try what I mention above before doing either of these things.

Good luck.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fc57cac.6020...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Ramon Hofer
On Sun, 20 May 2012 23:35:58 -0300, Henrique de Moraes Holschuh wrote:

 On Sun, 20 May 2012, Ramon Hofer wrote:
 On Sat, 19 May 2012 13:06:40 -0300, Henrique de Moraes Holschuh wrote:
  On Sat, 19 May 2012, Ramon Hofer wrote:
  On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:
   On 5/19/2012 2:52 AM, Ramon Hofer wrote:
   On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
   On 5/18/2012 9:39 AM, Shane Johnson wrote:
   After that I would look to see if
   something isn't shorting out a USB port.
  
   Yes, USB is the cause of the over-current errors, which is
   plainly evident in his screen shot.  But we don't yet know if
   this USB problem is what's hanging the system.  Further
   troubleshooting is required.
   
   The strange thing is as I mentioned in another post is that on
   the mb usb port 8 there's nothing attached and I haven't found
   where port 7 is :-?
   
   I wouldn't worry about the USB errors at this point.  Unless there
   is some larger issue with insufficient power on the motherboard
   causing the USB current error, it's likely unrelated to the
   storage hardware issue.
Fix it first, then worry about the USB errors.  Given you have no
   device plugged into those ports, it could be a phantom error.
  
  Yes I hope you're right with the phantom error :-) Especially
  because I can't find port 7. No label on the mb pcb nor in it's
  documentation.
  
  It might well mean one of the power planes is oversubscribed, and
  THAT can cause anything up to and including damage to hard disks,
  data corruption, and crashes.
 
 Thanks for the suggestion, Henrique!
 The PSU is a 750 W so I think it should be enough for now.
 
 Yes, it is probably enough.  You have to do a lot to overpower a *good*
 750W PSU (a crappy one, OTOH...).
 
 You should still do all testing with the minimal hardware setup.  From
 experience, you also need to be able to test using no keyboard or a
 different keyboard (and mouse)... USB is supposed to be safe from this
 crap as it can detect overcurrent, but since it IS detecting overcurrent
 in your case (be it a faulty alarm or not)...

The PSU is a Thermaltake. I have two PSUs with less power. Maybe I should 
try it with one of them?

I will try this evening with a old ps2 keyboard. But it would surprise me 
if this is the source of the problem because the usb transmitter for the 
keyboard / mouse is used in another computer without problems and the 
over-current messages are always related to port 7 and 8. Using a 
different usb port makes no difference...


Best regards


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpcnn1$6a2$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Ramon Hofer
On Sun, 20 May 2012 21:59:53 -0500, Stan Hoeppner wrote:

 On 5/20/2012 1:13 PM, Ramon Hofer wrote:
 
 I was able to set a RAID1 in the WebBIOS and set the bootable option.
 But I'm not sure if the setting was accepted. Even though when I set
 the bootable option again the WebBIOS tells me the option is already
 set - so it should be ok?
 
 Unfortunately the Debian installer doesn't list the RAID1 storage
 device :-?
 
 Are you using the very latest Squeeze installer ISO?

I'm using the Netinst from Unetbootin. I can try this evening another one.


 It's possible the driver in 2.6.32-5 used in the original Squeeze
 installer doesn't work with the 9240.  Support for the 9240 was added in
 2.6.32-29:
 
 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083
 
 Something else to try:
 
 If the disks that were attached to the mobo SATA ports are still intact
 with Sqeueeze installed, boot the system with those attached to the mobo
 SATA but with the 9240 and expander removed from the system.
 
 Once booted, upgrade the kernel:
 
 $ aptitude -t squeeze-backports install linux-image-3.2.0-0.bpo.2-amd64
 
 Shutdown, install the 9240 only, power up and see if it boots without
 hanging.  If it does, power down, plug in the expander, cables, drives,
 etc, power up and see if Debian sees the RAID1 virtual disk, and the
 JBOD drives, if any are present.

I have done this already. I have installed Squeeze with the Netinst iso 
and the lsi and expander attached.
Then after the install when I couldn't boot removed the lsi card (with 
the expander still in the pcie port but not connected to the lsi card). 
Installed bpo kernel installed the lsi card again and still it hangs at 
boot.


Best regards


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpcnv6$6a2$2...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Ramon Hofer
On Sun, 20 May 2012 21:37:19 -0500, Stan Hoeppner wrote:

 On 5/20/2012 1:13 PM, Ramon Hofer wrote:
 
 There were no problems upgrading the fw :-)
 
 Unfortunately it didn't solve he problem.
 
 Grrr.
 
 3.  Go into the mobo BIOS and set and test these options:

 Quiet Boot: DISABLED
 Interrupt 19 Capture:   DISABLED
 --save/reboot/test--
 PCI Express Port:   ENABLED
 PEG Force Gen1: ENABLED
 Detect Non-Compliance Device:   ENABLED --save/
reboot/
 test--
 XHCI Hand-off:  ENABLED
 Active State Power Management:  ENABLED PCIe (PCI
 Express) Max Read
 Request Size:   4096 --save/reboot/test--
 
 None of this worked.
 
 Grrr.
 
 
 If none of this works, disable both on board SATA controllers:

 Serial-ATA Controller 0:DISABLED
 Serial-ATA Controller 1:DISABLED

 and connect all drives to the 9240, and re-enable Interrupt 19
 Capture:
 ENABLED

 This will allow booting from the 9240.  In the 9240 webBIOS, create a
 RAID1 array device of two disks, make it bootable, save and initialize
 the array.  Reboot into the Squeeze install disk and install onto the
 RAID1 device.  The initialization should continue transparently in the
 background while you're installing Debian.  When finished reboot to
 see if the boot hang persists.
 
 I was able to set a RAID1 in the WebBIOS and set the bootable option.
 But I'm not sure if the setting was accepted. Even though when I set
 the bootable option again the WebBIOS tells me the option is already
 set - so it should be ok?
 
 Unfortunately the Debian installer doesn't list the RAID1 storage
 device :-?
 
 G.
 
 Does the mobo BIOS show the disk device?  If not, does the 9240 BIOS
 show the disk device, RAID level, and its size?

The LSI BIOS shows the RAID1 array with the correct size. But I couldn't 
see the disks in the mb BIOS. But I haven't really looked for it so I 
will see this evening again if I can find it...


 What we need to figure out is whether this is a BIOS problem at this
 point or a Debian installer kernel driver problem.

This sounds like a plan :-)


 Hopefully you won't need to do all of these things as it will be very
 time consuming.  I'm attempting to provide you a thorough
 troubleshooting guide that covers most/all the possible/likely causes
 of the hang.
 
 Thank you very much for your help so far :-)
 
 Sorry it hasn't helped you make forward progress.

Still you help me by having good ideas. I would have already ran out of 
ideas...


 Did you already flash the C7P67 BIOS to the latest version?  I can't
 recall.

No I didn't touch the mb firmware.
I can do this this evening as well.


Best regards


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpcoav$6a2$3...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Ramon Hofer
On Sun, 20 May 2012 21:27:21 -0500
Stan Hoeppner s...@hardwarefreak.com wrote:

 On 5/20/2012 9:24 AM, Ramon Hofer wrote:
  On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:
  
  On 5/19/2012 5:33 AM, Ramon Hofer wrote:
 
  Yes, I'm really thankful for the recommendation. And somehow I
  hoped you could jump in and help me :-)
 
  I'm actively working on it, have been for a couple of hours on and
  off. I'm reading your responses as I go before responding so I
  hopefully don't recommend something you've already tried.  I'm
  still researching. In the mean time, if you can, go ahead and
  flash the 9240 with the latest firmware, precisely following the
  instructions.
  
  Should I first flash the new firmware and then test what you
  describe below?
 
 Flash the firmware, then try too boot the system from the drives
 attached to the mobo SATA port, as you have been.  If the system locks
 as it did before, this will tell us the firmware update didn't solve
 the problem.  Given that the shipped FW was from 2010, I have high
 hopes the new FW will fix this problem.  I'm surprised your card
 shipped with a FW that old.  From what company did you purchase the
 9240-4i?  I'm wondering if it may have been sitting on a shelf for a
 while.

I purchased the card from http://www.techmania.ch/.
When I placed the order I asked them if I can come and pick it up
directly and they told me that they don't have an own warehouse but
they order the card directly from lsi.


  I am not very sure if I do the flashing right. Here's what I do:
  
  1. Read the firmware readme file [1]
  
  Installation:
  =
  Use MegaCLI to flash the SAS controllers.  MegaCLI can be
  downloaded from the support and download section of www.lsi.com.
 
  Command syntax:  MegaCli -adpfwflash -f imr_fw.rom -a0
  
  So I download the MegaCLI from [2] and read the MegaCLI readme [3]:
  
  Installation Commands: 
  ===
  1. Copy MegaCli.exe to a folder.
  2. Run MegaCli from the Command Prompt.  Use -h option to
  see help 
  menu.
  
  I create a FreeDOS USB stick with unetbooting. Copy MegaCli.exe and
  the imr_fw.rom [4] into a folder on the USB stick, boot it and run
  the above command to flash the card?
 
 Yep.
 
  
  [1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
  20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.txt
  
  [2] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
  20Files/8.00.40_Dos_Megacli.zip
  
  [3] http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/
  README_FOR_8.00.40_Dos_Megacli.zip.txt
  
  [4] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
  20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip
  
  
  (...)
  
  But I didn't know if it's ok to ask you by name.
 
  I've been doing a reply-to-all with each reply, hoping you'd
  follow suit.  This list is very busy thus a reply-all ensures I
  won't miss your posts.
  
  I'm using pan to read the newsgroup where's no reply to all button.
  But there's a mail to field which I'm now testing :-)
 
 Ahh, ok.  I didn't realize some people read mailing lists via news
 groups.
 
 When I reply-to-all, where does the copy end up that is sent to
 ramonho...@bluewin.ch?  Surely you read your email in an MUA such as
 ThunderBird or similar.  You can reply-to-all from there.

This message was sent by claws. Hope it works now :-)


  Please feel free to address me by name and/or contact me directly
  off list.  I recommended this storage controller/expander solution
  to you and it's not working yet.  I'm not going to leave you
  twisting in the wind.  That's not how I roll. ;)  Besides, look at
  my RHS domain.  I have a reputation to uphold. :)
  
  That's very kind!
  I know that everyone here does for free what they do. So I don't
  want to ask for someone to spend his/her time for me. But of course
  I'm really thankful for every single minute you spend to help me :-)
 
 I'll be with you until it's fixed, working, or until we identify the
 root cause.  It's always possible that the HBA is defective in some
 way. If neither the firmware update no other measures fix the
 problem, you may need to send the card back for replacement.

I really appreciate that. Thanks alot!


 BTW, after you flash the FW, power off the machine and remove the
 Intel Expander from its PCIe slot.  Disconnect the 8087 cable from
 the 9240. Then power up and see if the system boots from the mobo
 connected drives.  This will isolate the 9240 from the downstream SAS
 expander and drives.

This was my idea before so I did most of the tests without the expander
card...


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120521143835.125d9c15@nb-10114



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Henrique de Moraes Holschuh
On Mon, 21 May 2012, Ramon Hofer wrote:
 On Sun, 20 May 2012 23:35:58 -0300, Henrique de Moraes Holschuh wrote:
  Thanks for the suggestion, Henrique!
  The PSU is a 750 W so I think it should be enough for now.
  
  Yes, it is probably enough.  You have to do a lot to overpower a *good*
  750W PSU (a crappy one, OTOH...).
  
  You should still do all testing with the minimal hardware setup.  From
  experience, you also need to be able to test using no keyboard or a
  different keyboard (and mouse)... USB is supposed to be safe from this
  crap as it can detect overcurrent, but since it IS detecting overcurrent
  in your case (be it a faulty alarm or not)...
 
 The PSU is a Thermaltake. I have two PSUs with less power. Maybe I should 
 try it with one of them?

Well, it is worth a try, Thermaltake are usually good PSUs, but still...

 I will try this evening with a old ps2 keyboard. But it would surprise me 

Please make sure to not hotplug a PS2 device (mice/keyboards), they're
cold-plug only.  Some motherboards and devices tolerate hotplugging, but it
is not safe to do so unless you're explicilty told in documentation of both
devices that hotplugging is supported.

 if this is the source of the problem because the usb transmitter for the 
 keyboard / mouse is used in another computer without problems and the 
 over-current messages are always related to port 7 and 8. Using a 
 different usb port makes no difference...

Yes, it's unlikely.  But you have already exausted all likely reasons,
anyway...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120521152410.ga16...@khazad-dum.debian.net



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-21 Thread Stan Hoeppner
On 5/21/2012 2:00 PM, Ramon Hofer wrote:

 From my /var/log/installer/syslog I think it uses 2.6.32-5-amd64.
 I have attached the tree log files again (maybe you haven't seen them
 when I posted them to the list).
 There are two syslogs (the one from /var/log as well). And
 the /var/log/installer/hardware-summary.
 
 I have added the the command I'd use to print the file in the first
 line.

This isn't going to make any difference as it locks up with the 3.2
backport kernel, which has a much newer LSI driver.  So don't waste any
more time on the Debian installer.

 Maybe you can see something in there...

Nothing but the megasas errors and the 120 second timeouts.  There's no
smoking gun present in the logs.

 It's possible the driver in 2.6.32-5 used in the original Squeeze
 installer doesn't work with the 9240.  Support for the 9240 was
 added in 2.6.32-29:

 http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083

 Something else to try:

 If the disks that were attached to the mobo SATA ports are still
 intact with Sqeueeze installed, boot the system with those attached
 to the mobo SATA but with the 9240 and expander removed from the
 system.

 Once booted, upgrade the kernel:

 $ aptitude -t squeeze-backports install
 linux-image-3.2.0-0.bpo.2-amd64

 Shutdown, install the 9240 only, power up and see if it boots
 without hanging.  If it does, power down, plug in the expander,
 cables, drives, etc, power up and see if Debian sees the RAID1
 virtual disk, and the JBOD drives, if any are present.

 I have done this already. I have installed Squeeze with the Netinst
 iso and the lsi and expander attached.
 Then after the install when I couldn't boot removed the lsi card
 (with the expander still in the pcie port but not connected to the
 lsi card). Installed bpo kernel installed the lsi card again and
 still it hangs at boot.

 H  this is very strange.   Never seen anywhere near this much
 trouble before installing an LSI HBA with Linux.
 
 I have not much luck with linux and hardware ;-)

That's odd given you have top shelf mobo and SAS HBA, SuperMicro and
LSI.  The problem seems to be in the kernel at this point, though I'm
unable to find anything thus far via Google that provides a fix...

BTW, I noticed you stopped copying the Debian list.  All of this
exchange needs to be archived for others, so I'm adding the list back to
the CC.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fbac22e.1010...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Stan Hoeppner
On 5/19/2012 11:05 AM, Henrique de Moraes Holschuh wrote:
 On Sat, 19 May 2012, Ramon Hofer wrote:
 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

 It shows that udev is having serious trouble handling one of the USB
 devices.

 Yes but only when the lsi card is attached. When it's removed the 
 
 Get a better PSU, and if that doesn't work, either junk the motherboard, or
 give up on adding any cards that require a bit more power.

This is absolutely horrible advice.  Any moderate horsepower PCIe x16
GPU card from nVidia or AMD is going to draw 4-10 times the current of
these SAS boards.  Too much PCIe power draw isn't the issue here, unless
the mobo is possibly defective.  I doubt this is the case.  It's most
likely a firmware bug in the HBA or the system BIOS, or a driver bug in
2.6.32, or a combination of these.  We should know after Ramon runs
through the task list I provided earlier.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb8a1b3.9020...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Henrique de Moraes Holschuh
On Sun, 20 May 2012, Stan Hoeppner wrote:
 On 5/19/2012 11:05 AM, Henrique de Moraes Holschuh wrote:
  On Sat, 19 May 2012, Ramon Hofer wrote:
  And after a while there are more messages which I don't understand. I
  have taken a picture:
  http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
 
  It shows that udev is having serious trouble handling one of the USB
  devices.
 
  Yes but only when the lsi card is attached. When it's removed the 
  
  Get a better PSU, and if that doesn't work, either junk the motherboard, or
  give up on adding any cards that require a bit more power.
 
 This is absolutely horrible advice.  Any moderate horsepower PCIe x16

Well, yes. But mostly because I didn't add the proper but first check if
you cannot supply extra power using MOLEX connectors.  I apologise for that
one.

I *have* been through oversubscribed power rails due to el-cheap-o PSUs and
onboard (motherboard) voltage regulators before, as well as due to
undersized PSUs (in servers), and I've also been through overload scenarios
caused by bad memory modules, and a bad keyboard (which had developed low
resistance paths akin to very small short-circuits).  The system goes
slightly insane, all sort of weird defects show up, INCLUDING tripping the
overcurrent detector on the root USB hub due to +5V floating too much, etc.

 these SAS boards.  Too much PCIe power draw isn't the issue here, unless
 the mobo is possibly defective.  I doubt this is the case.  It's most

Or the PSU can't supply enough power to whichever rail the onboard VRs are
using to supply the PCIe slots and the chipset (might not be the 3.3/5V
ones, some boards prefer to do it using the 12V rail and a DC-DC VR).

 likely a firmware bug in the HBA or the system BIOS, or a driver bug in
 2.6.32, or a combination of these.  We should know after Ramon runs
 through the task list I provided earlier.

AFAIK, the only kernel bug that could cause overcurrent misdetects is a
problem on interrupt sharing, which should not be possible in a modern board
where everything PCIe uses MSI/MSI-X (the Linux USB core is still incapable
of using MSI/MSI-X, at least up to kernel 3.2)... or memory corruption,
which is less deterministic.

Firmware bugs in SMM code can cause just about anything, but it seems
unlikely they'd mess with the overcurrent alarm report bits in the USB
chipset because of a disk controller.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120520115339.ga26...@khazad-dum.debian.net



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Ramon Hofer
On Sat, 19 May 2012 13:41:33 +, Camaleón wrote:

(...)

 I have tried to check if I can see something in the mb BIOS to see if
 it can tell me anything about the connected hardware. But I didn't find
 anything in the PCI settings.
 
 Mmm, have you tried to set a RAID level instead using JBOD? It's just
 for testing... although this can only be done in a very early stage when
 the disks are completely empty with no data on them because I'm afraid
 changing this will destroy whatever contains.

I will play around this evening a bit.
Luckily my last attempt with the Supermicro HBAs wipped the disks already 
so I have some disks to play with ;-)


 Are you reaching the GRUB2 menu? If yes, you can select recovery
 mode/ single-user mode.
 
 Ah ok. Yes I have tried that with both kernels in recovery mode but
 without luck.
 There are alot more messages with the last two of them the over-current
 messages :-o
 
 If you could upload an image with the screen you get, it would be great
 :-)

Here you go:
http://666kb.com/i/c3yd21ff71u88d4x8.jpg

But I could only get the last part when it stopped adding new lines. It 
just was too fast to get anything before :-(


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpan47$87h$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Ramon Hofer
On Sat, 19 May 2012 13:06:40 -0300, Henrique de Moraes Holschuh wrote:

 On Sat, 19 May 2012, Ramon Hofer wrote:
 On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:
  On 5/19/2012 2:52 AM, Ramon Hofer wrote:
  On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
  
  On 5/18/2012 9:39 AM, Shane Johnson wrote:
 
  After that I would look to see if
  something isn't shorting out a USB port.
 
  Yes, USB is the cause of the over-current errors, which is plainly
  evident in his screen shot.  But we don't yet know if this USB
  problem is what's hanging the system.  Further troubleshooting is
  required.
  
  The strange thing is as I mentioned in another post is that on the
  mb usb port 8 there's nothing attached and I haven't found where
  port 7 is :-?
  
  I wouldn't worry about the USB errors at this point.  Unless there is
  some larger issue with insufficient power on the motherboard causing
  the USB current error, it's likely unrelated to the storage hardware
  issue.
   Fix it first, then worry about the USB errors.  Given you have no
  device plugged into those ports, it could be a phantom error.
 
 Yes I hope you're right with the phantom error :-) Especially because I
 can't find port 7. No label on the mb pcb nor in it's documentation.
 
 It might well mean one of the power planes is oversubscribed, and THAT
 can cause anything up to and including damage to hard disks, data
 corruption, and crashes.

Thanks for the suggestion, Henrique!
The PSU is a 750 W so I think it should be enough for now.


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpan8g$87h$2...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Camaleón
On Sun, 20 May 2012 12:12:55 +, Ramon Hofer wrote:

 On Sat, 19 May 2012 13:41:33 +, Camaleón wrote:
 
 Mmm, have you tried to set a RAID level instead using JBOD? It's just
 for testing... although this can only be done in a very early stage
 when the disks are completely empty with no data on them because I'm
 afraid changing this will destroy whatever contains.
 
 I will play around this evening a bit. Luckily my last attempt with the
 Supermicro HBAs wipped the disks already so I have some disks to play
 with ;-)

Good :-)

Also, consider installing into the motherboard only the strictly required 
devices to work (i.e., processor+heatsink, memory and a couple of hard 
disks to test mdraid).
 
 If you could upload an image with the screen you get, it would be great
 :-)
 
 Here you go:
 http://666kb.com/i/c3yd21ff71u88d4x8.jpg

Thanks!
 
 But I could only get the last part when it stopped adding new lines. It
 just was too fast to get anything before :-(

Okay... I can't recall if you are already considered/tried disabling the 
USB host controller from your BIOS. 

Anyway, from the above messages it seems there are two USB hosts detected 
(2-1 with 6 ports and 4-1 with 8 ports) and the latter is the one 
exposing the over-current condition. Which OTOH is also weird because 
according to the motherboard specifications¹, there has to be x2 USB 3 
ports and x12 USB 2.0. There's something not matching here.

¹http://www.supermicro.com/products/motherboard/Core/P67/C7P67.cfm

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpar6c$u2v$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Ramon Hofer
On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:

 On 5/19/2012 5:33 AM, Ramon Hofer wrote:
 
 Yes, I'm really thankful for the recommendation. And somehow I hoped
 you could jump in and help me :-)
 
 I'm actively working on it, have been for a couple of hours on and off.
  I'm reading your responses as I go before responding so I hopefully
 don't recommend something you've already tried.  I'm still researching.
  In the mean time, if you can, go ahead and flash the 9240 with the
 latest firmware, precisely following the instructions.

Should I first flash the new firmware and then test what you describe 
below?

I am not very sure if I do the flashing right. Here's what I do:

1. Read the firmware readme file [1]

 Installation:
 =
 Use MegaCLI to flash the SAS controllers.  MegaCLI can be downloaded
 from the support and download section of www.lsi.com.
 
 Command syntax:  MegaCli -adpfwflash -f imr_fw.rom -a0

So I download the MegaCLI from [2] and read the MegaCLI readme [3]:

 Installation Commands: 
 ===
 1.Copy MegaCli.exe to a folder.
 2.Run MegaCli from the Command Prompt.  Use -h option to see help 
menu.

I create a FreeDOS USB stick with unetbooting. Copy MegaCli.exe and the 
imr_fw.rom [4] into a folder on the USB stick, boot it and run the above 
command to flash the card?


[1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.txt

[2] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/8.00.40_Dos_Megacli.zip

[3] http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/
README_FOR_8.00.40_Dos_Megacli.zip.txt

[4] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip


(...)

 But I didn't know if it's ok to ask you by name.
 
 I've been doing a reply-to-all with each reply, hoping you'd follow
 suit.  This list is very busy thus a reply-all ensures I won't miss your
 posts.

I'm using pan to read the newsgroup where's no reply to all button. But 
there's a mail to field which I'm now testing :-)


 Please feel free to address me by name and/or contact me directly off
 list.  I recommended this storage controller/expander solution to you
 and it's not working yet.  I'm not going to leave you twisting in the
 wind.  That's not how I roll. ;)  Besides, look at my RHS domain.  I
 have a reputation to uphold. :)

That's very kind!
I know that everyone here does for free what they do. So I don't want to 
ask for someone to spend his/her time for me. But of course I'm really 
thankful for every single minute you spend to help me :-)


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpaurh$vas$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Ramon Hofer
On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:

 On 5/19/2012 5:33 AM, Ramon Hofer wrote:
 
 Yes, I'm really thankful for the recommendation. And somehow I hoped
 you could jump in and help me :-)
 
 I'm actively working on it, have been for a couple of hours on and off.
  I'm reading your responses as I go before responding so I hopefully
 don't recommend something you've already tried.  I'm still researching.
  In the mean time, if you can, go ahead and flash the 9240 with the
 latest firmware, precisely following the instructions.

There were no problems upgrading the fw :-)

Unfortunately it didn't solve he problem.

 Also try the following:
 
 1.  Power the Intel expander with a PSU 4 pin Molex connector instead of
 using a PCIe slot.  Molex are the large standard plugs, usually white,
 used to connect hard drives for the past 25 years--two black wires, one
 red, one yellow.  With the chassis laying on your desk and the side/top
 cover panel removed, lay the anti-static bag the expander shipped in on
 top of the drive cage frame or PSU, then lay the expander card on its
 back on top of the bag--heat sink facing the ceiling.  Make sure it
 doesn't fall off and ground out to the metal chassis or mobo, etc.  This
 will eliminate a possible PCIe power bug in the mobo.

Did that but again no improvement.
Over-current messages still present and boot process still not finished 
properly.

But the over-current message is always present even with only mb, ram, cpu 
and graphics card.

Btw this is a PCIe x1 ATI FireMV 2260 card. With it I have both PCIe x16 
for the LSI and Intel cards available.


 2.  With the expander powered directly from the PSU, try the 9240 in
 each x16 slot until one works (I'm assuming you know that you must power
 down the system before inserting/removing cards or you'll very likely
 permanently damage the cards and/or mobo).  If no success here...

No success with the hba in either of the two slots. I have also tried to 
plug the graphics card to another slot.
And the expander was completely removed for these tests with no SAS cable 
connected to the lsi card.


 3.  Go into the mobo BIOS and set and test these options:
 
 Quiet Boot:   DISABLED
 Interrupt 19 Capture: DISABLED
 --save/reboot/test--
 PCI Express Port: ENABLED
 PEG Force Gen1:   ENABLED
 Detect Non-Compliance Device: ENABLED --save/reboot/
test--
 XHCI Hand-off:ENABLED
 Active State Power Management:ENABLED PCIe (PCI 
Express) Max Read
 Request Size: 4096 --save/reboot/test--

None of this worked.


 If none of this works, disable both on board SATA controllers:
 
 Serial-ATA Controller 0:  DISABLED
 Serial-ATA Controller 1:  DISABLED
 
 and connect all drives to the 9240, and re-enable Interrupt 19 
Capture:
   ENABLED
 
 This will allow booting from the 9240.  In the 9240 webBIOS, create a
 RAID1 array device of two disks, make it bootable, save and initialize
 the array.  Reboot into the Squeeze install disk and install onto the
 RAID1 device.  The initialization should continue transparently in the
 background while you're installing Debian.  When finished reboot to see
 if the boot hang persists.

I was able to set a RAID1 in the WebBIOS and set the bootable option. But 
I'm not sure if the setting was accepted. Even though when I set the 
bootable option again the WebBIOS tells me the option is already set - so 
it should be ok?

Unfortunately the Debian installer doesn't list the RAID1 storage 
device :-?


 Hopefully you won't need to do all of these things as it will be very
 time consuming.  I'm attempting to provide you a thorough
 troubleshooting guide that covers most/all the possible/likely causes of
 the hang.

Thank you very much for your help so far :-)


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jpbc81$moh$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Stan Hoeppner
On 5/20/2012 9:24 AM, Ramon Hofer wrote:
 On Sat, 19 May 2012 11:31:51 -0500, Stan Hoeppner wrote:
 
 On 5/19/2012 5:33 AM, Ramon Hofer wrote:

 Yes, I'm really thankful for the recommendation. And somehow I hoped
 you could jump in and help me :-)

 I'm actively working on it, have been for a couple of hours on and off.
  I'm reading your responses as I go before responding so I hopefully
 don't recommend something you've already tried.  I'm still researching.
  In the mean time, if you can, go ahead and flash the 9240 with the
 latest firmware, precisely following the instructions.
 
 Should I first flash the new firmware and then test what you describe 
 below?

Flash the firmware, then try too boot the system from the drives
attached to the mobo SATA port, as you have been.  If the system locks
as it did before, this will tell us the firmware update didn't solve the
problem.  Given that the shipped FW was from 2010, I have high hopes the
new FW will fix this problem.  I'm surprised your card shipped with a FW
that old.  From what company did you purchase the 9240-4i?  I'm
wondering if it may have been sitting on a shelf for a while.

 I am not very sure if I do the flashing right. Here's what I do:
 
 1. Read the firmware readme file [1]
 
 Installation:
 =
 Use MegaCLI to flash the SAS controllers.  MegaCLI can be downloaded
 from the support and download section of www.lsi.com.

 Command syntax:  MegaCli -adpfwflash -f imr_fw.rom -a0
 
 So I download the MegaCLI from [2] and read the MegaCLI readme [3]:
 
 Installation Commands: 
 ===
 1.   Copy MegaCli.exe to a folder.
 2.   Run MegaCli from the Command Prompt.  Use -h option to see help 
 menu.
 
 I create a FreeDOS USB stick with unetbooting. Copy MegaCli.exe and the 
 imr_fw.rom [4] into a folder on the USB stick, boot it and run the above 
 command to flash the card?

Yep.

 
 [1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.txt
 
 [2] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
 20Files/8.00.40_Dos_Megacli.zip
 
 [3] http://www.lsi.com/downloads/Public/MegaRAID%20Common%20Files/
 README_FOR_8.00.40_Dos_Megacli.zip.txt
 
 [4] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip
 
 
 (...)
 
 But I didn't know if it's ok to ask you by name.

 I've been doing a reply-to-all with each reply, hoping you'd follow
 suit.  This list is very busy thus a reply-all ensures I won't miss your
 posts.
 
 I'm using pan to read the newsgroup where's no reply to all button. But 
 there's a mail to field which I'm now testing :-)

Ahh, ok.  I didn't realize some people read mailing lists via news groups.

When I reply-to-all, where does the copy end up that is sent to
ramonho...@bluewin.ch?  Surely you read your email in an MUA such as
ThunderBird or similar.  You can reply-to-all from there.

 Please feel free to address me by name and/or contact me directly off
 list.  I recommended this storage controller/expander solution to you
 and it's not working yet.  I'm not going to leave you twisting in the
 wind.  That's not how I roll. ;)  Besides, look at my RHS domain.  I
 have a reputation to uphold. :)
 
 That's very kind!
 I know that everyone here does for free what they do. So I don't want to 
 ask for someone to spend his/her time for me. But of course I'm really 
 thankful for every single minute you spend to help me :-)

I'll be with you until it's fixed, working, or until we identify the
root cause.  It's always possible that the HBA is defective in some way.
 If neither the firmware update no other measures fix the problem, you
may need to send the card back for replacement.

BTW, after you flash the FW, power off the machine and remove the Intel
Expander from its PCIe slot.  Disconnect the 8087 cable from the 9240.
Then power up and see if the system boots from the mobo connected
drives.  This will isolate the 9240 from the downstream SAS expander and
drives.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb9a809.2050...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Henrique de Moraes Holschuh
On Sun, 20 May 2012, Ramon Hofer wrote:
 On Sat, 19 May 2012 13:06:40 -0300, Henrique de Moraes Holschuh wrote:
  On Sat, 19 May 2012, Ramon Hofer wrote:
  On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:
   On 5/19/2012 2:52 AM, Ramon Hofer wrote:
   On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
   On 5/18/2012 9:39 AM, Shane Johnson wrote:
   After that I would look to see if
   something isn't shorting out a USB port.
  
   Yes, USB is the cause of the over-current errors, which is plainly
   evident in his screen shot.  But we don't yet know if this USB
   problem is what's hanging the system.  Further troubleshooting is
   required.
   
   The strange thing is as I mentioned in another post is that on the
   mb usb port 8 there's nothing attached and I haven't found where
   port 7 is :-?
   
   I wouldn't worry about the USB errors at this point.  Unless there is
   some larger issue with insufficient power on the motherboard causing
   the USB current error, it's likely unrelated to the storage hardware
   issue.
Fix it first, then worry about the USB errors.  Given you have no
   device plugged into those ports, it could be a phantom error.
  
  Yes I hope you're right with the phantom error :-) Especially because I
  can't find port 7. No label on the mb pcb nor in it's documentation.
  
  It might well mean one of the power planes is oversubscribed, and THAT
  can cause anything up to and including damage to hard disks, data
  corruption, and crashes.
 
 Thanks for the suggestion, Henrique!
 The PSU is a 750 W so I think it should be enough for now.

Yes, it is probably enough.  You have to do a lot to overpower a *good* 750W
PSU (a crappy one, OTOH...).

You should still do all testing with the minimal hardware setup.  From
experience, you also need to be able to test using no keyboard or a
different keyboard (and mouse)... USB is supposed to be safe from this crap
as it can detect overcurrent, but since it IS detecting overcurrent in your
case (be it a faulty alarm or not)...

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120521023558.gb3...@khazad-dum.debian.net



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Stan Hoeppner
On 5/20/2012 1:13 PM, Ramon Hofer wrote:

 There were no problems upgrading the fw :-)
 
 Unfortunately it didn't solve he problem.

Grrr.

 3.  Go into the mobo BIOS and set and test these options:

 Quiet Boot:  DISABLED
 Interrupt 19 Capture:DISABLED
 --save/reboot/test--
 PCI Express Port:ENABLED
 PEG Force Gen1:  ENABLED
 Detect Non-Compliance Device:ENABLED --save/reboot/
 test--
 XHCI Hand-off:   ENABLED
 Active State Power Management:   ENABLED PCIe (PCI 
 Express) Max Read
 Request Size:4096 --save/reboot/test--
 
 None of this worked.

Grrr.

 
 If none of this works, disable both on board SATA controllers:

 Serial-ATA Controller 0: DISABLED
 Serial-ATA Controller 1: DISABLED

 and connect all drives to the 9240, and re-enable Interrupt 19 
 Capture:  
  ENABLED

 This will allow booting from the 9240.  In the 9240 webBIOS, create a
 RAID1 array device of two disks, make it bootable, save and initialize
 the array.  Reboot into the Squeeze install disk and install onto the
 RAID1 device.  The initialization should continue transparently in the
 background while you're installing Debian.  When finished reboot to see
 if the boot hang persists.
 
 I was able to set a RAID1 in the WebBIOS and set the bootable option. But 
 I'm not sure if the setting was accepted. Even though when I set the 
 bootable option again the WebBIOS tells me the option is already set - so 
 it should be ok?
 
 Unfortunately the Debian installer doesn't list the RAID1 storage 
 device :-?

G.

Does the mobo BIOS show the disk device?  If not, does the 9240 BIOS
show the disk device, RAID level, and its size?

What we need to figure out is whether this is a BIOS problem at this
point or a Debian installer kernel driver problem.

 Hopefully you won't need to do all of these things as it will be very
 time consuming.  I'm attempting to provide you a thorough
 troubleshooting guide that covers most/all the possible/likely causes of
 the hang.
 
 Thank you very much for your help so far :-)

Sorry it hasn't helped you make forward progress.

Did you already flash the C7P67 BIOS to the latest version?  I can't recall.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb9aa5f.8010...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-20 Thread Stan Hoeppner
On 5/20/2012 1:13 PM, Ramon Hofer wrote:

 I was able to set a RAID1 in the WebBIOS and set the bootable option. But 
 I'm not sure if the setting was accepted. Even though when I set the 
 bootable option again the WebBIOS tells me the option is already set - so 
 it should be ok?
 
 Unfortunately the Debian installer doesn't list the RAID1 storage 
 device :-?

Are you using the very latest Squeeze installer ISO?

It's possible the driver in 2.6.32-5 used in the original Squeeze
installer doesn't work with the 9240.  Support for the 9240 was added in
2.6.32-29:

http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=604083

Something else to try:

If the disks that were attached to the mobo SATA ports are still intact
with Sqeueeze installed, boot the system with those attached to the mobo
SATA but with the 9240 and expander removed from the system.

Once booted, upgrade the kernel:

$ aptitude -t squeeze-backports install linux-image-3.2.0-0.bpo.2-amd64

Shutdown, install the 9240 only, power up and see if it boots without
hanging.  If it does, power down, plug in the expander, cables, drives,
etc, power up and see if Debian sees the RAID1 virtual disk, and the
JBOD drives, if any are present.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb9afa9.3090...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:

 On 5/18/2012 9:39 AM, Shane Johnson wrote:

 After that I would look to see if
 something isn't shorting out a USB port.
 
 Yes, USB is the cause of the over-current errors, which is plainly
 evident in his screen shot.  But we don't yet know if this USB problem
 is what's hanging the system.  Further troubleshooting is required.

The strange thing is as I mentioned in another post is that on the mb usb 
port 8 there's nothing attached and I haven't found where port 7 is :-?


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp7jgc$cje$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Camaleón
On Fri, 18 May 2012 21:55:36 +, Ramon Hofer wrote:

 On Fri, 18 May 2012 20:18:46 +, Camaleón wrote:
 
 Are you running Squeeze?
 
 Yes, sorry forgot to mention.
 
 I installed squeeze amd64 yesterday on a raid1 (just to try). Today when
 the card was here I put it in and couldn't boot. Then I installed
 squeeze with the card present without problems but booting afterwards
 didn't work again.
 Without the card installed bpo amd64 kernel but couldn't boot again.

You have to be extremely precise while describing the situation because 
there are missing pieces in the above stanza and the whole steps you 
followed :-)

Okay, let's start over. 

You installed the lsi card in one of the motherboard slots, configured 
the BIOS to use a JBOD disk layout and then boot the installation CD for 
Squeeze, right? 

The installation proccess was smoothly (you selected a mdadm 
configuration for the disks and then formatted them with no problems), 
when the installer finished and the system first rebooted, you selected 
the new installed system from GRUB2's menu and then, the booting proccess 
halted displaying the mentioned messages in the screen, right?

 And you installed the system with no glitches and then it hangs?
 
 Without the LSI card there are no problems (except the over-current
 message which is also present with only the mb and a disk). Installation
 works ok with and without card.

So you think the system stalls because of the raid card despite you get 
the same output messages at boot and there's no additional evidence of a 
problem related to the hard disks or the controller.

Mmm... weird it is, my young padawan :-) that's for sure but it can be 
something coming from your Supermicro motherboard's BIOS and the raid 
controller. Check if there's a BIOS update for your motherboard (but just 
check, don't install!) and if so, ask Supermicro technical support about 
the exact problems it corrects and tell them you are using a LSI raid 
card and you're having problems to boot your system from it.

 What's the point for listing the USB devices? :-?
 
 Because I thought I should mention the over-current message and it's
 related to usb.
 But I think it's a completely different thing. And I don't even know
 where port 7 is but port 8 is definitely empty :-?

Yes, I agree. It seems an unrelated problem that you can try to solve 
once you correct the booting issue if the error still persists.

 Something wrong with udevd when listing an usb?? device or hub.
 
 Ok, unfortunately I have no clue what this means. But this message isn't
 there without card but it's pci-e?

Ah, that's a very interesting discovery, man. To me it can mean the 
motherboard is not correctly detecting the card, hence a BIOS issue.

 Those messages are coming from the kernel side but I can't guess the
 source that trigger them.
 
 How can I find out what they mean? It seems as if many different
 problems lead to such messages?

I would center first in solving the core of the problem.

 Mmm... the strange here is that there is no clear indication about the
 nature of the problem, that is, what's preventing your system from
 booting. Can you at least get into the single-user mode?
 
 I can't get to any login. Or is there a way to get into single-user
 mode? If you mean recovery mode: no luck either :-(

Are you reaching the GRUB2 menu? If yes, you can select recovery mode/
single-user mode.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp7o10$27j$8...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Stan Hoeppner
On 5/19/2012 2:52 AM, Ramon Hofer wrote:
 On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
 
 On 5/18/2012 9:39 AM, Shane Johnson wrote:

 After that I would look to see if
 something isn't shorting out a USB port.

 Yes, USB is the cause of the over-current errors, which is plainly
 evident in his screen shot.  But we don't yet know if this USB problem
 is what's hanging the system.  Further troubleshooting is required.
 
 The strange thing is as I mentioned in another post is that on the mb usb 
 port 8 there's nothing attached and I haven't found where port 7 is :-?

I wouldn't worry about the USB errors at this point.  Unless there is
some larger issue with insufficient power on the motherboard causing the
USB current error, it's likely unrelated to the storage hardware issue.
 Fix it first, then worry about the USB errors.  Given you have no
device plugged into those ports, it could be a phantom error.

-- 
Stan



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb765a5.1020...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Fri, 18 May 2012 18:28:05 -0500, Stan Hoeppner wrote:

 On 5/18/2012 4:55 PM, Ramon Hofer wrote:
 
 I installed squeeze amd64 yesterday on a raid1 (just to try).
 
 You need to explain this in detail:  installed on raid1
 
 Installed onto what raid1?  Does this mean you created an mdadm raid1
 pair during the Squeeze installation process, and installed to that?  To
 what SAS/SATA controller are these two disks attached?  Please provide
 as much detail as possible about this controller chip and if it is on
 the motherboard.  If so, please provide the motherboard brand/model.

Sorry I try to give you some more details. But to be honest I'm just an 
interested consumer ;-)
What I want to say is that probably I just don't know how to get the 
information. Like I can't get to the syslog when the system doesn't boot. 
But I hope with your help I can learn about ways on how to get to the 
information :-)

I installed Squeeze AMD64 Netinstall to a raid1 with the disks directly 
attached to the mainboard. During installation I partitioned the disks, 
set the filesystem to raid and created md raids during the installation 
then chose the md raids to be mounted as /boot, swap, /, /var, /usr, /tmp 
and /home.
This was just done because of curiosity.

Now the same system partitions are directly on one of the disks. It is 
still attached directly to the mainboard

The mainboard is a Supermicro C7P67 with a Marvel 88SE91xx adapter 
onboard.


 Then I installed squeeze
 with the card present without problems but booting afterwards didn't
 work again.
 
 Detail, detail detail!  To what did you install Squeeze?  Which disks,
 attached to which controller?  We *NEED* these details to assist you.

The system was installed to a disk directly attached to the mainboard. I 
thought it might be a good idea anyway to use the SATA ports on the 
mainboard for the os disk.


 Without the card installed bpo amd64 kernel but couldn't boot again.
 
 If you installed to disks attached to the expander/9240 and then yanked
 the card, of course it wouldn't boot.  Again, this is why we need
 *details*.  ALWAYS supply the details!

No, sorry for all the misunderstanding.

Even if I only have the os disks (attached to the mainboard), the lsi 
card and the expander (both mounted on pci-e x16 ports on the mainboard) 
the system hangs on after the first three messages (megasas: INIT adapter 
done and the two over-current messages).

And when I remove the LSI card only I see the over-current messages and 
the system boot just fine.

As well when I remove the expander as well I see the over-current 
messages and the system boots fine.


 Without the LSI card there are no problems (except the over-current
 message which is also present with only the mb and a disk).
 Installation works ok with and without card.
 
 Ok, so the USB over-current error has nothing to do with the hang during
 boot.

Yes, this is what I think as well but didn't want to keep quiet about 
that.

 Nevertheless I think the module for the card should be loaded but
 then it somehow hangs.
 
 Only full dmesg output will tell us this.

Yes. Unfortunately I don't know how to get the output when I can't login.

Oh ok, now I have removed the card again and found some interesting logs.

/var/log/syslog:
http://pastebin.com/raw.php?i=00rN1X8s

/var/log/installer/syslog:
http://pastebin.com/raw.php?i=sDmjbeey

/var/log/installer/hardware-summary:
http://pastebin.com/raw.php?i=V8fX4F0W


 Ok. But I have no clue either how to find this out. Maybe you could
 point into the right direction :-)
 
 Again, do not flash the HBA firmware at this point.  Provide the details
 I requested and we'll move forward from there.  It may very well be that
 the RAID firmware is causing the boot problem and you need the straight
 JBOD firmware, but lets get all the other details first so we can
 determine that instead of making wild guesses.
 
 BTW, did you disable all boot related options in the 9240 BIOS and
 force it to JBOD mode?  Did you read the instructions in their entirety
 before mounting the HBA into the machine?  This isn't a $20 SATA card
 you simply slap in and go.  It's an SAS RAID controller.  More
 care/learning is required.

To be honest I have never worked with anything else than the usual 
consumer products.
So the most of the terms I don't understand. But I will work harder I 
read how to disable these options.
What I saw is that it sets the disks connected to the expander to jbod 
mode.
And I disbled the cards BIOS completely but with no luck.


I hope this helps a bit but please be gentle with a hobbyist :-)


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp7pi1$iik$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Fri, 18 May 2012 17:47:54 -0500, Stan Hoeppner wrote:

 On 5/18/2012 9:23 AM, Ramon Hofer wrote:
 Hi all
 
 I finally got my LSI 9240-4i and the Intel SAS expander.
 
 Unfortunately it prevents the system from booting. I only got this
 message on the screen:
 
 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7 hub 4-1:1.0 over-current
 condition on port 8
 
 These over-current errors are reported by USB, not megasas.  Unplug all
 of your USB devices until you get everything else running.

Even when I plug out the chassis usb connector and only have the onboard 
usb connectors from the mainboard without connected anything to it the 
message remains.

This is the device 1 on bus 4 right? So it should be ID 1d6b:0002 Linux 
Foundation 2.0 root hub Bus?

 I also got the over-current messages when the LSI card is removed.
 Here's the output of lsusb:
 
 Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
 Laser
 Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching
 Hub Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus
 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003
 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 002 Device
 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 
 Again, this is because the over-current issue has nothing to with the
 HBA, but the USB subsystem.

Yes this might have nothing to do with the problem. But I still wanted to 
mention it because I didn't know if it's related or not. Or if I should 
worry about it.

Mainboards somethimes say strange things :-)
On my htpc I always have the message cpu fan error probably because I 
have a big passive cooler and use the chassis fans to cool them.
And this was no problems so far too.


 Nevertheless I think the module for the card should be loaded but then
 it somehow hangs.
 
 You're assuming it's the HBA/module hanging the system.  I see no
 evidence of that so far.

I came to that conclusion because when the card is mounted to system 
stops during booting.
When the card is remove the system boots.
There's this over-current problem that could cause something.
And maybe the pci-e slots have to do something with it. But I have 
plugged the lsi card to both pci-e x16 slots on the mainboard but both 
times the system didn't boot.
And the expander only uses the slot to draw it's power.

And I tried to switch the LSI bios off.

These are the things I tried to isolate the problem but unfortunately I 
don't have any other ideas.
I will now thoroughly study the lsi documentary...


 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
 
 It shows that udev is having serious trouble handling one of the USB
 devices.

Yes but only when the lsi card is attached. When it's removed the 
messages don't appear. And I don't even have anything connected to the usb 
ports. Really confusing...
I thought I had the same messages with the Supermicro AOC-SASLP-MV8 
cards :-?
But when I switched to the bpo amd64 kernel it _seemed_ ok.

This is why I hoped with the megaraid module it would be the same.

Btw just left of the Ext. LED connector there's the CR1 LED constantly 
(from the moment the system is powered) blinking with a 1 sec on / 1 sec 
off period. I couldn't find the meaning of this LED in the LSI documents. 
But to be honest I didn't read through the 500 page manual. Which I will 
do now :-)


 Then there are lots of messages like this:
 
 INFO: task modprobe:123 blocked for more than 120 seconds. echo 0...
 disables this message
 
 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
 kworker/u:1:164, 165 are listed.
 
 Posting log snippets like this is totally useless.  Please post your
 entire dmesg output to pastebin and provide the link.

I didn't have the idea yesterday that I could use the files under /var/
log. I was only missing the possibility to type dmesg in a terminal when 
the error occurs.

But I have posted some logs in my previous post. I hope these help more.

 I can enter the BIOS of the card just fine. It detect the disks and by
 defaults sets jbod option for them. This is fine because I want to use
 linux RAID.
 
 Sure, because the card and expander are working properly.

Yes, now I only have to convice the os to accept this :-)


 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html Should I try a firmware
 upgrade?
 
 Your hang problem seems unrelated to the HBA.  Exhaust all other
 possibilities before attempting a firmware upgrade.  If there is some
 other system level problem, it could botch the FW upgrade and brick the
 card, leaving you in a far worse situation than you are now.
 
 Post your FW version here.  It's likely pretty recent already.

The FW version is 2.70.04-0862.

I have a little confusion with the versioning 

Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:

 On 5/19/2012 2:52 AM, Ramon Hofer wrote:
 On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
 
 On 5/18/2012 9:39 AM, Shane Johnson wrote:

 After that I would look to see if
 something isn't shorting out a USB port.

 Yes, USB is the cause of the over-current errors, which is plainly
 evident in his screen shot.  But we don't yet know if this USB problem
 is what's hanging the system.  Further troubleshooting is required.
 
 The strange thing is as I mentioned in another post is that on the mb
 usb port 8 there's nothing attached and I haven't found where port 7 is
 :-?
 
 I wouldn't worry about the USB errors at this point.  Unless there is
 some larger issue with insufficient power on the motherboard causing the
 USB current error, it's likely unrelated to the storage hardware issue.
  Fix it first, then worry about the USB errors.  Given you have no
 device plugged into those ports, it could be a phantom error.

Yes I hope you're right with the phantom error :-)
Especially because I can't find port 7. No label on the mb pcb nor in 
it's documentation.


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp7t1k$iik$3...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Sat, 19 May 2012 09:09:52 +, Camaleón wrote:

 On Fri, 18 May 2012 21:55:36 +, Ramon Hofer wrote:
 
 On Fri, 18 May 2012 20:18:46 +, Camaleón wrote:
  
 Are you running Squeeze?
 
 Yes, sorry forgot to mention.
 
 I installed squeeze amd64 yesterday on a raid1 (just to try). Today
 when the card was here I put it in and couldn't boot. Then I installed
 squeeze with the card present without problems but booting afterwards
 didn't work again.
 Without the card installed bpo amd64 kernel but couldn't boot again.
 
 You have to be extremely precise while describing the situation because
 there are missing pieces in the above stanza and the whole steps you
 followed :-)

Ok, sorry for that! I try to improve :-)


 Okay, let's start over.
 
 You installed the lsi card in one of the motherboard slots, configured
 the BIOS to use a JBOD disk layout and then boot the installation CD for
 Squeeze, right?

Yes, but I didn't set the LSI BIOS to use the cards as jbod it did it 
automatically.

In the cards BIOS I saw that virtual drives can be setup. But since I 
want to use them as jbod I don't think I have to set virtual drives.
The Controller Property pages are very hard to understand.
So I tried with the factory default.


 The installation proccess was smoothly (you selected a mdadm
 configuration for the disks and then formatted them with no problems),
 when the installer finished and the system first rebooted, you selected
 the new installed system from GRUB2's menu and then, the booting
 proccess halted displaying the mentioned messages in the screen, right?

Exactly.
I only saw the three messages (megasas: INIT adapter done and the over-
currents) for some time. Then the screen was filled with the timeout and 
udev messages.


 And you installed the system with no glitches and then it hangs?
 
 Without the LSI card there are no problems (except the over-current
 message which is also present with only the mb and a disk).
 Installation works ok with and without card.
 
 So you think the system stalls because of the raid card despite you get
 the same output messages at boot and there's no additional evidence of a
 problem related to the hard disks or the controller.

I only get the two over current lines always.
The timeout and udev errors don't appear when the card is removed.


 Mmm... weird it is, my young padawan :-) that's for sure but it can be
 something coming from your Supermicro motherboard's BIOS and the raid
 controller. Check if there's a BIOS update for your motherboard (but
 just check, don't install!) and if so, ask Supermicro technical support
 about the exact problems it corrects and tell them you are using a LSI
 raid card and you're having problems to boot your system from it.

Thanks Master Camaleón :-D
The mb BIOS version is 2.10.1206. But I couldn't find the current 
version. They only write R 2.0.
And the readmes in the firmware zip don't tell me more.

I will email Supermicro to ask them.


 What's the point for listing the USB devices? :-?
 
 Because I thought I should mention the over-current message and it's
 related to usb.
 But I think it's a completely different thing. And I don't even know
 where port 7 is but port 8 is definitely empty :-?
 
 Yes, I agree. It seems an unrelated problem that you can try to solve
 once you correct the booting issue if the error still persists.

Will do that :-)


 Something wrong with udevd when listing an usb?? device or hub.
 
 Ok, unfortunately I have no clue what this means. But this message
 isn't there without card but it's pci-e?
 
 Ah, that's a very interesting discovery, man. To me it can mean the
 motherboard is not correctly detecting the card, hence a BIOS issue.

Ah yes, maybe it thinks it's a usb device?
I have tried to check if I can see something in the mb BIOS to see if it 
can tell me anything about the connected hardware. But I didn't find 
anything in the PCI settings.


(...)

 Mmm... the strange here is that there is no clear indication about the
 nature of the problem, that is, what's preventing your system from
 booting. Can you at least get into the single-user mode?
 
 I can't get to any login. Or is there a way to get into single-user
 mode? If you mean recovery mode: no luck either :-(
 
 Are you reaching the GRUB2 menu? If yes, you can select recovery mode/
 single-user mode.

Ah ok. Yes I have tried that with both kernels in recovery mode but 
without luck.
There are alot more messages with the last two of them the over-current 
messages :-o


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp800h$sd8$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Ramon Hofer
On Sat, 19 May 2012 10:33:06 +, Ramon Hofer wrote:

 On Fri, 18 May 2012 17:47:54 -0500, Stan Hoeppner wrote:
 
 Post your FW version here.  It's likely pretty recent already.
 
 The FW version is 2.70.04-0862.
 
 I have a little confusion with the versioning from LSI. On their
 homepage [1] they list the firmware name 4.6 - 10M09 P24 as the newest.
 The filename of this file is
 20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip. The starting
 number 20.10.1-0777 is the newest version according to the readme. The
 filename ends with 2.120.244-1482 which seems more in the format the
 version listed in my cards BIOS.
  
 [1] http://www.lsi.com/downloads/Public/MegaRAID%20Common%
 20Files/20.10.1-0077_SAS_2008_FW_Image_APP-2.120.244-1482.zip

When I start the system the card shows it's version before it start 
detecting the disks.

It's 4.14.00 and the date it shows is 29.1.2010.

So it seems a bit old?


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp8080$sd8$2...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Camaleón
On Sat, 19 May 2012 11:26:09 +, Ramon Hofer wrote:

 On Sat, 19 May 2012 09:09:52 +, Camaleón wrote:

 You installed the lsi card in one of the motherboard slots, configured
 the BIOS to use a JBOD disk layout and then boot the installation CD
 for Squeeze, right?
 
 Yes, but I didn't set the LSI BIOS to use the cards as jbod it did it
 automatically.

I guess that's the default.

 In the cards BIOS I saw that virtual drives can be setup. But since I
 want to use them as jbod I don't think I have to set virtual drives. The
 Controller Property pages are very hard to understand. So I tried with
 the factory default.

It's important that you first read and get a global understanding about 
the capabilities (and possibilities) of your card. A raid card is almost 
a small computer by itself (it has the logic and the wires to act as 
such), they're a very complex piece of hardware.

Moreover, the raid card has to stablish a perfect dialog with your 
motherboard and the rest of the system components (OS, hard disks...), 
and every single item in this chain (a BIOS problem, firmware glitch) can 
fail or make the computer behave weirdly. 

 So you think the system stalls because of the raid card despite you get
 the same output messages at boot and there's no additional evidence of
 a problem related to the hard disks or the controller.
 
 I only get the two over current lines always. The timeout and udev
 errors don't appear when the card is removed.

Okay.

 Mmm... weird it is, my young padawan :-) that's for sure but it can be
 something coming from your Supermicro motherboard's BIOS and the raid
 controller. Check if there's a BIOS update for your motherboard (but
 just check, don't install!) and if so, ask Supermicro technical support
 about the exact problems it corrects and tell them you are using a LSI
 raid card and you're having problems to boot your system from it.
 
 Thanks Master Camaleón :-D
 The mb BIOS version is 2.10.1206. But I couldn't find the current
 version. They only write R 2.0.
 And the readmes in the firmware zip don't tell me more.
 
 I will email Supermicro to ask them.

Yes, do it ASAP. Look, Supermicro is somehow special in this regard. 
They have top-quality motherboards which allows special configuration and 
setups and thus they work very closely with the rest of the hardware 
manufacturers (memory modules, HBA providers...). Should there's any 
specific problem with your raid card and any of their boards they'll tell 
the steps to follow.

 Something wrong with udevd when listing an usb?? device or hub.
 
 Ok, unfortunately I have no clue what this means. But this message
 isn't there without card but it's pci-e?
 
 Ah, that's a very interesting discovery, man. To me it can mean the
 motherboard is not correctly detecting the card, hence a BIOS issue.
 
 Ah yes, maybe it thinks it's a usb device? 

Yes, sort of. It could be that the motherboard is having problems to 
address some resources provided by your raid card.

 I have tried to check if I can see something in the mb BIOS to see if
 it can tell me anything about the connected hardware. But I didn't find
 anything in the PCI settings.

Mmm, have you tried to set a RAID level instead using JBOD? It's just for 
testing... although this can only be done in a very early stage when the 
disks are completely empty with no data on them because I'm afraid 
changing this will destroy whatever contains.

 Are you reaching the GRUB2 menu? If yes, you can select recovery mode/
 single-user mode.
 
 Ah ok. Yes I have tried that with both kernels in recovery mode but
 without luck.
 There are alot more messages with the last two of them the over-current
 messages :-o

If you could upload an image with the screen you get, it would be 
great :-)

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp87ud$27j$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Henrique de Moraes Holschuh
On Sat, 19 May 2012, Ramon Hofer wrote:
  And after a while there are more messages which I don't understand. I
  have taken a picture:
  http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
  
  It shows that udev is having serious trouble handling one of the USB
  devices.
 
 Yes but only when the lsi card is attached. When it's removed the 

Get a better PSU, and if that doesn't work, either junk the motherboard, or
give up on adding any cards that require a bit more power.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120519160510.ga22...@khazad-dum.debian.net



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Henrique de Moraes Holschuh
On Sat, 19 May 2012, Ramon Hofer wrote:
 On Sat, 19 May 2012 04:19:33 -0500, Stan Hoeppner wrote:
  On 5/19/2012 2:52 AM, Ramon Hofer wrote:
  On Fri, 18 May 2012 17:57:56 -0500, Stan Hoeppner wrote:
  
  On 5/18/2012 9:39 AM, Shane Johnson wrote:
 
  After that I would look to see if
  something isn't shorting out a USB port.
 
  Yes, USB is the cause of the over-current errors, which is plainly
  evident in his screen shot.  But we don't yet know if this USB problem
  is what's hanging the system.  Further troubleshooting is required.
  
  The strange thing is as I mentioned in another post is that on the mb
  usb port 8 there's nothing attached and I haven't found where port 7 is
  :-?
  
  I wouldn't worry about the USB errors at this point.  Unless there is
  some larger issue with insufficient power on the motherboard causing the
  USB current error, it's likely unrelated to the storage hardware issue.
   Fix it first, then worry about the USB errors.  Given you have no
  device plugged into those ports, it could be a phantom error.
 
 Yes I hope you're right with the phantom error :-)
 Especially because I can't find port 7. No label on the mb pcb nor in 
 it's documentation.

It might well mean one of the power planes is oversubscribed, and THAT
can cause anything up to and including damage to hard disks, data
corruption, and crashes.

-- 
  One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie. -- The Silicon Valley Tarot
  Henrique Holschuh


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/20120519160640.gb22...@khazad-dum.debian.net



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-19 Thread Stan Hoeppner
On 5/19/2012 5:33 AM, Ramon Hofer wrote:

 Yes, I'm really thankful for the recommendation.
 And somehow I hoped you could jump in and help me :-)

I'm actively working on it, have been for a couple of hours on and off.
 I'm reading your responses as I go before responding so I hopefully
don't recommend something you've already tried.  I'm still researching.
 In the mean time, if you can, go ahead and flash the 9240 with the
latest firmware, precisely following the instructions.

Also try the following:

1.  Power the Intel expander with a PSU 4 pin Molex connector instead of
using a PCIe slot.  Molex are the large standard plugs, usually white,
used to connect hard drives for the past 25 years--two black wires, one
red, one yellow.  With the chassis laying on your desk and the side/top
cover panel removed, lay the anti-static bag the expander shipped in on
top of the drive cage frame or PSU, then lay the expander card on its
back on top of the bag--heat sink facing the ceiling.  Make sure it
doesn't fall off and ground out to the metal chassis or mobo, etc.  This
will eliminate a possible PCIe power bug in the mobo.

2.  With the expander powered directly from the PSU, try the 9240 in
each x16 slot until one works (I'm assuming you know that you must power
down the system before inserting/removing cards or you'll very likely
permanently damage the cards and/or mobo).  If no success here...

3.  Go into the mobo BIOS and set and test these options:

Quiet Boot: DISABLED
Interrupt 19 Capture:   DISABLED
--save/reboot/test--
PCI Express Port:   ENABLED
PEG Force Gen1: ENABLED
Detect Non-Compliance Device:   ENABLED
--save/reboot/test--
XHCI Hand-off:  ENABLED
Active State Power Management:  ENABLED
PCIe (PCI Express) Max Read Request Size:   4096
--save/reboot/test--

If none of this works, disable both on board SATA controllers:

Serial-ATA Controller 0:DISABLED
Serial-ATA Controller 1:DISABLED

and connect all drives to the 9240, and re-enable
Interrupt 19 Capture:   ENABLED

This will allow booting from the 9240.  In the 9240 webBIOS, create a
RAID1 array device of two disks, make it bootable, save and initialize
the array.  Reboot into the Squeeze install disk and install onto the
RAID1 device.  The initialization should continue transparently in the
background while you're installing Debian.  When finished reboot to see
if the boot hang persists.

Hopefully you won't need to do all of these things as it will be very
time consuming.  I'm attempting to provide you a thorough
troubleshooting guide that covers most/all the possible/likely causes of
the hang.

 But I didn't know if it's ok to ask you by name.

I've been doing a reply-to-all with each reply, hoping you'd follow
suit.  This list is very busy thus a reply-all ensures I won't miss your
posts.

Please feel free to address me by name and/or contact me directly off
list.  I recommended this storage controller/expander solution to you
and it's not working yet.  I'm not going to leave you twisting in the
wind.  That's not how I roll. ;)  Besides, look at my RHS domain.  I
have a reputation to uphold. :)

-- 
Stan



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb7caf7.8060...@hardwarefreak.com



LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Ramon Hofer
Hi all

I finally got my LSI 9240-4i and the Intel SAS expander.

Unfortunately it prevents the system from booting. I only got this 
message on the screen:

megasas: INIT adapter done
hub 4-1:1.0 over-current condition on port 7
hub 4-1:1.0 over-current condition on port 8

I also got the over-current messages when the LSI card is removed. Here's 
the output of lsusb:

Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop 
Laser
Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Nevertheless I think the module for the card should be loaded but then it 
somehow hangs.

And after a while there are more messages which I don't understand. I 
have taken a picture:
http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

Then there are lots of messages like this:

INFO: task modprobe:123 blocked for more than 120 seconds.
echo 0... disables this message

Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and 
kworker/u:1:164, 165 are listed.

I can enter the BIOS of the card just fine. It detect the disks and by 
defaults sets jbod option for them. This is fine because I want to use 
linux RAID.

May this problem be the same:
http://www.spinics.net/lists/raid/msg30359.html
Should I try a firmware upgrade?

This card was recommended to me by the list:
http://lists.debian.org/debian-user/2012/05/msg00104.html

I hope I can get some hints here :-)


Best regards
Ramon



-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp5m1n$dee$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Shane Johnson
Over current problems from what I have seen are hardware problems - I would
make sure the intel expander doesn't need a external power source and if it
does that it is functioning properly.  After that I would look to see if
something isn't shorting out a USB port.

Shane


On Fri, May 18, 2012 at 8:23 AM, Ramon Hofer ramonho...@bluewin.ch wrote:

 Hi all

 I finally got my LSI 9240-4i and the Intel SAS expander.

 Unfortunately it prevents the system from booting. I only got this
 message on the screen:

 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7
 hub 4-1:1.0 over-current condition on port 8

 I also got the over-current messages when the LSI card is removed. Here's
 the output of lsusb:

 Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
 Laser
 Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

 Nevertheless I think the module for the card should be loaded but then it
 somehow hangs.

 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

 Then there are lots of messages like this:

 INFO: task modprobe:123 blocked for more than 120 seconds.
 echo 0... disables this message

 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
 kworker/u:1:164, 165 are listed.

 I can enter the BIOS of the card just fine. It detect the disks and by
 defaults sets jbod option for them. This is fine because I want to use
 linux RAID.

 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html
 Should I try a firmware upgrade?

 This card was recommended to me by the list:
 http://lists.debian.org/debian-user/2012/05/msg00104.html

 I hope I can get some hints here :-)


 Best regards
 Ramon



 --
 To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
 with a subject of unsubscribe. Trouble? Contact
 listmas...@lists.debian.org
 Archive: http://lists.debian.org/jp5m1n$dee$1...@dough.gmane.org




-- 
Shane D. Johnson
IT Administrator
Rasmussen Equipment


Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Ramon Hofer
On Fri, 18 May 2012 08:39:57 -0600, Shane Johnson wrote:

 On Fri, May 18, 2012 at 8:23 AM, Ramon Hofer ramonho...@bluewin.ch
 wrote:
 
 Hi all

 I finally got my LSI 9240-4i and the Intel SAS expander.

 Unfortunately it prevents the system from booting. I only got this
 message on the screen:

 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7 hub 4-1:1.0 over-current
 condition on port 8

 I also got the over-current messages when the LSI card is removed.
 Here's the output of lsusb:

 Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop
 Laser
 Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching
 Hub Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus
 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 003
 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub Bus 002 Device
 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

 Nevertheless I think the module for the card should be loaded but then
 it somehow hangs.

 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

 Then there are lots of messages like this:

 INFO: task modprobe:123 blocked for more than 120 seconds. echo 0...
 disables this message

 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
 kworker/u:1:164, 165 are listed.

 I can enter the BIOS of the card just fine. It detect the disks and by
 defaults sets jbod option for them. This is fine because I want to use
 linux RAID.

 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html Should I try a firmware
 upgrade?

 This card was recommended to me by the list:
 http://lists.debian.org/debian-user/2012/05/msg00104.html

 I hope I can get some hints here :-)
 
 Over current problems from what I have seen are hardware problems - I
 would make sure the intel expander doesn't need a external power source
 and if it does that it is functioning properly.  After that I would look
 to see if something isn't shorting out a USB port.
 

Thanks for your answer. But the over-current message is present as well 
without any cards. It's also there if I only have the bare mainboard and 
a disk in use.

However this doesn't bother me much for now because it doesn't seem to be 
the source of my problem.

But I'd like to know if someone has experience with the LSI card and if a 
firmware upgrade would be a good idea?
I don't want to break anything.


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp609u$40o$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Camaleón
On Fri, 18 May 2012 14:23:51 +, Ramon Hofer wrote:

 I finally got my LSI 9240-4i and the Intel SAS expander.
 
 Unfortunately it prevents the system from booting. I only got this
 message on the screen:
 
 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7 
 hub 4-1:1.0 over-current condition on port 8

How bad, but don't panic, these things happen ;-(

Are you running Squeeze?

 I also got the over-current messages when the LSI card is removed.

And you installed the system with no glitches and then it hangs?

 Here's the output of lsusb:

(...)

What's the point for listing the USB devices? :-?

 Nevertheless I think the module for the card should be loaded but then
 it somehow hangs.
 
 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

Something wrong with udevd when listing an usb?? device or hub.

 Then there are lots of messages like this:
 
 INFO: task modprobe:123 blocked for more than 120 seconds. echo 0...
 disables this message
 
 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
 kworker/u:1:164, 165 are listed.

Those messages are coming from the kernel side but I can't guess the 
source that trigger them.

 I can enter the BIOS of the card just fine. It detect the disks and by
 defaults sets jbod option for them. This is fine because I want to use
 linux RAID.

Mmm... the strange here is that there is no clear indication about the 
nature of the problem, that is, what's preventing your system from 
booting. Can you at least get into the single-user mode?

 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html 
 Should I try a firmware upgrade?

(...)

Wait, wait, wait... that looks a completely different scenario (different 
driver -mt2sas-, different raid card, encryption in place, different 
error...). And while updating the firmware is usually good, you better 
first ensure what's what you want to correct (we still don't know) and 
what firmware version solves the problem.

Greetings,

-- 
Camaleón


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp6ar6$jg9$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Ramon Hofer
On Fri, 18 May 2012 20:18:46 +, Camaleón wrote:

 On Fri, 18 May 2012 14:23:51 +, Ramon Hofer wrote:
 
 I finally got my LSI 9240-4i and the Intel SAS expander.
 
 Unfortunately it prevents the system from booting. I only got this
 message on the screen:
 
 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7 hub 4-1:1.0 over-current
 condition on port 8
 
 How bad, but don't panic, these things happen ;-(
 
 Are you running Squeeze?

Yes, sorry forgot to mention.

I installed squeeze amd64 yesterday on a raid1 (just to try). Today when 
the card was here I put it in and couldn't boot. Then I installed squeeze 
with the card present without problems but booting afterwards didn't work 
again.
Without the card installed bpo amd64 kernel but couldn't boot again.



 I also got the over-current messages when the LSI card is removed.
 
 And you installed the system with no glitches and then it hangs?

Without the LSI card there are no problems (except the over-current 
message which is also present with only the mb and a disk).
Installation works ok with and without card.


 Here's the output of lsusb:
 
 (...)
 
 What's the point for listing the USB devices? :-?

Because I thought I should mention the over-current message and it's 
related to usb.
But I think it's a completely different thing. And I don't even know 
where port 7 is but port 8 is definitely empty :-?


 Nevertheless I think the module for the card should be loaded but then
 it somehow hangs.
 
 And after a while there are more messages which I don't understand. I
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg
 
 Something wrong with udevd when listing an usb?? device or hub.

Ok, unfortunately I have no clue what this means. But this message isn't 
there without card but it's pci-e?


 Then there are lots of messages like this:
 
 INFO: task modprobe:123 blocked for more than 120 seconds. echo 0...
 disables this message
 
 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and
 kworker/u:1:164, 165 are listed.
 
 Those messages are coming from the kernel side but I can't guess the
 source that trigger them.

How can I find out what they mean? It seems as if many different problems 
lead to such messages?

 I can enter the BIOS of the card just fine. It detect the disks and by
 defaults sets jbod option for them. This is fine because I want to use
 linux RAID.
 
 Mmm... the strange here is that there is no clear indication about the
 nature of the problem, that is, what's preventing your system from
 booting. Can you at least get into the single-user mode?

I can't get to any login. Or is there a way to get into single-user mode?
If you mean recovery mode: no luck either :-(


 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html Should I try a firmware
 upgrade?
 
 (...)
 
 Wait, wait, wait... that looks a completely different scenario
 (different driver -mt2sas-, different raid card, encryption in place,
 different error...). And while updating the firmware is usually good,
 you better first ensure what's what you want to correct (we still don't
 know) and what firmware version solves the problem.

Ok. But I have no clue either how to find this out.
Maybe you could point into the right direction :-)


Best regards
Ramon


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/jp6ggn$gm5$1...@dough.gmane.org



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Stan Hoeppner
On 5/18/2012 9:23 AM, Ramon Hofer wrote:
 Hi all
 
 I finally got my LSI 9240-4i and the Intel SAS expander.
 
 Unfortunately it prevents the system from booting. I only got this 
 message on the screen:
 
 megasas: INIT adapter done
 hub 4-1:1.0 over-current condition on port 7
 hub 4-1:1.0 over-current condition on port 8

These over-current errors are reported by USB, not megasas.  Unplug all
of your USB devices until you get everything else running.

 I also got the over-current messages when the LSI card is removed. Here's 
 the output of lsusb:
 
 Bus 004 Device 003: ID 046d:c517 Logitech, Inc. LX710 Cordless Desktop 
 Laser
 Bus 004 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 004 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 001 Device 002: ID 8087:0024 Intel Corp. Integrated Rate Matching Hub
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 003 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
 Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub

Again, this is because the over-current issue has nothing to with the
HBA, but the USB subsystem.

 Nevertheless I think the module for the card should be loaded but then it 
 somehow hangs.

You're assuming it's the HBA/module hanging the system.  I see no
evidence of that so far.

 And after a while there are more messages which I don't understand. I 
 have taken a picture:
 http://666kb.com/i/c3wf606sc1qkcvgoc.jpg

It shows that udev is having serious trouble handling one of the USB
devices.

 Then there are lots of messages like this:
 
 INFO: task modprobe:123 blocked for more than 120 seconds.
 echo 0... disables this message
 
 Instead of modprobe:123 also modprobe:124, 125, 126, 127, 135, 137 and 
 kworker/u:1:164, 165 are listed.

Posting log snippets like this is totally useless.  Please post your
entire dmesg output to pastebin and provide the link.

 I can enter the BIOS of the card just fine. It detect the disks and by 
 defaults sets jbod option for them. This is fine because I want to use 
 linux RAID.

Sure, because the card and expander are working properly.

 May this problem be the same:
 http://www.spinics.net/lists/raid/msg30359.html
 Should I try a firmware upgrade?

Your hang problem seems unrelated to the HBA.  Exhaust all other
possibilities before attempting a firmware upgrade.  If there is some
other system level problem, it could botch the FW upgrade and brick the
card, leaving you in a far worse situation than you are now.

Post your FW version here.  It's likely pretty recent already.

 This card was recommended to me by the list:
 http://lists.debian.org/debian-user/2012/05/msg00104.html

Yes, I recommended it.  It's the best card available in its class.

 I hope I can get some hints here :-)

When troubleshooting potential hardware issues, always disconnect
everything you can to isolate the component you believe may have an
issue.  If that device still has a problem, work until you resolve that
problem.  Then add your other hardware back into the system one device
at a time until you run into the next problem.  Rinse, repeat, until all
problems are resolved.  Isolating components during testing is the key.
 This is called process of elimination testing--eliminate everything
but the one device you're currently testing.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb6d19a.6090...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Stan Hoeppner
On 5/18/2012 9:39 AM, Shane Johnson wrote:
 Over current problems from what I have seen are hardware problems - I would
 make sure the intel expander doesn't need a external power source and if it
 does that it is functioning properly.  

Ramon is well aware of the power configuration for the expander.  And it
has nothing to do with the over-current error, which is USB related.
The two lines are adjacent in dmesg but have nothing to do with one
another.  Anyone whose every looked at dmesg output should know of this.

 After that I would look to see if
 something isn't shorting out a USB port.

Yes, USB is the cause of the over-current errors, which is plainly
evident in his screen shot.  But we don't yet know if this USB problem
is what's hanging the system.  Further troubleshooting is required.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb6d3f4.7070...@hardwarefreak.com



Re: LSI MegaRAID SAS 9240-4i hangs system at boot

2012-05-18 Thread Stan Hoeppner
On 5/18/2012 4:55 PM, Ramon Hofer wrote:

 I installed squeeze amd64 yesterday on a raid1 (just to try). 

You need to explain this in detail:  installed on raid1

Installed onto what raid1?  Does this mean you created an mdadm raid1
pair during the Squeeze installation process, and installed to that?  To
what SAS/SATA controller are these two disks attached?  Please provide
as much detail as possible about this controller chip and if it is on
the motherboard.  If so, please provide the motherboard brand/model.

 Today when 
 the card was here I put it in and couldn't boot. 

Please be technical in your descriptions and provide as much detail as
possible.  The above statement sounds like something from a person who
has never touched a PC before.  Providing detail is what solves
problems.  Lack of detail is what causes problems to linger on until
people take hammers to hardware.  I assume you prefer the former. :)

 Then I installed squeeze 
 with the card present without problems but booting afterwards didn't work 
 again.

Detail, detail detail!  To what did you install Squeeze?  Which disks,
attached to which controller?  We *NEED* these details to assist you.

 Without the card installed bpo amd64 kernel but couldn't boot again.

If you installed to disks attached to the expander/9240 and then yanked
the card, of course it wouldn't boot.  Again, this is why we need
*details*.  ALWAYS supply the details!

 Without the LSI card there are no problems (except the over-current 
 message which is also present with only the mb and a disk).
 Installation works ok with and without card.

Ok, so the USB over-current error has nothing to do with the hang during
boot.

 Nevertheless I think the module for the card should be loaded but then
 it somehow hangs.

Only full dmesg output will tell us this.

 Ok. But I have no clue either how to find this out.
 Maybe you could point into the right direction :-)

Again, do not flash the HBA firmware at this point.  Provide the details
I requested and we'll move forward from there.  It may very well be that
the RAID firmware is causing the boot problem and you need the straight
JBOD firmware, but lets get all the other details first so we can
determine that instead of making wild guesses.

BTW, did you disable all boot related options in the 9240 BIOS and
force it to JBOD mode?  Did you read the instructions in their entirety
before mounting the HBA into the machine?  This isn't a $20 SATA card
you simply slap in and go.  It's an SAS RAID controller.  More
care/learning is required.

-- 
Stan


-- 
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org 
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org
Archive: http://lists.debian.org/4fb6db05.9060...@hardwarefreak.com