Am 2017-03-01 um 22:42 schrieb Daniel Frey:

> I'm not sure how the sg? -> sd? mapping is supposed to work. I find it
> odd that there seems to be two nodes reported for each sd? entry.
> However, this could be the way the controller driver reports it to the
> kernel...
> 
>> 07:01.0 SCSI storage controller: LSI Logic / Symbios Logic 53c1030
>> PCI-X Fusion-MPT Dual Ultra320 SCSI (rev 08)
>> 0a:0e.0 RAID bus controller: Adaptec AAC-RAID
>>
> 
> Well, if you are using a hw raid card in jbod mode the controller will
> generally not report that info. You'd have to install the controller's
> cli management tools and use that. You'd have to figure out which
> controller your drives are attached to.
> 
> Adaptec uses sys-block/arcconf
> LSI uses sys-block/megacli
> 3ware uses sys-block/tw_cli

yes, thanks.
arcconf doesn't do much here ... tried some commands, but the controller
doesn't return info.

Maybe not the disks itself die but the controller gets flaky ... quite
old already and I had issues at warm boot lately that were only solved
by removing power completely.

See these lines in dmesg:

[74403.796012] aacraid: Host adapter abort request (1,0,0,0)
[74403.804011] aacraid: Host adapter abort request (1,0,1,0)
[74403.804033] aacraid: Host adapter reset request. SCSI hang ?
[74403.804040] AAC: Host adapter BLINK LED 0x7
[74403.804056] AAC0: adapter kernel panic'd 7.
[74509.788015] aacraid: Host adapter abort request (1,0,0,0)
[74511.804015] aacraid: Host adapter abort request (1,0,1,0)
[74511.804041] aacraid: Host adapter reset request. SCSI hang ?
[74511.804044] AAC: Host adapter BLINK LED 0x7
[74511.804068] AAC0: adapter kernel panic'd 7.

And sdi throws errors:

[31529.901711] md/raid:md3: read error corrected (8 sectors at 11190152
on sdi1)
[31529.901713] md/raid:md3: read error corrected (8 sectors at 11190160
on sdi1)
[31529.901715] md/raid:md3: read error corrected (8 sectors at 11190168
on sdi1)
[31529.901717] md/raid:md3: read error corrected (8 sectors at 11190176
on sdi1)
[31529.901718] md/raid:md3: read error corrected (8 sectors at 11190184
on sdi1)

I wonder if one or more disks do any kind of electrical "noise" on the
SATA bus and confuse the controller in a way.

This is why I would like to remove sdi ... and the reason why I want to
spot that specific hdd.

Back then I used the trick to stress that specific disk by dd or
something (read everything in for example) and let a person spot the
disk by looking at the LEDs on the drive cages ;-)

Maybe the faster way in this case.

> The management tools for the other cards should provide this sort of
> functionality.
> 
> If you had used the raid card to create an array the management cli
> tools with show that a specific port is dead and you query it for the
> serial number.
> 
> This doesn't help you with the sg mapping. The problem for you now will
> be figuring out why sg_map is reporting the way it is.

The disks were originally configured via StorMan under SLES10 or so,
that server was a SLES server back then and I moved it to gentoo later on.

I could boot into SLES to have StorMan again, but this leads to the
mentioned boot-failure, so I want to avoid that for now.

Something is wrong with this box and I have to spot if it's the disk(s)
or the controller. All this while I am >600km away from the server.

Thanks, Stefan


Reply via email to