On 4/12/15 1:25 PM, [email protected] wrote:
> Michael Tiernan wrote:
>> Normally, what will happen is that the kickstart process will wipe and
>> rebuild on the drive in Slot1 since it is the first drive. This is not
>> the desired outcome.
>>
>> What I want to do is confirm that the drive I'm focusing on is the
>> "correct" one in the physical hardware slot 0.
> Curious why the 'for whatever reason' has gone unremarked.
> What's the use case? (Frankly, could this happen to me and
> should I pay close attention?) One response seemed to assume you
> have data on other disks you want to preserve. Maybe you're
> trying to track a physical disk that contains the root
> filesystem? Fail the install if any disks are inop? Something
> else?
First off, a preface/reminder. We don't always get to choose the
entirety of our infrastructure that we inherit. :(  (i.e. suspend
preferred logic and assume to pick one battle at a time.)

So, my use case, as screwy as it may seem is this:

I've got a machine with > 1 drives in it. Usually the number is 6, 8, or
greater now that we've got some new slot rich Dells in the racks.

Machine is running along with the system on the disk in slot 0 and the
other 6, 8, etc drives configured as individual RAID0 containers or just
as raw disks. Don't ask, just go with it. Principle rule, data (on data
drives) is sacred and should never be lost.

Now, something happens and "other person with permission" reboots the
system "because" and instead of it coming up, we find that the system
drive has gone bad. Sadly this happens much more than I'd like. This
results in a situation where I cannot determine the UUIDs of the
existing drives and divine where to build the new root.

Sometimes the system drive truly disappears and in other cases just
begins to exhibit signs of total failure. Either way, I replace the
system drive with another drive (SATA) and then have to build a new
system on this drive.

What *sometimes* happens is that the new/replacement drive is bad or
also going dumb. When this happens, at times, instead of it being "Drive
#1", it is ignored and not counted and then what will happen when the
kickstart proceeds is that the *SECOND* drive in the system, the first
data drive, aka "Drive #2" will get wiped out and the system built on
it.[2] This is not the desired result.

My preference would be to be able to ask the PCI cards/slots that handle
storage "Do you have a target in slot0?"

I *CAN* determine, so far that in some cases[1], that if I pull the
drive from slot0 and slot1 is the next drive but I've not done enough
tests to confirm this fully.

I also have situations where the "Slot0" device is reached via direct
hardware "linkage" (PCI-HBA->SATADev) but there's also times when the
connection is less "plain" and is PCI-HBA->RAIDContr->SATADev) so that
the "Zeroth drive" is a virtual drive. (Which I can live with if I can
determine it.))

So far, I've found that the "/dev/disk/by-path" information is SOMEWHAT
informative at first but fails quickly when you try to parse it up.

I can post examples but it /seems/ that the response from inside the
HBA, beyond the PCI definitions of responses, is vague enough so that
you can't reliably be sure of what you're getting unless you determine
things like firmware revs and try to keep track of them.

The one thing that I've run into is the question "If you report SCSI
target a.b.c.0, are you telling me that this applies SPECIFICALLY to the
first *possible* slot, better known as slot0?"

So far, I've not found definitions of this information that tells me for
sure.

[1] The determination of these cases shows that if I pull the drive in
slot 0 and build on the drive in slot 1, the /dev/disk/by-path
identification shows as SCSI target a.b.c.1 but I have not proven that
this is *always* the case. (I'm still testing.)

[2] As already pointed out the "correction" to this problem is to
specify what device I want the system built on in the kickstart using
the anaconda "--onpart=/dev/disk/by-path/pci...." path. However, I need
to construct this path on demand since it changes with different systems
but it also *seems* to change with different revs of hardware/firmware!
As it is, the kickstart says /dev/sda but if the device in slot0 goes
"away" then /dev/sda is what's in slot1 which isn't a good thing.

There's days that it feels like all this device "standardization" is
giving us an opaque window on the engine under the hood but we can't
touch it and the responses to "where's the spark plugs" is as firm as
"over there".

In my very limited experience and view, I am of the opinion that I
should be able to ask the system:
"Do you have any devices that are handling storage targets?"
"If so, can you tell me the physical hard correlation between them and
how you see them?"
OR
"If I ask you, can you tell me if there's a drive/storage device
available in what shows up on the hardware as SlotN"

It may be that there's no physical correlation, the drives are all
virtual devices hanging off the network but I should *still* be able to
query about them directly and get correlatable data about them.

Of course I may be just digging myself a hole in a religious fervor and
should give up.

-- 
  << MCT >> Michael C Tiernan. http://www.linkedin.com/in/mtiernan    
  Non Impediti Ratione Cogatationis
  Women and cats will do as they please, and men and dogs
   should relax and get used to the idea. -Robert A. Heinlein

_______________________________________________
Discuss mailing list
[email protected]
https://lists.lopsa.org/cgi-bin/mailman/listinfo/discuss
This list provided by the League of Professional System Administrators
 http://lopsa.org/

Reply via email to