On 02/05/2019 12:02 PM, Benjamin Block wrote:
On Mon, Feb 04, 2019 at 04:37:52PM +0000, Will, Chris wrote:
We have auto lun scan turned on for our SLES 11 SP4 hosts and it is on
by default for our SLES 12 SP3 hosts.  We are trying to insert a
Cirrus device which has the capability to discover LUNs and NPIV WWNs.
My very limited understanding of the process, during the initial boot
process (z/VM IPL) it initially will report back no LUNs to the guest,
logs into the storage device and then reports back to the guest with
the LUNs that are masked for the NPIV WWPN.

Is it this?: https://www.cdsi.us.com/technology/
If so, it sounds as if the discovery on the appliance is done on initializing
the multi-step data migration. After that, I would assume it persistently knows
both ends and answers on behalf of its attached opposite ends (host, storage).

Actually it has quite some commonality with SAN volume virtualizers using
"pass-through" mirrored "image" volumes. One of the main differences being that
it can be inserted into the data path without having to reconfigure the host as
the host still gets to see something that looks like the old storage one is
migrating from.

Auto-LUN-Scan in Linux works roughly as follows:
  - in z/VM guests you need dedicated FCP devices that are on CHPIDs with
    NPIV enabled (I assume you have that already). Apart from this z/VM
    does not play any role in this, it doesn't help or intercept
    anything.
  - during boot - when the zFCP driver is loaded, usually by the initial
    RAM-Disk - we (the driver) scan the FC-Network for available remote
    ports (ports that are in the same zone as the initiator ports on
    your) and open them automatically
    - this happens completely transparent, regardless of whether NPIV
      and/or auto-LUN-scan is enabled or not
  - for each successfully opened remote port the Linux Kernel SCSI code
    will issue LUN-scanning. If you have auto-LUN-scan enabled and use a
    NPIV enabled FCP device the zFCP driver will allow the scan to
    happen - otherwise we intercept it.
  - such a scan entails sending of the SCSI Command REPORT LUNS (support
    for this commands is mandatory for every SCSI device type), and for
    all reported devices the Linux kernel will create SCSI-Devices
  - note that "opening" a LUN does not actually generate any traffic in
    the network, only the remote port open and REPORT LUNS does generate
    traffic, and for all found LUNs Linux will also send some
    INQUIRY commands and such, but there is no "open LUN" command as
    such
  - Linux doesn't retry this LUN-scanning later without any reason (port
    recovery or such), so if the network stays quit it doesn't

Now, I have never worked with the devices you want to use, so its
guessing after this.

If I understand you correctly the Cirrus device sits between your
initiator FC-Ports (on the Z side) and the storage-server, somewhere in
the network and intercepts your traffic? So Linux will open the port on
this device and it will see the initial traffic by Linux, and forward it
to whatever storage-server it thinks is correct.

If it does not report back the correct list of LUNs for the initial
REPORT LUNS command, you have a problem (and I'd consider this device
broken). There are some fall-backs in the kernel for cases where the
storage device is buggy and/or doesn't properly support modern SCSI
standards, but that doesn't mean it'll help you.

This does not seem to give the SLES 12 guests issues but most of the
SLES 11 guests have issues.  Is there anyway for the guest to do a
retry, otherwise it ends up in emergency repair mode (setting we have
in fstab).

Its hard to give you proper advice without knowing in what state the
system halts, and what exactly happend during the whole process I
described above.

But in the absence of this:

Like Mark said, it might help to just trigger a scan manually with the
script rescan-scsi-bus.sh, or you could just try writing into the scan
attribute of the SCSI-Hosts in question (as root):

                                    +-- Device-Bus-ID of your FCP device
                                    |
                                    v
echo "- - -" > /sys/devices/css*/*/0.0.1900/host*/scsi_host/host*/scan

This should issue an other rescan of all the attached remote ports, and
you don't need any extra tools. You might also be able to script that
and put it in your initial ram-disk (although that might be not as easy
as it sounds; you'd have to find a proper trigger and time to issue the
script during the boot-process). I don't know any way to activate
something like this "out-of-the-box" with SLES11 (or 12 for that
matter).

Appending the following at IPL to the kernel parameters, might help you debug
further at which point things break in the way that some disk block device it
depends on is not configured or ready. This includes any initrd processing, and
auto lun scan processing during initrd and after initrd. Otherwise things can
be quite silence by default even for "error" cases:

linuxrc=trace scsi_mod.scsi_logging_level=4605 printk.time=1 ignore_loglevel

This provides a lot of output on the console.

Optionally also "shell=1", if you want to get a root shell at the end of initrd
processing to have a look what the device setup is at this boot point in time
before it attempts to mount the root file system.

linuxrc=trace and shell=1 are specific to SLES11 initrd [man 8 mkinitrd].
Kernel parameters:
https://www.ibm.com/support/knowledgecenter/linuxonibm/liaaf/lnz_r_s114.html
Device Drivers, Features, and Commands on SUSE Linux Enterprise Server
Chapter 3. Kernel and module parameters
Chapter 38. Booting Linux

With SLES12, the initrd is from dracut and different.
https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lhdd/lhdd_c_ipl_kernparm.html
https://mirrors.edge.kernel.org/pub/linux/utils/boot/dracut/dracut.html#_description_7
https://mirrors.edge.kernel.org/pub/linux/utils/boot/dracut/dracut.html#debugging-dracut
https://mirrors.edge.kernel.org/pub/linux/utils/boot/dracut/dracut.html#dracutkerneldebug

For both SLES versions, the content of /etc/udev/rules.d/51-zfcp-*.rules
managed by "yast zfcp" and zfcp_{host|disk}_configure is also relevant.

--
Mit freundlichen Gruessen / Kind regards
Steffen Maier

Linux on IBM Z Development

https://www.ibm.com/privacy/us/en/
IBM Deutschland Research & Development GmbH
Vorsitzender des Aufsichtsrats: Matthias Hartmann
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

----------------------------------------------------------------------
For LINUX-390 subscribe / signoff / archive access instructions,
send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit
http://www.marist.edu/htbin/wlvindex?LINUX-390

Reply via email to