------- Comment From ma...@de.ibm.com 2016-11-14 06:49 EDT------- (In reply to comment #8) > (In reply to comment #7) > > (In reply to comment #2) > > > (In reply to comment #1)
> cat 41-zfcp-lun-0.0.e100.rules > # Generated by chzdev > ACTION=="add", SUBSYSTEMS=="ccw", KERNELS=="0.0.e100", > GOTO="start_zfcp_lun_0.0.e100" > GOTO="end_zfcp_lun_0.0.e100" > > LABEL="start_zfcp_lun_0.0.e100" > SUBSYSTEM=="fc_remote_ports", ATTR{port_name}=="0x5005076306135700", > GOTO="cfg_fc_0.0.e100_0x5005076306135700" > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074675712", > KERNELS=="rport-*", > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000" > SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", KERNEL=="*:1074741248", > KERNELS=="rport-*", > ATTRS{fc_remote_ports/$id/port_name}=="0x5005076306135700", > GOTO="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000" > GOTO="end_zfcp_lun_0.0.e100" > > LABEL="cfg_fc_0.0.e100_0x5005076306135700" > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400e00000000" > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000400f00000000" > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4000401200000000" > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001400d00000000" > ATTR{[ccw/0.0.e100]0x5005076306135700/unit_add}="0x4001401100000000" > GOTO="end_zfcp_lun_0.0.e100" > > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400e00000000" > ATTR{queue_depth}="32" > GOTO="end_zfcp_lun_0.0.e100" > > LABEL="cfg_scsi_0.0.e100_0x5005076306135700_0x4000400f00000000" > ATTR{queue_depth}="32" > GOTO="end_zfcp_lun_0.0.e100" > > LABEL="end_zfcp_lun_0.0.e100" I'm a bit surprised that only 0x4000400e00000000 and 0x4000400f00000000 seem to actually exist and thus have entries for the labels "cfg_scsi_*". Maybe your other paths (0x4000401200000000, 0x4001400d00000000, 0x4001401100000000) are currently not (yet) available, such as due to missing LUN masking on the storage; lszdev does not show these as well (although I would have expected it to show them at least as persistently configured only but not actively configured...) It's definitely OK to (pre)configure paths. > > > PV Volume information: > > > physical_volumes { > > > > > > pv0 { > > > device = "/dev/sdb5" # Hint only > > > > > pv1 { > > > device = "/dev/sda" # Hint only > > > > This does not look very good, having single path scsi disk devices mentioned > > by LVM. With zfcp-attached SCSI disks, LVM must be on top of multipathing. > > Could you please double check if your installation with LVM and multipathing > > does the correct layering? If not, this would be an independent bug. See > > also [1, slide 28 "Multipathing for Disks ? LVM on Top"]. ping maybe this is part of the root cases for sudden failure > > > Additional testing has been done with CKD volumes and we see the same > > > behavior. > > > Because of this behavior, I do not > > > believe the problem is related to SAN disk or multipath. I think it is > > > due > > > to the system not being able to read the UUID on any PV in the VG other > > > then > > > the IPL disk. > > > > For any disk device type, the initrd must contain all information how to > > enable/activate all paths of the entire block device dependency tree > > required to mount the root file system. An example for a dependency tree is > > in [1, slide 37] and such example is independent of any particular Linux > > distribution. > > I don't know how much automatic dependency tracking Ubuntu does for the > > user, especially regarding additional z-specific device activation steps > > ("setting online" as for DASD or zFCP). Potentially the user must take care > > of the dependency tree himself and ensure the necessary information lands in > > the initrd. > > > > Once the dependency tree of the root-fs has changed (such as adding a PV to > > an LVM containing the root-fs as in your case), you must re-create the > > initrd with the following command before any reboot: > > $ update-initramfs -u > > The "update-initramfs -u" command was never explicitly run after the system > was built. > The second PV volume was added to VG on 10/26/2016. However, it was not > until early November that the root FS was extended. > > Between 10/16/2016 and the date the root fs was extended, the second PV was > always online and and active in a VG and LV display after every Reboot. I don't understand how it would have ever worked without having ran "update-initramfs -u" after the addition of another PV to the root-fs dependencies. Maybe chzdev did some magic; what was it's exact output when you made the actively added paths persistent with "chzdev zfcp-lun -e --online"? > I have a note in my runlog with the following from 10/26/2016 > >>>Rebooted the system and all is working. Both disks are there and > >>>everything is online. > lsscsi > [0:0:0:1074675712]disk IBM 2107900 1.69 /dev/sdb <----- This would > be 0x400E4000 almost, it's 0x4000400e00000000 [it's swapping of half-words, you can use "lsscsi -xx" to get the hex FCP LUN values] > [0:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdd <----- This would > be 0x400F4000 and 0x4000400f00000000 > [1:0:0:1074675712]disk IBM 2107900 1.69 /dev/sda > [1:0:0:1074741248]disk IBM 2107900 1.69 /dev/sdc > > In your case on reboot, it only activated 2 paths to FCP LUN > > 0x4000400e00000000 (I cannot determine the target port WWPN(s) from below > > output because it does not convey this info) from two different FCP devices > > 0.0.e300 and 0.0.e100. > I see what you are saying, only 1074675712 (0x400E4000) is coming online at 0x4000400e00000000 > boot. 107474128 (0x400f4000) does not come online at boot. The second 0x4000400f00000000 > device must be coming online after boot has completed and that is why lsscsi > shows it online. > And since the boot partition is on the first segment, the > system can read initrd and start the boot. While zipl does support some cases of device-mapper targets under certain circumstances for the "zipl target" (/boot/ with Ubuntu 16.04), it's still dangerous to have a multi-PV root-fs _and_ the zipl target being a part of the root-fs, i.e. the zipl target not being it's own mount point withOUT LVM. [1, slide 25 "Multipathing for Disks ? Persistent Configuration"] http://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.ludd/ludd_c_zipl_lboot.html http://www.mail-archive.com/linux-390%40vm.marist.edu/msg62492.html (root-fs on LVM in general: http://www.mail-archive.com/linux-390@vm.marist.edu/msg69553.html) > But when it goes to mount root, > it is not aware of the second segment. Do I have this right? Yes. However, I still don't understand why it had worked before and now no longer. What has changed meanwhile to break it? Have you used zfcp auto lun scan before but now no longer? Or the LVM on multipathing is broken (see above)? > If so, that brings me to the next question. If this is the case, do you > have a procedure where I could bring up a rescue system, bring volumes > 1074675712 (0x400E4000) & 107474128 (0x400f4000) online, chroot and then > update the initrd with the second volume? or do I need to rebuild the > system from scratch? I have no experience with rescuing a debian mkinitrd based initrd. Your console output seems to indicate that it drops you into a root shell at the end after it gave up waiting for the root-fs dependencies. Maybe you can use this to manually add the missing paths using sysfs (, maybe one also needs to manually do the necessary pvscan and vgchange invocations,) and then have the initrd retry to mount the root-fs (it is possible with dracut based initrds and dracut has been inspired by debian mkinitrd IIRC). The easiest rescue option is to attach the disk(s) to another running Linux from which you can access and modify the broken disk content (some commands need to be done in a chroot environment!). Actually, we're now in a lot of guessing and desperately need debug data from the broken system. Typically we need the output of dbginfo.sh (Ubuntu may prefer the output of sosreport). Since the system does not boot that's a but tricky, but maybe the method described in the previous paragraph works and you can run dbginfo.sh in chroot of the broken root-fs; that would us at least give the persistent config on disk (though of course not the dynamic config). > > REFERENCE > > > > [1] > > http://www-05.ibm.com/de/events/linux-on-z/pdf/day2/4_Steffen_Maier_zfcp-best-practices-2015.pdf -- You received this bug notification because you are a member of Kernel Packages, which is subscribed to linux in Ubuntu. https://bugs.launchpad.net/bugs/1641078 Title: System cannot be booted up when root filesystem is on an LVM on two disks Status in Ubuntu on IBM z Systems: New Status in linux package in Ubuntu: New Bug description: ---Problem Description--- LVMed root file system acrossing multiple disks cannot be booted up ---uname output--- Linux ntc170 4.4.0-38-generic #57-Ubuntu SMP Tue Sep 6 15:47:15 UTC 2016 s390x s390x s390x GNU/Linux ---Patches Installed--- n/a Machine Type = z13 ---System Hang--- cannot boot up the system after shutdown or reboot ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Created root file system on an LVM and the LVM crosses two disks. After shut down or reboot the system, the system cannot be up. Stack trace output: no Oops output: no System Dump Info: The system is not configured to capture a system dump. Device driver error code: Begin: Mounting root file system ... Begin: Running /scripts/local-top ... lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V. -Attach sysctl -a output output to the bug. More detailed installation description: The installation was on a FCP SCSI SAN volumes each with two active paths. Multipath was involved. The system IPLed fine up to the point that we expanded the /root filesystem to span volumes. At boot time, the system was unable to locate the second segment of the /root filesystem. The error message indicated this was due to lvmetad not being not active. Error message: Begin: Running /scripts/local-block ... lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid 7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V Failed to find logical volume "ub01-vg/root" PV Volume information: physical_volumes { pv0 { id = "L2qixM-SKkF-rQsp-ddao-gagl-LwKV-7Bw1Dz" device = "/dev/sdb5" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 208713728 # 99.5225 Gigabytes pe_start = 2048 pe_count = 25477 # 99.5195 Gigabytes } pv1 { id = "7PC3sg-i5Dc-iSqq-AvU1-XYv2-M90B-M0kO8V" device = "/dev/sda" # Hint only status = ["ALLOCATABLE"] flags = [] dev_size = 209715200 # 100 Gigabytes pe_start = 2048 pe_count = 25599 # 99.9961 Gigabytes LV Volume Information: logical_volumes { root { id = "qWuZeJ-Libv-DrEs-9b1a-p0QF-2Fj0-qgGsL8" status = ["READ", "WRITE", "VISIBLE"] flags = [] creation_host = "ub01" creation_time = 1477515033 # 2016-10-26 16:50:33 -0400 segment_count = 2 segment1 { start_extent = 0 extent_count = 921 # 3.59766 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv0", 0 ] } segment2 { start_extent = 921 extent_count = 25344 # 99 Gigabytes type = "striped" stripe_count = 1 # linear stripes = [ "pv1", 0 ] } } Additional testing has been done with CKD volumes and we see the same behavior. Only the UUID of the fist volume in the VG can be located at boot, and the same message: lvmetad is not active yet, using direct activation during sysinit Couldn't find device with uuid xxxxxxxxxxxxxxxxx is displayed for CKD disks. Just a different UUID is listed. If the file /root file system only has one segment on the first volume, CKD or SCSI volumes, the system will IPL. Because of this behavior, I do not believe the problem is related to SAN disk or multipath. I think it is due to the system not being able to read the UUID on any PV in the VG other then the IPL disk. To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1641078/+subscriptions -- Mailing list: https://launchpad.net/~kernel-packages Post to : kernel-packages@lists.launchpad.net Unsubscribe : https://launchpad.net/~kernel-packages More help : https://help.launchpad.net/ListHelp