Re: RHEL 6.9 servers getting Unknown program exception: 0028 SMP
Bill, Has the dust settled now on the z/VM Spectre fixes? We put on VM65396 as well on a couple of systems then halted that until further fixes arrive. On 4/12/2018 7:26 AM, Bill Bitner wrote: Rick, if you have VM65396 on, but not VM65414, that would be my guess. VM65414 corrected a problem introduced by VM65396 where a guest could erroneously be given a program check 28. ___ Bill Bitner - z/VM Customer Focus and Care - 607-429-3286 bitn...@us.ibm.com "Making systems practical and profitable for customers through virtualization and its exploitation." - z/VM -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: Gold On LUN
We used EDEVICE's for 15 years, tired of that. :-) Seriously the large EDEVICEs with many zLinux OS disks per device was a bottleneck. We never moved '/var' off so log backups (often for a bunch of systems at once) and Linux backups were both having really high wait times on EDEVs, with 1 I/O at a time per EDEVICE I guess no surprise. On top of that we have really bad SAN performance during backup windows for a VERY long time (6-8 months). Maybe our EDEV's were too large and with too many servers per each, and certainly the servers could have been placed on them more randomly, but when the Linux SA's decided to up the OS disk allocation from 20G for SLES11 to 60G for SLES12 the thinking was we would need even larger EDEV's causing even more contention. Now we're learning that NPIV would be an important thing to have for a pure LUN based implementation, but alas, hard to implement down the road like this. If this proves to be too difficult we might go back to EDEV, we're just starting with this project. I don't know what all went in to the decision to not do NPIV, but the limitation on login's may have been a factor. But we certainly have seen cases where someone has picked up the wrong LUN, and fortunately DIDN'T USE IT, but still, the other server couldn't come up because it's LUN was stolen. Anyway, I'm the VM guy, and there are a lot of good tips here, but I'll need to see what our Linux guys want to do. Maybe the SLES12-SP3 clone fixer thing would do, sounds like it would. Alan, so don't share the PCHPID b/c say 1803 could be attached to two servers at once? Problem is that we have a spare ficon to the SAN with very little on it and we maybe could change it to NPIV and migrate gradually, but with two production LPARS we would have to share that new NPIV'd channel at least until we can free the other one to change to NPIV. Another thing we haven't really figured out is COOP, similar situation there, we'll have a LUN that is a copy, but how do we get the Linux guest to happily use it. That was easier with EDEV b/c we could get the system up and fix the multipath config and whatever else they were doing to get SLES11 to work. On 9/9/2017 3:37 PM, Scott Rohling wrote: EDEVICEs are z/VM's way to provide that virtualization and hide the ugly details but in this case z/VM is simply supplying a path (FCP subchannel) and it's up to guest OS to do what it needs to do to connect to the storage using that path. So to me the question would be - why aren't EDEVICEs being used? Scott Rohling On Fri, Sep 8, 2017 at 1:45 PM, Willemina Konynenberg wrote: To me, all this seems to suggest some weakness in the virtualisation infrastructure, which seems odd for something as mature as z/VM. So then the follow up question would be: is the host infrastructure being used properly here? Is there not some other (managable) way to set things up such that all the ugly technical details of the underlying host/san infrastructure is completely hidden from the (clone) guests, and that the guests cannot accidentally end up accessing resources they shouldn't be allowed to access? This should be the responsibility of the host system, not of each and every single guest. To me, that seems a fairly basic requirement for any sensible virtual machine host infrastructure, so I would think that would already be possible in z/VM somehow. Willemina On 09/08/17 22:28, Robert J Brenneman wrote: Ancient history: http://www.redbooks.ibm.com/redpapers/pdfs/redp3871.pdf Without NPIV you're in that same boat. Even if you had NPIV you would still have to mount the new clone and fix the ramdisk so that it points to the new target device instead of the golden image. This is especially an issue for DS8000 type storage units that give every LUN a unique LUN number based on which internal LCU its on and the order it gets created. Storwize devices like SVC and V7000 do it differently: each LUN is numbered starting from and counts up from there for each host, so the boot LUN is always LUN 0x for every clone and you don't have to worry about that part so much. The gist of your issue is that you need to: mount the new clone volume on a running Linux instance chroot into it so that your commands are 'inside' that cloned linux environment fix the udev rules to point to the correct lun number fix the grub kernel parameter to point to the correct lun if needed fix the /etc/fstab records to point to the new lun if needed ?? re-generate the initrd so that it does not contain references to the master image ?? ( I'm not sure whether that last one is required on SLES 12 ) On Fri, Sep 8, 2017 at 3:30 PM, Alan Altmark wrote: On Friday, 09/08/2017 at 04:46 GMT, Scott Rohling wrote: Completely agree with you ..I might make an exception if the only FCP use is for z/VM to supply EDEVICEs AND the PCHID is configured in the IOCDS as non-shared. Alan Altmark Senior Managin
Re: Gold On LUN
Thanks Robert and others, We figured there would be a learning curve, I think we'll get it figured out, we just need to figure out everything, then how you do those things on SLES12. On 9/8/2017 3:28 PM, Robert J Brenneman wrote: Ancient history: http://www.redbooks.ibm.com/redpapers/pdfs/redp3871.pdf Without NPIV you're in that same boat. Even if you had NPIV you would still have to mount the new clone and fix the ramdisk so that it points to the new target device instead of the golden image. This is especially an issue for DS8000 type storage units that give every LUN a unique LUN number based on which internal LCU its on and the order it gets created. Storwize devices like SVC and V7000 do it differently: each LUN is numbered starting from and counts up from there for each host, so the boot LUN is always LUN 0x for every clone and you don't have to worry about that part so much. The gist of your issue is that you need to: mount the new clone volume on a running Linux instance chroot into it so that your commands are 'inside' that cloned linux environment fix the udev rules to point to the correct lun number fix the grub kernel parameter to point to the correct lun if needed fix the /etc/fstab records to point to the new lun if needed ?? re-generate the initrd so that it does not contain references to the master image ?? ( I'm not sure whether that last one is required on SLES 12 ) On Fri, Sep 8, 2017 at 3:30 PM, Alan Altmark wrote: On Friday, 09/08/2017 at 04:46 GMT, Scott Rohling wrote: Completely agree with you ..I might make an exception if the only FCP use is for z/VM to supply EDEVICEs AND the PCHID is configured in the IOCDS as non-shared. Alan Altmark Senior Managing z/VM and Linux Consultant IBM Systems Lab Services IBM Z Delivery Practice ibm.com/systems/services/labservices office: 607.429.3323 mobile; 607.321.7556 alan_altm...@us.ibm.com IBM Endicott -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- Jay Brenneman -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: Gold On LUN
Bingo! No NPIV so our only hope is fixing the clone with it mounted to gold server or in recovery mode, but if "It's all tooling, no direct editing of any config files with SLES." then how do we fix this? On 9/8/2017 9:05 AM, Scott Rohling wrote: That depends on whether NPIV is enabled ... otherwise this guest could have the same access as the one it was cloned from.. Scott Rohling On Fri, Sep 8, 2017 at 6:41 AM, Steffen Maier wrote: Volume access control (LUN masking / host mapping) should prevent access to a golden volume. You'd still need to customize disk clones to make them work but it should at least break early during boot and not accecss the golden image (especially not writable and thus potentially destroying its golden property). On 09/07/2017 09:19 PM, Greg Preddy wrote: I think that is it, we have no clue where SLES12 puts the info about LUNS so don't know what to change where. Here's the official top-level documentation for zfcp configuration: https://www.ibm.com/support/knowledgecenter/linuxonibm/com. ibm.linux.z.lhdd/lhdd_t_fcp_wrk_on.html https://www.ibm.com/support/knowledgecenter/linuxonibm/com. ibm.linux.z.lhdd/lhdd_t_fcp_wrk_addu.html It's all tooling, no direct editing of any config files with SLES. On overview of the same, although not yet full updated for SLES12, is in (slides 33,35 for SLES): http://www-05.ibm.com/de/events/linux-on-z/pdf/day2/4_Steffe n_Maier_zfcp-best-practices-2015.pdf But there's much more to customize in a golden disk image on the first use of a disk clone... Found some steps for cloning SLES11 that would most likely work if we were SLES11. http://www.redbooks.ibm.com/abstracts/sg248890.html?Open "The Virtualization Cookbook for IBM z Systems Volume 3: SUSE Linux Enterprise Server 12" However, the authors moved focus from golden disk image cloning towards different disk content provisioning techniques for technical reasons: "Chapter 7. Configuring Linux for cloning Linux operating systems over time tend to have more and more unique identifiers, such as, with the introduction of systemd, a new machine ID has been added. All of these identifiers must be re-created on the cloned system. However, the process to know all these identifiers and to re-create them requires in-depth knowledge of the golden image. Failure to update all of these identifiers could cause unforeseen trouble later, including the possibilities of data corruption or security issues. If you are unsure of all of the unique identifiers for your golden image, and you prefer not to follow the cloning process, refer to the automated installation procedures for KIWI imaging instead. Find information about these in the following chapters" The older book version for SLES11 might contain more information on cloning, but that's of course not necessarily fully applicable to SLES12. http://www.redbooks.ibm.com/abstracts/tips1060.html?Open NB: The book's own tooling/scripting contains the image clone customization details. On 9/7/2017 10:34 AM, Karl Kingston wrote: Check your FCP definitions on linux. You may find they are still referencing your gold system. On Thu, 2017-09-07 at 11:31 -0400, Grzegorz Powiedziuk wrote: Hi What do you mean it still mounts a gold LUN? You boot from from a NEW Lun but root filesystem ends up beeing mounted from GOLD Lun? First of I all I would make sure that GOLD lun after clonning is not accesible in virtual machine anymore. Just to make it simple. I can't remember how it is done in SLES but in RHEL there is a bunch of stuff that refers to a specific LUN with a specific scsi_id For example multipath (/etc/multipath.conf) configuration. In there you usually you bond scsi_id (wwid) of Lun with friendly name (mpathX for example). That multipath configuration is also saved in initrd. So if you boot from clone, it will end up mounting wrong volume. Are you using LVM? 2017-09-07 9:08 GMT-04:00 Greg Preddy : All, We're doing SLES 12 on 100% LUN, with gold copy on a single 60GB LUN. This is a new cloning approach for us so we're not sure how to make this work. Our Linux SA got the storage admin to replicate the LUN, but when we change the server to boot the copy, it still mounts the gold LUN. 99% sure we got the LOADDEV parms right. Does anyone have steps to clone a LUN-only SLES 12 system? -- Mit freundlichen Grüßen / Kind regards Steffen Maier Linux on z Systems Development IBM Deutschland Research & Development GmbH Vorsitzende des Aufsichtsrats: Martina Koederitz Geschaeftsfuehrung: Dirk Wittkopp Sitz der Gesellschaft: Boeblingen Registergericht: Amtsgericht Stuttgart, HRB 243294 -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?
Re: Gold On LUN
I think that is it, we have no clue where SLES12 puts the info about LUNS so don't know what to change where. Found some steps for cloning SLES11 that would most likely work if we were SLES11. On 9/7/2017 10:34 AM, Karl Kingston wrote: Check your FCP definitions on linux. You may find they are still referencing your gold system. On Thu, 2017-09-07 at 11:31 -0400, Grzegorz Powiedziuk wrote: Hi What do you mean it still mounts a gold LUN? You boot from from a NEW Lun but root filesystem ends up beeing mounted from GOLD Lun? First of I all I would make sure that GOLD lun after clonning is not accesible in virtual machine anymore. Just to make it simple. I can't remember how it is done in SLES but in RHEL there is a bunch of stuff that refers to a specific LUN with a specific scsi_id For example multipath (/etc/multipath.conf) configuration. In there you usually you bond scsi_id (wwid) of Lun with friendly name (mpathX for example). That multipath configuration is also saved in initrd. So if you boot from clone, it will end up mounting wrong volume. Are you using LVM? 2017-09-07 9:08 GMT-04:00 Greg Preddy : All, We're doing SLES 12 on 100% LUN, with gold copy on a single 60GB LUN. This is a new cloning approach for us so we're not sure how to make this work. Our Linux SA got the storage admin to replicate the LUN, but when we change the server to boot the copy, it still mounts the gold LUN. 99% sure we got the LOADDEV parms right. Does anyone have steps to clone a LUN-only SLES 12 system? -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Re: Gold On LUN
Yes we use LVM except on /boot. Not clear what needs to be changed, /etc/multipath.conf on the new LUN? On 9/7/2017 10:31 AM, Grzegorz Powiedziuk wrote: Hi What do you mean it still mounts a gold LUN? You boot from from a NEW Lun but root filesystem ends up beeing mounted from GOLD Lun? First of I all I would make sure that GOLD lun after clonning is not accesible in virtual machine anymore. Just to make it simple. I can't remember how it is done in SLES but in RHEL there is a bunch of stuff that refers to a specific LUN with a specific scsi_id For example multipath (/etc/multipath.conf) configuration. In there you usually you bond scsi_id (wwid) of Lun with friendly name (mpathX for example). That multipath configuration is also saved in initrd. So if you boot from clone, it will end up mounting wrong volume. Are you using LVM? 2017-09-07 9:08 GMT-04:00 Greg Preddy : All, We're doing SLES 12 on 100% LUN, with gold copy on a single 60GB LUN. This is a new cloning approach for us so we're not sure how to make this work. Our Linux SA got the storage admin to replicate the LUN, but when we change the server to boot the copy, it still mounts the gold LUN. 99% sure we got the LOADDEV parms right. Does anyone have steps to clone a LUN-only SLES 12 system? -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/ -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/
Gold On LUN
All, We're doing SLES 12 on 100% LUN, with gold copy on a single 60GB LUN. This is a new cloning approach for us so we're not sure how to make this work. Our Linux SA got the storage admin to replicate the LUN, but when we change the server to boot the copy, it still mounts the gold LUN. 99% sure we got the LOADDEV parms right. Does anyone have steps to clone a LUN-only SLES 12 system? -- For LINUX-390 subscribe / signoff / archive access instructions, send email to lists...@vm.marist.edu with the message: INFO LINUX-390 or visit http://www.marist.edu/htbin/wlvindex?LINUX-390 -- For more information on Linux on System z, visit http://wiki.linuxvm.org/