[Linux-HA] Why does o2cb RA remove module ocfs2?
Hi! I had a problem where O2CB stop fenced the node that was shut down: I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the O2CB resource was stopped. Unfortunately this caused an error like (SLES11 SP3): --- modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No such file or directory o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2 --- This in turn caused a node fence, which ruined the clean reboot. So why is the RA messing with the kernel module on stop? Regards, Ulrich ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Why does o2cb RA remove module ocfs2?
On 2014-02-05T12:24:00, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I had a problem where O2CB stop fenced the node that was shut down: I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the O2CB resource was stopped. Unfortunately this caused an error like (SLES11 SP3): --- modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No such file or directory o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2 --- This in turn caused a node fence, which ruined the clean reboot. So why is the RA messing with the kernel module on stop? Because customers complained about the new module not being picked up if they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's incredibly hard to please everyone, alas ... The right way to update a cluster node is anyway this one: 1. Stop the cluster stack 2. Update/upgrade/reboot as needed 3. Restart the cluster stack This would avoid this error too. Or keeping multiple kernel versions in parallel (which also helps if a kernel update no longer boots for some reason). Removing the running kernel package is usually not a great idea; I prefer to remove them after having successfully rebooted only, because you *never* know if you may have to reload a module. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
[Linux-HA] Antw: Re: Why does o2cb RA remove module ocfs2?
Lars Marowsky-Bree l...@suse.com schrieb am 05.02.2014 um 12:36 in Nachricht 20140205113649.gn13...@suse.de: On 2014-02-05T12:24:00, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I had a problem where O2CB stop fenced the node that was shut down: I had updated the kernel, and then rebooted. As part of shutdown, the cluster stack was stopped. In turn, the O2CB resource was stopped. Unfortunately this caused an error like (SLES11 SP3): --- modprobe: FATAL: Could not load /lib/modules/3.0.101-0.8-xen/modules.dep: No such file or directory o2cb(prm_O2CB)[19908]: ERROR: Unable to unload module: ocfs2 --- This in turn caused a node fence, which ruined the clean reboot. So why is the RA messing with the kernel module on stop? Because customers complained about the new module not being picked up if they upgrade ocfs2-kmp and restarted the cluster stack on a node. It's incredibly hard to please everyone, alas ... I think the proper way would be this: Stop your OCFS2 resources, rmmod the module, [modprobe the module to re-insert the new version], start your OCFS2 resources. I guess the kernel update is more common than the just the ocfs2-kmp update The right way to update a cluster node is anyway this one: 1. Stop the cluster stack 2. Update/upgrade/reboot as needed 3. Restart the cluster stack This would avoid this error too. Or keeping multiple kernel versions in parallel (which also helps if a kernel update no longer boots for some reason). Removing the running kernel package is usually not a great idea; I prefer to remove them after having successfully rebooted only, because you *never* know if you may have to reload a module. There's another way: (Like HP-UX learned to do it): Defer changes to the running kernel until shutdown/reboot. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
Re: [Linux-HA] Antw: Re: Why does o2cb RA remove module ocfs2?
On 2014-02-05T15:06:47, Ulrich Windl ulrich.wi...@rz.uni-regensburg.de wrote: I guess the kernel update is more common than the just the ocfs2-kmp update Well, some customers do apply updates in the recommended way, and thus don't encounter this ;-) In any case, since at this time the cluster services are already stopped, at least the service impact is minimal. This would avoid this error too. Or keeping multiple kernel versions in parallel (which also helps if a kernel update no longer boots for some reason). Removing the running kernel package is usually not a great idea; I prefer to remove them after having successfully rebooted only, because you *never* know if you may have to reload a module. There's another way: (Like HP-UX learned to do it): Defer changes to the running kernel until shutdown/reboot. True. Hence: activate multi-versions for the kernel in /etc/zypp/zypp.conf and only remove the old kernel after the reboot. I do that manually, but I do think we even have a script for that somewhere. I honestly don't remember where though; I like to keep several kernels around for testing anyway. I think this is the default going forward, but as always: zypper gained this ability during the SLE 11 cycle, and we couldn't just change existing behaviour in a simple update, it has to be manually activated. Regards, Lars -- Architect Storage/HA SUSE LINUX Products GmbH, GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer, HRB 21284 (AG Nürnberg) Experience is the name everyone gives to their mistakes. -- Oscar Wilde ___ Linux-HA mailing list Linux-HA@lists.linux-ha.org http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems