Hello all, Has anyone experienced issues with Red Hat EL 5.6 using kernels 2.6.18-238, 2.6.18-238.1.1 and 2.6.18-238.5.1 booting in an ESX 3.5 virtual environment? We are running into a condition where VMs are hanging during the initial kernel boot process. I'm unable to correlate these hangs to any particular ESX-level event, the VMs are running on different ESX hosts and even different clusters. All of the issues began with the upgrade to EL 5.6 and kernel 2.6.18-238.1.1.el5 and persists in 2.6.18-238.5.1.el5 (we skipped -238.el5). This has affected more than 20 hosts at this point of all different configurations, but always EL 5.6 VMs only. AS4 is not affected and we don't have any EL6 VMs yet. The issue is exactly the same. During the initial kernel start, it gets as far as:
PCI: Setting latency timer of device 0000:00:01.0 to 64 NET: Registered protocol family 2 IP route cache hash table entries: 32768 (order: 5, 131072 bytes) TCP established hash table entries: 131072 (order: 8, 1048576 bytes) TCP bind hash table entries: 65536 (order: 7, 524288 bytes) TCP: Hash tables configured (established 131072 bind 65536) TCP reno registered Simple Boot Flag at 0x36 set to 0x80 The next line on all VMs that boot successfully is: Using TSC for driving interrupts However VMs that are hanging during boot never reach the "Using TSC..." line. This leads me to believe that the problem is related to the OS electing to use TSC as the clocksouce and that is somehow an unstable combination with ESX 3.5 and EL 5.6 VMs. However the issue is sporadic and I can't make this issue occur - simply that when an EL5.6 VM fails to boot, they all fail in the same place in the same way. I've considered moving back to clocksource=acpi_pm divider=10 as kernel flags that was recommended for EL 5.3 and previously, but I'm hesitant to do that since TSC is clearly a better-performing timekeeper. On physical hosts, even ones that use TSC, I never see a "Using TSC for driving interrupts" kernel message so the behavior is subtly different but I can't find anything in Google about this kernel message or event. Has anyone encountered this? Anyone able to shed light on the inner workings of TSC that might lead me to a solution for this (or perhaps being able to intelligently file a Bugzilla)? Thanks. -- Jason McCormick Unix Team Lead, Systems Group, IT Software Engineering Institute, Carnegie Mellon Univ. E: [email protected] _______________________________________________ rhelv5-list mailing list [email protected] https://www.redhat.com/mailman/listinfo/rhelv5-list
