Re: [CentOS] Upgrading to Centos 4.7 on HP DL580G5 caused problems
nate wrote: Dr Les Oswald wrote: Googling revealed many different scenarios with this boot error message, some suggesting a memory error - Oh Joy, these two machines have 64GB RAM each. Login to the ILO and checked the integrated management log for errors? It does sound like a hardware issue. Forgot to reply on this one & had a query from user since - fixed by simply reverting to previous kernel. No further problems. In conclude that its been caused by kernel upgrade and was unlikely to be hardware as both nodes were affected instantly after applying the upgrade. Les Oswald nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos begin:vcard fn:Dr Les Oswald n:Oswald;Les org:Cranfield University;IT Department adr:Central Avenue;;Building 63;CRANFIELD CAMPUS;Beds.;MK43 0AL;UK email;internet:l.osw...@cranfield.ac.uk title:HPC Specialist tel;work:01234-752924 (DDL) tel;fax:01234-751814 tel;cell:07765-915549 x-mozilla-html:TRUE url:http://www.cranfield.ac.uk/ccc version:2.1 end:vcard ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
Re: [CentOS] Upgrading to Centos 4.7 on HP DL580G5 caused problems
Dr Les Oswald wrote: > Googling revealed many different scenarios with this boot error message, > some suggesting a memory error - Oh Joy, these two machines have 64GB > RAM each. Login to the ILO and checked the integrated management log for errors? It does sound like a hardware issue. nate ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos
[CentOS] Upgrading to Centos 4.7 on HP DL580G5 caused problems
As part of patching a cluster which has two DL580G5 login nodes ( 4X Intel 7300 DC cpus) & 24 HP DL160G5 compute nodes ( 2x Intel 5272 DC cpus) we encountered an issue that I would like to record: I upgraded both DL580s to Centos 4.7 via yum but only rebooted one initially- this node, previously bomb-proof, started to hang randomly with no obvious messages logged to help with diagnosis. In the dmessage output I found this sequence never seem before Uhhuh. NMI received for unknown reason 20. Dazed and confused, but trying to continue Do you have a strange power saving mode enabled? Uhhuh. NMI received for unknown reason 30. Dazed and confused, but trying to continue (repeated several times) Googling revealed many different scenarios with this boot error message, some suggesting a memory error - Oh Joy, these two machines have 64GB RAM each. I then changed grub.conf to boot to the previous kernel 2.6.9-67.0.15.ELsmp instead of the updated version of 2.6.9-78.0.5.ELsmp. The boot-time error messages immediately went away and so far the systems are reliable. Has anyone an explanation or confirmation that they have seen or overcome the above issue? I should mention that the DL160 compute nodes have not exhibited this behaviour at all. Les Oswald ___ CentOS mailing list CentOS@centos.org http://lists.centos.org/mailman/listinfo/centos