Hey Werner, Cool. I'll document this in the code as well, and I'll go ahead and write a SEL interpretation condition for the SMI timeout too.
Al On Thu, 2011-04-07 at 07:06 -0700, Werner Fischer wrote: > Hey Al, > > thanks for the beta1. I'll forward this to our customer and let you know > as soon as I have feedback. > > btw: below are the details I got from Intel regarding the sensors. > -------------------------------------------------------------------- > 1) SMI Timeout: > The BMC supports an SMI timeout sensor (sensor type OEM (F3h), event > type Discrete (03h)) that asserts if the SMI signal has been asserted > for more than 90 seconds. A continuously asserted SMI signal is an > indication that the BIOS cannot service the condition that caused the > SMI. This is usually because that condition prevents the BIOS from > running. When an SMI timeout occurs, the BMC asserts the SMI timeout > sensor and logs a SEL event for that sensor. The BMC will also reset the > system. > The normal value is deasserted; system health status = OK > When this sensor is asserted, the system health status = fatal. > > 2) IOH Therm Trip. This sensor indicates whether the IOH has reached > overheating point (thermal trip point) > The normal value is deasserted; system health status = OK > When this sensor is asserted, the system health status = fatal. > > Both VRD Hot sensors have a fatal contribution to system health when in > limit exceeded state. > -------------------------------------------------------------------- > > best regards, > Werner > > On Wed, 2011-04-06 at 13:59 -0700, Albert Chu wrote: > > Hey Werner, > > > > I got a beta release that should handle sensor #47. > > > > http://download.gluster.com/pub/freeipmi/qa-release/freeipmi-1.0.4.beta1.tar.gz > > > > Al > > > > On Wed, 2011-04-06 at 10:00 -0700, Albert Chu wrote: > > > Hey Werner, > > > > > > On Tue, 2011-04-05 at 23:47 -0700, Werner Fischer wrote: > > > > Hi Al, > > > > > > > > thank you for the beta. > > > > > > > > Sensors 55, 56, and 59 are now recognized: > > > > > > > > ID | Name | Type | State | Reading > > > > | Units | Event > > > > [...] > > > > 47 | SMI Timeout | OEM Reserved | N/A | N/A > > > > | N/A | 'OK' > > > > [...] > > > > 55 | P1 VRD Hot | Temperature | Nominal | N/A > > > > | N/A | 'OK' > > > > 56 | P2 VRD Hot | Temperature | Nominal | N/A > > > > | N/A | 'OK' > > > > [...] > > > > 59 | IOH Therm Trip | Temperature | Nominal | N/A > > > > | N/A | 'OK' > > > > > > > > For sensor 47 the state is still "N/A". > > > > > > > > For the SMI timeout I assume that the unasserted state is the one which > > > > should be nominal as I have found a notice on a similar Intel > > > > motherboard: There Intel they corrected an issue when SMI Timeout was > > > > asserted, causing a critical event in their event log - see page 19 in > > > > this pdf, point "5) Event Log may report SMI Timeout Assertion after > > > > Server Power button is pressed" > > > > http://download.intel.com/support/motherboards/server/mfsys25/sb/mfsys25_mfsys35_spec_update_feb11.pdf > > > > > > Ahh, I completely misread sensor 47. I thought it was an OEM event > > > sensor, but it's not. It has a normal event, the sensor type is the > > > only thing that is OEM. Assuming your guess about assert vs. unassert > > > is correct (it's a reasonable guess to me), I can add this OEM support > > > into FreeIPMI. I'll try and get you a beta sometime later today. > > > > > > Al > > > > > > > But I will ask Intel on more details on sensor 47 and sensor 59 as you > > > > have requested to be sure. I'll let you know on the list once I have > > > > more details on that. > > > > > > > > Best regards, > > > > Werner > > > > > > > > > > > > PS: here is some more verbose output on these four sensors: > > > > > > > > Record ID: 47 > > > > ID String: SMI Timeout > > > > Sensor Type: OEM Reserved (F3h) > > > > Sensor Number: 6 > > > > IPMB Slave Address: 10h > > > > Sensor Owner ID: 20h > > > > Sensor Owner LUN: 0h > > > > Channel Number: 0h > > > > Entity ID: system board (7) > > > > Entity Instance: 1 > > > > Entity Instance Type: Physical Entity > > > > Event/Reading Type Code: 3h > > > > Sensor State: N/A > > > > Sensor Event: 'OK' > > > > > > > > Record ID: 55 > > > > ID String: P1 VRD Hot > > > > Sensor Type: Temperature (1h) > > > > Sensor Number: 102 > > > > IPMB Slave Address: 10h > > > > Sensor Owner ID: 20h > > > > Sensor Owner LUN: 0h > > > > Channel Number: 0h > > > > Entity ID: processor (3) > > > > Entity Instance: 1 > > > > Entity Instance Type: Physical Entity > > > > Event/Reading Type Code: 5h > > > > Sensor State: Nominal > > > > Sensor Event: 'OK' > > > > > > > > Record ID: 56 > > > > ID String: P2 VRD Hot > > > > Sensor Type: Temperature (1h) > > > > Sensor Number: 103 > > > > IPMB Slave Address: 10h > > > > Sensor Owner ID: 20h > > > > Sensor Owner LUN: 0h > > > > Channel Number: 0h > > > > Entity ID: processor (3) > > > > Entity Instance: 2 > > > > Entity Instance Type: Physical Entity > > > > Event/Reading Type Code: 5h > > > > Sensor State: Nominal > > > > Sensor Event: 'OK' > > > > > > > > Record ID: 59 > > > > ID String: IOH Therm Trip > > > > Sensor Type: Temperature (1h) > > > > Sensor Number: 106 > > > > IPMB Slave Address: 10h > > > > Sensor Owner ID: 20h > > > > Sensor Owner LUN: 0h > > > > Channel Number: 0h > > > > Entity ID: system board (7) > > > > Entity Instance: 1 > > > > Entity Instance Type: Physical Entity > > > > Event/Reading Type Code: 3h > > > > Sensor State: Nominal > > > > Sensor Event: 'OK' > > > > > > > > On Fri, 2011-04-01 at 15:32 -0700, Albert Chu wrote: > > > > > Hey Werner, Ben, > > > > > > > > > > Here's a beta that should support those sensor interpretations. It's > > > > > tough for me to test w/o your motherboard in front of me, PLMK if it > > > > > works for you. > > > > > > > > > > http://download.gluster.com/pub/freeipmi/qa-release/freeipmi-1.0.4.beta0.tar.gz > > > > > > > > > > Al > > > > > > > > > > On Fri, 2011-04-01 at 03:51 -0700, Werner Fischer wrote: > > > > > > Hi Al, > > > > > > (sorry for sending it twice, I sent my first email in error only to > > > > > > you, not the list) > > > > > > > > > > > > I've been on vacation for some weeks and now back again. > > > > > > > > > > > > Benjamin meant with "not detected" that FreeIPMI returns a > > > > > > monitoring > > > > > > status of "N/A" for those sensors (not "Nominal"). Unfortunately we > > > > > > missed to send the output of "ipmimonitoring --legacy-output > > > > > > --interpret-oem-data --quiet-cache --sdr-cache-recreate" (which is > > > > > > used > > > > > > by our Nagios plugin): > > > > > > > > > > > > Record ID | Sensor Name | Sensor Group | Monitoring Status | Sensor > > > > > > Units | Sensor Reading [...] > > > > > > 47 | SMI Timeout | OEM Reserved | N/A | N/A | 'OK' > > > > > > [...] > > > > > > 55 | P1 VRD Hot | Temperature | N/A | N/A | 'OK' > > > > > > 56 | P2 VRD Hot | Temperature | N/A | N/A | 'OK' > > > > > > [...] > > > > > > 59 | IOH Therm Trip | Temperature | N/A | N/A | 'OK' > > > > > > > > > > > > Would it be possible for you to include information about those four > > > > > > sensors to future versions of FreeIPMI, so that it reports a > > > > > > monitoring > > > > > > status of "Nominal" when the sensor reading is 'OK' as above? > > > > > > > > > > > > In case you would need additional information from Intel about those > > > > > > sensors, just let me know. > > > > > > > > > > > > Best regards and have a nice weekend, > > > > > > thank you, > > > > > > Werner > > > > > > > > > > > > On Wed, 2011-02-23 at 10:06 -0800, Albert Chu wrote: > > > > > > > Hi Benjamin, > > > > > > > > > > > > > > What do you mean by "not detected"? It appears everything is > > > > > > > fine by > > > > > > > the information you list below. > > > > > > > > > > > > > > Do you mean these sensors are not reporting actual temperatures? > > > > > > > While > > > > > > > these are indeed temperature sensors (identified by the > > > > > > > motherboard as > > > > > > > such), they do not appear to be sensors that report a temperature > > > > > > > reading. They instead report an event bitmask. The key is the > > > > > > > "event/Readin Type Code" field of each sensor. > > > > > > > > > > > > > > Al > > > > > > > > > > > > > > On Tue, 2011-02-22 at 23:55 -0800, Benjamin Bayer wrote: > > > > > > > > Hello, > > > > > > > > we have a Intel SR1625 wehre some Sensors not detected with > > > > > > > > FreeIPMI Version 1.0.2.beta3. > > > > > > > > > > > > > > > > Thank You. > > > > > > > > > > > > > > > > Regards > > > > > > > > > > > > > > > > Benjamin Bayer > > > > > > > > > > > > > > > > -- Albert Chu [email protected] Computer Scientist High Performance Systems Division Lawrence Livermore National Laboratory _______________________________________________ Freeipmi-users mailing list [email protected] http://lists.gnu.org/mailman/listinfo/freeipmi-users
