Re: ipmi0: incorrect critical max
On 22/03/23 03:45, Stephen Borrill wrote: On Sat, 18 Mar 2023, Lloyd Parkes wrote: On 18/03/23 05:14, Stephen Borrill wrote: On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ... Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE: 59.253 0.000 110.471 degC Just out of interest, in the BIOS (RBSU) what is the Power Management / Power Regulator set to? It will have settings such as "Dynamic Power Savings Mode" and "OS Control Mode". I set it to Maximum I/O Performance (words may not match exactly, it is in a box waiting to be installed at a customer). OK. When you don't set it to OS Controlled, the HPE RBSU chops power management out of the ACPI in a way that makes Linux complain about corrupt ACPI information. I realise that you are looking at IPMI, not ACPI, but it does have that HPE smell of ugly removal from your view because the RBSU is managing it. That could just be coincidence of course. Cheers, Lloyd
Re: ipmi0: incorrect critical max
On Sat, 18 Mar 2023, Lloyd Parkes wrote: On 18/03/23 05:14, Stephen Borrill wrote: On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ... Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE: 59.253 0.000 110.471 degC Just out of interest, in the BIOS (RBSU) what is the Power Management / Power Regulator set to? It will have settings such as "Dynamic Power Savings Mode" and "OS Control Mode". I set it to Maximum I/O Performance (words may not match exactly, it is in a box waiting to be installed at a customer). -- Stephen
Re: ipmi0: incorrect critical max
net...@precedence.co.uk (Stephen Borrill) writes: > Current CritMax WarnMax WarnMin CritMin Unit >[ipmi0] >11-LOM-CORE:59.2530.000 110.471degC >Seen on 9.3_STABLE, but also in 10 BETA. >I suppose one simple fix would be to ensure that if CritMax is lower >than WarnMax, it should be set to the value of WarnMax. IPMI reports 3 upper and 3 lower limits (each as an unsigned byte) and a bitmask to show which value is valid. lower non-recoverable threshold -> configures CritMin lower critical threshold -> configures CritMin lower non-critical threshold -> configures WarnMin lower limits of 0 are ignored, because you cannot exceed them. upper non-recoverable threshold -> configures CritMax upper critical threshold -> configures CritMax upper non-critical threshold -> configures WarnMax upper limits of 255 are ignored, because you cannot exceed them. Apparently your system says that the upper critical or the non-recoverable threshold exist but returns a value of zero. The code could do some more sanity checking and then just skip the invalid limits. Something like: @@ -1582,6 +1684,16 @@ ipmi_get_sensor_limits(struct ipmi_softc break; } + if ((data[0] & 0x28) == 0x28 && data[6] < data[4]) + data[0] ^= 0x20; + if ((data[0] & 0x18) == 0x18 && data[5] < data[4]) + data[0] ^= 0x10; + + if ((data[0] & 0x0a) == 0x0a && data[3] > data[1]) + data[0] ^= 0x08; + if ((data[0] & 0x06) == 0x06 && data[2] > data[1]) + data[0] ^= 0x04; + if (data[0] & 0x20 && data[6] != 0xff) { *pcritmax = ipmi_convert_sensor(&data[6], psensor); *props |= prop_critmax; As an alternative you could also override the limit in /etc/envsys.conf.
Re: ipmi0: incorrect critical max
On 18/03/23 05:14, Stephen Borrill wrote: On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ... Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE: 59.253 0.000 110.471 degC Just out of interest, in the BIOS (RBSU) what is the Power Management / Power Regulator set to? It will have settings such as "Dynamic Power Savings Mode" and "OS Control Mode". Cheers, Lloyd
Re: ipmi0: incorrect critical max
Stephen Borrill writes: > On an HP Microserver Gen10 Plus, I found that soon after booting, I get > the following alert: > > ipmi0: critical over limit on '11-LOM-CORE' > > If powerd is running (the default), it shuts the machine down (so > basically as soon as it hits multi-user). > > envstat shows that CritMax is zero: > >Current CritMax WarnMax WarnMin CritMin Unit > [ipmi0] > 11-LOM-CORE:59.2530.000 110.471degC > > Seen on 9.3_STABLE, but also in 10 BETA. > > I suppose one simple fix would be to ensure that if CritMax is lower > than WarnMax, it should be set to the value of WarnMax. > > Any other things to look at? The machine won't be put into production for > a few days, so it's good time to experiment > > I have put the latest BIOS on the machine If that server has a independent out of band "system" in it, a BMC with a command line interface or web browser, I would get into that and see if it reports the sensors there just to see if the ipmi driver pulls them correctly. The BMC may not have a way to specifiy or tell you what the Crit and Warn values are, but it would be worth looking around for that too. Failing any of that, I think you should be able to set what NetBSD thinks the CritMax is in /etc/envsys.conf. See envsys.conf(5) for details. I have a ASrockRack board that doesn't report one of the sensors correctly and/or the APMI driver doesn't pull it correctly. It is a fixed values that never changes for nothing... -- Brad Spencer - b...@anduin.eldar.org - KC8VKS - http://anduin.eldar.org
ipmi0: incorrect critical max
On an HP Microserver Gen10 Plus, I found that soon after booting, I get the following alert: ipmi0: critical over limit on '11-LOM-CORE' If powerd is running (the default), it shuts the machine down (so basically as soon as it hits multi-user). envstat shows that CritMax is zero: Current CritMax WarnMax WarnMin CritMin Unit [ipmi0] 11-LOM-CORE:59.2530.000 110.471degC Seen on 9.3_STABLE, but also in 10 BETA. I suppose one simple fix would be to ensure that if CritMax is lower than WarnMax, it should be set to the value of WarnMax. Any other things to look at? The machine won't be put into production for a few days, so it's good time to experiment I have put the latest BIOS on the machine -- Stephen