Michal Bachorik - Sun Microsystems - Prague Czech Republic wrote: > > Hi Garret, all, > > some answers below: >>>> >>>> Most importantly: is there any negative impact on FMA? >>>> >>> I do not understand the question - if you configure the freeipmi in >>> a way, that it watches remote machine and performs shutdown when CPU >>> temperature reaches 60oC while FMA is configured in a way (I do not >>> know if it is possible, just guessing) that it has to switch off the >>> machine when CPU temperature reaches 60oC and freeipmi is faster >>> then FMA (so it reboots the machine sooner, then FMA is able to >>> perform shutdown), then YES, it can have negative impact. >>> >>> But in such sense, anything can have negative impact on FMA - for >>> me, it is the administrator's responsibility to configure the >>> machine in a way that it works reliable. It is important, that by >>> default (after installation) freeimpi does NOTHING. it has to be >>> configured to DO something. >> >> OK. But in the situation you describe, FMA's fault handling might >> allow for a different handling -- e.g. turning fans up to full speed, >> or throttling back a CPU or even disabling one or more cores (or the >> whole CPU if multiple CPUs are present) -- which is better IMO than a >> blind shut down. > > Shutting down a machine was just an example - I think IPMI could be > capable of doing similar things (increase FAN speed, disabling core - > i am not 100% sure if in current version, but IPMI is evolving as it > is pretty much standard for platform operations management in HW) or > even just for sending a SNMP alert or to send the email.
The issue here isn't IPMI, but rather what is "freeipmi" capable of doing, and if it is being used for "control", then there are potential ways that the freeipmi tool and FMA can step on each other's toes. > > I believe that FMA is important feature and useful, but there is one > differentiator - ipmi can be used for watching the machine remotely > and perform the platform operation without having access to operating > system. this is important for HPCin situations, where large cluster > (zounds of nodes) are needed to be restarted, or just shutdown without > touching the OS (for example, OS re-provisioning is going to be > performed and we do not wait until current OS would go down). Ok. > I just answered all 3 above questions in separate email, sorry for > scattering the info. Yep, saw that. Thanks. - Garrett > > Regards, > > Michal >> >> - Garrett >> >>> >>> Regards, >>> >>> Michal >>>> Nico >>>> >>> >> >