On Wed, 2021-01-27 at 16:32 +0000, Mark Brown wrote: > On Wed, Jan 27, 2021 at 03:34:46PM +0100, AngeloGioacchino Del Regno > wrote: > > Il 27/01/21 13:56, Matti Vaittinen ha scritto: > > > I can only speak for BD9576MUF - which has two limits for > > > monitored > > > entity (temperature/voltage). One limit being 'warning' limit (or > > > 'detection' as data-sheet says), the other being 'protection' > > > limit. > > I would tend to agree with you here, Matti. Also from what I > > understand, > > the wanted outcome is software handling a possibly temporary issue > > with > > you charging caps, external IC initialization using (expectedly) > > much > > more power than needed before stabilizing, and eventually handling > > other > > "real" issues for which there is a solution that may not even > > include > > disabling the regulator itself, but some other handling on the > > connected > > device driver. > > Note that the events the API currently has are expected to be for the > actual error conditions, not for the warning ones - indicating that > the > voltage is out of regulation for example.
I am unsure how to interpret this. What is the criteria of issue being an error/warning. When I was talking about warning I meant that the issue which is detected is unexpected and abnormal (error?) - but might still be recoverable (warning?). I understand the regulator framework must not signal same events for different purposes - but I don't really know what the current events are used for - I am grateful for any guidance! So, I would like to have an event which the driver can signal to consumers so that the consumer drivers can try to recover problem W/O shutting down the SoC. I try to explain what the BD9576 does and how I see it - I hope you can then tell me if I am misusing existing events/errors. Sorry for longish explanation... The patch I sent did just use existing flags as follows: REGULATOR_EVENT_UNDER_VOLTAGE: Sent when PMIC informs regulator output being under the target. Limit for this is configurable for each output - from roughly at 0.9*(set Vout) up to 0.99*(set Vout). The detection can also be disabled. I think that if this is enabled for system, then this should be regarded as "error condition" - but HW does not shut down. Furthermore, BD9576 does NOT have "protection" limit for output dropping. It does have a protection limit for regulator input dropping though - and PMIC will forcibly shut down if voltage input of a regulator drops below certain value. What action the system can take when output voltage drops depends on system and is not really up to the regulator driver to decide. But is this notification correct? (I thought so just judging the name). What I thought is that it would be nice to inform consumers about the detected under-voltage (as HW supports this). In addition to this notification I added a 'state flag' in driver so that if someone calls the regulator_get_error we do return REGULATOR_ERROR_UNDER_VOLTAGE. Are these correct flags? REGULATOR_EVENT_REGULATION_OUT: Current patch send this when PMIC informs regulator output being OVER the set voltage. PMIC supports similar limit configuration / feature disabling as with UVD. The over voltage limit can be set roughly from 1.01*(set Vout) to 1.1*(set Vout). PMIC has also protection limits for regulator outputs so that if output is roughly 1.2*(set Vout) the PMIC will shut down. Again, I think this is an error condition but what action system can/should take if output voltage is exceeding the limit is not known by me. I used same regulator_get_error logic and REGULATOR_ERROR_REGULATION_OUT is returned by regulator_get_error when condition is not cleared. Again - I would appreciate knowledge about if this is correct flag. REGULATOR_EVENT_OVER_TEMP: Sent when PMIC gives us "thermal warning" IRQ. The PMIC has thermal shut-down point 175 C when it will immediately cut the power from system to protect it from damaging. The thermal warning IRQ is firing at 30 C lower temperature and HW performs no other action here. AFAIR the temperature limit can not be adjusted. I think this is again an error condition - kind of last resort for the system to do something before PMIC shuts it down. I don't think this is intended to be any 'normal temperature regulation' feature. What system can do here (turn off some HW? Reduce clocks? maxx fans? Shut down graphichs block and send warning to display using TCON image overlay? Forcibly close the panorama windows to cut sunlight?) depends on system. Here I also added REGULATOR_ERROR_OVER_TEMP to be returned by regulator_get_error. Finally, for one of the outputs we have "over current warning" interrupt and over current protection. The limit for OCW is configurable (data-sheet speaks of milliVOLTS here so maybe shunt is used?) and it can also be disabled. Protection is again not configurable and will shut down the outputs immediately. I think it is again up to system designer to decide whether the OCW interrupt should be used to inform error condition. I used REGULATOR_EVENT_OVER_CURRENT and REGULATOR_ERROR_OVER_CURRENT respectively. > If you're supporting warning > notifications as well you'll want to add more events (or possibly > another flag to munge in with the existing events to indicate that > it's > a warning rather than an error). As I said, I need some help here to decide if these events are what existing notification flags are intended to be used for or if I should add new notification(s)/flags? I think that from user perspective it boils down to question if there is some different recovery actions to be taken? In the case of BD9576MUF I don't see any other meaningfull error events because, as I said, when the "protection" level is reached the IRQs can't really be handled (because power from all outputs is shut. Unless another processor which is not being powered by the BD9576 is controlling it - and I don't think this would be the case here). > > > Though, Mark: for example, on qcom labibb, there's "PBS" that is > > killing > > the regulators on short-circuit condition and as you see, handling > > that > > is a little trickier compared to the over-current one, where there > > is no > > such auto-magic-thing... > > .... I wouldn't know if it'd be a good idea to have a system like > > qcom's > > PBS everywhere. > > For the sake of protecting HW "paranoidly" though.. maybe :)) > > Well, if these things are kicking in the hardware is in serious > trouble > anyway so it's unclear what the system would be likely to do in > software, and also unclear how safe it is to rely on software to be > able > to take that action given that it let things get into such a bad > state > in the first place. Actually, bear with me but I am unsure why we have these notifications if we don't expect SW to be able to do anything? Wouldn't the panic print be all that is needed then? I think that setups which have dual limits (one for initiating potential SW recovery - other for HW to forcing protection) actually make sense. So does implementing notifiers / error statuses for events where SW recovery is potentially helpful. But whether the existing event notifications / error flags are correct for these is something I can't decide :) Here I ask guidance for Mark & others who know what is the idea behind existing error-flags/events. Best Regards Matti Vaittinen