Hi Corey, Thank you for the quick reply and the patches.
We had tests running for the entire weekend with a driver patched with my change. No more errors. So it works. But of course, your patch is a much more sane fix. I'll test it ASAP and will report the result. Bodo > -----Original Message----- > From: Corey Minyard [mailto:[email protected]] On Behalf Of Corey Minyard > Sent: Monday, March 24, 2014 5:10 PM > To: Strösser, Bodo > Cc: [email protected] > Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF does not always work > > On 03/21/2014 04:02 PM, Strösser, Bodo wrote: > > Hi Corey, > > > > We are using servers, that have a BMC connected via KCS. > > We are running SuSE SLES11-SP2, kernel 3.0.101-0.7.17, but > > the problem I see seems to be relevant for Mainline also. > > > > For our BMC, ipmi_si.ko uses both, ipmi_thread() and > > smi_timeout() in parallel to call smi_event_handler(). > > But only smi_timeout() passes the elapsed time for timeout > > handling to smi_event_handler(), but not ipmi_thread(). > > > > After smi_timeout detected IDLE, it does not start the > > timer again. Instead, a call to sender() will restart the > > timer. Unfortunately, if a command to the BMC is not > > started via sender(), but generated internally (eg. from > > smi_event_handler() because of smi_info->req_events) and > > started via smi_info->handlers->start_transaction(), > > AFAICS the timer is not started. So, the timeouts in the > > KCS driver while waiting for IBF or OBF won't work in this > > situation. > > > > The BMCs in our servers seem to have a sporadic problem > > that leads to the error recovery being started because of > > an timeout while waiting for OBF. But in case the timer is > > stopped while the problem occurs, the driver loop waiting > > for OBF forever. > > I think I can see what is happening here, but I don't think your fix is > correct. With your fix the driver would never time out if the interface > wasn't working, because it's starting the timer in the thread, the timer > is what is used to time out operations, and the thread will run > constantly while the driver is operating. > > Try the attached patches instead, please. I've done some minimal testing. > > > > > P.S.: There might be another minor issue in ipmi_kcs_sm.c > > If check_obf() times out, it starts the error handling but does > > not reset kcs->obf_timeout. Thus, in state KCS_ERROR2 check_obf() > > is called with the timer already expired. So I'd like to > > suggest this small change: > > > > --- a/drivers/char/ipmi/ipmi_kcs_sm.c 2014-03-19 10:42:51.000000000 +0100 > > +++ b/drivers/char/ipmi/ipmi_kcs_sm.c 2014-03-19 10:48:33.000000000 +0100 > > @@ -251,8 +251,9 @@ static inline int check_obf(struct si_sm > > if (!GET_STATUS_OBF(status)) { > > kcs->obf_timeout -= time; > > if (kcs->obf_timeout < 0) { > > - start_error_recovery(kcs, "OBF not ready in time"); > > - return 1; > > + start_error_recovery(kcs, "OBF not ready in time"); > > + kcs->obf_timeout = OBF_RETRY_TIMEOUT; > > + return 1; > > } > > return 0; > > } > > > Indeed, you are right. I've attached another patch for that. > > -corey ------------------------------------------------------------------------------ Learn Graph Databases - Download FREE O'Reilly Book "Graph Databases" is the definitive new guide to graph databases and their applications. Written by three acclaimed leaders in the field, this first edition is now available. Download your free book today! http://p.sf.net/sfu/13534_NeoTech _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
