On 03/27/2014 12:47 AM, Prasanna Kumar wrote: > Hi, > > When going through the patch, noticed a small issue > The function smi_mod_timer hard codes time to (jiffies + > SI_TIMEOUT_JIFFIES) and does not make use of passed value "new_val" . >
You are right, I have fixed it. Thank you. -corey > > > regards, > Prasanna > > > On Wed, Mar 26, 2014 at 10:34 PM, Corey Minyard <[email protected] > <mailto:[email protected]>> wrote: > > On 03/26/2014 11:51 AM, Strösser, Bodo wrote: > > This time with the CC to the list. Sorry for the noise. > > > > > >> -----Original Message----- > >> From: Corey Minyard [mailto:[email protected] > <mailto:[email protected]>] On Behalf Of Corey Minyard > >> Sent: Wednesday, March 26, 2014 12:41 AM > >> To: Strösser, Bodo > >> Cc: [email protected] > <mailto:[email protected]> > >> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF does > not always work > >> > >> On 03/25/2014 05:32 PM, Strösser, Bodo wrote: > >>> Attached you'll find my modified patch to explain my > hypothesis better. > >>> It seems to work fine. Thus, my hypothesis seems to be right. > >> timer_running needs to be a bool, > > Yes, of course. int is what we had to use in the good old times ;-) > > Indeed, old habits die hard :). > > > > > > >> but beyond that, this is good. I did > >> think about this, but I thought timer_running() would be > enough. But > >> thinking about it some more, I think you are right. It's not > quite good > >> enough. > > > > Have attached a new version of the patch. This time using bool > and with your > > comments, that I've slightly extended. Have added my From and > Signed-off-by > > hoping that this is ok for you. > > Perfect, I have it queued for the next merge window. > > Thanks again, > > -corey > > > > > The patch is based on SuSE-SLES11-SP2 (3.0.101-xxx), but it > applies to > > mainline also, printing some offsets. > > > > Bodo > > > > > > > >> -corey > >> > >>> Bodo > >>> > >>>> -----Original Message----- > >>>> From: Strösser, Bodo > >>>> Sent: Tuesday, March 25, 2014 9:41 PM > >>>> To: '[email protected] <mailto:[email protected]>' > >>>> Cc: [email protected] > <mailto:[email protected]> > >>>> Subject: RE: ipmi_si: KCS: timeout waiting for IBF or OBF > does not always work > >>>> > >>>> Hi Corey, > >>>> > >>>> Sorry, the patch isn't good. The problem is, that the timeout > time now is > >>>> a multiple of what it should be (5 seconds). > >>>> > >>>> I guess, I know the reason for this: > >>>> I think, the timer is reset quite often while the driver > loops waiting for OBF > >>>> in KCS status 40. The reset can happen, as timer_pending() is > not the right test. > >>>> For example the timer isn't pending if smi_timeout already is > started, but has to > >>>> wait for the lock the is held by ipmi_thread(). > >>>> > >>>> Maybe we could add a flag to smi_info, that is set in > smi_mod_timer() and is reset > >>>> in smi_timeout() id smi_mod_timer() is not called. Then > ipmi_thread() could test > >>>> the flag instead of calling timer_pending(). But maybe you > have a better idea? > >>>> > >>>> Bodo > >>>> > >>>>> -----Original Message----- > >>>>> From: Corey Minyard [mailto:[email protected] > <mailto:[email protected]>] On Behalf Of Corey Minyard > >>>>> Sent: Monday, March 24, 2014 7:54 PM > >>>>> To: Strösser, Bodo > >>>>> Cc: [email protected] > <mailto:[email protected]> > >>>>> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF > does not always work > >>>>> > >>>>> On 03/24/2014 11:30 AM, Strösser, Bodo wrote: > >>>>>> Hi Corey, > >>>>>> > >>>>>> Thank you for the quick reply and the patches. > >>>>>> > >>>>>> We had tests running for the entire weekend with a driver > >>>>>> patched with my change. No more errors. So it works. > >>>>> It certainy fixes the issue you were seeing, but it would > cause some > >>>>> other issues that would possibly cause some error conditions > to not be > >>>>> handled correctly, and it would make the power management > people very > >>>>> unhappy with me :-0. > >>>>> > >>>>> Hopefully the patch I submitted will fix the issue without > those side > >>>>> effects. But your patch pointed me in the right direction, > so it wasn't > >>>>> without merit. > >>>>> > >>>>> Thanks again, > >>>>> > >>>>> -corey > >>>>> > >>>>>> But of course, your patch is a much more sane fix. I'll test > >>>>>> it ASAP and will report the result. > >>>>>> > >>>>>> Bodo > >>>>>> > >>>>>> > >>>>>>> -----Original Message----- > >>>>>>> From: Corey Minyard [mailto:[email protected] > <mailto:[email protected]>] On Behalf Of Corey Minyard > >>>>>>> Sent: Monday, March 24, 2014 5:10 PM > >>>>>>> To: Strösser, Bodo > >>>>>>> Cc: [email protected] > <mailto:[email protected]> > >>>>>>> Subject: Re: ipmi_si: KCS: timeout waiting for IBF or OBF > does not always work > >>>>>>> > >>>>>>> On 03/21/2014 04:02 PM, Strösser, Bodo wrote: > >>>>>>>> Hi Corey, > >>>>>>>> > >>>>>>>> We are using servers, that have a BMC connected via KCS. > >>>>>>>> We are running SuSE SLES11-SP2, kernel 3.0.101-0.7.17, but > >>>>>>>> the problem I see seems to be relevant for Mainline also. > >>>>>>>> > >>>>>>>> For our BMC, ipmi_si.ko uses both, ipmi_thread() and > >>>>>>>> smi_timeout() in parallel to call smi_event_handler(). > >>>>>>>> But only smi_timeout() passes the elapsed time for timeout > >>>>>>>> handling to smi_event_handler(), but not ipmi_thread(). > >>>>>>>> > >>>>>>>> After smi_timeout detected IDLE, it does not start the > >>>>>>>> timer again. Instead, a call to sender() will restart the > >>>>>>>> timer. Unfortunately, if a command to the BMC is not > >>>>>>>> started via sender(), but generated internally (eg. from > >>>>>>>> smi_event_handler() because of smi_info->req_events) and > >>>>>>>> started via smi_info->handlers->start_transaction(), > >>>>>>>> AFAICS the timer is not started. So, the timeouts in the > >>>>>>>> KCS driver while waiting for IBF or OBF won't work in this > >>>>>>>> situation. > >>>>>>>> > >>>>>>>> The BMCs in our servers seem to have a sporadic problem > >>>>>>>> that leads to the error recovery being started because of > >>>>>>>> an timeout while waiting for OBF. But in case the timer is > >>>>>>>> stopped while the problem occurs, the driver loop waiting > >>>>>>>> for OBF forever. > >>>>>>> I think I can see what is happening here, but I don't > think your fix is > >>>>>>> correct. With your fix the driver would never time out if > the interface > >>>>>>> wasn't working, because it's starting the timer in the > thread, the timer > >>>>>>> is what is used to time out operations, and the thread > will run > >>>>>>> constantly while the driver is operating. > >>>>>>> > >>>>>>> Try the attached patches instead, please. I've done some > minimal testing. > >>>>>>> > >>>>>>>> P.S.: There might be another minor issue in ipmi_kcs_sm.c > >>>>>>>> If check_obf() times out, it starts the error handling > but does > >>>>>>>> not reset kcs->obf_timeout. Thus, in state KCS_ERROR2 > check_obf() > >>>>>>>> is called with the timer already expired. So I'd like to > >>>>>>>> suggest this small change: > >>>>>>>> > >>>>>>>> --- a/drivers/char/ipmi/ipmi_kcs_sm.c 2014-03-19 > 10:42:51.000000000 +0100 > >>>>>>>> +++ b/drivers/char/ipmi/ipmi_kcs_sm.c 2014-03-19 > 10:48:33.000000000 +0100 > >>>>>>>> @@ -251,8 +251,9 @@ static inline int check_obf(struct si_sm > >>>>>>>> if (!GET_STATUS_OBF(status)) { > >>>>>>>> kcs->obf_timeout -= time; > >>>>>>>> if (kcs->obf_timeout < 0) { > >>>>>>>> - start_error_recovery(kcs, "OBF not > ready in time"); > >>>>>>>> - return 1; > >>>>>>>> + start_error_recovery(kcs, "OBF not > ready in time"); > >>>>>>>> + kcs->obf_timeout = OBF_RETRY_TIMEOUT; > >>>>>>>> + return 1; > >>>>>>>> } > >>>>>>>> return 0; > >>>>>>>> } > >>>>>>>> > >>>>>>> Indeed, you are right. I've attached another patch for that. > >>>>>>> > >>>>>>> -corey > > > > ------------------------------------------------------------------------------ > Learn Graph Databases - Download FREE O'Reilly Book > "Graph Databases" is the definitive new guide to graph databases > and their > applications. Written by three acclaimed leaders in the field, > this first edition is now available. Download your free book today! > http://p.sf.net/sfu/13534_NeoTech > _______________________________________________ > Openipmi-developer mailing list > [email protected] > <mailto:[email protected]> > https://lists.sourceforge.net/lists/listinfo/openipmi-developer > > ------------------------------------------------------------------------------ _______________________________________________ Openipmi-developer mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/openipmi-developer
