On Tuesday 18 November 2014 06:56:11 Guenter Roeck wrote: > On 11/17/2014 12:35 AM, Pali Rohár wrote: > > On Thursday 23 October 2014 18:45:07 Guenter Roeck wrote: > >> On Thu, Oct 23, 2014 at 12:37:34PM +0200, Pali Rohár wrote: > >>> On Wednesday 22 October 2014 19:10:05 Guenter Roeck wrote: > >>>> On Wed, Oct 22, 2014 at 06:35:53PM +0200, Pali Rohár wrote: > >>>>> On Wednesday 22 October 2014 18:19:47 Guenter Roeck > > > > wrote: > >>>>>> On Wed, Oct 22, 2014 at 02:29:06PM +0200, Pali Rohár > > > > wrote: > >>>>>>> On Tuesday 21 October 2014 06:27:23 Guenter Roeck > > > > wrote: > >>>>>>>> On 10/20/2014 09:46 AM, Pali Rohár wrote: > >>>>>>>>> Ok, I will describe my problem. Guenter, maybe > >>>>>>>>> you can find another solution/fix for it. > >>>>>>>>> > >>>>>>>>> Calling i8k_get_temp(3) on my laptop without > >>>>>>>>> I8K_TEMPERATURE_BUG always returns value 193 > >>>>>>>>> (which is above I8K_MAX_TEMP). > >>>>>>>>> > >>>>>>>>> When I8K_TEMPERATURE_BUG is enabled (by default) > >>>>>>>>> then i8k_get_temp(3) returns value from prev[3] > >>>>>>>>> and store new value I8K_TEMPERATURE_BUG to > >>>>>>>>> prev[3]. Value in prev[3] is initialized to 0. > >>>>>>>>> > >>>>>>>>> What I want to achieve is: when i8k_get_temp() > >>>>>>>>> for particular sensor id always returns invalid > >>>>>>>>> value (> I8K_MAX_TEMP) then we should totally > >>>>>>>>> ignore sensor with that id and do not export it > >>>>>>>>> via hwmon. > >>>>>>>>> > >>>>>>>>> My solution is: initialize prev[id] to > >>>>>>>>> I8K_MAX_TEMP, so on invalid data first call to > >>>>>>>>> i8k_get_temp(id) returns I8K_MAX_TEMP. Then in > >>>>>>>>> i8k_init_hwmon check if value is < I8K_MAX_TEMP > >>>>>>>>> and if not ignore sensor id. > >>>>>>>>> > >>>>>>>>> Guenter, it is clear now? Are you ok that we > >>>>>>>>> should ignore sensor if always report value > >>>>>>>>> above I8K_MAX_TEMP? If you do not like my > >>>>>>>>> solution/patch for it, can you specify how > >>>>>>>>> other can it be fixed? > >>>>>>>> > >>>>>>>> I still don't see the point in initializing > >>>>>>>> prev[]. > >>>>>>> > >>>>>>> Now prev[] is initialized to 0. It means that first > >>>>>>> call i8k_get_temp() (with sensor id which return > >>>>>>> value > I8K_MAX_TEMP) returns 0. Second and other > >>>>>>> calls returns I8K_MAX_TEMP. > >>>>>>> > >>>>>>> So point is to return same value for first and other > >>>>>>> calls. > >>>>>> > >>>>>> Yes, I realized that after I sent my previous mail. > >>>>>> > >>>>>>>> Yes, I am ok with ignoring sensor values if the > >>>>>>>> reported temperature is above I8K_MAX_TEMP. I am > >>>>>>>> just not sure if we should check against > >>>>>>>> I8K_MAX_TEMP or against, say, 192. Reason is that > >>>>>>>> we do know that the sensor can erroneously return > >>>>>>>> 0x99 on some systems once in a while. We would > >>>>>>>> not want to ignore those sensors just because > >>>>>>>> they happen to report 0x99 during initialization. > >>>>>>>> > >>>>>>>> So maybe make it > >>>>>>>> > >>>>>>>> if (err >= 0 && err < 192) > >>>>>>>> > >>>>>>>> and add a note before the first if(), explaining > >>>>>>>> that higher values suggest that there is no > >>>>>>>> sensor attached. > >>>>>>>> > >>>>>>>> Thanks, > >>>>>>>> Guenter > >>>>>>> > >>>>>>> Right, now we need to decide which magic constant to > >>>>>>> use... > >>>>>>> > >>>>>>> And now I found another problem :-) > >>>>>>> > >>>>>>> On my laptop i8k_get_temp(3) not always return value > >>>>>>> 193. It is only when AMD graphics card is turned > >>>>>>> off. When card is on i8k_get_temp(3) returns same > >>>>>>> value as temperature hwmon part from radeon DRM > >>>>>>> driver. > >>>>>> > >>>>>> Can you turn the GPU on or off during runtime ? > >>>>>> That would make it really tricky to handle the > >>>>>> situation. > >>>>> > >>>>> Yes. New laptops with Nvidia Optimus or AMD PowerXpress > >>>>> or Enduro technology are designed to automatically turn > >>>>> off secondary GPU when is not in use. And > >>>>> nouveau/radeon drivers together with vga_switcheroo > >>>>> implements this dynamic power on/off. > >>>>> > >>>>>>> So it looks like that on my laptop i8k sensor with > >>>>>>> id 3 reports GPU temperature. > >>>>>>> > >>>>>>> When card is turned off radeon driver reports > >>>>>>> -EINVAL for temperature hwmon sysnode. > >>>>>>> > >>>>>>> So now I think i8k could not ignore sensor totally > >>>>>>> as it can be mapped to some HW which can be > >>>>>>> dynamically turned on/off (like my graphics card). > >>>>>>> > >>>>>>> So what do you think about reporting -EINVAL instead > >>>>>>> I8K_MAX_TEMP when dell SMM returns value above > >>>>>>> I8K_MAX_TEMP? > >>>>>> > >>>>>> -EINVAL is supposed to mean "Invalid Argument", so it > >>>>>> really has ia different semantics. We could use > >>>>>> -ENXIO, "No such device or address", which seems more > >>>>>> appropriate. > >>>>> > >>>>> I prefer to use -EINVAL because other driver (radeon) is > >>>>> using it and userspace "sensors" programs handle EINVAL > >>>>> and show "N/A" in output instead reporting some error > >>>>> about reading value. If nothing else consistency (with > >>>>> other drivers) is my argument. > >>>> > >>>> Ok, if sensors implements it that way then let's do it. > >>>> > >>>>>> Overall, I think the entire error handling is broken > >>>>>> and should be replaced. One option would be to > >>>>>> explicitly check for 0x99 and, if detected, go to > >>>>>> sleep for, say, 100ms and try again. If it still > >>>>>> fails, and for all other bad values, return -ENXIO. > >>>>>> Then the calling code can either return the error to > >>>>>> user space in the show function, or not install the > >>>>>> sensor in the probe function. > >>>>>> > >>>>>> Does that make sense ? > >>>>> > >>>>> Yes, replacing error handling with retry call (after > >>>>> some sleep) is better then current code (which returns > >>>>> previous value or returns I8K_MAX_TEMP). > >>>>> > >>>>> But your solution not install the sensor if probe fails > >>>>> on bad value does not solve problem that i8k.ko is > >>>>> loading at time when GPU card is turned off. > >>>> > >>>> Yes, the dynamics in that situation makes it a bit > >>>> difficult to handle the situation. > >>>> > >>>>> I think current check for installing sensor (err < 0) is > >>>>> enough and when invalid value (> I8K_MAX_TEMP) is > >>>>> returned just do not show this value for userspace in > >>>>> hwmon sysnode. > >>>> > >>>> Ok with me, and agreed. > >>>> > >>>> Thanks, > >>>> Guenter > >>> > >>> Ok, are you going to fix i8k_get_temp() function (with > >>> sleeping)? > >> > >> I had hoped you would find the time to do it ;-). > >> > >> Sure, I can do it, but I am kind of busy right now, so it > >> will take a while. > >> > >> Thanks, > >> Guenter > > > > Ok. Will you do that for 3.19 kernel? Meanwhile I can > > prepare patch for temperature labels. I looked into > > NBSVC.MDM and there is something related for type of > > temperature sensors... > > Highly unlikely. I don't have the time right now. > > Guenter
Ok. I will try to at least fix patch from first post in this thread, so i8k_get_temp() does not return totally invalid value to userspace (or at first call). -- Pali Rohár pali.ro...@gmail.com
signature.asc
Description: This is a digitally signed message part.