Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-09-16 Thread Jean Delvare
Hi Wei,

Sorry for the late reply, catching up with the discussions from before
my vacation...

On Tue, 30 Jul 2013 16:18:35 +0800, Wei Ni wrote:
> On 07/29/2013 11:58 PM, Jean Delvare wrote:
> > On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
> >> Yes, we had met this problems, to fix this issue, we enabled one-shot
> >> mode in the bottom half handler of nct interrupts to force a
> >> conversion/comparison. This effectively stops repeated nct interrupts.
> >> We will do following things in the IRQ thread:
> >> 1. stand by the nct1008. (set configure register bit 6)
> >> 2. update the limit value if needed.
> >> 3. write to one-shot resister.
> >> 4. give hardware necessary time to finish conversion
> >> 5. run the nct1008 (clear configure register bit 6)
> > 
> > Doh, this is so ugly :(
> > 
> > Why don't you configure the pin as THERM2 instead of ALERT then? I'd
> > expect this to make things easier.
> 
> If configure as THERM2, only the high temperature limits are relevant,
> so when the temperature reduced, it will not trigger interrupt, and we
> can't update the cooling state.

Ah, indeed, I had not noticed this restriction.

> Or do you mean that we can configure the pin to THERM2 in the irq_thread
> to avoid the repeated interrupt ? I tried it, but no help, the nct1008
> will not run the conversion/comparison immediately, so the status
> register will not be cleared.

No, I didn't mean to suggest anything like that.

> >> (...)
> >> These trip-temps are not critical temperature, we used these temps to
> >> update cooling states. For the critical-temp, we handle it like my
> >> mentioned in #1.
> > 
> > I understand. But even if these interrupts are only used for managing
> > cooling states, a misbehavior could still have annoying consequences,
> > such as causing the thermal shutdown to trigger when this could have
> > been avoided, or throttling to stay enabled even though the system has
> > cooled down enough.
> 
> I think our driver are trying best to avoid these troubles. As I know in
> our downstream codes, we didn't met these things.
> I think since the lm90 support interrupt mode, then the driver should
> have related interface to handle it, and it can call the callback
> function to do what the platform driver want.

Yes, fair enough. I do not object to it, I was only trying to
understand how you were using it.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-09-16 Thread Jean Delvare
Hi Wei,

Sorry for the late reply, catching up with the discussions from before
my vacation...

On Tue, 30 Jul 2013 16:18:35 +0800, Wei Ni wrote:
 On 07/29/2013 11:58 PM, Jean Delvare wrote:
  On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
  Yes, we had met this problems, to fix this issue, we enabled one-shot
  mode in the bottom half handler of nct interrupts to force a
  conversion/comparison. This effectively stops repeated nct interrupts.
  We will do following things in the IRQ thread:
  1. stand by the nct1008. (set configure register bit 6)
  2. update the limit value if needed.
  3. write to one-shot resister.
  4. give hardware necessary time to finish conversion
  5. run the nct1008 (clear configure register bit 6)
  
  Doh, this is so ugly :(
  
  Why don't you configure the pin as THERM2 instead of ALERT then? I'd
  expect this to make things easier.
 
 If configure as THERM2, only the high temperature limits are relevant,
 so when the temperature reduced, it will not trigger interrupt, and we
 can't update the cooling state.

Ah, indeed, I had not noticed this restriction.

 Or do you mean that we can configure the pin to THERM2 in the irq_thread
 to avoid the repeated interrupt ? I tried it, but no help, the nct1008
 will not run the conversion/comparison immediately, so the status
 register will not be cleared.

No, I didn't mean to suggest anything like that.

  (...)
  These trip-temps are not critical temperature, we used these temps to
  update cooling states. For the critical-temp, we handle it like my
  mentioned in #1.
  
  I understand. But even if these interrupts are only used for managing
  cooling states, a misbehavior could still have annoying consequences,
  such as causing the thermal shutdown to trigger when this could have
  been avoided, or throttling to stay enabled even though the system has
  cooled down enough.
 
 I think our driver are trying best to avoid these troubles. As I know in
 our downstream codes, we didn't met these things.
 I think since the lm90 support interrupt mode, then the driver should
 have related interface to handle it, and it can call the callback
 function to do what the platform driver want.

Yes, fair enough. I do not object to it, I was only trying to
understand how you were using it.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-30 Thread Wei Ni
On 07/29/2013 11:58 PM, Jean Delvare wrote:
> Hi Wei,
> 
> On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
>> On 07/27/2013 11:02 PM, Jean Delvare wrote:
>>> On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
 the interrupt line.
>>>
>>> Why don't you use the SMBus alert mechanism then? It is already
>>> implemented and allows you to reuse the interrupt of the SMBus
>>> controller.
>>>
>>> Of course this is a question for the hardware designers, not you... If
>>> the board uses a separate interrupt pin for the NCT1008's ALERT output,
>>> then the driver has to handle that interrupt explicitly, as done in your
>>> patch.
>>>
>>> One thing which worries me though is this explanation in the NCT1008
>>> datasheet:
>>>
>>> "The ALERT interrupt latch is not reset by reading the
>>> status register. It resets when the ALERT output has been
>>> serviced by the master reading the device address, provided
>>> the error condition has gone away and the status register flag
>>> bits are reset."
>>>
>>> This suggests that connecting the ALERT output to a separate interrupt
>>> pin will not work, as the ALERT output will never be de-asserted even
>>> if the fault conditions are gone and the status register was read. But
>>> as you say this is how your system is supposed to work, can you explain
>>> how it can work?
>>
>> Yes, we had met this problems, to fix this issue, we enabled one-shot
>> mode in the bottom half handler of nct interrupts to force a
>> conversion/comparison. This effectively stops repeated nct interrupts.
>> We will do following things in the IRQ thread:
>> 1. stand by the nct1008. (set configure register bit 6)
>> 2. update the limit value if needed.
>> 3. write to one-shot resister.
>> 4. give hardware necessary time to finish conversion
>> 5. run the nct1008 (clear configure register bit 6)
> 
> Doh, this is so ugly :(
> 
> Why don't you configure the pin as THERM2 instead of ALERT then? I'd
> expect this to make things easier.

If configure as THERM2, only the high temperature limits are relevant,
so when the temperature reduced, it will not trigger interrupt, and we
can't update the cooling state.

Or do you mean that we can configure the pin to THERM2 in the irq_thread
to avoid the repeated interrupt ? I tried it, but no help, the nct1008
will not run the conversion/comparison immediately, so the status
register will not be cleared.

> 
>> (...)
>> These trip-temps are not critical temperature, we used these temps to
>> update cooling states. For the critical-temp, we handle it like my
>> mentioned in #1.
> 
> I understand. But even if these interrupts are only used for managing
> cooling states, a misbehavior could still have annoying consequences,
> such as causing the thermal shutdown to trigger when this could have
> been avoided, or throttling to stay enabled even though the system has
> cooled down enough.

I think our driver are trying best to avoid these troubles. As I know in
our downstream codes, we didn't met these things.
I think since the lm90 support interrupt mode, then the driver should
have related interface to handle it, and it can call the callback
function to do what the platform driver want.

Thanks.
Wei.

> 
>> Yes, it's not safe to rely on the software, so on our tegra114, we will
>> have soc thermal, which can handle these trip-temps on hardware.
> 
> OK, good :)
> 
>> (...)
>> Yes, absolutely agree, we can't add any private codes to the generic driver.
>> I had talked about it with Durgadoss in his "[PATCH] Thermal Framework
>> Enhancements" series, and in his v3 series, it introduced threshold
>> concept, which used to set limit value, and the "framework is notified
>> about this interrupt to take appropriate action". But this function
>> still didn't be completed yet.
>> I think the thermal fw can expose callback like thermal_zone->alert, and
>> in the irq_thread, we can notify the thermal fw to call this alert
>> callback function, then the platform code can do anything in this callback.
> 
> I didn't follow the discussions closely, but something like this would
> be needed, yes.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-30 Thread Wei Ni
On 07/29/2013 11:58 PM, Jean Delvare wrote:
 Hi Wei,
 
 On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
 On 07/27/2013 11:02 PM, Jean Delvare wrote:
 On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
 the interrupt line.

 Why don't you use the SMBus alert mechanism then? It is already
 implemented and allows you to reuse the interrupt of the SMBus
 controller.

 Of course this is a question for the hardware designers, not you... If
 the board uses a separate interrupt pin for the NCT1008's ALERT output,
 then the driver has to handle that interrupt explicitly, as done in your
 patch.

 One thing which worries me though is this explanation in the NCT1008
 datasheet:

 The ALERT interrupt latch is not reset by reading the
 status register. It resets when the ALERT output has been
 serviced by the master reading the device address, provided
 the error condition has gone away and the status register flag
 bits are reset.

 This suggests that connecting the ALERT output to a separate interrupt
 pin will not work, as the ALERT output will never be de-asserted even
 if the fault conditions are gone and the status register was read. But
 as you say this is how your system is supposed to work, can you explain
 how it can work?

 Yes, we had met this problems, to fix this issue, we enabled one-shot
 mode in the bottom half handler of nct interrupts to force a
 conversion/comparison. This effectively stops repeated nct interrupts.
 We will do following things in the IRQ thread:
 1. stand by the nct1008. (set configure register bit 6)
 2. update the limit value if needed.
 3. write to one-shot resister.
 4. give hardware necessary time to finish conversion
 5. run the nct1008 (clear configure register bit 6)
 
 Doh, this is so ugly :(
 
 Why don't you configure the pin as THERM2 instead of ALERT then? I'd
 expect this to make things easier.

If configure as THERM2, only the high temperature limits are relevant,
so when the temperature reduced, it will not trigger interrupt, and we
can't update the cooling state.

Or do you mean that we can configure the pin to THERM2 in the irq_thread
to avoid the repeated interrupt ? I tried it, but no help, the nct1008
will not run the conversion/comparison immediately, so the status
register will not be cleared.

 
 (...)
 These trip-temps are not critical temperature, we used these temps to
 update cooling states. For the critical-temp, we handle it like my
 mentioned in #1.
 
 I understand. But even if these interrupts are only used for managing
 cooling states, a misbehavior could still have annoying consequences,
 such as causing the thermal shutdown to trigger when this could have
 been avoided, or throttling to stay enabled even though the system has
 cooled down enough.

I think our driver are trying best to avoid these troubles. As I know in
our downstream codes, we didn't met these things.
I think since the lm90 support interrupt mode, then the driver should
have related interface to handle it, and it can call the callback
function to do what the platform driver want.

Thanks.
Wei.

 
 Yes, it's not safe to rely on the software, so on our tegra114, we will
 have soc thermal, which can handle these trip-temps on hardware.
 
 OK, good :)
 
 (...)
 Yes, absolutely agree, we can't add any private codes to the generic driver.
 I had talked about it with Durgadoss in his [PATCH] Thermal Framework
 Enhancements series, and in his v3 series, it introduced threshold
 concept, which used to set limit value, and the framework is notified
 about this interrupt to take appropriate action. But this function
 still didn't be completed yet.
 I think the thermal fw can expose callback like thermal_zone-alert, and
 in the irq_thread, we can notify the thermal fw to call this alert
 callback function, then the platform code can do anything in this callback.
 
 I didn't follow the discussions closely, but something like this would
 be needed, yes.
 

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-29 Thread Jean Delvare
Hi Wei,

On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
> On 07/27/2013 11:02 PM, Jean Delvare wrote:
> > On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
> >> 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
> >> the interrupt line.
> > 
> > Why don't you use the SMBus alert mechanism then? It is already
> > implemented and allows you to reuse the interrupt of the SMBus
> > controller.
> > 
> > Of course this is a question for the hardware designers, not you... If
> > the board uses a separate interrupt pin for the NCT1008's ALERT output,
> > then the driver has to handle that interrupt explicitly, as done in your
> > patch.
> > 
> > One thing which worries me though is this explanation in the NCT1008
> > datasheet:
> > 
> > "The ALERT interrupt latch is not reset by reading the
> > status register. It resets when the ALERT output has been
> > serviced by the master reading the device address, provided
> > the error condition has gone away and the status register flag
> > bits are reset."
> > 
> > This suggests that connecting the ALERT output to a separate interrupt
> > pin will not work, as the ALERT output will never be de-asserted even
> > if the fault conditions are gone and the status register was read. But
> > as you say this is how your system is supposed to work, can you explain
> > how it can work?
> 
> Yes, we had met this problems, to fix this issue, we enabled one-shot
> mode in the bottom half handler of nct interrupts to force a
> conversion/comparison. This effectively stops repeated nct interrupts.
> We will do following things in the IRQ thread:
> 1. stand by the nct1008. (set configure register bit 6)
> 2. update the limit value if needed.
> 3. write to one-shot resister.
> 4. give hardware necessary time to finish conversion
> 5. run the nct1008 (clear configure register bit 6)

Doh, this is so ugly :(

Why don't you configure the pin as THERM2 instead of ALERT then? I'd
expect this to make things easier.

> (...)
> These trip-temps are not critical temperature, we used these temps to
> update cooling states. For the critical-temp, we handle it like my
> mentioned in #1.

I understand. But even if these interrupts are only used for managing
cooling states, a misbehavior could still have annoying consequences,
such as causing the thermal shutdown to trigger when this could have
been avoided, or throttling to stay enabled even though the system has
cooled down enough.

> Yes, it's not safe to rely on the software, so on our tegra114, we will
> have soc thermal, which can handle these trip-temps on hardware.

OK, good :)

> (...)
> Yes, absolutely agree, we can't add any private codes to the generic driver.
> I had talked about it with Durgadoss in his "[PATCH] Thermal Framework
> Enhancements" series, and in his v3 series, it introduced threshold
> concept, which used to set limit value, and the "framework is notified
> about this interrupt to take appropriate action". But this function
> still didn't be completed yet.
> I think the thermal fw can expose callback like thermal_zone->alert, and
> in the irq_thread, we can notify the thermal fw to call this alert
> callback function, then the platform code can do anything in this callback.

I didn't follow the discussions closely, but something like this would
be needed, yes.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-29 Thread Wei Ni
On 07/27/2013 11:02 PM, Jean Delvare wrote:
> Hi Wei,
> 
> Sorry for the late reply.
> 
> On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
>> On 07/18/2013 11:58 PM, Jean Delvare wrote:
>>> First of all, how is the chip wired on your system? You are using an
>>> NCT1008, right? Which output of the chip is connected to your interrupt
>>> line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
>>> I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
>>> are comparator outputs, they stay low as long as the temperature are
>>> above the therm limits. Reading the status register won't bring them
>>> back up as I understand it, and contrary to the ALERT output, they
>>> can't be masked. Won't this result in an interrupt flood if using
>>> IRQF_TRIGGER_LOW? Have you tested your code already?
>>
>> Let me explain why I want to add the IRQ thread.
>> In our tegra30 platform, we use an NCT1008, and in our downstream tree,
>> it programmed as following:
>> 1. The pin THERM is connected to the PMIC, we will set the THERM limit
>> in the initialization, once the this limit is tripped, it will trigged
>> the PMIC, and the PMIC will shutdown the system immediately.
> 
> OK, this is what the chip is designed for, good.
> 
>> 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
>> the interrupt line.
> 
> Why don't you use the SMBus alert mechanism then? It is already
> implemented and allows you to reuse the interrupt of the SMBus
> controller.
> 
> Of course this is a question for the hardware designers, not you... If
> the board uses a separate interrupt pin for the NCT1008's ALERT output,
> then the driver has to handle that interrupt explicitly, as done in your
> patch.
> 
> One thing which worries me though is this explanation in the NCT1008
> datasheet:
> 
> "The ALERT interrupt latch is not reset by reading the
> status register. It resets when the ALERT output has been
> serviced by the master reading the device address, provided
> the error condition has gone away and the status register flag
> bits are reset."
> 
> This suggests that connecting the ALERT output to a separate interrupt
> pin will not work, as the ALERT output will never be de-asserted even
> if the fault conditions are gone and the status register was read. But
> as you say this is how your system is supposed to work, can you explain
> how it can work?

Yes, we had met this problems, to fix this issue, we enabled one-shot
mode in the bottom half handler of nct interrupts to force a
conversion/comparison. This effectively stops repeated nct interrupts.
We will do following things in the IRQ thread:
1. stand by the nct1008. (set configure register bit 6)
2. update the limit value if needed.
3. write to one-shot resister.
4. give hardware necessary time to finish conversion
5. run the nct1008 (clear configure register bit 6)

> 
>> In the platform init, we will prepare some trip
>> temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
>> remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
>> temperature exceed this rang, it will interrupt the system, then we will
>> update the low/high limit to next rang, for example, if the temp raise
>> to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
>> we will run the throttle functions, and update cooling state.
> 
> Hu ho, I have seen this kind of design in the past, and I must say I
> don't quite like it. Moving critical thermal management handling to the
> software makes me feel unsafe. System thermal safety should not rely on
> the OS IMHO. It is best handled by the hardware, or if that can't be
> done, by the BIOS. And these limit updates are tricky, they could fail
> and then you're in trouble. Imagine for example that another chip on
> the SMBus hugs the bus and delays or even plain prevents the change of
> limits...
> 
> But once again I guess you're not responsible for this.

These trip-temps are not critical temperature, we used these temps to
update cooling states. For the critical-temp, we handle it like my
mentioned in #1.
Yes, it's not safe to rely on the software, so on our tegra114, we will
have soc thermal, which can handle these trip-temps on hardware.

> 
> Another problem with this design, even if no such problem happens, is
> that the limits are user(-space)-writable in the lm90 driver, while
> with your design these limits are under full control of the platform
> management code. You definitely don't want the user to come and adjust
> the limits, this could result in overheating of the system. So, do you
> have a plan to optionally switch limits to read only in the lm90
> driver? If not, how do you intend to solve the problem?

Yes, I had consider it, but didn't have good solution yet, because the
lm90 is a generic driver, it's difficult to add a new feature to switch
to read only or not.

> 
> The more I think about it, the more I wonder if a custom thermal driver
> wouldn't be better for 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-29 Thread Wei Ni
On 07/27/2013 11:02 PM, Jean Delvare wrote:
 Hi Wei,
 
 Sorry for the late reply.
 
 On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
 On 07/18/2013 11:58 PM, Jean Delvare wrote:
 First of all, how is the chip wired on your system? You are using an
 NCT1008, right? Which output of the chip is connected to your interrupt
 line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
 I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
 are comparator outputs, they stay low as long as the temperature are
 above the therm limits. Reading the status register won't bring them
 back up as I understand it, and contrary to the ALERT output, they
 can't be masked. Won't this result in an interrupt flood if using
 IRQF_TRIGGER_LOW? Have you tested your code already?

 Let me explain why I want to add the IRQ thread.
 In our tegra30 platform, we use an NCT1008, and in our downstream tree,
 it programmed as following:
 1. The pin THERM is connected to the PMIC, we will set the THERM limit
 in the initialization, once the this limit is tripped, it will trigged
 the PMIC, and the PMIC will shutdown the system immediately.
 
 OK, this is what the chip is designed for, good.
 
 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
 the interrupt line.
 
 Why don't you use the SMBus alert mechanism then? It is already
 implemented and allows you to reuse the interrupt of the SMBus
 controller.
 
 Of course this is a question for the hardware designers, not you... If
 the board uses a separate interrupt pin for the NCT1008's ALERT output,
 then the driver has to handle that interrupt explicitly, as done in your
 patch.
 
 One thing which worries me though is this explanation in the NCT1008
 datasheet:
 
 The ALERT interrupt latch is not reset by reading the
 status register. It resets when the ALERT output has been
 serviced by the master reading the device address, provided
 the error condition has gone away and the status register flag
 bits are reset.
 
 This suggests that connecting the ALERT output to a separate interrupt
 pin will not work, as the ALERT output will never be de-asserted even
 if the fault conditions are gone and the status register was read. But
 as you say this is how your system is supposed to work, can you explain
 how it can work?

Yes, we had met this problems, to fix this issue, we enabled one-shot
mode in the bottom half handler of nct interrupts to force a
conversion/comparison. This effectively stops repeated nct interrupts.
We will do following things in the IRQ thread:
1. stand by the nct1008. (set configure register bit 6)
2. update the limit value if needed.
3. write to one-shot resister.
4. give hardware necessary time to finish conversion
5. run the nct1008 (clear configure register bit 6)

 
 In the platform init, we will prepare some trip
 temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
 remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
 temperature exceed this rang, it will interrupt the system, then we will
 update the low/high limit to next rang, for example, if the temp raise
 to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
 we will run the throttle functions, and update cooling state.
 
 Hu ho, I have seen this kind of design in the past, and I must say I
 don't quite like it. Moving critical thermal management handling to the
 software makes me feel unsafe. System thermal safety should not rely on
 the OS IMHO. It is best handled by the hardware, or if that can't be
 done, by the BIOS. And these limit updates are tricky, they could fail
 and then you're in trouble. Imagine for example that another chip on
 the SMBus hugs the bus and delays or even plain prevents the change of
 limits...
 
 But once again I guess you're not responsible for this.

These trip-temps are not critical temperature, we used these temps to
update cooling states. For the critical-temp, we handle it like my
mentioned in #1.
Yes, it's not safe to rely on the software, so on our tegra114, we will
have soc thermal, which can handle these trip-temps on hardware.

 
 Another problem with this design, even if no such problem happens, is
 that the limits are user(-space)-writable in the lm90 driver, while
 with your design these limits are under full control of the platform
 management code. You definitely don't want the user to come and adjust
 the limits, this could result in overheating of the system. So, do you
 have a plan to optionally switch limits to read only in the lm90
 driver? If not, how do you intend to solve the problem?

Yes, I had consider it, but didn't have good solution yet, because the
lm90 is a generic driver, it's difficult to add a new feature to switch
to read only or not.

 
 The more I think about it, the more I wonder if a custom thermal driver
 wouldn't be better for your case. Exposing a few read-only values to
 user-space through the thermal/hwmon bridge would IMHO make more sense

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-29 Thread Jean Delvare
Hi Wei,

On Mon, 29 Jul 2013 18:14:56 +0800, Wei Ni wrote:
 On 07/27/2013 11:02 PM, Jean Delvare wrote:
  On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
  2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
  the interrupt line.
  
  Why don't you use the SMBus alert mechanism then? It is already
  implemented and allows you to reuse the interrupt of the SMBus
  controller.
  
  Of course this is a question for the hardware designers, not you... If
  the board uses a separate interrupt pin for the NCT1008's ALERT output,
  then the driver has to handle that interrupt explicitly, as done in your
  patch.
  
  One thing which worries me though is this explanation in the NCT1008
  datasheet:
  
  The ALERT interrupt latch is not reset by reading the
  status register. It resets when the ALERT output has been
  serviced by the master reading the device address, provided
  the error condition has gone away and the status register flag
  bits are reset.
  
  This suggests that connecting the ALERT output to a separate interrupt
  pin will not work, as the ALERT output will never be de-asserted even
  if the fault conditions are gone and the status register was read. But
  as you say this is how your system is supposed to work, can you explain
  how it can work?
 
 Yes, we had met this problems, to fix this issue, we enabled one-shot
 mode in the bottom half handler of nct interrupts to force a
 conversion/comparison. This effectively stops repeated nct interrupts.
 We will do following things in the IRQ thread:
 1. stand by the nct1008. (set configure register bit 6)
 2. update the limit value if needed.
 3. write to one-shot resister.
 4. give hardware necessary time to finish conversion
 5. run the nct1008 (clear configure register bit 6)

Doh, this is so ugly :(

Why don't you configure the pin as THERM2 instead of ALERT then? I'd
expect this to make things easier.

 (...)
 These trip-temps are not critical temperature, we used these temps to
 update cooling states. For the critical-temp, we handle it like my
 mentioned in #1.

I understand. But even if these interrupts are only used for managing
cooling states, a misbehavior could still have annoying consequences,
such as causing the thermal shutdown to trigger when this could have
been avoided, or throttling to stay enabled even though the system has
cooled down enough.

 Yes, it's not safe to rely on the software, so on our tegra114, we will
 have soc thermal, which can handle these trip-temps on hardware.

OK, good :)

 (...)
 Yes, absolutely agree, we can't add any private codes to the generic driver.
 I had talked about it with Durgadoss in his [PATCH] Thermal Framework
 Enhancements series, and in his v3 series, it introduced threshold
 concept, which used to set limit value, and the framework is notified
 about this interrupt to take appropriate action. But this function
 still didn't be completed yet.
 I think the thermal fw can expose callback like thermal_zone-alert, and
 in the irq_thread, we can notify the thermal fw to call this alert
 callback function, then the platform code can do anything in this callback.

I didn't follow the discussions closely, but something like this would
be needed, yes.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-27 Thread Jean Delvare
Hi Wei,

Sorry for the late reply.

On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
> On 07/18/2013 11:58 PM, Jean Delvare wrote:
> > First of all, how is the chip wired on your system? You are using an
> > NCT1008, right? Which output of the chip is connected to your interrupt
> > line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
> > I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
> > are comparator outputs, they stay low as long as the temperature are
> > above the therm limits. Reading the status register won't bring them
> > back up as I understand it, and contrary to the ALERT output, they
> > can't be masked. Won't this result in an interrupt flood if using
> > IRQF_TRIGGER_LOW? Have you tested your code already?
> 
> Let me explain why I want to add the IRQ thread.
> In our tegra30 platform, we use an NCT1008, and in our downstream tree,
> it programmed as following:
> 1. The pin THERM is connected to the PMIC, we will set the THERM limit
> in the initialization, once the this limit is tripped, it will trigged
> the PMIC, and the PMIC will shutdown the system immediately.

OK, this is what the chip is designed for, good.

> 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
> the interrupt line.

Why don't you use the SMBus alert mechanism then? It is already
implemented and allows you to reuse the interrupt of the SMBus
controller.

Of course this is a question for the hardware designers, not you... If
the board uses a separate interrupt pin for the NCT1008's ALERT output,
then the driver has to handle that interrupt explicitly, as done in your
patch.

One thing which worries me though is this explanation in the NCT1008
datasheet:

"The ALERT interrupt latch is not reset by reading the
status register. It resets when the ALERT output has been
serviced by the master reading the device address, provided
the error condition has gone away and the status register flag
bits are reset."

This suggests that connecting the ALERT output to a separate interrupt
pin will not work, as the ALERT output will never be de-asserted even
if the fault conditions are gone and the status register was read. But
as you say this is how your system is supposed to work, can you explain
how it can work?

> In the platform init, we will prepare some trip
> temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
> remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
> temperature exceed this rang, it will interrupt the system, then we will
> update the low/high limit to next rang, for example, if the temp raise
> to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
> we will run the throttle functions, and update cooling state.

Hu ho, I have seen this kind of design in the past, and I must say I
don't quite like it. Moving critical thermal management handling to the
software makes me feel unsafe. System thermal safety should not rely on
the OS IMHO. It is best handled by the hardware, or if that can't be
done, by the BIOS. And these limit updates are tricky, they could fail
and then you're in trouble. Imagine for example that another chip on
the SMBus hugs the bus and delays or even plain prevents the change of
limits...

But once again I guess you're not responsible for this.

Another problem with this design, even if no such problem happens, is
that the limits are user(-space)-writable in the lm90 driver, while
with your design these limits are under full control of the platform
management code. You definitely don't want the user to come and adjust
the limits, this could result in overheating of the system. So, do you
have a plan to optionally switch limits to read only in the lm90
driver? If not, how do you intend to solve the problem?

The more I think about it, the more I wonder if a custom thermal driver
wouldn't be better for your case. Exposing a few read-only values to
user-space through the thermal/hwmon bridge would IMHO make more sense
than cluttering the lm90 driver with conditionals to limit what it
exposes to user-space.

> We wish to upstream these codes, but as you know, it's difficult to use
> current lm90.c and thermal framework to implement it, and these codes
> related many other things, such as throttling cpufreq, other clock freq.
> So at first, I want to update the lm90.c, so that I can add it to the
> thermal fw and use interrupt handler easily. And at the same time I
> followed the thermal framework thread, there has new infrastructure
> posted, which is more easy to add lm90 to thermal fw.

Don't get me wrong, I'm very happy with your efforts to upstream your
code, this is very welcome. And I am also very happy that you split it
into small chunks which are easier to review. But I also need to know
the big picture to see where you're ultimately going.

> > (...)
> > For a real system, if the THERM output is connected to an interrupt line
> > (instead of directly to a fan controller) I would 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-27 Thread Jean Delvare
Hi Wei,

Sorry for the late reply.

On Fri, 19 Jul 2013 14:41:54 +0800, Wei Ni wrote:
 On 07/18/2013 11:58 PM, Jean Delvare wrote:
  First of all, how is the chip wired on your system? You are using an
  NCT1008, right? Which output of the chip is connected to your interrupt
  line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
  I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
  are comparator outputs, they stay low as long as the temperature are
  above the therm limits. Reading the status register won't bring them
  back up as I understand it, and contrary to the ALERT output, they
  can't be masked. Won't this result in an interrupt flood if using
  IRQF_TRIGGER_LOW? Have you tested your code already?
 
 Let me explain why I want to add the IRQ thread.
 In our tegra30 platform, we use an NCT1008, and in our downstream tree,
 it programmed as following:
 1. The pin THERM is connected to the PMIC, we will set the THERM limit
 in the initialization, once the this limit is tripped, it will trigged
 the PMIC, and the PMIC will shutdown the system immediately.

OK, this is what the chip is designed for, good.

 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
 the interrupt line.

Why don't you use the SMBus alert mechanism then? It is already
implemented and allows you to reuse the interrupt of the SMBus
controller.

Of course this is a question for the hardware designers, not you... If
the board uses a separate interrupt pin for the NCT1008's ALERT output,
then the driver has to handle that interrupt explicitly, as done in your
patch.

One thing which worries me though is this explanation in the NCT1008
datasheet:

The ALERT interrupt latch is not reset by reading the
status register. It resets when the ALERT output has been
serviced by the master reading the device address, provided
the error condition has gone away and the status register flag
bits are reset.

This suggests that connecting the ALERT output to a separate interrupt
pin will not work, as the ALERT output will never be de-asserted even
if the fault conditions are gone and the status register was read. But
as you say this is how your system is supposed to work, can you explain
how it can work?

 In the platform init, we will prepare some trip
 temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
 remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
 temperature exceed this rang, it will interrupt the system, then we will
 update the low/high limit to next rang, for example, if the temp raise
 to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
 we will run the throttle functions, and update cooling state.

Hu ho, I have seen this kind of design in the past, and I must say I
don't quite like it. Moving critical thermal management handling to the
software makes me feel unsafe. System thermal safety should not rely on
the OS IMHO. It is best handled by the hardware, or if that can't be
done, by the BIOS. And these limit updates are tricky, they could fail
and then you're in trouble. Imagine for example that another chip on
the SMBus hugs the bus and delays or even plain prevents the change of
limits...

But once again I guess you're not responsible for this.

Another problem with this design, even if no such problem happens, is
that the limits are user(-space)-writable in the lm90 driver, while
with your design these limits are under full control of the platform
management code. You definitely don't want the user to come and adjust
the limits, this could result in overheating of the system. So, do you
have a plan to optionally switch limits to read only in the lm90
driver? If not, how do you intend to solve the problem?

The more I think about it, the more I wonder if a custom thermal driver
wouldn't be better for your case. Exposing a few read-only values to
user-space through the thermal/hwmon bridge would IMHO make more sense
than cluttering the lm90 driver with conditionals to limit what it
exposes to user-space.

 We wish to upstream these codes, but as you know, it's difficult to use
 current lm90.c and thermal framework to implement it, and these codes
 related many other things, such as throttling cpufreq, other clock freq.
 So at first, I want to update the lm90.c, so that I can add it to the
 thermal fw and use interrupt handler easily. And at the same time I
 followed the thermal framework thread, there has new infrastructure
 posted, which is more easy to add lm90 to thermal fw.

Don't get me wrong, I'm very happy with your efforts to upstream your
code, this is very welcome. And I am also very happy that you split it
into small chunks which are easier to review. But I also need to know
the big picture to see where you're ultimately going.

  (...)
  For a real system, if the THERM output is connected to an interrupt line
  (instead of directly to a fan controller) I would expect the platform
  to provide a callback to handle 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-24 Thread Jean Delvare
Hi Wei,

On Wed, 24 Jul 2013 15:46:46 +0800, Wei Ni wrote:
> Do you have any more suggestions for this series, if no, I will prepare
> v4 patches.

Sorry, I have a couple more replies "in progress" but had friends at
home last week end so I did not have the time to finish and send them.
I'll try to do that today, thanks for your patience.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-24 Thread Wei Ni
Hi, Jean

On 07/19/2013 02:41 PM, Wei Ni wrote:
> On 07/18/2013 11:58 PM, Jean Delvare wrote:
>> Hi Wei,
>>
>> On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
>>> When the temperature exceed the limit range value,
>>> the driver can handle the interrupt.
>>>
>>> Signed-off-by: Wei Ni 
>>> ---
>>>  drivers/hwmon/lm90.c |   24 
>>>  1 file changed, 24 insertions(+)
>>>
>>> diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
>>> index c90037f..1cc3d19 100644
>>> --- a/drivers/hwmon/lm90.c
>>> +++ b/drivers/hwmon/lm90.c
>>> @@ -89,6 +89,7 @@
>>>  #include 
>>>  #include 
>>>  #include 
>>> +#include 
>>>  
>>>  /*
>>>   * Addresses to scan
>>> @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client 
>>> *client)
>>> return true;
>>>  }
>>>  
>>> +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
>>> +{
>>> +   struct lm90_data *data = dev_id;
>>> +   struct i2c_client *client = to_i2c_client(data->hwmon_dev->parent);
>>
>> Why are you passing data as the dev_id in the first place, instead of
>> client? It's easier to get the data when you have the client
>> (i2c_get_clientdata) than the other way around.
> 
> Oh, I'm stupid :)
> I will pass the client.
> 
>>
>>> +
>>> +   if (lm90_is_tripped(client))
>>> +   return IRQ_HANDLED;
>>> +   else
>>> +   return IRQ_NONE;
>>> +}
>>> +
>>>  static int lm90_probe(struct i2c_client *client,
>>>   const struct i2c_device_id *id)
>>>  {
>>> @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
>>> goto exit_remove_files;
>>> }
>>>  
>>> +   if (client->irq >= 0) {
>>
>> I though you had agreed to just check for (client->irq)?
> 
> Oh, yes, I forgot to change it, thanks, I will update it.
> 
>>
>>> +   dev_dbg(dev, "lm90 IRQ: %d\n", client->irq);
>>
>> The "lm90" is redundant, dev_dbg will use the chip name as a prefix.
> 
> Ok, I will remove it.
> 
>>
>>> +   err = devm_request_threaded_irq(dev, client->irq,
>>> +   NULL, lm90_irq_thread,
>>> +   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
>>> +   "lm90", data);
>>> +   if (err < 0) {
>>> +   dev_err(dev, "cannot request interrupt\n");
>>
>> You should include the IRQ number in the error message, it is useful
>> for investigating the issue. Not everyone will have the debugging
>> message above enabled.
> 
> Yes, you are right, I will add the IRQ number.
> 
>>
>>> +   goto exit_remove_files;
>>> +   }
>>> +   }
>>> +
>>> return 0;
>>>  
>>>  exit_remove_files:
>>
>> That's it for the code. Now I'm not sure I understand what you are
>> trying to do and what this is all good for.
>>
>> First of all, how is the chip wired on your system? You are using an
>> NCT1008, right? Which output of the chip is connected to your interrupt
>> line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
>> I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
>> are comparator outputs, they stay low as long as the temperature are
>> above the therm limits. Reading the status register won't bring them
>> back up as I understand it, and contrary to the ALERT output, they
>> can't be masked. Won't this result in an interrupt flood if using
>> IRQF_TRIGGER_LOW? Have you tested your code already?
> 
> Let me explain why I want to add the IRQ thread.
> In our tegra30 platform, we use an NCT1008, and in our downstream tree,
> it programmed as following:
> 1. The pin THERM is connected to the PMIC, we will set the THERM limit
> in the initialization, once the this limit is tripped, it will trigged
> the PMIC, and the PMIC will shutdown the system immediately.
> 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
> the interrupt line. In the platform init, we will prepare some trip
> temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
> remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
> temperature exceed this rang, it will interrupt the system, then we will
> update the low/high limit to next rang, for example, if the temp raise
> to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
> we will run the throttle functions, and update cooling state.
> 
> We wish to upstream these codes, but as you know, it's difficult to use
> current lm90.c and thermal framework to implement it, and these codes
> related many other things, such as throttling cpufreq, other clock freq.
> So at first, I want to update the lm90.c, so that I can add it to the
> thermal fw and use interrupt handler easily. And at the same time I
> followed the thermal framework thread, there has new infrastructure
> posted, which is more easy to add lm90 to thermal fw.
> 
>>
>> Also I don't think just logging an error message is the right thing to
>> do in the case of overheating. The code to handle 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-24 Thread Wei Ni
Hi, Jean

On 07/19/2013 02:41 PM, Wei Ni wrote:
 On 07/18/2013 11:58 PM, Jean Delvare wrote:
 Hi Wei,

 On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
 When the temperature exceed the limit range value,
 the driver can handle the interrupt.

 Signed-off-by: Wei Ni w...@nvidia.com
 ---
  drivers/hwmon/lm90.c |   24 
  1 file changed, 24 insertions(+)

 diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
 index c90037f..1cc3d19 100644
 --- a/drivers/hwmon/lm90.c
 +++ b/drivers/hwmon/lm90.c
 @@ -89,6 +89,7 @@
  #include linux/err.h
  #include linux/mutex.h
  #include linux/sysfs.h
 +#include linux/interrupt.h
  
  /*
   * Addresses to scan
 @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client 
 *client)
 return true;
  }
  
 +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
 +{
 +   struct lm90_data *data = dev_id;
 +   struct i2c_client *client = to_i2c_client(data-hwmon_dev-parent);

 Why are you passing data as the dev_id in the first place, instead of
 client? It's easier to get the data when you have the client
 (i2c_get_clientdata) than the other way around.
 
 Oh, I'm stupid :)
 I will pass the client.
 

 +
 +   if (lm90_is_tripped(client))
 +   return IRQ_HANDLED;
 +   else
 +   return IRQ_NONE;
 +}
 +
  static int lm90_probe(struct i2c_client *client,
   const struct i2c_device_id *id)
  {
 @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
 goto exit_remove_files;
 }
  
 +   if (client-irq = 0) {

 I though you had agreed to just check for (client-irq)?
 
 Oh, yes, I forgot to change it, thanks, I will update it.
 

 +   dev_dbg(dev, lm90 IRQ: %d\n, client-irq);

 The lm90 is redundant, dev_dbg will use the chip name as a prefix.
 
 Ok, I will remove it.
 

 +   err = devm_request_threaded_irq(dev, client-irq,
 +   NULL, lm90_irq_thread,
 +   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
 +   lm90, data);
 +   if (err  0) {
 +   dev_err(dev, cannot request interrupt\n);

 You should include the IRQ number in the error message, it is useful
 for investigating the issue. Not everyone will have the debugging
 message above enabled.
 
 Yes, you are right, I will add the IRQ number.
 

 +   goto exit_remove_files;
 +   }
 +   }
 +
 return 0;
  
  exit_remove_files:

 That's it for the code. Now I'm not sure I understand what you are
 trying to do and what this is all good for.

 First of all, how is the chip wired on your system? You are using an
 NCT1008, right? Which output of the chip is connected to your interrupt
 line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
 I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
 are comparator outputs, they stay low as long as the temperature are
 above the therm limits. Reading the status register won't bring them
 back up as I understand it, and contrary to the ALERT output, they
 can't be masked. Won't this result in an interrupt flood if using
 IRQF_TRIGGER_LOW? Have you tested your code already?
 
 Let me explain why I want to add the IRQ thread.
 In our tegra30 platform, we use an NCT1008, and in our downstream tree,
 it programmed as following:
 1. The pin THERM is connected to the PMIC, we will set the THERM limit
 in the initialization, once the this limit is tripped, it will trigged
 the PMIC, and the PMIC will shutdown the system immediately.
 2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
 the interrupt line. In the platform init, we will prepare some trip
 temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
 remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
 temperature exceed this rang, it will interrupt the system, then we will
 update the low/high limit to next rang, for example, if the temp raise
 to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
 we will run the throttle functions, and update cooling state.
 
 We wish to upstream these codes, but as you know, it's difficult to use
 current lm90.c and thermal framework to implement it, and these codes
 related many other things, such as throttling cpufreq, other clock freq.
 So at first, I want to update the lm90.c, so that I can add it to the
 thermal fw and use interrupt handler easily. And at the same time I
 followed the thermal framework thread, there has new infrastructure
 posted, which is more easy to add lm90 to thermal fw.
 

 Also I don't think just logging an error message is the right thing to
 do in the case of overheating. The code to handle SMBus alerts is from
 me, and it does indeed just log the problems, but it was really only
 meant as a proof of concept when I first added support for SMBus Alert.
 Today we could do better than this, starting with issuing sysfs

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-24 Thread Jean Delvare
Hi Wei,

On Wed, 24 Jul 2013 15:46:46 +0800, Wei Ni wrote:
 Do you have any more suggestions for this series, if no, I will prepare
 v4 patches.

Sorry, I have a couple more replies in progress but had friends at
home last week end so I did not have the time to finish and send them.
I'll try to do that today, thanks for your patience.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-19 Thread Wei Ni
On 07/18/2013 11:58 PM, Jean Delvare wrote:
> Hi Wei,
> 
> On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
>> When the temperature exceed the limit range value,
>> the driver can handle the interrupt.
>>
>> Signed-off-by: Wei Ni 
>> ---
>>  drivers/hwmon/lm90.c |   24 
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
>> index c90037f..1cc3d19 100644
>> --- a/drivers/hwmon/lm90.c
>> +++ b/drivers/hwmon/lm90.c
>> @@ -89,6 +89,7 @@
>>  #include 
>>  #include 
>>  #include 
>> +#include 
>>  
>>  /*
>>   * Addresses to scan
>> @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
>>  return true;
>>  }
>>  
>> +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
>> +{
>> +struct lm90_data *data = dev_id;
>> +struct i2c_client *client = to_i2c_client(data->hwmon_dev->parent);
> 
> Why are you passing data as the dev_id in the first place, instead of
> client? It's easier to get the data when you have the client
> (i2c_get_clientdata) than the other way around.

Oh, I'm stupid :)
I will pass the client.

> 
>> +
>> +if (lm90_is_tripped(client))
>> +return IRQ_HANDLED;
>> +else
>> +return IRQ_NONE;
>> +}
>> +
>>  static int lm90_probe(struct i2c_client *client,
>>const struct i2c_device_id *id)
>>  {
>> @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
>>  goto exit_remove_files;
>>  }
>>  
>> +if (client->irq >= 0) {
> 
> I though you had agreed to just check for (client->irq)?

Oh, yes, I forgot to change it, thanks, I will update it.

> 
>> +dev_dbg(dev, "lm90 IRQ: %d\n", client->irq);
> 
> The "lm90" is redundant, dev_dbg will use the chip name as a prefix.

Ok, I will remove it.

> 
>> +err = devm_request_threaded_irq(dev, client->irq,
>> +NULL, lm90_irq_thread,
>> +IRQF_TRIGGER_LOW | IRQF_ONESHOT,
>> +"lm90", data);
>> +if (err < 0) {
>> +dev_err(dev, "cannot request interrupt\n");
> 
> You should include the IRQ number in the error message, it is useful
> for investigating the issue. Not everyone will have the debugging
> message above enabled.

Yes, you are right, I will add the IRQ number.

> 
>> +goto exit_remove_files;
>> +}
>> +}
>> +
>>  return 0;
>>  
>>  exit_remove_files:
> 
> That's it for the code. Now I'm not sure I understand what you are
> trying to do and what this is all good for.
> 
> First of all, how is the chip wired on your system? You are using an
> NCT1008, right? Which output of the chip is connected to your interrupt
> line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
> I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
> are comparator outputs, they stay low as long as the temperature are
> above the therm limits. Reading the status register won't bring them
> back up as I understand it, and contrary to the ALERT output, they
> can't be masked. Won't this result in an interrupt flood if using
> IRQF_TRIGGER_LOW? Have you tested your code already?

Let me explain why I want to add the IRQ thread.
In our tegra30 platform, we use an NCT1008, and in our downstream tree,
it programmed as following:
1. The pin THERM is connected to the PMIC, we will set the THERM limit
in the initialization, once the this limit is tripped, it will trigged
the PMIC, and the PMIC will shutdown the system immediately.
2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
the interrupt line. In the platform init, we will prepare some trip
temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
temperature exceed this rang, it will interrupt the system, then we will
update the low/high limit to next rang, for example, if the temp raise
to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
we will run the throttle functions, and update cooling state.

We wish to upstream these codes, but as you know, it's difficult to use
current lm90.c and thermal framework to implement it, and these codes
related many other things, such as throttling cpufreq, other clock freq.
So at first, I want to update the lm90.c, so that I can add it to the
thermal fw and use interrupt handler easily. And at the same time I
followed the thermal framework thread, there has new infrastructure
posted, which is more easy to add lm90 to thermal fw.

> 
> Also I don't think just logging an error message is the right thing to
> do in the case of overheating. The code to handle SMBus alerts is from
> me, and it does indeed just log the problems, but it was really only
> meant as a proof of concept when I first added support for SMBus Alert.
> Today we could do 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-19 Thread Wei Ni
On 07/18/2013 11:58 PM, Jean Delvare wrote:
 Hi Wei,
 
 On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
 When the temperature exceed the limit range value,
 the driver can handle the interrupt.

 Signed-off-by: Wei Ni w...@nvidia.com
 ---
  drivers/hwmon/lm90.c |   24 
  1 file changed, 24 insertions(+)

 diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
 index c90037f..1cc3d19 100644
 --- a/drivers/hwmon/lm90.c
 +++ b/drivers/hwmon/lm90.c
 @@ -89,6 +89,7 @@
  #include linux/err.h
  #include linux/mutex.h
  #include linux/sysfs.h
 +#include linux/interrupt.h
  
  /*
   * Addresses to scan
 @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
  return true;
  }
  
 +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
 +{
 +struct lm90_data *data = dev_id;
 +struct i2c_client *client = to_i2c_client(data-hwmon_dev-parent);
 
 Why are you passing data as the dev_id in the first place, instead of
 client? It's easier to get the data when you have the client
 (i2c_get_clientdata) than the other way around.

Oh, I'm stupid :)
I will pass the client.

 
 +
 +if (lm90_is_tripped(client))
 +return IRQ_HANDLED;
 +else
 +return IRQ_NONE;
 +}
 +
  static int lm90_probe(struct i2c_client *client,
const struct i2c_device_id *id)
  {
 @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
  goto exit_remove_files;
  }
  
 +if (client-irq = 0) {
 
 I though you had agreed to just check for (client-irq)?

Oh, yes, I forgot to change it, thanks, I will update it.

 
 +dev_dbg(dev, lm90 IRQ: %d\n, client-irq);
 
 The lm90 is redundant, dev_dbg will use the chip name as a prefix.

Ok, I will remove it.

 
 +err = devm_request_threaded_irq(dev, client-irq,
 +NULL, lm90_irq_thread,
 +IRQF_TRIGGER_LOW | IRQF_ONESHOT,
 +lm90, data);
 +if (err  0) {
 +dev_err(dev, cannot request interrupt\n);
 
 You should include the IRQ number in the error message, it is useful
 for investigating the issue. Not everyone will have the debugging
 message above enabled.

Yes, you are right, I will add the IRQ number.

 
 +goto exit_remove_files;
 +}
 +}
 +
  return 0;
  
  exit_remove_files:
 
 That's it for the code. Now I'm not sure I understand what you are
 trying to do and what this is all good for.
 
 First of all, how is the chip wired on your system? You are using an
 NCT1008, right? Which output of the chip is connected to your interrupt
 line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
 I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
 are comparator outputs, they stay low as long as the temperature are
 above the therm limits. Reading the status register won't bring them
 back up as I understand it, and contrary to the ALERT output, they
 can't be masked. Won't this result in an interrupt flood if using
 IRQF_TRIGGER_LOW? Have you tested your code already?

Let me explain why I want to add the IRQ thread.
In our tegra30 platform, we use an NCT1008, and in our downstream tree,
it programmed as following:
1. The pin THERM is connected to the PMIC, we will set the THERM limit
in the initialization, once the this limit is tripped, it will trigged
the PMIC, and the PMIC will shutdown the system immediately.
2. The pin ALERT/THERM2 is configured as ALERT, and it is connected to
the interrupt line. In the platform init, we will prepare some trip
temps, such as 20C, 40C,60C, 80C, and we will set 20C to the
remote-low-temp-limit, and set 40C to remote-high-temp-limit. When the
temperature exceed this rang, it will interrupt the system, then we will
update the low/high limit to next rang, for example, if the temp raise
to 50C, we will set 40C to low-limit, and set 60C to hight-limit, then
we will run the throttle functions, and update cooling state.

We wish to upstream these codes, but as you know, it's difficult to use
current lm90.c and thermal framework to implement it, and these codes
related many other things, such as throttling cpufreq, other clock freq.
So at first, I want to update the lm90.c, so that I can add it to the
thermal fw and use interrupt handler easily. And at the same time I
followed the thermal framework thread, there has new infrastructure
posted, which is more easy to add lm90 to thermal fw.

 
 Also I don't think just logging an error message is the right thing to
 do in the case of overheating. The code to handle SMBus alerts is from
 me, and it does indeed just log the problems, but it was really only
 meant as a proof of concept when I first added support for SMBus Alert.
 Today we could do better than this, starting with issuing sysfs
 notifications to the relevant alarm files (several 

Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-18 Thread Jean Delvare
Hi Wei,

On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
> When the temperature exceed the limit range value,
> the driver can handle the interrupt.
> 
> Signed-off-by: Wei Ni 
> ---
>  drivers/hwmon/lm90.c |   24 
>  1 file changed, 24 insertions(+)
> 
> diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
> index c90037f..1cc3d19 100644
> --- a/drivers/hwmon/lm90.c
> +++ b/drivers/hwmon/lm90.c
> @@ -89,6 +89,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  /*
>   * Addresses to scan
> @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
>   return true;
>  }
>  
> +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
> +{
> + struct lm90_data *data = dev_id;
> + struct i2c_client *client = to_i2c_client(data->hwmon_dev->parent);

Why are you passing data as the dev_id in the first place, instead of
client? It's easier to get the data when you have the client
(i2c_get_clientdata) than the other way around.

> +
> + if (lm90_is_tripped(client))
> + return IRQ_HANDLED;
> + else
> + return IRQ_NONE;
> +}
> +
>  static int lm90_probe(struct i2c_client *client,
> const struct i2c_device_id *id)
>  {
> @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
>   goto exit_remove_files;
>   }
>  
> + if (client->irq >= 0) {

I though you had agreed to just check for (client->irq)?

> + dev_dbg(dev, "lm90 IRQ: %d\n", client->irq);

The "lm90" is redundant, dev_dbg will use the chip name as a prefix.

> + err = devm_request_threaded_irq(dev, client->irq,
> + NULL, lm90_irq_thread,
> + IRQF_TRIGGER_LOW | IRQF_ONESHOT,
> + "lm90", data);
> + if (err < 0) {
> + dev_err(dev, "cannot request interrupt\n");

You should include the IRQ number in the error message, it is useful
for investigating the issue. Not everyone will have the debugging
message above enabled.

> + goto exit_remove_files;
> + }
> + }
> +
>   return 0;
>  
>  exit_remove_files:

That's it for the code. Now I'm not sure I understand what you are
trying to do and what this is all good for.

First of all, how is the chip wired on your system? You are using an
NCT1008, right? Which output of the chip is connected to your interrupt
line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
are comparator outputs, they stay low as long as the temperature are
above the therm limits. Reading the status register won't bring them
back up as I understand it, and contrary to the ALERT output, they
can't be masked. Won't this result in an interrupt flood if using
IRQF_TRIGGER_LOW? Have you tested your code already?

Also I don't think just logging an error message is the right thing to
do in the case of overheating. The code to handle SMBus alerts is from
me, and it does indeed just log the problems, but it was really only
meant as a proof of concept when I first added support for SMBus Alert.
Today we could do better than this, starting with issuing sysfs
notifications to the relevant alarm files (several other hwmon drivers
are doing that already.)

For a real system, if the THERM output is connected to an interrupt line
(instead of directly to a fan controller) I would expect the platform
to provide a callback to handle thermal events and take whatever
appropriate measure is needed (e.g. throttling.) Just logging the
problem won't save the system, by the time someone looks at the log it
might be too late.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-18 Thread Jean Delvare
Hi Wei,

On Fri, 12 Jul 2013 15:48:06 +0800, Wei Ni wrote:
 When the temperature exceed the limit range value,
 the driver can handle the interrupt.
 
 Signed-off-by: Wei Ni w...@nvidia.com
 ---
  drivers/hwmon/lm90.c |   24 
  1 file changed, 24 insertions(+)
 
 diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
 index c90037f..1cc3d19 100644
 --- a/drivers/hwmon/lm90.c
 +++ b/drivers/hwmon/lm90.c
 @@ -89,6 +89,7 @@
  #include linux/err.h
  #include linux/mutex.h
  #include linux/sysfs.h
 +#include linux/interrupt.h
  
  /*
   * Addresses to scan
 @@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
   return true;
  }
  
 +static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
 +{
 + struct lm90_data *data = dev_id;
 + struct i2c_client *client = to_i2c_client(data-hwmon_dev-parent);

Why are you passing data as the dev_id in the first place, instead of
client? It's easier to get the data when you have the client
(i2c_get_clientdata) than the other way around.

 +
 + if (lm90_is_tripped(client))
 + return IRQ_HANDLED;
 + else
 + return IRQ_NONE;
 +}
 +
  static int lm90_probe(struct i2c_client *client,
 const struct i2c_device_id *id)
  {
 @@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
   goto exit_remove_files;
   }
  
 + if (client-irq = 0) {

I though you had agreed to just check for (client-irq)?

 + dev_dbg(dev, lm90 IRQ: %d\n, client-irq);

The lm90 is redundant, dev_dbg will use the chip name as a prefix.

 + err = devm_request_threaded_irq(dev, client-irq,
 + NULL, lm90_irq_thread,
 + IRQF_TRIGGER_LOW | IRQF_ONESHOT,
 + lm90, data);
 + if (err  0) {
 + dev_err(dev, cannot request interrupt\n);

You should include the IRQ number in the error message, it is useful
for investigating the issue. Not everyone will have the debugging
message above enabled.

 + goto exit_remove_files;
 + }
 + }
 +
   return 0;
  
  exit_remove_files:

That's it for the code. Now I'm not sure I understand what you are
trying to do and what this is all good for.

First of all, how is the chip wired on your system? You are using an
NCT1008, right? Which output of the chip is connected to your interrupt
line, THERM or ALERT/THERM2? ALERT is normally used for SMBus Alert but
I suppose it could be used for an interrupt too. THERM and THERM2 OTOH
are comparator outputs, they stay low as long as the temperature are
above the therm limits. Reading the status register won't bring them
back up as I understand it, and contrary to the ALERT output, they
can't be masked. Won't this result in an interrupt flood if using
IRQF_TRIGGER_LOW? Have you tested your code already?

Also I don't think just logging an error message is the right thing to
do in the case of overheating. The code to handle SMBus alerts is from
me, and it does indeed just log the problems, but it was really only
meant as a proof of concept when I first added support for SMBus Alert.
Today we could do better than this, starting with issuing sysfs
notifications to the relevant alarm files (several other hwmon drivers
are doing that already.)

For a real system, if the THERM output is connected to an interrupt line
(instead of directly to a fan controller) I would expect the platform
to provide a callback to handle thermal events and take whatever
appropriate measure is needed (e.g. throttling.) Just logging the
problem won't save the system, by the time someone looks at the log it
might be too late.

-- 
Jean Delvare
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-12 Thread Wei Ni
When the temperature exceed the limit range value,
the driver can handle the interrupt.

Signed-off-by: Wei Ni 
---
 drivers/hwmon/lm90.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
index c90037f..1cc3d19 100644
--- a/drivers/hwmon/lm90.c
+++ b/drivers/hwmon/lm90.c
@@ -89,6 +89,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /*
  * Addresses to scan
@@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
return true;
 }
 
+static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
+{
+   struct lm90_data *data = dev_id;
+   struct i2c_client *client = to_i2c_client(data->hwmon_dev->parent);
+
+   if (lm90_is_tripped(client))
+   return IRQ_HANDLED;
+   else
+   return IRQ_NONE;
+}
+
 static int lm90_probe(struct i2c_client *client,
  const struct i2c_device_id *id)
 {
@@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
goto exit_remove_files;
}
 
+   if (client->irq >= 0) {
+   dev_dbg(dev, "lm90 IRQ: %d\n", client->irq);
+   err = devm_request_threaded_irq(dev, client->irq,
+   NULL, lm90_irq_thread,
+   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
+   "lm90", data);
+   if (err < 0) {
+   dev_err(dev, "cannot request interrupt\n");
+   goto exit_remove_files;
+   }
+   }
+
return 0;
 
 exit_remove_files:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3 3/4] hwmon: (lm90) add support to handle IRQ

2013-07-12 Thread Wei Ni
When the temperature exceed the limit range value,
the driver can handle the interrupt.

Signed-off-by: Wei Ni w...@nvidia.com
---
 drivers/hwmon/lm90.c |   24 
 1 file changed, 24 insertions(+)

diff --git a/drivers/hwmon/lm90.c b/drivers/hwmon/lm90.c
index c90037f..1cc3d19 100644
--- a/drivers/hwmon/lm90.c
+++ b/drivers/hwmon/lm90.c
@@ -89,6 +89,7 @@
 #include linux/err.h
 #include linux/mutex.h
 #include linux/sysfs.h
+#include linux/interrupt.h
 
 /*
  * Addresses to scan
@@ -1460,6 +1461,17 @@ static bool lm90_is_tripped(struct i2c_client *client)
return true;
 }
 
+static irqreturn_t lm90_irq_thread(int irq, void *dev_id)
+{
+   struct lm90_data *data = dev_id;
+   struct i2c_client *client = to_i2c_client(data-hwmon_dev-parent);
+
+   if (lm90_is_tripped(client))
+   return IRQ_HANDLED;
+   else
+   return IRQ_NONE;
+}
+
 static int lm90_probe(struct i2c_client *client,
  const struct i2c_device_id *id)
 {
@@ -1536,6 +1548,18 @@ static int lm90_probe(struct i2c_client *client,
goto exit_remove_files;
}
 
+   if (client-irq = 0) {
+   dev_dbg(dev, lm90 IRQ: %d\n, client-irq);
+   err = devm_request_threaded_irq(dev, client-irq,
+   NULL, lm90_irq_thread,
+   IRQF_TRIGGER_LOW | IRQF_ONESHOT,
+   lm90, data);
+   if (err  0) {
+   dev_err(dev, cannot request interrupt\n);
+   goto exit_remove_files;
+   }
+   }
+
return 0;
 
 exit_remove_files:
-- 
1.7.9.5

--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/