Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

2019-03-03 Thread Jonathan Cameron
t; conversion. We currently don't fully understand what is going on, but we 
> > developed a workaround that is enhancing the timeout handling in 
> > at91_adc_read_raw() with a simple retry mechanism. (restarting with 
> > ADC_START)
> > 
> > * 4th issue: when touch operation is enabled (TSMODE) a delay time between 
> > a channel disable (AT91_ADC_CHDR) and channel enable (AT91_ADC_CHER) is 
> > needed otherwise the following conversion will fail.
> > 
> > This issue represents the second way how timeout situations can arise. We 
> > measured around 40 to 42 ADC Cycles (at different ADC Clocks 500khz, 1mhz, 
> > 2mhz, 3mhz, 5mhz) of delay needed between the channel disable and the 
> > channel enable. At higher adc clock frequency (10mhz, 20 mhz) the issue 
> > doesn't show up (most likely because the function calls are delaying 
> > already enough). We also observed that the need delay time increases 
> > further according to the transfer time settings (TRANSFER) by 0, 2, 4 or 8  
> > ADC cycles.
> > 
> > This Issue can be reproduced by compiling and running the error 
> > demonstrator (single channel) and including a simple error message in the 
> > kernel module in the timeout case.
> > 
> > $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -o adc_demo_error
> > 
> > This issue is least understood by us and I think that talking to the 
> > hardware people regarding this issue can be worthwhile. 
> > As a workaround we put at91_adc_read_raw() to sleep after the writing 
> > AT91_ADC_CHDR for 50 to 75 ADC Cycles. 
> > 
> > 
> > 
> > We will run another round of tests next week - and after that I would like 
> > to prepare a pull request concerning the 2nd Issue. For the 3rd and the 4th 
> > Issue I would like to hear your opinion if our modifications are sound 
> > before I will write an pull request.
> > 
> > Thanks for your time,
> > Best wishes,
> > Georg Ottinger
> > 
> > -
> > 
> > -Ursprüngliche Nachricht-
> > Von: Jonathan Cameron [mailto:jonathan.came...@huawei.com] 
> > Gesendet: Montag, 04. Februar 2019 10:46
> > An: Georg Ottinger 
> > Cc: Jonathan Cameron ; eugen.hris...@microchip.com; 
> > Stefan Etzlstorfer ; Hartmut Knaack 
> > ; Lars-Peter Clausen ; Peter 
> > Meerwald-Stadler ; Nicolas Ferre 
> > ; Alexandre Belloni 
> > ; Ludovic Desroches 
> > ; David S. Miller ; 
> > Ard Biesheuvel ; Kees Cook 
> > ; linux-...@vger.kernel.org; 
> > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Maxime 
> > Ripard 
> > Betreff: Re: [PATCH] iio: adc: at91: disable adc channel interrupt in 
> > timeout case
> > 
> > On Mon, 4 Feb 2019 07:17:07 +
> > Georg Ottinger  wrote:
> >   
> > > Actually this issue occurred to us with an concrete product, where we 
> > > experienced a system hang at -20 °C.
> > > It was triggered by a race condition between the Touch Trigger and the 
> > > Channel Trigger of the ADC. Once triggered we got in to the situation 
> > > where an ongoing Channel Conversion was lost (Timeout case).When we 
> > > queried a second channel than we got a system hang. Investigating this 
> > > issue we developed an error demonstrator - reading alternating two 
> > > channels as fast as possible (when Touch is enabled). This also provokes 
> > > this issue at room temperature.
> > > 
> > > For the error demonstrator use following commandline to compile:
> > > 
> > > $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -D2ND_CHANNEL -o 
> > > adc_demo_error2
> > > 
> > > -
> > > // adc_demo_error.c
> > > #include 
> > > #include 
> > > 
> > > #define VLEN 10
> > > 
> > > int main()
> > > {
> > >   int fd_adc,fd_adc2;
> > >   int ret;
> > >   char dummy[VLEN];
> > >   
> > >   fd_adc = open 
> > > ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> > > e4_raw",O_RDONLY);
> > > #ifdef 2ND_CHANNEL
> > >   fd_adc2 = open 
> > > ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> > > e5_raw",O_RDONLY);
> > > #endif
> > > 
> > >   while(1) {
> > > 
> > > lseek(fd_adc, 0, SEEK_SET);
> > > ret = read(fd_adc, dummy, VLEN);
> > > #ifdef 2ND_CHANNEL
> > > lseek(fd_adc2, 0, SEEK_SET);
> > >  

Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

2019-02-22 Thread Ludovic Desroches
 higher adc clock frequency (10mhz, 20 mhz) the issue doesn't show 
> up (most likely because the function calls are delaying already enough). We 
> also observed that the need delay time increases further according to the 
> transfer time settings (TRANSFER) by 0, 2, 4 or 8  ADC cycles.
> 
> This Issue can be reproduced by compiling and running the error demonstrator 
> (single channel) and including a simple error message in the kernel module in 
> the timeout case.
> 
> $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -o adc_demo_error
> 
> This issue is least understood by us and I think that talking to the hardware 
> people regarding this issue can be worthwhile. 
> As a workaround we put at91_adc_read_raw() to sleep after the writing 
> AT91_ADC_CHDR for 50 to 75 ADC Cycles. 
> 
> 
> 
> We will run another round of tests next week - and after that I would like to 
> prepare a pull request concerning the 2nd Issue. For the 3rd and the 4th 
> Issue I would like to hear your opinion if our modifications are sound before 
> I will write an pull request.
> 
> Thanks for your time,
> Best wishes,
> Georg Ottinger
> 
> -
> 
> -Ursprüngliche Nachricht-
> Von: Jonathan Cameron [mailto:jonathan.came...@huawei.com] 
> Gesendet: Montag, 04. Februar 2019 10:46
> An: Georg Ottinger 
> Cc: Jonathan Cameron ; eugen.hris...@microchip.com; Stefan 
> Etzlstorfer ; Hartmut Knaack ; 
> Lars-Peter Clausen ; Peter Meerwald-Stadler 
> ; Nicolas Ferre ; Alexandre 
> Belloni ; Ludovic Desroches 
> ; David S. Miller ; Ard 
> Biesheuvel ; Kees Cook ; 
> linux-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; 
> linux-kernel@vger.kernel.org; Maxime Ripard 
> Betreff: Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout 
> case
> 
> On Mon, 4 Feb 2019 07:17:07 +
> Georg Ottinger  wrote:
> 
> > Actually this issue occurred to us with an concrete product, where we 
> > experienced a system hang at -20 °C.
> > It was triggered by a race condition between the Touch Trigger and the 
> > Channel Trigger of the ADC. Once triggered we got in to the situation where 
> > an ongoing Channel Conversion was lost (Timeout case).When we queried a 
> > second channel than we got a system hang. Investigating this issue we 
> > developed an error demonstrator - reading alternating two channels as fast 
> > as possible (when Touch is enabled). This also provokes this issue at room 
> > temperature.
> > 
> > For the error demonstrator use following commandline to compile:
> > 
> > $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -D2ND_CHANNEL -o 
> > adc_demo_error2
> > 
> > -
> > // adc_demo_error.c
> > #include 
> > #include 
> > 
> > #define VLEN 10
> > 
> > int main()
> > {
> >   int fd_adc,fd_adc2;
> >   int ret;
> >   char dummy[VLEN];
> >   
> >   fd_adc = open 
> > ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> > e4_raw",O_RDONLY);
> > #ifdef 2ND_CHANNEL
> >   fd_adc2 = open 
> > ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltag
> > e5_raw",O_RDONLY);
> > #endif
> > 
> >   while(1) {
> > 
> > lseek(fd_adc, 0, SEEK_SET);
> > ret = read(fd_adc, dummy, VLEN);
> > #ifdef 2ND_CHANNEL
> > lseek(fd_adc2, 0, SEEK_SET);
> > ret = read(fd_adc2, dummy, VLEN);
> > #endif
> > 
> >   }
> > }
> > 
> > 
> > 
> > 
> > Greeting, Georg
> Hi Georg,
> 
> Thanks for the detailed error report and reproducer.
> 
> What has me wondering now is where the race is that is triggering this?
> Could you talk us through it?  I don't know this driver that well so probably 
> much quicker for you to fill in the gaps rather than me try to figure out the 
> race path!
> 
> I have no problem with defense in depth (with appropriate comments) but I'd 
> like to close that down as well.  If it really is unsolvable I'd like at 
> least to have some clear comments in the code explaining why.
> 
> Thanks,
> 
> Jonathan
> 
> > 
> > -Ursprüngliche Nachricht-
> > Von: Jonathan Cameron [mailto:ji...@kernel.org]
> > Gesendet: Samstag, 02. Februar 2019 11:21
> > An: Georg Ottinger 
> > Cc: eugen.hris...@microchip.com; Stefan Etzlstorfer 
> > ; Hartmut Knaack ; 
> > Lars-Peter Clausen ; Peter Meerwald-Stadler 
> > ; Nicolas Ferre ; 
> > Alexandre Belloni ; Ludovic Desroches 
> > ; David S. Miller 
> > ; Ard Biesheuve

Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

2019-02-22 Thread Ludovic Desroches
On Wed, Jan 30, 2019 at 02:42:02PM +0100, g.ottin...@abatec.at wrote:
> From: Georg Ottinger 
> 
> Having a brief look at at91_adc_read_raw() it is obvious that in the case
> of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR registers is
> omitted. If 2 different channels are queried we can end up with a
> situation where two interrupts are enabled, but only one interrupt is
> cleared in the interrupt handler. Resulting in a interrupt loop and a
> system hang.
> 
> Signed-off-by: Georg Ottinger 
Acked-by: Ludovic Desroches 

Thanks

> ---
>  drivers/iio/adc/at91_adc.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c
> index 75d2f7358..596841a3c 100644
> --- a/drivers/iio/adc/at91_adc.c
> +++ b/drivers/iio/adc/at91_adc.c
> @@ -704,23 +704,29 @@ static int at91_adc_read_raw(struct iio_dev *idev,
>   ret = wait_event_interruptible_timeout(st->wq_data_avail,
>  st->done,
>  msecs_to_jiffies(1000));
> - if (ret == 0)
> - ret = -ETIMEDOUT;
> - if (ret < 0) {
> - mutex_unlock(>lock);
> - return ret;
> - }
> -
> - *val = st->last_value;
>  
> + /* Disable interrupts, regardless if adc conversion was
> +  * successful or not
> +  */
>   at91_adc_writel(st, AT91_ADC_CHDR,
>   AT91_ADC_CH(chan->channel));
>   at91_adc_writel(st, AT91_ADC_IDR, BIT(chan->channel));
>  
> - st->last_value = 0;
> - st->done = false;
> + if (ret > 0) {
> + /* a valid conversion took place */
> + *val = st->last_value;
> + st->last_value = 0;
> + st->done = false;
> + ret = IIO_VAL_INT;
> + } else if (ret == 0) {
> + /* conversion timeout */
> + dev_err(>dev, "ADC Channel %d timeout.\n",
> + chan->channel);
> + ret = -ETIMEDOUT;
> + }
> +
>   mutex_unlock(>lock);
> - return IIO_VAL_INT;
> + return ret;
>  
>   case IIO_CHAN_INFO_SCALE:
>   *val = st->vref_mv;
> -- 
> 2.17.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel


Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

2019-02-04 Thread Jonathan Cameron
On Mon, 4 Feb 2019 07:17:07 +
Georg Ottinger  wrote:

> Actually this issue occurred to us with an concrete product, where we 
> experienced a system hang at -20 °C.
> It was triggered by a race condition between the Touch Trigger and the 
> Channel Trigger of the ADC. Once triggered we got in to the situation where 
> an ongoing Channel Conversion was lost (Timeout case).When we queried a 
> second channel than we got a system hang. Investigating this issue we 
> developed an error demonstrator - reading alternating two channels as fast as 
> possible (when Touch is enabled). This also provokes this issue at room 
> temperature.
> 
> For the error demonstrator use following commandline to compile:
> 
> $ arm-buildroot-linux-gnueabihf-gcc adc_demo_error.c -D2ND_CHANNEL -o 
> adc_demo_error2
> 
> -
> // adc_demo_error.c
> #include 
> #include 
> 
> #define VLEN 10
> 
> int main()
> {
>   int fd_adc,fd_adc2;
>   int ret;
>   char dummy[VLEN];
>   
>   fd_adc = open 
> ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltage4_raw",O_RDONLY);
>   
> #ifdef 2ND_CHANNEL
>   fd_adc2 = open 
> ("/sys/devices/platform/ahb/ahb:apb/f8018000.adc/iio:device0/in_voltage5_raw",O_RDONLY);
> #endif
> 
>   while(1) {
> 
> lseek(fd_adc, 0, SEEK_SET);
> ret = read(fd_adc, dummy, VLEN);
> #ifdef 2ND_CHANNEL
> lseek(fd_adc2, 0, SEEK_SET);
> ret = read(fd_adc2, dummy, VLEN);
> #endif
> 
>   }
> }
> 
> 
> 
> 
> Greeting, Georg
Hi Georg,

Thanks for the detailed error report and reproducer.

What has me wondering now is where the race is that is triggering this?
Could you talk us through it?  I don't know this driver that well so
probably much quicker for you to fill in the gaps rather than me try
to figure out the race path!

I have no problem with defense in depth (with appropriate comments) but
I'd like to close that down as well.  If it really is unsolvable I'd
like at least to have some clear comments in the code explaining why.

Thanks,

Jonathan

> 
> -Ursprüngliche Nachricht-
> Von: Jonathan Cameron [mailto:ji...@kernel.org] 
> Gesendet: Samstag, 02. Februar 2019 11:21
> An: Georg Ottinger 
> Cc: eugen.hris...@microchip.com; Stefan Etzlstorfer 
> ; Hartmut Knaack ; Lars-Peter 
> Clausen ; Peter Meerwald-Stadler ; 
> Nicolas Ferre ; Alexandre Belloni 
> ; Ludovic Desroches 
> ; David S. Miller ; Ard 
> Biesheuvel ; Kees Cook ; 
> linux-...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; 
> linux-kernel@vger.kernel.org; Maxime Ripard 
> Betreff: Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout 
> case
> 
> On Wed, 30 Jan 2019 14:42:02 +0100
>  wrote:
> 
> > From: Georg Ottinger 
> > 
> > Having a brief look at at91_adc_read_raw() it is obvious that in the 
> > case of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR 
> > registers is omitted. If 2 different channels are queried we can end 
> > up with a situation where two interrupts are enabled, but only one 
> > interrupt is cleared in the interrupt handler. Resulting in a 
> > interrupt loop and a system hang.
> > 
> > Signed-off-by: Georg Ottinger   
> 
> Whilst I agree this looks like a correct change, I would like Maxime to take 
> a look as he is obviously much more familiar with the driver than I am.
> 
> I suspect you can only actually get there in the event of a hardware failure 
> as that isn't actually a timeout you should ever see.
> Could be wrong though!
> 
> Thanks,
> 
> Jonathan
> 
> > ---
> >  drivers/iio/adc/at91_adc.c | 28 +---
> >  1 file changed, 17 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c 
> > index 75d2f7358..596841a3c 100644
> > --- a/drivers/iio/adc/at91_adc.c
> > +++ b/drivers/iio/adc/at91_adc.c
> > @@ -704,23 +704,29 @@ static int at91_adc_read_raw(struct iio_dev *idev,
> > ret = wait_event_interruptible_timeout(st->wq_data_avail,
> >st->done,
> >msecs_to_jiffies(1000));
> > -   if (ret == 0)
> > -   ret = -ETIMEDOUT;
> > -   if (ret < 0) {
> > -   mutex_unlock(>lock);
> > -   return ret;
> > -   }
> > -
> > -   *val = st->last_value;
> >  
> > +   /* Disable interrupts, regardless if adc conversion was
> > +* successful or not
> > + 

Re: [PATCH] iio: adc: at91: disable adc channel interrupt in timeout case

2019-02-02 Thread Jonathan Cameron
On Wed, 30 Jan 2019 14:42:02 +0100
 wrote:

> From: Georg Ottinger 
> 
> Having a brief look at at91_adc_read_raw() it is obvious that in the case
> of a timeout the setting of AT91_ADC_CHDR and AT91_ADC_IDR registers is
> omitted. If 2 different channels are queried we can end up with a
> situation where two interrupts are enabled, but only one interrupt is
> cleared in the interrupt handler. Resulting in a interrupt loop and a
> system hang.
> 
> Signed-off-by: Georg Ottinger 

Whilst I agree this looks like a correct change, I would like Maxime
to take a look as he is obviously much more familiar with the driver
than I am.

I suspect you can only actually get there in the event of a hardware
failure as that isn't actually a timeout you should ever see.
Could be wrong though!

Thanks,

Jonathan

> ---
>  drivers/iio/adc/at91_adc.c | 28 +---
>  1 file changed, 17 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/iio/adc/at91_adc.c b/drivers/iio/adc/at91_adc.c
> index 75d2f7358..596841a3c 100644
> --- a/drivers/iio/adc/at91_adc.c
> +++ b/drivers/iio/adc/at91_adc.c
> @@ -704,23 +704,29 @@ static int at91_adc_read_raw(struct iio_dev *idev,
>   ret = wait_event_interruptible_timeout(st->wq_data_avail,
>  st->done,
>  msecs_to_jiffies(1000));
> - if (ret == 0)
> - ret = -ETIMEDOUT;
> - if (ret < 0) {
> - mutex_unlock(>lock);
> - return ret;
> - }
> -
> - *val = st->last_value;
>  
> + /* Disable interrupts, regardless if adc conversion was
> +  * successful or not
> +  */
>   at91_adc_writel(st, AT91_ADC_CHDR,
>   AT91_ADC_CH(chan->channel));
>   at91_adc_writel(st, AT91_ADC_IDR, BIT(chan->channel));
>  
> - st->last_value = 0;
> - st->done = false;
> + if (ret > 0) {
> + /* a valid conversion took place */
> + *val = st->last_value;
> + st->last_value = 0;
> + st->done = false;
> + ret = IIO_VAL_INT;
> + } else if (ret == 0) {
> + /* conversion timeout */
> + dev_err(>dev, "ADC Channel %d timeout.\n",
> + chan->channel);
> + ret = -ETIMEDOUT;
> + }
> +
>   mutex_unlock(>lock);
> - return IIO_VAL_INT;
> + return ret;
>  
>   case IIO_CHAN_INFO_SCALE:
>   *val = st->vref_mv;