Re: [PATCH] Produce a situation where L1 cache flush missing vs cpu power down hangs the system

2014-04-09 Thread Daniel Lezcano


Hi Sandeep,


On 04/09/2014 07:15 AM, Sandeep Tripathy wrote:

  Hi Daniel,
  - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part
of the recommended sequence for Individual core power down .


Yes, absolutely. It is what Lorenzo and I discussed.

The macro v7_exit_coherency_flush should be used for that.


  - If a core is powered down having dirty lines in L1 then the
system should encounter an issue (abort) very easily. May be the first
idle attempt itself is sufficient to break things.




   Does any platform (exynos4 ?) work without doing L1 D cache
clean in cpu idle individual core power down sequence ?


Yes, the power down sequence is done but without L1 D cache clean and it 
is very very hard to make the board hang. It is so rare, I can't say 
100% it is related to the driver itself or something else.


If there is a way to spot the issue, I will be happy to test it.

Thanks
  -- Daniel



On 8 April 2014 17:07, Daniel Lezcano daniel.lezc...@linaro.org
mailto:daniel.lezc...@linaro.org wrote:

On 04/08/2014 01:07 PM, Amit Kucheria wrote:

Hi Daniel,

Have you noticed this on any platform yet with this test?


I have noticed a very very rare hang on the exynos4 board with this
test and the dual cpu support but it is not reproducible enough to
check if the cache flush fixes it or not (or it is related to the
cpuidle driver). I had a long discussion yesterday with Lorenzo who
explained me what could happen without flushing the cache and
exiting the SMP mode.

AFAICT,

* The tc2 flushes its cache.

* exynos5 still need to disable cpu1 to enter the AFTR state (which
is broken today), so cache is flushed in the hotplug code path. I
hope I can spot the issue with a quad core.

* omap4 flushes its cache.

* omap3 is not concerned by this because it is an UP system.

* vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu
power down.

* calxeda hides that through the firmware I believe

* I don't know for tegra but I assume they are flushing and
disabling the cache

I hope with this test we can spot the issue, if any, with multiple
runs on the boards, especially when new drivers will be implemented.

--
  http://www.linaro.org/ Linaro.org │ Open source software for ARM
SoCs

Follow Linaro:  http://www.facebook.com/__pages/Linaro
http://www.facebook.com/pages/Linaro Facebook |
http://twitter.com/#!/__linaroorg
http://twitter.com/#!/linaroorg Twitter |
http://www.linaro.org/linaro-__blog/
http://www.linaro.org/linaro-blog/ Blog


_
linaro-dev mailing list
linaro-dev@lists.linaro.org mailto:linaro-dev@lists.linaro.org
http://lists.linaro.org/__mailman/listinfo/linaro-dev
http://lists.linaro.org/mailman/listinfo/linaro-dev





--
 http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs

Follow Linaro:  http://www.facebook.com/pages/Linaro Facebook |
http://twitter.com/#!/linaroorg Twitter |
http://www.linaro.org/linaro-blog/ Blog


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [PATCH] Produce a situation where L1 cache flush missing vs cpu power down hangs the system

2014-04-09 Thread Sandeep Tripathy
Hi Daniel,
  Just for test you may try removing the flush_cache_louis();
from  __cpu_suspend_save
.
  Because if the driver is using cpu_suspend() in idle path then almost
all the important data on core L1 is clean.
  Still it can fail iff the code after that ( cpu_suspend()) modifies
some cacheable important data because SCTLR.C bit
  is not cleared yet.

  It should just fail if no cache clean is done before powering down.
  And it should work  iff  v7_exit_coherency_flush() or similar is
done.

  Note: This is based on my observation on A7 quad. Please correct me
if the understanding is wrong.

Thanks
  Sandeep

On 9 April 2014 13:24, Daniel Lezcano daniel.lezc...@linaro.org wrote:


 Hi Sandeep,



 On 04/09/2014 07:15 AM, Sandeep Tripathy wrote:

   Hi Daniel,
   - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP) is part
 of the recommended sequence for Individual core power down .


 Yes, absolutely. It is what Lorenzo and I discussed.

 The macro v7_exit_coherency_flush should be used for that.


- If a core is powered down having dirty lines in L1 then the
 system should encounter an issue (abort) very easily. May be the first
 idle attempt itself is sufficient to break things.



 Does any platform (exynos4 ?) work without doing L1 D cache
 clean in cpu idle individual core power down sequence ?


 Yes, the power down sequence is done but without L1 D cache clean and it
 is very very hard to make the board hang. It is so rare, I can't say 100%
 it is related to the driver itself or something else.

 If there is a way to spot the issue, I will be happy to test it.

 Thanks
   -- Daniel


  On 8 April 2014 17:07, Daniel Lezcano daniel.lezc...@linaro.org
 mailto:daniel.lezc...@linaro.org wrote:

 On 04/08/2014 01:07 PM, Amit Kucheria wrote:

 Hi Daniel,

 Have you noticed this on any platform yet with this test?


 I have noticed a very very rare hang on the exynos4 board with this
 test and the dual cpu support but it is not reproducible enough to
 check if the cache flush fixes it or not (or it is related to the
 cpuidle driver). I had a long discussion yesterday with Lorenzo who
 explained me what could happen without flushing the cache and
 exiting the SMP mode.

 AFAICT,

 * The tc2 flushes its cache.

 * exynos5 still need to disable cpu1 to enter the AFTR state (which
 is broken today), so cache is flushed in the hotplug code path. I
 hope I can spot the issue with a quad core.

 * omap4 flushes its cache.

 * omap3 is not concerned by this because it is an UP system.

 * vinci, s3c64, imx5, imx6, ux500, kirkwood does not support cpu
 power down.

 * calxeda hides that through the firmware I believe

 * I don't know for tegra but I assume they are flushing and
 disabling the cache

 I hope with this test we can spot the issue, if any, with multiple
 runs on the boards, especially when new drivers will be implemented.

 --
   http://www.linaro.org/ Linaro.org │ Open source software for ARM
 SoCs

 Follow Linaro:  http://www.facebook.com/__pages/Linaro
 http://www.facebook.com/pages/Linaro Facebook |
 http://twitter.com/#!/__linaroorg
 http://twitter.com/#!/linaroorg Twitter |
 http://www.linaro.org/linaro-__blog/
 http://www.linaro.org/linaro-blog/ Blog


 _
 linaro-dev mailing list
 linaro-dev@lists.linaro.org mailto:linaro-dev@lists.linaro.org
 http://lists.linaro.org/__mailman/listinfo/linaro-dev
 http://lists.linaro.org/mailman/listinfo/linaro-dev




 --
  http://www.linaro.org/ Linaro.org │ Open source software for ARM SoCs

 Follow Linaro:  http://www.facebook.com/pages/Linaro Facebook |
 http://twitter.com/#!/linaroorg Twitter |
 http://www.linaro.org/linaro-blog/ Blog


___
linaro-dev mailing list
linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev


Re: [PATCH] Produce a situation where L1 cache flush missing vs cpu power down hangs the system

2014-04-09 Thread Daniel Lezcano

On 04/09/2014 02:19 PM, Sandeep Tripathy wrote:


Hi Daniel,
   Just for test you may try removing the flush_cache_louis();
from __cpu_suspend_save .
   Because if the driver is using cpu_suspend() in idle path then
almost all the important data on core L1 is clean.
   Still it can fail iff the code after that ( cpu_suspend())
modifies some cacheable important data because SCTLR.C bit
   is not cleared yet.
   It should just fail if no cache clean is done before powering down.
   And it should work  iff  v7_exit_coherency_flush() or similar is
done.
   Note: This is based on my observation on A7 quad. Please correct
me if the understanding is wrong.


Thanks Sandeep for the info.

I think Lorenzo spotted the SCTLR.C bit must be cleared before powering 
down the cpu and what does cpu_suspend is not enough because some data 
could be fetch from the other cpu. The only way to properly handle this 
is to call always v7_exit_coherency_flush before powering down the cpu.




On 9 April 2014 13:24, Daniel Lezcano daniel.lezc...@linaro.org
mailto:daniel.lezc...@linaro.org wrote:


Hi Sandeep,



On 04/09/2014 07:15 AM, Sandeep Tripathy wrote:

   Hi Daniel,
   - L1 D$ clean ( SCTLR C bit clear, DCCISW, clear SMP)
is part
of the recommended sequence for Individual core power down .


Yes, absolutely. It is what Lorenzo and I discussed.

The macro v7_exit_coherency_flush should be used for that.


   - If a core is powered down having dirty lines in L1
then the
system should encounter an issue (abort) very easily. May be the
first
 idle attempt itself is sufficient to break things.



Does any platform (exynos4 ?) work without doing L1
D cache
clean in cpu idle individual core power down sequence ?


Yes, the power down sequence is done but without L1 D cache clean
and it is very very hard to make the board hang. It is so rare, I
can't say 100% it is related to the driver itself or something else.

If there is a way to spot the issue, I will be happy to test it.

Thanks
   -- Daniel


On 8 April 2014 17:07, Daniel Lezcano daniel.lezc...@linaro.org
mailto:daniel.lezc...@linaro.org
mailto:daniel.lezcano@linaro.__org
mailto:daniel.lezc...@linaro.org wrote:

 On 04/08/2014 01:07 PM, Amit Kucheria wrote:

 Hi Daniel,

 Have you noticed this on any platform yet with this test?


 I have noticed a very very rare hang on the exynos4 board
with this
 test and the dual cpu support but it is not reproducible
enough to
 check if the cache flush fixes it or not (or it is related
to the
 cpuidle driver). I had a long discussion yesterday with
Lorenzo who
 explained me what could happen without flushing the cache and
 exiting the SMP mode.

 AFAICT,

 * The tc2 flushes its cache.

 * exynos5 still need to disable cpu1 to enter the AFTR
state (which
 is broken today), so cache is flushed in the hotplug code
path. I
 hope I can spot the issue with a quad core.

 * omap4 flushes its cache.

 * omap3 is not concerned by this because it is an UP system.

 * vinci, s3c64, imx5, imx6, ux500, kirkwood does not
support cpu
 power down.

 * calxeda hides that through the firmware I believe

 * I don't know for tegra but I assume they are flushing and
 disabling the cache

 I hope with this test we can spot the issue, if any, with
multiple
 runs on the boards, especially when new drivers will be
implemented.

 --
   http://www.linaro.org/ Linaro.org │ Open source
software for ARM
 SoCs

 Follow Linaro:  http://www.facebook.com/pages/Linaro
http://www.facebook.com/__pages/Linaro
 http://www.facebook.com/__pages/Linaro
http://www.facebook.com/pages/Linaro Facebook |
 http://twitter.com/#!/linaroorg
http://twitter.com/#!/__linaroorg
 http://twitter.com/#!/__linaroorg
http://twitter.com/#!/linaroorg Twitter |
 http://www.linaro.org/linaro-blog/
http://www.linaro.org/linaro-__blog/
 http://www.linaro.org/linaro-__blog/
http://www.linaro.org/linaro-blog/ Blog


 ___
 linaro-dev mailing list
linaro-dev@lists.linaro.org mailto:linaro-dev@lists.linaro.org
mailto:linaro-dev@lists.__linaro.org
mailto:linaro-dev@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-dev