On Tue, 6 May 2014 12:34:45 +0300
Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:

> On Sun, 4 May 2014 11:36:10 +0300
> Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:
> 
> > Hello,
> > 
> > Yesterday I have been trying to debug what's causing the XFCE desktop
> > background artefacts on my A10-Lime, which look like this:
> >     
> > http://people.freedesktop.org/~siamashka/files/20140504/a10-l2-cache-fail-artefacts-in-xfce.png
> > 
> > And narrowed them down to ARM Cortex-A8 L2 cache failures, which
> > are reproducible when doing JPEG decoding:
> > 
> > $ djpeg -v       
> > libjpeg-turbo version 1.3.0 (build 20130811)
> > 
> > $ wget http://linux-sunxi.org/images/8/83/A10-LIME.jpg
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 691497bd2e5d36976c1ea3150de89df6  -
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 6a874af750f92e1e3c019f2df7edf3f7  -
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 297b98ba10233cbbcea2566e1c4fd7c7  -
> > 
> > Please note that the md5sum of the decoded JPEG file is different for
> > each run.
> > 
> > There are other ways to reproduce it (the FFmpeg test suite can detect
> > this problem too), but the djpeg test is very simple and fast to do.
> > In the case if somebody does not have the djpeg tool from libjpeg-turbo
> > in their distro, I have a static djpeg binary here for extra
> > convenience:
> >     http://people.freedesktop.org/~siamashka/files/20140504/djpeg-static
> > It has been built using:
> >     
> > http://people.freedesktop.org/~siamashka/files/20140504/build-static-djpeg.sh
> > 
> > On my collection of just three Allwinner A10 based devices, I get the
> > following results with the libjpeg-turbo djpeg test (and the default
> > CPU core voltage):
> >     A10-Lime    - fails at 1008MHz (960MHz is fine)
> >     Mele A2000  - fails at 1152MHz (1104MHz is fine)
> >     Cubieboard1 - fails at 1152MHz (1104MHz is fine)
> > 
> > Why is it likely related to the L2 cache? Because this problem goes
> > away if we disable the L2 cache by adding something like
> >         mrc     p15, 0, r10, c1, c0, 1
> >         bic     r10, r10, #(1 << 1)
> >         mcr     p15, 0, r10, c1, c0, 1
> > to the code around
> >    
> > https://github.com/linux-sunxi/linux-sunxi/blob/sunxi-v3.4.86-r0/arch/arm/mm/proc-v7.S#L248
> > 
> > It is also interesting that sun4i and sun5i have different L2 cache
> > latency parameters configured there. I have tried increasing the
> > latencies in the L2 Cache Auxiliary Control Register, but these
> > changes did not seem to affect anything. It looks like the only
> > important factors are the CPU clock speed and the CPU core
> > voltage (increasing it to 1.45V from 1.4V also fixes the problem
> > on my A10-Lime).
> > 
> > Anyway, with the sample size of just 3 devices, 33% of them appear to
> > be unable to run stable at 1GHz and 1.4V core voltage. I wonder, how
> > common is this problem in general? Are there any other Allwinner A10
> > devices failing the libjpeg-turbo djpeg test at 1GHz?
> > 
> > Also it would make sense to run reliability tests for all the cpufreq
> > operating points, because any frequency+voltage pair can be a weak link.
> 
> Implemented an automated script for running tests at different
> operating points:
>     https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
> 
> Only 1008MHz appears to be really problematic on my A10-Lime device. An
> example of running it:
> 
> lime ~ # ./cpufreq-ljt-stress-test
> CPU stress test, which is doing JPEG decoding by libjpeg-turbo
> at different cpufreq operating points.
> 
> Testing CPU 0
>  1488 MHz SKIPPED
>  1440 MHz SKIPPED
>  1392 MHz SKIPPED
>  1344 MHz SKIPPED
>  1296 MHz SKIPPED
>  1248 MHz SKIPPED
>  1200 MHz SKIPPED
>  1152 MHz SKIPPED
>  1104 MHz SKIPPED
>  1056 MHz SKIPPED
>  1008 MHz . FAILED
>   960 MHz ............................................................ OK
>   912 MHz ............................................................ OK

A follow up to this (better late than never). Olliver Schinagl has run
the cpufreq-ljt-stress-test test on multiple A10-Lime devices today:

    http://irclog.whitequark.org/linux-sunxi/2014-07-03#9499336;

Appears that it failed on his revA A10-Lime and worked fine on eight
other revC A10-Lime boards. Together with my revA A10-Lime, we have a
perfect two out of two failure rate. All the other A10 based devices (8
Olliver's revC A10-Lime, my Cubieboard1 and Mele A2000, and also
lioka's hackberry) pass the test.

Now that we have finally collected the long awaited statistics, it looks
pretty obvious that there is something wrong specifically with the
revision A of the A10-Lime board. The revision A was a pre-production
'developer' batch of the A10-Lime board and very few people should be
affected (I got one donated to me for free, so can't really complain).
Kudos to Koen Kooi for providing us with a hint about the possible
voltage drop on the power line connecting the AXP209 PMIC and the
A10 SoC, which seems to explain the problem:

    https://www.mail-archive.com/linux-sunxi@googlegroups.com/msg04689.html

If anyone has the revision A of A10-Lime board, a workaround is to
increase the core voltage for the 1008MHz operating point, or simply
downclock the CPU to 960MHz or even 912MHz.

To be on a safe side, the owners of the revision B of A10-Lime are
strongly encouraged to run this test too (and share the results).

But in any case, appears that the 1GHz CPU clock frequency is fine for
the A10 SoC itself. Phew, the galaxy is safe.

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to