On Wed, 15 Jan 2014 09:20:18 +0200
Siarhei Siamashka <siarhei.siamas...@gmail.com> wrote:

> On Tue, 17 Dec 2013 10:15:56 -0800 (PST)
> Димитър Гамишев <gamis...@gmail.com> wrote:
> 
> > On Monday, December 16, 2013 6:44:03 AM UTC+2, Siarhei Siamashka wrote:
> > > There is just one thing I'm really worried about. The 16-bit 
> > > memory interface is a major performance risk factor. I wonder 
> > > how LIME performs on memory intensive workloads (such as 
> > > graphics) when compared with, for example, Cubieboard. 
> 
> Now that I got A10-OLinuXino-Lime device (thanks Tsvetan!), I could
> run some OpenGL ES benchmarks in X11 with Mali r3p0 binary drivers.
> Mele A2000 and Cubieboard1 devices are used for comparison because
> they have 32-bit memory interface, but different memory clock speed.
> The default memory timings configuration for LIME is using dram_cas=9
> set in dram_a10_olinuxino_l.c in u-boot, but I also tried the
> cubieboard memory timings (dram_cas=6) as an extra test just to
> see how it may affect performance.
> 
> == The final score for glmark2-es2 2012.12 (test in 800x600 window) ==
> 
> LIME (CAS=9) - 480MHz dram clock, dram_bus_width=16, dram_cas=9
> LIME (CAS=6) - 480MHz dram clock, dram_bus_width=16, dram_cas=6
> Mele A2000   - 360MHz dram clock, dram_bus_width=32, dram_cas=6
> Cubieboard1  - 480MHz dram clock, dram_bus_width=32, dram_cas=6
> 
> In all cases ARM Cortex-A8 in Allwinner A10 is clocked at 1008MHz
> (performance cpufreq governor) and Mali400 MP1 is clocked at 320MHz.
> Desktop color depth is 32bpp.
> 
>              | 1280x720p50 | 1280x720p60 | 1920x1080p50 | 1920x1080p60
> -------------+-------------+-------------+--------------+--------------
> LIME (CAS=9) |      85     |      75     |     46 (**)  |    41 (**)
> LIME (CAS=6) |     100     |      91     |     56 (**)  |    48 (**)
> Mele A2000   |     151     |     148     |    140 (**)  |   136 (**)
> Cubieboard1  |     166     |     166     |    161 (*)   |   157 (*)
> -------------+-------------+-------------+--------------+--------------
>
>  (*) minor occasional glitches on screen
> (**) severe screen shaking effect is observed

And a new entry for this table (the details are at the end of my post):

               | 1280x720p50 | 1280x720p60 | 1920x1080p50 | 1920x1080p60
  -------------+-------------+-------------+--------------+--------------
  LIME (CAS=7) |     114     |     110     |     94       |    85

> Note that the window size is the exactly same in all tests. Only the
> screen resolution is different, and this only affects how much of the
> memory bandwidth is drained by maintaining the screen refresh.
> 
> With 16-bit memory bus width, the 3D graphics performance becomes very
> bad very quickly when the screen resolution and refresh rate increase.
> Trying to use the 50Hz monitor refresh rate is more important than
> ever, because it both increases the performance and also rendering
> perfect tear-free 50Hz animation is somewhat less demanding than
> 60Hz animation.
> 
> The performance of hardware accelerated video decoding using CedarX
> with 1080p monitor is going to be really interesting too. And common
> sense dictates that it is very important not to waste memory bandwidth
> unnecessarily.
> 
> BTW, for 32-bit memory bus width, Mali performance does not seem to
> be affected that much by the screen resolution and refresh rate
> increase. But software graphics rendering done on the CPU (or any
> other memory intensive activity) is still taking a performance hit
> even with the 32-bit memory bus:
>     
> http://ssvb.github.io/2013/06/27/fullhd-x11-desktop-performance-of-the-allwinner-a10.html
> The X11 desktop performance on LIME is going to be challenging
> at high screen resolutions too, unless the desktop color depth
> is reduced to 16bpp.
> 
> The memory timings with dram_cas=9 also affect performance.
> While dram_cas=6 might be considered as an unsafe choice for
> 480MHz, it would be really great if we could use some better
> safe/fast settings.

As now we know a lot more about the dram controller in A10/A13/A20,
we can generate exact timings for any dram clock speed. Targeting
480MHz for A10-OLinuXino-Lime would be something like this:

static struct dram_para dram_para = { /* DRAM timings: 7-7-7-17 */
        .clock     = 480,
        .type      = 3,
        .rank_num  = 1,
        .density   = 4096,
        .io_width  = 16,
        .bus_width = 16,
        .cas       = 7,
        .zq        = 0x7b,
        .odt_en    = 0,
        .size      = 512,
        .tpr0      = 0x30917790,
        .tpr1      = 0xa078,
        .tpr2      = 0x23200,
        .tpr3      = 0x0,
        .tpr4      = 0x0,
        .tpr5      = 0x0,
        .emr1      = 0x4,
        .emr2      = 0x8,
        .emr3      = 0x0,
};

This has been generated by the a10-dram-timings-calculator.rb script
from https://github.com/ssvb/a10-meminfo

Please note that these settings result in slightly worse performance
than the current CAS6 set of settings, which are configured in
u-boot-sunxi for A10-OLinuXino-Lime. But they are not violating the
timing specs of the used DDR3 memory chips anymore. These new CAS7
settings are still faster than the typical CAS9 settings used by
u-boot-sunxi for the other boards.

However having the following u-boot patch has *critical* importance for
the proper FullHD desktop resolution support on A10-OLinuXino-Lime:
    https://github.com/linux-sunxi/u-boot-sunxi/commit/4e1532df5ebc6e0d
Using DEFE instead of DEBE and fixing the arbitration of the DEFE
port in the DRAM controller provides a huge graphics performance
boost. And it also resolves all the HDMI signal disruption glitches.

A quickly hacked test branch, enforcing the use of DEFE scaled
layers in the xf86-video-fbturbo driver:
    
https://github.com/ssvb/xf86-video-fbturbo/tree/20140504-test-enforced-defe-scaler
The drawback is that relying only on DEFE, we are down to just 2
disp layers instead of 4. This causes a lot of practical issues
with multi-monitor support and libvdpau-sunxi compatibility. Unless
we can also find some magic fix for DEBE, some special layers
allocation logic may be necessary to decide whether the DEBE
layers can be safely used for each particular configuration
(depending on the SoC type, the dram clock speed, bus width, the
screen resolution, color depth and refresh rate). All the joy of
having a http://en.wikipedia.org/wiki/Leaky_abstraction :-(

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to