On Mon, 31 Mar 2014 15:54:24 +0300 Pekka Paalanen <ppaala...@gmail.com> wrote:
> From: Ben Avison <bavi...@riscosopen.org> > > Benchmark results, "before" is the patch > - ARMv6: Add fast path for over_n_8888_8888_ca > and "after" contains the additional patches: > - ARMv6: Add fast path flag to force no preload of destination buffer > - ARMv6: Add fast path for in_reverse_8888_8888 (this patch) > > lowlevel-blt-bench, in_reverse_8888_8888, 100 iterations: > > Before After > Mean StdDev Mean StdDev Confidence Change > L1 21.1 0.1 32.0 0.1 100.00% +51.9% > L2 11.7 0.3 18.4 0.5 100.00% +56.9% > M 10.5 0.0 16.3 0.0 100.00% +54.8% > HT 8.2 0.0 12.0 0.0 100.00% +46.7% > VT 8.1 0.0 11.8 0.0 100.00% +45.4% > R 8.0 0.0 11.2 0.0 100.00% +40.0% > RT 4.7 0.0 6.0 0.1 100.00% +28.1% > > At most 14 outliers rejected per case per set. > > cairo-perf-trace with trimmed traces, 30 iterations: > > Before After > Mean StdDev Mean StdDev Confidence > Change > t-firefox-paintball.trace 17.9 0.0 14.0 0.0 100.00% > +27.8% > t-firefox-chalkboard.trace 36.6 0.0 35.8 0.0 100.00% > +2.1% > t-firefox-canvas-alpha.trace 20.7 0.3 20.3 0.3 100.00% > +1.7% > t-firefox-particles.trace 27.5 0.1 27.1 0.1 100.00% > +1.3% > t-chromium-tabs.trace 4.9 0.0 4.8 0.0 100.00% > +1.1% > t-evolution.trace 13.0 0.1 12.9 0.1 100.00% > +1.0% > t-swfdec-youtube.trace 7.8 0.0 7.7 0.0 100.00% > +0.8% > t-gvim.trace 33.0 0.2 32.8 0.2 100.00% > +0.7% > t-gnome-terminal-vim.trace 19.8 0.2 19.7 0.2 99.46% > +0.6% > t-grads-heat-map.trace 4.4 0.0 4.4 0.0 99.32% > +0.6% > t-firefox-fishbowl.trace 21.1 0.0 21.0 0.0 100.00% > +0.5% > t-firefox-planet-gnome.trace 10.9 0.0 10.8 0.0 100.00% > +0.4% > t-firefox-canvas-swscroll.trace 32.1 0.1 32.0 0.1 100.00% > +0.4% > t-firefox-fishtank.trace 13.2 0.0 13.1 0.0 100.00% > +0.4% > t-firefox-asteroids.trace 11.1 0.0 11.0 0.0 100.00% > +0.4% > t-firefox-canvas.trace 17.9 0.0 17.9 0.0 99.99% > +0.3% > t-poppler.trace 9.7 0.1 9.7 0.1 79.51% > +0.2% (insignificant) > t-firefox-talos-svg.trace 20.4 0.0 20.4 0.0 97.25% > +0.1% (insignificant) > t-swfdec-giant-steps.trace 14.8 0.0 14.8 0.0 96.75% > +0.1% (insignificant) > t-firefox-scrolling.trace 24.6 0.1 24.6 0.1 31.24% > +0.1% (insignificant) > t-midori-zoomed.trace 8.0 0.0 8.0 0.0 50.76% > +0.0% (insignificant) > t-gnome-system-monitor.trace 17.1 0.0 17.1 0.0 4.49% > -0.0% (insignificant) > t-xfce4-terminal-a1.trace 4.8 0.0 4.8 0.0 98.08% > -0.2% (insignificant) > t-poppler-reseau.trace 22.1 0.1 22.2 0.1 93.89% > -0.3% (insignificant) > t-firefox-talos-gfx.trace 25.4 0.4 25.5 0.5 75.53% > -0.5% (insignificant) > > At most 4 outliers rejected per case per set. > > Cairo perf reports the running time, but the change is computed for > operations per second instead (inverse of running time). > > Confidence is based on Welch's t-test. Absolute changes less than 1% > can be accounted as measurement errors, even if statistically > significant. > > There was a question of why FLAG_NO_PRELOAD_DST exists. If a patch > removing that flag from pixman-arm-simd-asm.S is added on top, the > change will be the following. > > Before: flag in use > After: flag removed > > Before After > Mean StdDev Mean StdDev Confidence Change > L1 32.0 0.1 31.8 0.1 100.00% -0.6% > L2 18.4 0.5 25.0 0.5 100.00% +36.0% > M 16.3 0.0 25.7 0.0 100.00% +57.9% > HT 12.0 0.0 13.9 0.0 100.00% +16.4% > VT 11.8 0.0 13.2 0.0 100.00% +12.4% > R 11.2 0.0 14.0 0.0 100.00% +24.3% > RT 6.0 0.1 7.0 0.1 100.00% +15.1% > > Before After > Mean StdDev Mean StdDev Confidence > Change > t-chromium-tabs.trace 4.8 0.0 4.8 0.0 100.00% > +0.7% > t-poppler-reseau.trace 22.2 0.1 22.1 0.1 99.98% > +0.6% > t-poppler.trace 9.7 0.1 9.6 0.1 99.70% > +0.5% > t-firefox-talos-gfx.trace 25.5 0.5 25.4 0.3 72.06% > +0.5% (insignificant) > t-firefox-canvas-alpha.trace 20.3 0.3 20.2 0.2 80.88% > +0.4% (insignificant) > t-firefox-canvas.trace 17.9 0.0 17.8 0.0 99.36% > +0.2% > t-firefox-canvas-swscroll.trace 32.0 0.1 31.9 0.1 84.83% > +0.1% (insignificant) > t-firefox-asteroids.trace 11.0 0.0 11.0 0.0 100.00% > +0.1% > t-midori-zoomed.trace 8.0 0.0 8.0 0.0 99.90% > +0.1% > t-firefox-planet-gnome.trace 10.8 0.0 10.8 0.0 91.34% > +0.1% (insignificant) > t-firefox-scrolling.trace 24.6 0.1 24.6 0.1 0.53% > +0.0% (insignificant) > t-gnome-terminal-vim.trace 19.7 0.2 19.7 0.1 11.42% > -0.0% (insignificant) > t-firefox-talos-svg.trace 20.4 0.0 20.4 0.0 54.68% > -0.0% (insignificant) > t-swfdec-giant-steps.trace 14.8 0.0 14.8 0.0 78.92% > -0.0% (insignificant) > t-firefox-fishtank.trace 13.1 0.0 13.1 0.0 97.09% > -0.0% (insignificant) > t-gnome-system-monitor.trace 17.1 0.0 17.1 0.0 65.13% > -0.0% (insignificant) > t-evolution.trace 12.9 0.1 12.9 0.1 34.70% > -0.1% (insignificant) > t-grads-heat-map.trace 4.4 0.0 4.4 0.0 28.95% > -0.1% (insignificant) > t-firefox-fishbowl.trace 21.0 0.0 21.0 0.0 99.92% > -0.2% > t-xfce4-terminal-a1.trace 4.8 0.0 4.8 0.0 98.78% > -0.2% (insignificant) > t-firefox-particles.trace 27.1 0.1 27.3 0.1 99.89% > -0.5% > t-swfdec-youtube.trace 7.7 0.0 7.8 0.0 100.00% > -0.7% > t-gvim.trace 32.8 0.2 33.1 0.2 100.00% > -0.9% > t-firefox-chalkboard.trace 35.8 0.0 37.1 0.0 100.00% > -3.3% > t-firefox-paintball.trace 14.0 0.0 15.0 0.0 100.00% > -6.2% > > IOW, the flag has adverse effects on lowlevel-blt-bench performance, > but improves one or two Cairo traces slightly. > > v4, Pekka Paalanen <pekka.paala...@collabora.co.uk> : > Rebased, re-benchmarked on Raspberry Pi, commit message. FYI, this is what Ben explained to me, when I asked about the preload dst flag: On Thu, 03 Apr 2014 14:40:23 +0100 "Ben Avison" <bavi...@riscosopen.org> wrote: > The thing with the lowlevel-blt-bench benchmarks for the more > sophisticated composite types (as a general rule, anything that involves > branches at the per-pixel level) is that they are only profiling the case > where you have mid-level alpha values in the source/mask/destination. > Real-world images typically have a disproportionate number of fully > opaque and fully transparent pixels, which is why when there's a > discrepancy between which implementation performs best with cairo-perf > trace versus lowlevel-blt-bench, I usually favour the Cairo winner. > > The results of removing FLAG_NO_PRELOAD_DST (in other words, adding > preload of the destination buffer) are easy to explain in the > lowlevel-blt-bench results. In the L1 case, the destination buffer is > already in the L1 cache, so adding the preloads is simply adding extra > instruction cycles that have no effect on memory operations. The "in" > compositing operator depends upon the alpha of both source and > destination, so if you use uniform mid-alpha, then you actually do need > to read your destination pixels, so you benefit from preloading them. But > for fully opaque or fully transparent source pixels, you don't need to > read the corresponding destination pixel - it'll either be left alone or > overwritten. Since the ARM11 doesn't use write-allocate cacheing, both of > these cases avoid both the time taken to load the extra cachelines, as > well as increasing the efficiency of the cache for other data. If you > examine the source images being used by the Cairo test, you'll probably > find they mostly use transparent or opaque pixels. Thanks, pq _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman