Attached is a simple patch that produces around 20 % Mpix/s improvement
for wide path processing due to significant optimization of
pixman_expand. On my i7 laptop, we go from:
src_8888_2x10 = L1: 62.08 L2: 60.73 M: 59.61
( 4.30%) HT: 46.81 VT: 42.17 R: 43.18 RT: 26.01 (
325Kops/s)
to
src_8888_2x10 = L1: 76.94 L2: 78.43 M: 75.87
( 5.59%) HT: 56.73 VT: 52.39 R: 53.00 RT: 29.29 (
363Kops/s)
The key of the patch is the observation that unorm_to_unorm's work can
more easily be done with a simple multiplication and shift, when the
function is applied repeatedly and the parameters are not compile-time
constants. For instance, converting from 0xfe to 0xfefe (expanding from
8 bits to 16 bits) can be done by calculating
c = c * 0x101
However, sometimes the result is not a neat replication of all the bits.
For instance, going from 10 bits to 16 bits can be done by calculating
c = c * 0x401UL >> 4
where the intermediate result is 20 bit wide repetition of the 10-bit
pattern followed by shifting off the unnecessary lowest bits.
The patch has the algorithm to calculate the factor and the shift, and
converts the code to use it.
--
Antti
>From f8229b93394cd9d39593d1c08ae515dbc43618ea Mon Sep 17 00:00:00 2001
From: "Antti S. Lankila" <alank...@bel.fi>
Date: Sun, 10 Jun 2012 19:22:56 +0300
Subject: [PATCH] Faster unorm_to_unorm for wide processing.
Because wide path doesn't have compile-time constants, unorm_to_unorm
compiles to rather complicated code that is evaluated per pixel. It's easier
to generate bit repetition according to multiplication and shift, for instance
from 10 bit value to 16-bit value you can do:
v = v * 0x401 >> 4
which is just two simple integer operations.
---
pixman/pixman-utils.c | 31 +++++++++++++++++++++++++++----
1 file changed, 27 insertions(+), 4 deletions(-)
diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c
index 2ec2594..d33b10d 100644
--- a/pixman/pixman-utils.c
+++ b/pixman/pixman-utils.c
@@ -183,6 +183,21 @@ pixman_malloc_abc (unsigned int a,
return malloc (a * b * c);
}
+void
+unorm_to_unorm_params(int inwidth, int outwidth, uint32_t *factor, int *shift)
+{
+ *factor = 0;
+ int w = 0;
+ while (inwidth != 0 && w < outwidth) {
+ *factor |= 1 << w;
+ w += inwidth;
+ }
+
+ /* Did we generate too many bits? */
+ *shift = w - outwidth;
+ return;
+}
+
/*
* This function expands images from ARGB8 format to ARGB16. To preserve
* precision, it needs to know the original source format. For example, if the
@@ -214,6 +229,14 @@ pixman_expand (uint64_t * dst,
b_mask = ~(~0 << b_size);
int i;
+ uint32_t au_factor, ru_factor, gu_factor, bu_factor;
+ int au_shift, ru_shift, gu_shift, bu_shift;
+
+ unorm_to_unorm_params (a_size, 16, &au_factor, &au_shift);
+ unorm_to_unorm_params (r_size, 16, &ru_factor, &ru_shift);
+ unorm_to_unorm_params (g_size, 16, &gu_factor, &gu_shift);
+ unorm_to_unorm_params (b_size, 16, &bu_factor, &bu_shift);
+
/* Start at the end so that we can do the expansion in place
* when src == dst
*/
@@ -226,7 +249,7 @@ pixman_expand (uint64_t * dst,
if (a_size)
{
a = (pixel >> a_shift) & a_mask;
- a16 = unorm_to_unorm (a, a_size, 16);
+ a16 = a * au_factor >> au_shift;
}
else
{
@@ -238,9 +261,9 @@ pixman_expand (uint64_t * dst,
r = (pixel >> r_shift) & r_mask;
g = (pixel >> g_shift) & g_mask;
b = (pixel >> b_shift) & b_mask;
- r16 = unorm_to_unorm (r, r_size, 16);
- g16 = unorm_to_unorm (g, g_size, 16);
- b16 = unorm_to_unorm (b, b_size, 16);
+ r16 = r * ru_factor >> ru_shift;
+ g16 = g * gu_factor >> gu_shift;
+ b16 = b * bu_factor >> bu_shift;
}
else
{
--
1.7.9.5
_______________________________________________
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman