Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-14 Thread Siarhei Siamashka
On Tuesday 14 September 2010 08:53:37 Soeren Sandmann wrote:
> Siarhei Siamashka  writes:
> > +/* A variant of 'over', which works faster for non-additive blending on
> > the + * platforms which do not have special instructions for saturated
> > addition + */
> > +static force_inline uint32_t
> > +over_a (uint32_t src, uint32_t dest, pixman_bool_t additive_blending)
> > +{
> > +uint32_t a = ~src >> 24;
> > +if (additive_blending)
> > +{
> > +   UN8x4_MUL_UN8_ADD_UN8x4 (dest, a, src);
> > +   return dest;
> > +}
> > +else
> > +{
> > +   UN8x4_MUL_UN8 (dest, a);
> > +   return dest + src;
> > +}
> > +}
> 
> Is there any reason to not just add a boolean "additive_blending" to
> the existing force_inline over() function?

No particular reason. The patch was just self contained and a bit less
intrusive this way (in the sense of having less places in the code changed). It
was mostly intended as a preview for the idea. The final implementation can
indeed be done so that it better blends with the rest of code.

So does it make sense to split the patch into parts, introducing this third
argument for 'over' function first?

> It might also be interesting to add the check as a new
> NOT_SUPER_LUMINESCENT flag and then simply require it for the source
> for all the over_n_*() functions.

I see many reasons *not* to add it as a new flag:

1. It takes one extra flag bit. There are already 24 bits used, with only 8 
remaining. We still need some flag(s) for rotation transforms:
http://lists.freedesktop.org/archives/pixman/2010-August/000420.html
I expect that compacting bits later may turn out to be tricky, so it may be 
wise not to waste them in the first place. Extending flag bits to 64-bit
variable is possible, but may reduce performance.

2. After introducing this bit, every compositing operation with a solid
source will do calculation for this flag, spending some time on it. But
calculation of this flag is not needed for many operators (SRC for example). 
Also it is only useful exclusively for C fast paths and simple SIMD-incapable 
processors, everyone else will just take a tiny performance hit.

The 'last mile' check as implemented in my patch should be fine as far as 
performance is concerned. The only drawback is that the one who implements the
fast path functions, will be forced to handle all possible types of input data.
And not be lazy providing just NOT_SUPER_LUMINESCENT operation only, relying on
pixman to fallback to someting else when needed.

BTW, I like this 'super-luminescent' term :) I tried to search for the 
information about the case when "color components exceed alpha in premultiplied 
format", and it looked like many (game developers) know about this thing and
its features, but seemed like nobody had a clear single-word definition for it.
Searching for "super-luminescent premultiplied" gives some references, all in 
cairo and freedesktop.org context. Anyway, let's indeed call this thing
'super-luminescent'. I think I need to update comments in the patch and also in
the commit message to use it instead of 'additive blending', which I took from:
http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Premultiplied%20alpha]]

> That would allow similar optimizations for the n_8_565 case and probably the
> n___ca() case as well.

Yes, and also 'over_n_' could make use of this optimization (if C fast path 
function even gets implemented for it).

> The flag could be set for all the gradients and any time an image is
> opaque.

I'm not quite sure about how useful this flag could be for gradients (it would 
have to be somehow propagated to the scanline combiner function?).

But another important operation is over__. And it is hard to do 
anything with it because we don't known if there are any super-luminescent
pixels in the source image. Maybe it can make sense reconsidering how the
super-luminiscent colors are handled in general? When discussing it on IRC the
other day, there were even concerns about where such pixels could possibly
come from and whether they are even used in cairo in any way.

But this stuff is only important for C implementation and simple processors.
So for now I would just go after the simple C fast path functions like
over_n_8_/over_n_8_0565 and maybe add the rest of ideas into TODO list.
Probably some people are more motivated in improving pixman performance on
simple processors? I'm adding Georgi Beloev to CC just in case because he
seems to be interested in MIPS32R2.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Dmitri Vorobiev
From: Jonathan Morton 

 This test is a modified version of Siarhei's compositor throughput
 benchmark.  It's expanded with explicit reporting of memory bandwidth
 consumption for the M-test, and with an additional 8x8-random test
 intended to determine peak ops/sec capability.  There are also quite a
 lot more operations tested for.

---
 test/Makefile.am  |2 +
 test/lowlevel-blt-bench.c |  722 +
 2 files changed, 724 insertions(+), 0 deletions(-)
 create mode 100644 test/lowlevel-blt-bench.c

diff --git a/test/Makefile.am b/test/Makefile.am
index 3d98e17..94f6f93 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -18,6 +18,7 @@ TESTPROGRAMS =\
scaling-crash-test  \
blitters-test   \
scaling-test\
+   lowlevel-blt-bench  \
composite
 
 a1_trap_test_LDADD = $(TEST_LDADD)
@@ -29,6 +30,7 @@ oob_test_LDADD = $(TEST_LDADD)
 window_test_LDADD = $(TEST_LDADD)
 scaling_crash_test_LDADD = $(TEST_LDADD)
 region_translate_test_LDADD = $(TEST_LDADD)
+lowlevel_blt_bench_LDADD = $(TEST_LDADD)
 
 region_test_LDADD = $(TEST_LDADD)
 region_test_SOURCES = region-test.c utils.c utils.h
diff --git a/test/lowlevel-blt-bench.c b/test/lowlevel-blt-bench.c
new file mode 100644
index 000..cca7732
--- /dev/null
+++ b/test/lowlevel-blt-bench.c
@@ -0,0 +1,722 @@
+/*
+ * Copyright © 2010 Movial Creative Technologies Oy
+ *
+ * Permission to use, copy, modify, distribute, and sell this software and its
+ * documentation for any purpose is hereby granted without fee, provided that
+ * the above copyright notice appear in all copies and that both that
+ * copyright notice and this permission notice appear in supporting
+ * documentation, and that the name of ARM Ltd not be used in
+ * advertising or publicity pertaining to distribution of the software without
+ * specific, written prior permission.  ARM Ltd makes no
+ * representations about the suitability of this software for any purpose.  It
+ * is provided "as is" without express or implied warranty.
+ *
+ * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
+ * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
+ * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
+ * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
+ * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
+ * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
+ * SOFTWARE.
+ *
+ * Author:  Jonathan Morton (jonathan.mor...@movial.com)
+ *
+ * Based on the code by Siarhei Siamashka 
+ *
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#ifdef HAVE_CONFIG_H
+#include 
+#endif
+
+#include "pixman-private.h"
+
+#define SOLID_FLAG 1
+#define CA_FLAG2
+
+static double
+gettime ()
+{
+struct timeval tv;
+
+gettimeofday (&tv, NULL);
+return (double)((int64_t)tv.tv_sec * 100 + tv.tv_usec) / 100.;
+}
+
+#define L1CACHE_SIZE (8 * 1024)
+#define L2CACHE_SIZE (128 * 1024)
+
+#define WIDTH  1920
+#define HEIGHT 1080
+#define BUFSIZE (WIDTH * HEIGHT * 4)
+#define XWIDTH 256
+#define XHEIGHT 256
+#define TILEWIDTH 32
+#define TINYWIDTH 8
+
+#define EXCLUDE_OVERHEAD 1
+
+uint32_t *dst;
+uint32_t *src;
+uint32_t *mask;
+
+double bandwidth = 0;
+
+double
+bench_memcpy ()
+{
+int64_t n = 0, total;
+double  t1, t2;
+int x = 0;
+
+t1 = gettime ();
+while (1)
+{
+   memcpy (dst, src, BUFSIZE - 64);
+   memcpy (src, dst, BUFSIZE - 64);
+   n += 4 * (BUFSIZE - 64);
+   t2 = gettime ();
+   if (t2 - t1 > 0.5)
+   break;
+}
+n = total = n * 5;
+t1 = gettime ();
+while (n > 0)
+{
+   if (++x >= 64)
+   x = 0;
+   memcpy ((char *)dst + 1, (char *)src + x, BUFSIZE - 64);
+   memcpy ((char *)src + 1, (char *)dst + x, BUFSIZE - 64);
+   n -= 4 * (BUFSIZE - 64);
+}
+t2 = gettime ();
+return (double)total / (t2 - t1);
+}
+
+static void
+pixman_image_composite_wrapper (pixman_implementation_t *impl,
+pixman_op_t  op,
+pixman_image_t * src_image,
+pixman_image_t * mask_image,
+pixman_image_t * dst_image,
+int32_t  src_x,
+int32_t  src_y,
+int32_t  mask_x,
+int32_t  mask_y,
+int32_t  dest_x,
+int32_t  dest_y,
+int32_t  width,
+int32_t 

Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Siarhei Siamashka
On Tuesday 14 September 2010 15:24:40 Dmitri Vorobiev wrote:
> From: Jonathan Morton 
> 
>  This test is a modified version of Siarhei's compositor throughput
>  benchmark.  It's expanded with explicit reporting of memory bandwidth
>  consumption for the M-test, and with an additional 8x8-random test
>  intended to determine peak ops/sec capability.  There are also quite a
>  lot more operations tested for.

Thanks for pushing this benchmarking stuff forward. It's a bit embarrassing
and I made this benchmark program as a quick hack to get some performance
numbers when tuning ARM NEON optimizations. It worked for me, but it was
not good enough for adding into pixman tree the way it was.

The biggest problem (but easy to solve) is the use of 'gettimeofday' and
'memalign' functions, which are available not on all supported systems.

I listed some of the issues/todo items here:
http://lists.freedesktop.org/archives/pixman/2010-September/000470.html

I will not go into further details, because reviewing mostly my own code is 
kind of weird :) I would wait for Søren's comment(s) to see what he considers
to be real blockers.

But thanks again for picking up this code. That's much appreciated.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 1/5] Add fence_malloc() and fence_free().

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.

If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().
---
 configure.ac |   10 +
 test/utils.c |  105 +-
 test/utils.h |   11 +-
 3 files changed, 124 insertions(+), 2 deletions(-)

diff --git a/configure.ac b/configure.ac
index dbff2a6..652fec9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -623,6 +623,16 @@ if test x$have_alarm = xyes; then
AC_DEFINE(HAVE_ALARM, 1, [Whether we have alarm()])
 fi
 
+AC_CHECK_FUNC(mprotect, have_mprotect=yes, have_mprotect=no)
+if test x$have_mprotect = xyes; then
+   AC_DEFINE(HAVE_MPROTECT, 1, [Whether we have mprotect()])
+fi
+
+AC_CHECK_FUNC(getpagesize, have_getpagesize=yes, have_=no)
+if test x$have_getpagesize = xyes; then
+   AC_DEFINE(HAVE_GETPAGESIZE, 1, [Whether we have getpagesize()])
+fi
+
 dnl =
 dnl Thread local storage
 
diff --git a/test/utils.c b/test/utils.c
index d95cbc2..0134e04 100644
--- a/test/utils.c
+++ b/test/utils.c
@@ -5,6 +5,8 @@
 #include 
 #endif
 
+#include 
+
 /* Random number seed
  */
 
@@ -197,10 +199,111 @@ image_endian_swap (pixman_image_t *img, int bpp)
 }
 }
 
+#define N_LEADING_PROTECTED10
+#define N_TRAILING_PROTECTED   10
+#define INITIAL_PAGE
+
+typedef struct
+{
+void *addr;
+uint32_t len;
+uint8_t *trailing;
+} info_t;
+
+#if defined(HAVE_MPROTECT) && defined(HAVE_GETPAGESIZE)
+
+void *
+fence_malloc (uint32_t len)
+{
+unsigned long page_size = getpagesize();
+unsigned long page_mask = page_size - 1;
+uint32_t n_payload_bytes = (len + page_mask) & ~page_mask;
+uint32_t n_bytes =
+   (len +
+page_size * (N_LEADING_PROTECTED + N_TRAILING_PROTECTED + 2) +
+n_payload_bytes) & ~page_mask;
+uint8_t *initial_page;
+uint8_t *leading_protected;
+uint8_t *trailing_protected;
+uint8_t *payload;
+uint8_t *addr;
+
+addr = malloc (n_bytes);
+
+if (!addr)
+{
+   printf ("malloc failed on %u %u\n", len, n_bytes);
+   return NULL;
+}
+
+initial_page = (uint8_t *)(((unsigned long)addr + page_mask) & ~page_mask);
+leading_protected = initial_page + page_size;
+payload = leading_protected + N_LEADING_PROTECTED * page_size;
+trailing_protected = payload + n_payload_bytes;
+
+((info_t *)initial_page)->addr = addr;
+((info_t *)initial_page)->len = len;
+((info_t *)initial_page)->trailing = trailing_protected;
+
+if (mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_NONE) == -1)
+{
+   free (addr);
+   return NULL;
+}
+
+if (mprotect (trailing_protected, N_TRAILING_PROTECTED * page_size,
+ PROT_NONE) == -1)
+{
+   mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+   free (addr);
+   return NULL;
+}
+
+return payload;
+}
+
+void
+fence_free (void *data)
+{
+uint32_t page_size = getpagesize();
+uint8_t *payload = data;
+uint8_t *leading_protected = payload - N_LEADING_PROTECTED * page_size;
+uint8_t *initial_page = leading_protected - page_size;
+info_t *info = (info_t *)initial_page;
+uint8_t *trailing_protected = info->trailing;
+
+mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+mprotect (trailing_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+free (info->addr);
+}
+
+#else
+
+void *
+fence_malloc (uint32_t len)
+{
+return malloc (len);
+}
+
+void
+fence_free (void *data)
+{
+free (data);
+}
+
+#endif
+
 uint8_t *
 make_random_bytes (int n_bytes)
 {
-uint8_t *bytes = malloc (n_bytes);
+uint8_t *bytes = fence_malloc (n_bytes);
 int i;
 
 if (!bytes)
diff --git a/test/utils.h b/test/utils.h
index a39af02..14e3c8b 100644
--- a/test/utils.h
+++ b/test/utils.h
@@ -51,7 +51,16 @@ compute_crc32 (uint32_tin_crc32,
 void
 image_endian_swap (pixman_image_t *img, int bpp);
 
-/* Generate n_bytes random bytes in malloced memory */
+/* Allocate memory that is bounded by protected pages,
+ * so that out-of-bounds access will cause segfaults
+ */
+void *
+fence_malloc (uint32_t len);
+
+void
+fence_free (void *data);
+
+/* Generate n_bytes random bytes in fence_malloced memory */
 uint8_t *
 make_random_bytes (int n_bytes);
 
-- 
1.7.1.1

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 2/5] Update and extend the alphamap test

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

- Test many more combinations of formats

- Test destination alpha maps

- Test various different alpha origins

Also add a transformation to the destination, but comment it out
because it is actually broken at the moment (and pretty difficult to
fix).
---
 test/Makefile.am |2 +-
 test/alphamap.c  |  243 --
 2 files changed, 218 insertions(+), 27 deletions(-)

diff --git a/test/Makefile.am b/test/Makefile.am
index 3d98e17..f3b2b58 100644
--- a/test/Makefile.am
+++ b/test/Makefile.am
@@ -13,9 +13,9 @@ TESTPROGRAMS =\
window-test \
gradient-crash-test \
trap-crasher\
-   alphamap\
alpha-loop  \
scaling-crash-test  \
+   alphamap\
blitters-test   \
scaling-test\
composite
diff --git a/test/alphamap.c b/test/alphamap.c
index e6a25ef..09de387 100644
--- a/test/alphamap.c
+++ b/test/alphamap.c
@@ -2,47 +2,238 @@
 #include 
 #include "utils.h"
 
-#define WIDTH 400
-#define HEIGHT 200
+#define WIDTH 100
+#define HEIGHT 100
 
-int
-main (int argc, char **argv)
+static const pixman_format_code_t formats[] =
+{
+PIXMAN_a8r8g8b8,
+PIXMAN_a2r10g10b10,
+PIXMAN_a4r4g4b4,
+PIXMAN_a8
+};
+
+static const pixman_format_code_t alpha_formats[] =
+{
+PIXMAN_null,
+PIXMAN_a8,
+PIXMAN_a2r10g10b10,
+PIXMAN_a4r4g4b4
+};
+
+static const int origins[] =
 {
-uint8_t *alpha = make_random_bytes (WIDTH * HEIGHT);
-uint32_t *src = (uint32_t *)make_random_bytes (WIDTH * HEIGHT * 4);
-uint32_t *dest = (uint32_t *)make_random_bytes (WIDTH * HEIGHT * 4);
-int i;
+0, 10, -100
+};
 
-pixman_image_t *a = pixman_image_create_bits (PIXMAN_a8, WIDTH, HEIGHT, 
(uint32_t *)alpha, WIDTH);
-pixman_image_t *d = pixman_image_create_bits (PIXMAN_a8r8g8b8, WIDTH, 
HEIGHT, dest, WIDTH * 4);
+static const char *
+format_name (pixman_format_code_t format)
+{
+if (format == PIXMAN_a8)
+   return "a8";
+else if (format == PIXMAN_a2r10g10b10)
+   return "a2r10g10b10";
+else if (format == PIXMAN_a8r8g8b8)
+   return "a8r8g8b8";
+else if (format == PIXMAN_a4r4g4b4)
+   return "a4r4g4b4";
+else if (format == PIXMAN_null)
+   return "none";
+else
+   assert (0);
 
-for (i = 0; i < 2; ++i)
+return "";
+}
+
+static pixman_image_t *
+make_image (pixman_format_code_t format)
+{
+uint32_t *bits;
+uint8_t bpp = PIXMAN_FORMAT_BPP (format) / 8;
+
+bits = (uint32_t *)make_random_bytes (WIDTH * HEIGHT * bpp);
+
+return pixman_image_create_bits (format, WIDTH, HEIGHT, bits, WIDTH * bpp);
+}
+
+static pixman_image_t *
+create_image (pixman_format_code_t format, pixman_format_code_t alpha_format,
+ int alpha_origin_x, int alpha_origin_y)
+{
+pixman_image_t *image = make_image (format);
+
+if (alpha_format != PIXMAN_null)
 {
-   pixman_format_code_t sformat = (i == 0)? PIXMAN_a8r8g8b8 : 
PIXMAN_a2r10g10b10;
-   pixman_image_t *s = pixman_image_create_bits (sformat, WIDTH, HEIGHT, 
src, WIDTH * 4);
-   int j, k;
+   pixman_image_t *alpha = make_image (alpha_format);
 
-   pixman_image_set_alpha_map (s, a, 0, 0);
+   pixman_image_set_alpha_map (image, alpha,
+   alpha_origin_x, alpha_origin_y);
+}
 
-   pixman_image_composite (PIXMAN_OP_SRC, s, NULL, d, 0, 0, 0, 0, 0, 0, 
WIDTH, HEIGHT);
+return image;
+}
 
-   for (j = 0; j < HEIGHT; ++j)
+static uint8_t
+get_alpha (pixman_image_t *image, int x, int y, int orig_x, int orig_y)
+{
+uint8_t *bits;
+uint8_t r;
+
+if (image->common.alpha_map)
+{
+   if (x - orig_x >= 0 && x - orig_x < WIDTH &&
+   y - orig_y >= 0 && y - orig_y < HEIGHT)
+   {
+   image = (pixman_image_t *)image->common.alpha_map;
+
+   x -= orig_x;
+   y -= orig_y;
+   }
+   else
{
-   for (k = 0; k < WIDTH; ++k)
+   return 0;
+   }
+}
+
+bits = (uint8_t *)image->bits.bits;
+
+if (image->bits.format == PIXMAN_a8)
+{
+   r = bits[y * WIDTH + x];
+}
+else if (image->bits.format == PIXMAN_a2r10g10b10)
+{
+   r = ((uint32_t *)bits)[y * WIDTH + x] >> 30;
+   r |= r << 2;
+   r |= r << 4;
+}
+else if (image->bits.format == PIXMAN_a8r8g8b8)
+{
+   r = ((uint32_t *)bits)[y * WIDTH + x] >> 24;
+}
+else if (image->bits.format == PIXMAN_a4r4g4b4)
+{
+   r = ((uint16_t *)bits)[y * WIDTH + x] >> 12;
+   r |= r << 4;
+}
+else
+{
+   assert (0);
+}
+
+return r;
+}
+
+#define ARRAY_LENGTH(A) ((int) (sizeof (A) / sizeof ((A) [0])))
+
+static int
+run_test (int s, int d, int sa, int da, int soff, int doff)
+{
+pixman_format_code_t sf = formats[s];
+pixman_format_code_t df = formats[d];
+ 

[Pixman] [PATCH 3/5] Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

This avoids a negative in the name. Also, by renaming the "wide"
variable in pixman-general.c to "narrow" and fixing up the logic
correspondingly, the code there reads a lot more straightforwardly.
---
 pixman/pixman-fast-path.c |2 +-
 pixman/pixman-general.c   |   46 ++--
 pixman/pixman-image.c |4 +-
 pixman/pixman-private.h   |6 ++--
 4 files changed, 29 insertions(+), 29 deletions(-)

diff --git a/pixman/pixman-fast-path.c b/pixman/pixman-fast-path.c
index f03752f..6dee472 100644
--- a/pixman/pixman-fast-path.c
+++ b/pixman/pixman-fast-path.c
@@ -1864,7 +1864,7 @@ static const pixman_fast_path_t c_fast_paths[] =
  FAST_PATH_NO_ALPHA_MAP|   \
  FAST_PATH_NEAREST_FILTER  |   \
  FAST_PATH_NO_ACCESSORS|   \
- FAST_PATH_NO_WIDE_FORMAT)
+ FAST_PATH_NARROW_FORMAT)
 
 #define SIMPLE_NEAREST_FAST_PATH(op,s,d,func)  \
 {   PIXMAN_OP_ ## op,  \
diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
index fc742c0..fc276bd 100644
--- a/pixman/pixman-general.c
+++ b/pixman/pixman-general.c
@@ -63,11 +63,11 @@ general_composite_rect  (pixman_implementation_t *imp,
mask && mask->type == BITS ? mask->bits.format : 0;
 const pixman_format_code_t dest_format =
dest->type == BITS ? dest->bits.format : 0;
-const int src_wide = PIXMAN_FORMAT_IS_WIDE (src_format);
-const int mask_wide = mask && PIXMAN_FORMAT_IS_WIDE (mask_format);
-const int dest_wide = PIXMAN_FORMAT_IS_WIDE (dest_format);
-const int wide = src_wide || mask_wide || dest_wide;
-const int Bpp = wide ? 8 : 4;
+const int src_narrow = !PIXMAN_FORMAT_IS_WIDE (src_format);
+const int mask_narrow = !mask || !PIXMAN_FORMAT_IS_WIDE (mask_format);
+const int dest_narrow = !PIXMAN_FORMAT_IS_WIDE (dest_format);
+const int narrow = src_narrow && mask_narrow && dest_narrow;
+const int Bpp = narrow ? 4 : 8;
 uint8_t *scanline_buffer = stack_scanline_buffer;
 uint8_t *src_buffer, *mask_buffer, *dest_buffer;
 fetch_scanline_t fetch_src = NULL, fetch_mask = NULL, fetch_dest = NULL;
@@ -106,29 +106,29 @@ general_composite_rect  (pixman_implementation_t *imp,
 
 if (op == PIXMAN_OP_CLEAR)
fetch_src = NULL;
-else if (wide)
-   fetch_src = _pixman_image_get_scanline_64;
-else
+else if (narrow)
fetch_src = _pixman_image_get_scanline_32;
+else
+   fetch_src = _pixman_image_get_scanline_64;
 
 if (!mask || op == PIXMAN_OP_CLEAR)
fetch_mask = NULL;
-else if (wide)
-   fetch_mask = _pixman_image_get_scanline_64;
-else
+else if (narrow)
fetch_mask = _pixman_image_get_scanline_32;
+else
+   fetch_mask = _pixman_image_get_scanline_64;
 
 if (op == PIXMAN_OP_CLEAR || op == PIXMAN_OP_SRC)
fetch_dest = NULL;
-else if (wide)
-   fetch_dest = _pixman_image_get_scanline_64;
-else
+else if (narrow)
fetch_dest = _pixman_image_get_scanline_32;
-
-if (wide)
-   store = _pixman_image_store_scanline_64;
 else
+   fetch_dest = _pixman_image_get_scanline_64;
+
+if (narrow)
store = _pixman_image_store_scanline_32;
+else
+   store = _pixman_image_store_scanline_64;
 
 /* Skip the store step and composite directly into the
  * destination if the output format of the compose func matches
@@ -148,7 +148,7 @@ general_composite_rect  (pixman_implementation_t *imp,
  op == PIXMAN_OP_OUT_REVERSE   ||
  op == PIXMAN_OP_DST)))
 {
-   if (!wide &&
+   if (narrow &&
!dest->common.alpha_map &&
!dest->bits.write_func)
{
@@ -175,19 +175,19 @@ general_composite_rect  (pixman_implementation_t *imp,
 mask->common.component_alpha&&
 PIXMAN_FORMAT_RGB (mask->bits.format);
 
-if (wide)
+if (narrow)
 {
if (component_alpha)
-   compose = 
(pixman_combine_32_func_t)_pixman_implementation_combine_64_ca;
+   compose = _pixman_implementation_combine_32_ca;
else
-   compose = 
(pixman_combine_32_func_t)_pixman_implementation_combine_64;
+   compose = _pixman_implementation_combine_32;
 }
 else
 {
if (component_alpha)
-   compose = _pixman_implementation_combine_32_ca;
+   compose = 
(pixman_combine_32_func_t)_pixman_implementation_combine_64_ca;
else
-   compose = _pixman_implementation_combine_32;
+   compose = 
(pixman_combine_32_func_t)_pixman_implementation_combine_64;
 }
 
 if (!compose)
diff --git a/pixman/pixman-image.c b/pixman/pixman-image.c
index cfee865..102df6c 100644
--- a/pixman/pixman-image.c
+++ b/pixman/pixman-image.c
@@ -379,7 +379,7 @@ compute_image_info (pixman_image_t *ima

[Pixman] [PATCH 4/5] Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

If an image has an alpha map that has wide components, then we need to
use 64 bit processing for that image. We detect this situation in
pixman-image.c and remove the FAST_PATH_NARROW_FORMAT flag.

In pixman-general, the wide/narrow decision is now based on the flags
instead of on the formats.
---
 pixman/pixman-general.c |   18 +++---
 pixman/pixman-image.c   |   15 +++
 2 files changed, 18 insertions(+), 15 deletions(-)

diff --git a/pixman/pixman-general.c b/pixman/pixman-general.c
index fc276bd..4d234a0 100644
--- a/pixman/pixman-general.c
+++ b/pixman/pixman-general.c
@@ -57,17 +57,6 @@ general_composite_rect  (pixman_implementation_t *imp,
  int32_t  height)
 {
 uint8_t stack_scanline_buffer[SCANLINE_BUFFER_LENGTH * 3];
-const pixman_format_code_t src_format =
-   src->type == BITS ? src->bits.format : 0;
-const pixman_format_code_t mask_format =
-   mask && mask->type == BITS ? mask->bits.format : 0;
-const pixman_format_code_t dest_format =
-   dest->type == BITS ? dest->bits.format : 0;
-const int src_narrow = !PIXMAN_FORMAT_IS_WIDE (src_format);
-const int mask_narrow = !mask || !PIXMAN_FORMAT_IS_WIDE (mask_format);
-const int dest_narrow = !PIXMAN_FORMAT_IS_WIDE (dest_format);
-const int narrow = src_narrow && mask_narrow && dest_narrow;
-const int Bpp = narrow ? 4 : 8;
 uint8_t *scanline_buffer = stack_scanline_buffer;
 uint8_t *src_buffer, *mask_buffer, *dest_buffer;
 fetch_scanline_t fetch_src = NULL, fetch_mask = NULL, fetch_dest = NULL;
@@ -77,8 +66,15 @@ general_composite_rect  (pixman_implementation_t *imp,
 pixman_bool_t component_alpha;
 uint32_t *bits;
 int32_t stride;
+int narrow, Bpp;
 int i;
 
+narrow =
+   (src->common.flags & FAST_PATH_NARROW_FORMAT)   &&
+   (!mask || mask->common.flags & FAST_PATH_NARROW_FORMAT) &&
+   (dest->common.flags & FAST_PATH_NARROW_FORMAT);
+Bpp = narrow ? 4 : 8;
+
 if (width * Bpp > SCANLINE_BUFFER_LENGTH)
 {
scanline_buffer = pixman_malloc_abc (width, 3, Bpp);
diff --git a/pixman/pixman-image.c b/pixman/pixman-image.c
index 102df6c..029a1df 100644
--- a/pixman/pixman-image.c
+++ b/pixman/pixman-image.c
@@ -327,10 +327,6 @@ compute_image_info (pixman_image_t *image)
flags |= FAST_PATH_Y_UNIT_ZERO;
 }
 
-/* Alpha map */
-if (!image->common.alpha_map)
-   flags |= FAST_PATH_NO_ALPHA_MAP;
-
 /* Filter */
 switch (image->common.filter)
 {
@@ -454,6 +450,17 @@ compute_image_info (pixman_image_t *image)
break;
 }
 
+/* Alpha map */
+if (!image->common.alpha_map)
+{
+   flags |= FAST_PATH_NO_ALPHA_MAP;
+}
+else
+{
+   if (PIXMAN_FORMAT_IS_WIDE (image->common.alpha_map->format))
+   flags &= ~FAST_PATH_NARROW_FORMAT;
+}
+
 /* Both alpha maps and convolution filters can introduce
  * non-opaqueness in otherwise opaque images. Also
  * an image with component alpha turned on is only opaque
-- 
1.7.1.1

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 5/5] Clip composite region against the destination alpha map extents.

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

Otherwise we can end up writing outside the alpha map.
---
 pixman/pixman.c |   21 +
 1 files changed, 17 insertions(+), 4 deletions(-)

diff --git a/pixman/pixman.c b/pixman/pixman.c
index cdf4b75..285bbfc 100644
--- a/pixman/pixman.c
+++ b/pixman/pixman.c
@@ -315,14 +315,27 @@ pixman_compute_composite_region32 (pixman_region32_t * 
region,
return FALSE;
 }
 
-if (dst_image->common.alpha_map && 
dst_image->common.alpha_map->common.have_clip_region)
+if (dst_image->common.alpha_map)
 {
-   if (!clip_general_image (region, 
&dst_image->common.alpha_map->common.clip_region,
--dst_image->common.alpha_origin_x,
--dst_image->common.alpha_origin_y))
+   if (!pixman_region32_intersect_rect (region, region,
+dst_image->common.alpha_origin_x,
+dst_image->common.alpha_origin_y,
+dst_image->common.alpha_map->width,
+
dst_image->common.alpha_map->height))
{
return FALSE;
}
+   if (!pixman_region32_not_empty (region))
+   return FALSE;
+   if (dst_image->common.alpha_map->common.have_clip_region)
+   {
+   if (!clip_general_image (region, 
&dst_image->common.alpha_map->common.clip_region,
+-dst_image->common.alpha_origin_x,
+-dst_image->common.alpha_origin_y))
+   {
+   return FALSE;
+   }
+   }
 }
 
 /* clip against src */
-- 
1.7.1.1

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH 0/5] Fix various alpha map related bugs

2010-09-14 Thread Søren Sandmann
From: Søren Sandmann Pedersen 

Hi,

The following patch series contains some fixes for various alpha-map
related bugs.

The bugs are:

- We don't currently clip the composite region against the extents of
  the alpha map of the destination. This means that if the alpha map
  doesn't cover the entire destination, we will make invalid writes to it.

- If the alpha map is one of the 10bpc formats, then it doesn't have a
  32 bit fetcher installed, which causes crashes. The fix for this is
  to make sure wide alpha maps result in wide processing.

There is also a stylistic change where the NO_WIDE_FORMAT flag is
renamed to FAST_PATH_NARROW_FORMAT to avoid the negative in the name.

Finally, there are some updates to the test suite:

- The alpha map test is much more aggressive. It now demonstrates the
  two bugs above and also tests that destination alpha maps are
  correctly read and written. 

- The alpha map test now also has code to set a transformation on the
  destination, but this is commented out because it causes crashes
  even with the other changes. This problem is somewhat difficult to
  fix because we currently don't have any fetchers that will do
  alphamap processing, but ignore transformations.

- There is a new fence_malloc()/fence_free() pair of utility
  functions. These attempt to allocate the memory surrounded by
  mprotect()ed pages so that invalid reads or writes will cause
  crashes.


Soren



Søren Sandmann Pedersen (5):
  Add fence_malloc() and fence_free().
  Update and extend the alphamap test
  Rename FAST_PATH_NO_WIDE_FORMAT to FAST_PATH_NARROW_FORMAT
  Remove FAST_PATH_NARROW_FORMAT flag if there is a wide alpha map
  Clip composite region against the destination alpha map extents.

 configure.ac  |   10 ++
 pixman/pixman-fast-path.c |2 +-
 pixman/pixman-general.c   |   54 +--
 pixman/pixman-image.c |   19 +++-
 pixman/pixman-private.h   |6 +-
 pixman/pixman.c   |   21 +++-
 test/Makefile.am  |2 +-
 test/alphamap.c   |  243 -
 test/utils.c  |  105 +++-
 test/utils.h  |   11 ++-
 10 files changed, 401 insertions(+), 72 deletions(-)
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Dmitri Vorobiev
>
> The biggest problem (but easy to solve) is the use of 'gettimeofday' and
> 'memalign' functions, which are available not on all supported systems.

Are you aware of any specific systems (I assume Windows is among them)
that might have these calls unsupported? I may check those and see
what can be done.

Thanks
Dmitri
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Siarhei Siamashka
On Tuesday 14 September 2010 16:37:14 Dmitri Vorobiev wrote:
> > The biggest problem (but easy to solve) is the use of 'gettimeofday' and
> > 'memalign' functions, which are available not on all supported systems.
> 
> Are you aware of any specific systems (I assume Windows is among them)
> that might have these calls unsupported? I may check those and see
> what can be done.

There was this bug reported earlier: 
http://bugs.freedesktop.org/show_bug.cgi?id=23260

So some people do care about at least MSYS/mingw. Now pixman already has a 
configure check for posix_memalign(), the same probably needs to be done
for gettimeofday().

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add a lowlevel blitter test

2010-09-14 Thread Dmitri Vorobiev
On Tue, Sep 14, 2010 at 4:45 PM, Siarhei Siamashka
 wrote:
> On Tuesday 14 September 2010 16:37:14 Dmitri Vorobiev wrote:
>> > The biggest problem (but easy to solve) is the use of 'gettimeofday' and
>> > 'memalign' functions, which are available not on all supported systems.
>>
>> Are you aware of any specific systems (I assume Windows is among them)
>> that might have these calls unsupported? I may check those and see
>> what can be done.
>
> There was this bug reported earlier:
> http://bugs.freedesktop.org/show_bug.cgi?id=23260
>
> So some people do care about at least MSYS/mingw. Now pixman already has a
> configure check for posix_memalign(), the same probably needs to be done
> for gettimeofday().

OK. Would that be sufficient to just disable the blitter test like
this if either memalign() or gettimeofday() isn't found at configure
stage?

Dmitri
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-14 Thread Soeren Sandmann
Siarhei Siamashka  writes:

> So does it make sense to split the patch into parts, introducing this third
> argument for 'over' function first?

Yeah, makes sense.

> > It might also be interesting to add the check as a new
> > NOT_SUPER_LUMINESCENT flag and then simply require it for the source
> > for all the over_n_*() functions.
> 
> I see many reasons *not* to add it as a new flag:
> 
> 1. It takes one extra flag bit. There are already 24 bits used, with only 8 
> remaining. We still need some flag(s) for rotation transforms:
> http://lists.freedesktop.org/archives/pixman/2010-August/000420.html
> I expect that compacting bits later may turn out to be tricky, so it may be 
> wise not to waste them in the first place. Extending flag bits to 64-bit
> variable is possible, but may reduce performance.
> 
> 2. After introducing this bit, every compositing operation with a solid
> source will do calculation for this flag, spending some time on it. But
> calculation of this flag is not needed for many operators (SRC for example). 
> Also it is only useful exclusively for C fast paths and simple SIMD-incapable 
> processors, everyone else will just take a tiny performance hit.
> 
> The 'last mile' check as implemented in my patch should be fine as far as 
> performance is concerned. The only drawback is that the one who implements the
> fast path functions, will be forced to handle all possible types of input 
> data.
> And not be lazy providing just NOT_SUPER_LUMINESCENT operation only, relying 
> on
> pixman to fallback to someting else when needed.

On the other hand, there are drawbacks to having the check in each
fast path too:

- It adds a bunch of both source and binary code that nobody will ever
  run except through the test suite. It seems kind of pointless to
  have an optimization for such an uncommon case.

- The check will happen for each invocation of the composite
  operation, whereas with the flag, the computation can be cached in
  the source image.

I'm not terribly concerned about wasting flag bits. There are still
25% of them left. In 0.22 I think we can get rid of the
NEED_WORKAROUND one, and the NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, and
NO_ACCESSORS could be collapsed to one NO_CRACK flag. Or they can be
extended to 64 bits.

> BTW, I like this 'super-luminescent' term :) I tried to search for the 
> information about the case when "color components exceed alpha in 
> premultiplied 
> format", and it looked like many (game developers) know about this thing and
> its features, but seemed like nobody had a clear single-word definition for 
> it.
> Searching for "super-luminescent premultiplied" gives some references, all in 
> cairo and freedesktop.org context. Anyway, let's indeed call this thing
> 'super-luminescent'. I think I need to update comments in the patch and also 
> in
> the commit message to use it instead of 'additive blending', which I took 
> from:
> http://home.comcast.net/~tom_forsyth/blog.wiki.html#[[Premultiplied%20alpha]]

I think the word originally comes from Jim Blinn's book "Dirty Pixels"
[1], except that apparently he calls them "superluminous". I'll highly
recommend that book for anyone intested in background material on
Render, pixman and cairo.

> > That would allow similar optimizations for the n_8_565 case and probably the
> > n___ca() case as well.
> 
> Yes, and also 'over_n_' could make use of this optimization (if C fast 
> path 
> function even gets implemented for it).

Existing fast path operations that can take advantage of this:

n_8_0565
n_8_0888
n_8_
n_1_
n_1_0565
n___ca
n__0565_ca
x888_8_ (because x888 is never superluminescent)

So if the optimization were fully implemented it would be quite a bit
of dead code.

> > The flag could be set for all the gradients and any time an image
> > is opaque.
> 
> I'm not quite sure about how useful this flag could be for gradients (it 
> would 
> have to be somehow propagated to the scanline combiner function?).

Right, it would have to be propagated to the scanline combiner to be
useful. 


Soren


[1] http://www.amazon.com/Jim-Blinns-Corner-Kaufmann-Computer/dp/1558604553
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] pixman 0.18 MinGW SSE2 error

2010-09-14 Thread Soeren Sandmann
Tor Lillqvist  writes:

> By the way, it seems that with gcc 4.5.0 from mingw.org, __thread, sse
> and mmx work fine.
> 
> I added the below to pixman 0.18 and as far as I can see, it works.
> make check reports no problems. (Earlier I had to use --disable-mmx
> and --disable-sse2.) Also gtk-demo and gimp run fine.
> 
> (Also a change to get rid of the warnings about -fvisibility being
> ignored.)

This patch looks good to me, so I'll merge it unless someone tells me
otherwise within the next few days.


Soren


> diff --git a/configure.ac b/configure.ac
> index 8b2b024..d9f9718 100644
> --- a/configure.ac
> +++ b/configure.ac
> @@ -145,6 +145,9 @@ have_gcc4=no
>  AC_MSG_CHECKING(for -fvisibility)
>  AC_COMPILE_IFELSE([
>  #if defined(__GNUC__) && (__GNUC__ >= 4)
> +#ifdef _WIN32
> +error Have -fvisibility but it is ignored and generates a warning
> +#endif
>  #else
>  error Need GCC 4.0 for visibility
>  #endif
> @@ -524,8 +527,8 @@ support_for__thread=no
> 
>  AC_MSG_CHECKING(for __thread)
>  AC_COMPILE_IFELSE([
> -#ifdef __MINGW32__
> -#error MinGW has broken __thread support
> +#if defined __MINGW32__ && !(__GNUC__ > 4 || (__GNUC__ == 4 &&
> __GNUC_MINOR__ >= 5))
> +#error This MinGW version has broken __thread support
>  #endif
>  __thread int x ;
>  int main () { return 0; }
> diff --git a/pixman/pixman-compiler.h b/pixman/pixman-compiler.h
> index 26f7071..affb236 100644
> --- a/pixman/pixman-compiler.h
> +++ b/pixman/pixman-compiler.h
> @@ -60,7 +60,7 @@
>  #endif
> 
>  /* GCC visibility */
> -#if defined(__GNUC__) && __GNUC__ >= 4
> +#if defined(__GNUC__) && __GNUC__ >= 4 && !defined(_WIN32)
>  #   define PIXMAN_EXPORT __attribute__ ((visibility("default")))
>  /* Sun Studio 8 visibility */
>  #elif defined(__SUNPRO_C) && (__SUNPRO_C >= 0x550)
> --
> cairo mailing list
> ca...@cairographics.org
> http://lists.cairographics.org/mailman/listinfo/cairo
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman