Re: [Pixman] [PATCH 0/11] Use macros to generate fetchers

2011-09-13 Thread Soeren Sandmann
Soeren Sandmann sandm...@cs.au.dk writes:

 Chris Wilson ch...@chris-wilson.co.uk writes:

 The final patch is a little scary though. Was that motivated by
 observation of the generated assembly or through performance testing?

 It was motivated by looking at the generated assembly, but I'll run some
 performance tests. 

Performance results below. I tested master, the series without the final
fast path patch, and the series with the final patch. The results are
that without wins 10/26, master 9/26, and fast paths 7/26, but
there is never a huge difference.

I guess the conclusion is that the fast paths can't be justified by
these measurements.


Soren

 minimum   median  std.dev. 
winner
---
 
[  0]imagefirefox-talos-gfx   21.013   21.058   0.11%6/6
without
[  0]imagefirefox-talos-gfx   20.714   20.790   0.27%6/6
[  0]imagefirefox-talos-gfx   21.249   21.312   0.15%6/6
---
[  0]  image16firefox-talos-gfx   19.623   19.630   0.04%5/6
without
[  0]  image16firefox-talos-gfx   19.596   19.618   0.14%6/6
[  0]  image16firefox-talos-gfx   19.925   19.948   0.12%6/6
---
[  0]imagefirefox-talos-svg  120.890  121.006   0.07%5/6
without
[  0]imagefirefox-talos-svg  120.623  120.647   0.03%5/6
[  0]imagefirefox-talos-svg  120.978  121.055   0.04%5/6
---
[  0]  image16firefox-talos-svg  138.907  139.182   0.12%5/6
master
[  0]  image16firefox-talos-svg  140.011  140.043   0.04%5/6
[  0]  image16firefox-talos-svg  139.609  139.825   0.08%5/6
---
[  0]imageevolution   25.322   25.456   0.27%6/6
without
[  0]imageevolution   25.248   25.414   0.27%6/6
[  0]imageevolution   26.654   26.732   0.15%6/6
---
[  0]  image16evolution   13.096   13.165   0.51%6/6
master
[  0]  image16evolution   13.217   13.250   0.17%5/6
[  0]  image16evolution   13.368   13.435   0.26%6/6
---
[  0]image gnome-system-monitor   10.192   10.222   0.13%6/6
without
[  0]image gnome-system-monitor   10.166   10.195   0.13%6/6
[  0]image gnome-system-monitor   10.488   10.499   0.12%6/6
---
[  0]  image16 gnome-system-monitor   10.210   10.255   0.34%6/6
without
[  0]  image16 gnome-system-monitor   10.191   10.236   0.34%6/6
[  0]  image16 gnome-system-monitor   10.503   10.537   0.40%6/6
---
[  0]image   gnome-terminal-vim6.5476.692   0.92%6/6
master
[  0]image   gnome-terminal-vim6.9927.098   1.07%6/6
[  0]image   gnome-terminal-vim6.7226.864   1.17%6/6
---
[  0]  image16   gnome-terminal-vim6.1046.242   0.95%6/6
fast paths
[  0]  image16   gnome-terminal-vim6.1756.271   0.74%6/6
[  0]  image16   gnome-terminal-vim6.0436.141   0.79%6/6
---
[  0]image   grads-heat-map0.5520.561   0.85%6/6
master
[  0]image   grads-heat-map0.5780.584   0.48%6/6
[  0]image   grads-heat-map0.5580.568   0.80%6/6
---
[  0]  image16   grads-heat-map0.5560.559   0.56%6/6
master
[  0]  image16   grads-heat-map0.5750.576   0.07%4/6
[  0]  image16   grads-heat-map0.5580.561   0.27%6/6
---
[  0]image gvim   32.374   32.459   0.15%6/6
fast paths
[  0]image gvim   32.414   32.505   0.11%6/6
[  0]image

Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms

2011-09-10 Thread Soeren Sandmann
Soeren Sandmann sandm...@cs.au.dk writes:

 The new code only computes the transformation once, and then it add the
 absolute value of the 00 and 10 components to get a conservative
 estimate of the transformed destination boundary. I'm especially
 interested in review of this part, both of the code and of whether it's
 actually correct mathematically.

It is in fact not at all correct. It's completely bogus in fact. For
affine matrices the idea may not be entirely wrong, but the correct
thing to add to the X coordinates would be |m00| + |m01| + |m11| and not
just |m00|. Similar for Y coordinates.

But that fails to take projective matrices into account, so I just went
back to computing the transformed bounding box twice. New patch below.


Soren


From e02d17612531fb8556d30fc2dde3ad2680f2b739 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com
Date: Mon, 5 Sep 2011 00:19:51 -0400
Subject: [PATCH] Eliminate compute_sample_extents() function

In analyze_extents(), instead of calling compute_sample_extents() call
compute_transformed_extents() and inline the remaining part of
compute_sample_extents(). The upcoming bilinear-nearest optimization
will do something different with these two pieces of code.
---
 pixman/pixman.c |  100 +++
 1 files changed, 42 insertions(+), 58 deletions(-)

diff --git a/pixman/pixman.c b/pixman/pixman.c
index 264a56b..19eda08 100644
--- a/pixman/pixman.c
+++ b/pixman/pixman.c
@@ -514,45 +514,9 @@ compute_transformed_extents (pixman_transform_t *transform,
 return TRUE;
 }
 
-static pixman_bool_t
-compute_sample_extents (pixman_transform_t *transform,
-   pixman_box32_t *extents,
-   pixman_fixed_t x_off, pixman_fixed_t y_off,
-   pixman_fixed_t width, pixman_fixed_t height)
-{
-box_48_16_t transformed;
-
-if (!compute_transformed_extents (transform, extents, transformed))
-   return FALSE;
-
-/* Expand the source area by a tiny bit so account of different rounding 
that
- * may happen during sampling. Note that (8 * pixman_fixed_e) is very far 
from
- * 0.5 so this won't cause the area computed to be overly pessimistic.
- */
-transformed.x1 += x_off - 8 * pixman_fixed_e;
-transformed.y1 += y_off - 8 * pixman_fixed_e;
-transformed.x2 += x_off + width + 8 * pixman_fixed_e;
-transformed.y2 += y_off + height + 8 * pixman_fixed_e;
-
-if (transformed.x1  pixman_min_fixed_48_16 || transformed.x1  
pixman_max_fixed_48_16 ||
-   transformed.y1  pixman_min_fixed_48_16 || transformed.y1  
pixman_max_fixed_48_16 ||
-   transformed.x2  pixman_min_fixed_48_16 || transformed.x2  
pixman_max_fixed_48_16 ||
-   transformed.y2  pixman_min_fixed_48_16 || transformed.y2  
pixman_max_fixed_48_16)
-{
-   return FALSE;
-}
-else
-{
-   extents-x1 = pixman_fixed_to_int (transformed.x1);
-   extents-y1 = pixman_fixed_to_int (transformed.y1);
-   extents-x2 = pixman_fixed_to_int (transformed.x2) + 1;
-   extents-y2 = pixman_fixed_to_int (transformed.y2) + 1;
-
-   return TRUE;
-}
-}
-
 #define IS_16BIT(x) (((x) = INT16_MIN)  ((x) = INT16_MAX))
+#define ABS(f)  (((f)  0)?  (-(f)) : (f))
+#define IS_16_16(f) (((f) = pixman_min_fixed_48_16  ((f) = 
pixman_max_fixed_48_16)))
 
 static pixman_bool_t
 analyze_extent (pixman_image_t   *image,
@@ -563,7 +527,8 @@ analyze_extent (pixman_image_t   *image,
 pixman_fixed_t *params;
 pixman_fixed_t x_off, y_off;
 pixman_fixed_t width, height;
-pixman_box32_t ex;
+box_48_16_t transformed;
+pixman_box32_t exp_extents;
 
 if (!image)
return TRUE;
@@ -633,17 +598,6 @@ analyze_extent (pixman_image_t   *image,
default:
return FALSE;
}
-
-   /* Check whether the non-expanded, transformed extent is entirely within
-* the source image, and set the FAST_PATH_SAMPLES_COVER_CLIP if it is.
-*/
-   ex = *extents;
-   if (compute_sample_extents (transform, ex, x_off, y_off, width, 
height) 
-   ex.x1 = 0  ex.y1 = 0 
-   ex.x2 = image-bits.width  ex.y2 = image-bits.height)
-   {
-   *flags |= FAST_PATH_SAMPLES_COVER_CLIP;
-   }
 }
 else
 {
@@ -653,17 +607,47 @@ analyze_extent (pixman_image_t   *image,
height = 0;
 }
 
-/* Check that the extents expanded by one don't overflow. This ensures that
- * compositing functions can simply walk the source space using 16.16
- * variables without worrying about overflow.
+if (!compute_transformed_extents (transform, extents, transformed))
+   return FALSE;
+
+/* Expand the source area by a tiny bit so account of different rounding 
that
+ * may happen during sampling. Note that (8 * pixman_fixed_e) is very far 
from
+ * 0.5 so this won't cause the area computed to be overly pessimistic

[Pixman] [ANNOUNCE] pixman release 0.23.4 now available

2011-09-09 Thread Soeren Sandmann
A new pixman release 0.23.4 is now available. This is a development
release leading up to a stable release 0.24.0.

Note: If you run the test suite on this released tarball, this message:

*** BUG ***
In pixman_region32_union_rect: Invalid rectangle passed
Set a breakpoint on '_pixman_log_error' to debug

will be printed a number of times when the region-contains-test is
running. The test is still expected to pass, however.

Please report bugs to pixman@lists.freedesktop.org or file them at

https://bugs.freedesktop.org/enter_bug.cgi?product=pixman

Søren


tar.gz:
http://cairographics.org/snapshots/pixman-0.23.4.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.4.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.4.tar.bz2

Hashes:
MD5:  389ce497fcfb26eb99396ca5ed6e17ab  pixman-0.23.4.tar.gz
MD5:  1fb7c1ab150cfbc417290d5888d68ce8  pixman-0.23.4.tar.bz2
SHA1: 539f4875070ce836a109fa0fae7a3421d503ca46  pixman-0.23.4.tar.gz
SHA1: 5c85fff3f474436fdcc0c29139ce4ed87e38e24b  pixman-0.23.4.tar.bz2

GPG signature:
http://cairographics.org/snapshots/pixman-0.23.4.tar.gz.sha1.asc
(signed by Søren Sandmann Pedersen sandm...@cs.au.dk)

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.23.4

Log:
Andrea Canciani (4):
  radial: Improve documentation and naming
  radial: Fix typos and trailing whitespace
  win32: Build benchmarks
  Workaround bug in llvm-gcc

Chris Wilson (1):
  bits: optimise fetching width==1 repeats

Siarhei Siamashka (2):
  C fast path for scaled src_x888_ with nearest filter
  ARM: workaround binutils bug #12931 (code sections alignment)

Søren Sandmann Pedersen (12):
  Post-release version bump to 0.23.3
  Makefile.am: Add pixman@lists.freedesktop.org to 
RELEASE_ANNOUNCE_LIST
  Fix lcg_rand_u32() to return 32 random bits.
  New test of pixman_region_contains_{rectangle,point}
  Speed up pixman_region{,32}_contains_rectangle()
  Use find_box_for_y() in pixman_region_contains_point() too
  Don't include stdint.h in lowlevel-blt-bench.c
  In pixman_image_create_bits() allow images larger than 2GB
  Rename pixman-fast-path.h to pixman-inlines.h
  Use repeat() function from pixman-inlines.h in pixman-bits-image.c
  Move bilinear interpolation to pixman-inlines.h
  Pre-release version bump to 0.23.4

Taekyun Kim (3):
  ARM NEON: Standard fast path out_reverse_8_
  ARM: NEON better instruction scheduling of over_n_8_
  ARM: NEON better instruction scheduling of over_n_
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] 0.23.4

2011-09-06 Thread Soeren Sandmann
Hi,

We've gone too long without a release, so I'm planning on making a
0.23.4 release some time this week.

If you have outstanding patches that are ready to go in, please push
them. Off the top of my head, I can think of the ARM scheduling patches,
and the 1 pixel fill patches, though there might be others.


Thanks,
Soren

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms

2011-09-05 Thread Soeren Sandmann
 It seems to me that the cause of the problem is that the meaning of
 COVER depends on what the filter is: it means there is enough space
 around the sample area for the chosen filter, which then becomes wrong
 when doing filter reductions.

 So maybe a solution would be to provide two separate flags:

 COVER_NEAREST:  there is enough space for a nearest filter
 COVER_BILINEAR: there is enough space for a bilinear filter

 These are unambiguous and because either there is enough space for a
 bilinear filter, or there isn't, they can be set independently of what
 the filter actually is.

 Code that currently depends on COVER generally also depends on filter,
 at least implicitly, so there shouldn't be any problem picking the
 correct version of COVER for existing fast paths.

 This would solve the problem because in the problematic case, the flags
 FILTER_NEAREST | COVER_NEAREST could be set, and the standard fast paths
 would run, but at the same time, bilinear/cover paths wouldn't be
 triggered because they would require FILTER_BILINEAR | COVER_BILINEAR.

 As a side effect, this also means we can drop the code that worries
 about what covering means when there is a convolution filter.

I tried implementing this scheme in the patch series that follows.

There is first a general cleanup of the analyze_extent routines. The old
code computed the transformed sample extent twice, once to determine
whether the clip was contained within the samples, and once to determine
whether the transformation of the destination rectangle, expanded by 1,
was contained within the 16 bit coordinate system.

The new code only computes the transformation once, and then it add the
absolute value of the 00 and 10 components to get a conservative
estimate of the transformed destination boundary. I'm especially
interested in review of this part, both of the code and of whether it's
actually correct mathematically.

Following that cleanup is Siarhei's change to affine-test and a change
to blitters test to make it use a bilinear filter occasionally. Then the
splitting of SAMPLES_COVER_CLIP and the bilinear reduction, followed by
Siarhei's patch to also do the reduction for 90/180/270 degree rotations
and integral translations.

The bug, for reference:

   https://bugs.freedesktop.org/show_bug.cgi?id=37421

Performance testing with Xephyr on a Core2 Duo @1GHz:

Current master:
*** ROUND 1 ***
---
Test: Test Xrender doing non-scaled Over blends
Time: 5.720 sec.
---
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 5.149 sec.
---
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.237 sec.

With bilinear reduction:
*** ROUND 1 ***
---
Test: Test Xrender doing non-scaled Over blends
Time: 4.947 sec.
---
Test: Test Xrender (offscreen) doing non-scaled Over blends
Time: 4.487 sec.
---
Test: Test Imlib2 doing non-scaled Over blends
Time: 6.235 sec.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms

2011-09-05 Thread Soeren Sandmann
Soeren Sandmann sandm...@cs.au.dk writes:

 I tried implementing this scheme in the patch series that follows.

The patch series is also available here:

  http://cgit.freedesktop.org/~sandmann/pixman/log/?h=bilinear-reduction


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/11] Use macros to generate fetchers

2011-09-02 Thread Soeren Sandmann
Chris Wilson ch...@chris-wilson.co.uk writes:

 The final patch is a little scary though. Was that motivated by
 observation of the generated assembly or through performance testing?

It was motivated by looking at the generated assembly, but I'll run some
performance tests. 

 Can pixman export a wrapper to its convert_pixel function? There are
 quite a few open-coded routines for doing the same spread around the X
 server and its brethren.

I wouldn't necessarily be opposed to exporting such a function if a
patch show up. It probably has to be driven by someone who has a need
for it though, to make sure we get the right API.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] ARM iwmmxt patches

2011-08-31 Thread Soeren Sandmann
Matt Turner matts...@gmail.com writes:

 I've been trying to figure out if the ARM iwmmxt inline assembly makes
 any difference at all. I think the conclusion is that it does not.
 Updated code is here:
 http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations

You mean that inline assembly in mmx_fill() doesn't make a difference?

 See
 http://people.freedesktop.org/~mattst88/pixman-iwmmxt-benchdata.txt

The lowlevel-blt benchmark doesn't hit the fill and blt routines at all,
so this data doesn't support the conclusion that inline assembly in
mmx_fill() and mmx_blt() makes no difference.

 Never does using inline assembly seem to make any sort of meaningful
 difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried
 checking the alignment in the 'wip' commit in the blt function to
 avoid a lot of unnecessary walign instructions, but as you can see
 from the benchmark results, it doesn't help anything.

The cairo-trace tests are better benchmarks to use in general because
they reflect real-world use. lowlevel-blt-bench really should only be
used for the case where you are optimizing a specific compositing
routine.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] ARM iwmmxt patches

2011-08-31 Thread Soeren Sandmann
Matt Turner matts...@gmail.com writes:

 Never does using inline assembly seem to make any sort of meaningful
 difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried
 checking the alignment in the 'wip' commit in the blt function to
 avoid a lot of unnecessary walign instructions, but as you can see
 from the benchmark results, it doesn't help anything.

 The cairo-trace tests are better benchmarks to use in general because
 they reflect real-world use. lowlevel-blt-bench really should only be
 used for the case where you are optimizing a specific compositing
 routine.

 OK, I'll run cairo-trace to determine the effect of the inline
 assembly. I think the addition of inline assembly could go in as
 follow-on patches though, right?

Right, as long as they are not required to avoid regressions on either
x86 or ARM.

There are still some problems with the rest of the patch set
though. Several of the comments from Siarhei and me have not been
addressed, and compiling your iwmmxt-optimizations2 branch on x86
results in

../pixman/.libs/libpixman-1.so: undefined reference to `_mm_align_si64'


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] Shapes, etc.

2011-08-24 Thread Soeren Sandmann
Hi,


Here are some ideas on how to improve the support in pixman for cairo's
compositing operation.  There are several separate questions related to
that:

   1. Can we support cairo's compositing operation better?

   2. Can we pass shape information to pixman more efficiently?

and a tangential third one:

   3. Should the pixman API be based on something other than images?

The TL;DR is a strawman proposal to add:

   - pixman_image_set_clip_image() 
   - support for op_LERP operators
   - either a polygon or a spans image


1. Can we support cairo's compositing operation better?

The first question pertains to the fact that cairo's operation is more
complex than what pixman and X Render provides. This leads to extra work
in cairo to construct the desired compositing operation.

One aspect of this is that cairo supports clipping to arbitrary shapes,
whereas pixman only supports axis aligned regions. A straightforward fix
for this is just add support for clip images:

pixman_image_set_clip_image (pixman_image_t *image,
 pixman_image_t *clip);

which means that when @image is used as a destination, all rendering is
clipped to the alpha channel of @clip in addition to being clipped to
the clip region. (We need both the region and the clip image because the
X server will want to set a clip region based on the window hierarchy in
any case).

The simplest way to implement this is to have the general implementation
run an iterator for the clip image, and then pass a clip buffer to the
combiners. In addition to computing the regular operator, the combiners
would then also do LERP_clip onto the destination buffer.

pixman_image_composite_with_clip (src, mask, dest, clip, ...)

It may also be that it would more useful to instead add a new function
that takes a clip image in addition to src, mask, and dest.

Another aspect is that cairo uses a different rendering equation in some
cases. The two equations used are:

unbounded:   [(src IN shape) OP dest  ] LERP_clip dest

bounded: [(src OP dest) LERP_shape dest   ] LERP_clip dest

With shaped clips, the LERP_clip part is taken care of, and the first
equation is already directly supported by pixman. The second one could
be supported by adding new op_LERP operators that would use the

 (src OP dest) LERP_shape dest

equation. For cairo's purposes, all we need is CLEAR_LERP and SRC_LERP,
but I think it could be useful to have the full set of LERP operators.


2. Can we pass shape information to pixman more efficiently?

The second question is whether there is a more efficient way to pass
shape information to pixman than passing them as 8 bit alpha masks.

A polygon image is a possibility, and the one that I prefer as it allows
more processing to happen in one pass, and because it is a more compact
representation which is useful for the X protocol. There could be a
method on such an image to generate a list of spans, so that cairo
wouldn't need to maintain its own polygon rasterizer.

But passing the information as spans is another possibility. If we add
support for clip images, the obvious way to add spans support would be
through a spans image:

 pixman_image_create_spans (const pixman_span_t *spans, 
int  n_spans);

that could be used both as the shape and as the clip. The initial
implementation of such an image would be very simple: just implement an
iterator that walks the list of spans, generating argb scanline buffers.

An appealing prospect here is a fast path that processes both shape and
clip in one pass. If both span lists are known to be sorted in YX order,
then they could be processed in one linear pass through both.


3. Should the pixman API be based on something other than images?

Finally, the more tangential question is whether pixman's API should be
based on image objects in the future. They tend to be a little clumsy to
use in both cairo and X. I wrote down some halfbaked ideas here:

 http://cgit.freedesktop.org/~sandmann/pixman/tree/docs/new-api?h=docs

inspired by Chris' old suggestion that pixman should behave more like a
GPU.

The gist of it is that instead of operating on image objects, there is
just one pixman_context_t that has a bunch of properties that can be
set. Once all the properties are set, the user is expected to call
pixman_context_composite().

There is also an api that allows bulk setting of properties so that an
application can submit a list of commands and pixman would then carry
them out, pretty much like a GPU would.

It could be considered a generalization of the pixman_compositor_t idea
from Chris' patch.


Comments on all three questions welcome.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Workaround bug in llvm-gcc

2011-08-23 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 The same issue affects non-Apple llvm-gcc, but it's deprecated and
 there will be no further releases. I would not be surprised if Apple
 just started using clang as the default compiler in future XCode
 versions.

Okay, if there won't ever be a fixed version of llvm-gcc, then I guess
there is no point trying to be future-proof.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Workaround bug in llvm-gcc

2011-08-22 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in
 the 2.9 release of LLVM) performs an invalid optimization which
 unifies the empty_region and the bad_region structures because they
 have the same content.

 A bug has been filed against Apple Developers Tool for this
 issue. This commit works around this bug by making one of the two
 structures volatile, so that it cannot be merged.

Is the bug report publicly readable?

  static const box_type_t PREFIX (_empty_box_) = { 0, 0, 0, 0 };
  static const region_data_type_t PREFIX (_empty_data_) = { 0, 0 };
 +#if defined (__llvm__)  !defined (__clang__)
 +static const volatile region_data_type_t PREFIX (_broken_data_) = { 0, 0 };
 +#else
  static const region_data_type_t PREFIX (_broken_data_) = { 0, 0 };
 +#endif

It worries me a little that these #ifdefs will still trigger when Apple
fix their compiler. Can we detect it at configure time instead by
running something like this:

typedef struct { int x, y; } blah_t;

static const blah_t b1 = { 0, 0 };
static const blah_t b2 = { 0, 0 };

int
main ()
{
if (b1 == b2)
return 1;
else
return 0;
}


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] bits: optimise fetching width==1 repeats

2011-08-22 Thread Soeren Sandmann
Chris Wilson ch...@chris-wilson.co.uk writes:

 diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c
 index 4e9ed14..a762c27 100644
 --- a/pixman/pixman-bits-image.c
 +++ b/pixman/pixman-bits-image.c
 @@ -1149,7 +1149,34 @@ bits_image_fetch_untransformed_repeat_normal 
 (bits_image_t *image,
  while (y = image-height)
   y -= image-height;
  
 -while (width)
 +if (image-width == 1)
 +{
 + /* XXX duplication from fetch solid */
 + if (wide)
 + {
 + uint64_t color;
 + uint64_t *b = (uint64_t *)buffer;
 + uint64_t *end;
 +
 + color = image-fetch_pixel_64 (image, 0, y);
 +
 + end = b + width;
 + while (b  end)
 + *(b++) = color;
 + }
 + else
 + {
 + uint32_t color;
 + uint32_t *end;
 +
 + color = image-fetch_pixel_32 (image, 0, y);
 +
 + end = buffer + width;
 + while (buffer  end)
 + *(buffer++) = color;
 + }
 +}
 +else while (width)
  {
   while (x  0)
   x += image-width;

Makes sense to me, but I think the duplicated code should be in a new
force_inline function replicate_pixel() or something. Also, maybe the
patch got mangled in the mail, but it looks to me like it uses
eight-space indents.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [RFC] A spans interface for pixman

2011-08-17 Thread Soeren Sandmann
Bill Spitzak spit...@gmail.com writes:

 Yes; to be a bit more precise, with the current code, the alpha channel
 is a sampled version of the polygon and the rgb channels are 0. Ie., the
 pixels are black with varying levels of opacity.

 An interesting alternative possibility would be to consider the RGB
 channels duplicates of the alpha channel, which would make the pixels
 shades of white instead.

 This is definitely better and much more useful. I would in fact have
 ALL 1-channel images treated this way, removing a lot of need to have
 different operators whose purpose is only to source a 1-channel image
 that defaults to acting like 0,0,0,x. I'm not sure what will break in
 pixman and/or cairo if this is done, however.

A way to introduce this behavior without breaking backwards
compatibility would be to just add new PIXMAN_w8/w4/w1 format that would
operate in this way.

I'm not sure which other one-channel images you mean. Aside from
a8/a4/a1, the only other one-channel images pixman supports are g8/g4/g1
and c8/c4/c1 which are paletted and not used much.

 It could be interesting to add support for polygon operators:

 pixman_image_create_polygon_intersection (polygon1, polygon2)
 pixman_image_create_polygon_union (polygon1, polygon2);
 ...

 These would return a new polygon image that would contain pointers to
 the other two.

 It may be possible to actually intersect these and make a new polygon
 with full accuracy. Though in some cases such as matching lines it can
 be painful and may produce a much larger number of path components
 than the two originals. But also if users do this enough times you
 will end up with a whole tree of polygons and CSG-like work.

What I meant was that you could rasterize all the polygons at the same
time, and then do the intersect/union at the subpixel scanline
level. That would produce the same accuracy as intersecting the polygons
up front.

 Questions:

 Can the polygon images be scaled or transformed when used as a source?
 (and produce a correct antialiased result, not scaled pixels).

Right now they can't, but they probably should considering all other
images can be. Is this any more complicated than simply transforming all
the edge endpoints?

 At the cairo level how could a user directly create a polygon surface?
 Could it just detect that you only drew paths filled with white?

I don't know if a polygon pattern is interesting as a user-visible
concept in cairo.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] In pixman_image_create_bits() allow images larger than 2GB

2011-08-11 Thread Soeren Sandmann
Chris Wilson ch...@chris-wilson.co.uk writes:

 On Thu, 11 Aug 2011 06:43:03 -0400, Søren Sandmann sandm...@cs.au.dk wrote:
 From: Søren Sandmann Pedersen s...@redhat.com
 
 There is no reason for pixman_image_create_bits() to check that the
 image size fits in int32_t. The correct check is against size_t since
 that is what the argument to calloc() is.

 The only question I have is whether ssize_t width and height is
 appropriate, the valid image dimensions are still only int32_t right?

Yeah, you are right. It's only an internal function, so using ssize_t is
harmless, but there's no need to introduce this confusion. New version
below.


Thanks,
Soren



commit cb49f9427ead511b9bd60e771fc1b0d74d72d22a
Author: Søren Sandmann Pedersen s...@redhat.com
Date:   Thu Aug 11 06:30:43 2011 -0400

In pixman_image_create_bits() allow images larger than 2GB

There is no reason for pixman_image_create_bits() to check that the
image size fits in int32_t. The correct check is against size_t since
that is what the argument to calloc() is.

This patch fixes this by adding a new _pixman_multiply_overflows_size()
and using it in create_bits(). Also prepend an underscore to the names
of other similar functions since they are internal to pixman.

V2: Use int, not ssize_t for the arguments in create_bits() since
width/height are still limited to 32 bits, as pointed out by Chris
Wilson.

diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c
index 4e9ed14..f5b66dc 100644
--- a/pixman/pixman-bits-image.c
+++ b/pixman/pixman-bits-image.c
@@ -1490,10 +1490,10 @@ static uint32_t *
 create_bits (pixman_format_code_t format,
  int  width,
  int  height,
- int *rowstride_bytes)
+ int *   rowstride_bytes)
 {
 int stride;
-int buf_size;
+size_t buf_size;
 int bpp;
 
 /* what follows is a long-winded way, avoiding any possibility of integer
@@ -1502,11 +1502,11 @@ create_bits (pixman_format_code_t format,
  */
 
 bpp = PIXMAN_FORMAT_BPP (format);
-if (pixman_multiply_overflows_int (width, bpp))
+if (_pixman_multiply_overflows_int (width, bpp))
return NULL;
 
 stride = width * bpp;
-if (pixman_addition_overflows_int (stride, 0x1f))
+if (_pixman_addition_overflows_int (stride, 0x1f))
return NULL;
 
 stride += 0x1f;
@@ -1514,7 +1514,7 @@ create_bits (pixman_format_code_t format,
 
 stride *= sizeof (uint32_t);
 
-if (pixman_multiply_overflows_int (height, stride))
+if (_pixman_multiply_overflows_size (height, stride))
return NULL;
 
 buf_size = height * stride;
diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h
index 6a3935e..a25897d 100644
--- a/pixman/pixman-private.h
+++ b/pixman/pixman-private.h
@@ -691,10 +691,13 @@ void *
 pixman_malloc_abc (unsigned int a, unsigned int b, unsigned int c);
 
 pixman_bool_t
-pixman_multiply_overflows_int (unsigned int a, unsigned int b);
+_pixman_multiply_overflows_size (size_t a, size_t b);
 
 pixman_bool_t
-pixman_addition_overflows_int (unsigned int a, unsigned int b);
+_pixman_multiply_overflows_int (unsigned int a, unsigned int b);
+
+pixman_bool_t
+_pixman_addition_overflows_int (unsigned int a, unsigned int b);
 
 /* Compositing utilities */
 void
diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c
index cb4e621..49e3488 100644
--- a/pixman/pixman-utils.c
+++ b/pixman/pixman-utils.c
@@ -31,15 +31,19 @@
 #include pixman-private.h
 
 pixman_bool_t
-pixman_multiply_overflows_int (unsigned int a,
-   unsigned int b)
+_pixman_multiply_overflows_size (size_t a, size_t b)
+{
+return a = SIZE_MAX / b;
+}
+
+pixman_bool_t
+_pixman_multiply_overflows_int (unsigned int a, unsigned int b)
 {
 return a = INT32_MAX / b;
 }
 
 pixman_bool_t
-pixman_addition_overflows_int (unsigned int a,
-   unsigned int b)
+_pixman_addition_overflows_int (unsigned int a, unsigned int b)
 {
 return a  INT32_MAX - b;
 }
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] mmx: fix unaligned accesses

2011-07-27 Thread Soeren Sandmann
matts...@gmail.com writes:

 @@ -2569,20 +2585,31 @@ mmx_composite_in_8_8 (pixman_implementation_t *imp,
   src_line += src_stride;
   w = width;
  
 - if unsigned long)dest_image  3) == 0) 
 - (((unsigned long)src_image  3) == 0))
 + while (w  (unsigned long)dst  3)
   {
 - while (w = 4)
 - {
 - uint32_t *s = (uint32_t *)src;
 - uint32_t *d = (uint32_t *)dst;
 + uint8_t s, d;
 + uint16_t tmp;
 +
 + s = *src;
 + d = *dst;
  
 - *d = store (in (load (*s), load (*d)));
 + *dst = MUL_UN8 (s, d, tmp);
  
 - w -= 4;
 - dst += 4;
 - src += 4;
 - }
 + src++;
 + dst++;
 + w--;
 + }
 +
 + while (w = 4)
 + {
 + uint32_t *s = (uint32_t *)src;
 + uint32_t *d = (uint32_t *)dst;
 +
 + *d = store (in (load (*s), load (*d)));
 +
 + w -= 4;
 + dst += 4;
 + src += 4;
   }
  
   while (w--)

The previous code here is totally bogus. These two checks:

   if unsigned long)dst_image  3) == 0) 
   (((unsigned long)src_image  3) == 0))

check that the pointer to the *image struct* is aligned, which is
completely useless. There are some other instances of this problem
(in_n_8_8, add_n_8_8). This was not introduced by your patch, but we may
as well get them fixed anyway, not the least because those will also
SIGBUS on ARM.

Your patch changes the code to just check that the destination is
aligned. Is that actually sufficient? In this code:

   while (w = 4)
{
uint32_t *s = (uint32_t *)src;
uint32_t *d = (uint32_t *)dst;

*d = store (in (load (*s), load (*d)));

w -= 4;
dst += 4;
src += 4;
}

there doesn't seem to be any protection against (*s) being an unaligned
access. The same thing seems to apply to other changes in the patch.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/4] ARM: REPEAT_NORMAL support for standard fast paths

2011-07-12 Thread Soeren Sandmann
Taekyun Kim podai...@gmail.com writes:

 On 07/11/2011 09:18 PM, Soeren Sandmann wrote:
 This performance regression was introduced when the simple repeat code
 was removed. But I'm not sure hacking it into the ARM backend is the
 right plan. See this mail for a different approach:

  http://lists.freedesktop.org/archives/pixman/2010-December/000815.html

 I have a branch with a start on doing it that way here:

  http://cgit.freedesktop.org/~sandmann/pixman/log/?h=simple-repeat

 which may or may not be useful as a starting point. (I'd be interested
 in seeing what the benchmark results of that branch are).

 It seems to be the right place where we can put simple repeat codes.
 It can handle simple repeat for both sse2 and ARM at common place.

 I'm a bit worried that tiling does not give us good memory access patterns
 causing cache overhead. 1 x n source images would be as slow as 90 degree
 rotation. Memory buffer will be accessed in vertical order.

Yeah, that is a problem, and that was in fact one of the reasons the
original 'simple repeat' code was deleted. It's memory access pattern
for 1xn images was really bad.  It may be that adding this support to
the ARM backend, as you did, is the better way.

 May I take yours as a starting point and integrate with mine?

Of course.

 Below is benchmark results. (Core2 Duo E5200)
 I couldn't see any noticeable performance changes.

No, it looks like a slowdown ...


Soren

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/4] ARM: REPEAT_NORMAL support for standard fast paths

2011-07-11 Thread Soeren Sandmann
Taekyun Kim podai...@gmail.com writes:

 Current standard fast paths do not support REPEAT_NORMAL causing such
 paths being on top of profile result of some popular web sites. They
 usually use REPEAT_NORMAL to draw some tiled background or gradient.

This has come up before:

http://lists.freedesktop.org/archives/pixman/2010-November/000731.html

This performance regression was introduced when the simple repeat code
was removed. But I'm not sure hacking it into the ARM backend is the
right plan. See this mail for a different approach:

http://lists.freedesktop.org/archives/pixman/2010-December/000815.html

I have a branch with a start on doing it that way here:

http://cgit.freedesktop.org/~sandmann/pixman/log/?h=simple-repeat

which may or may not be useful as a starting point. (I'd be interested
in seeing what the benchmark results of that branch are).


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman release 0.23.2 now available

2011-07-04 Thread Soeren Sandmann
A new pixman release 0.23.2 is now available. This is the first
development release leading up to a stable 0.24 release.

- Improved support for tiled bilinear scaling on SSE2 and ARM [Taekyun Kim]

- Fix for a bug causing glyph corruption on ARM devices with a 16 bit
  frame buffer [Søren Sandmann]

Please report bugs to pixman at lists.freedesktop.org or file them at 

https://bugs.freedesktop.org/enter_bug.cgi?product=pixman

Søren


tar.gz:
http://cairographics.org/snapshots/pixman-0.23.2.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.2.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.2.tar.bz2

Hashes:
MD5:  775c5ecde7485bd69936dcc978f5aa58  pixman-0.23.2.tar.gz
MD5:  2e2805f5ca02edeb15a7862779670069  pixman-0.23.2.tar.bz2
SHA1: 5100c1e2d4566e507ead5c638d2a66bb165b2e63  pixman-0.23.2.tar.gz
SHA1: 62568ed6eb84c0312cc0e6f9c5f4bc12b8202492  pixman-0.23.2.tar.bz2

GPG signature:
http://cairographics.org/snapshots/pixman-0.23.2.tar.gz.sha1.asc
(signed by Soren Sandmann sandm...@daimi.au.dk)

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.23.2

Log:
Andrea Canciani (3):
  test: Fix compilation on win32
  Include noop in win32 builds
  Silence autoconf warnings

Dave Yeo (1):
  Check for working mmap()

Nis Martensen (1):
  Fix a few typos in pixman-combine.c.template

Søren Sandmann (5):
  mmx: Delete some unused variables
  sse2: Delete some unused variables
  demos: Comment out some unused variables
  ARM: Fix two bugs in neon_composite_over_n__0565_ca().
  test: Make fuzzer-find-diff.pl executable

Søren Sandmann Pedersen (12):
  Post-release version bump to 0.23.1
  Add a noop implementation.
  Add a noop composite function for the DST operator
  Move noop dest fetching to noop implementation
  Add a noop src iterator
  Move NULL iterator into pixman-noop.c
  Move NOP src iterator into noop implementation.
  Replace instances of dst_* with dest_*
  In pixman-general.c rename image_parameters to {src, mask, 
dest}_image
  Replace argumentxs to composite functions with a pointer to a 
struct
  blitters-test: Make common formats more likely to be tested.
  Pre-release version bump to 0.23.2

Taekyun Kim (6):
  Replace boolean arguments with flags for bilinear fast path 
template
  REPEAT_NORMAL support for bilinear fast path template
  sse2: Declare bilinear src__ REPEAT_NORMAL composite 
function
  ARM: Add REPEAT_NORMAL functions to bilinear BIND macros
  Enable REPEAT_NORMAL bilinear fast path entries
  Bilinear REPEAT_NORMAL source line extension for too short 
src_width
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PIXMAN][PATCH 0/6] REPEAT_NORMAL support for bilinear scaled fast paths

2011-06-17 Thread Soeren Sandmann
Taekyun Kim podai...@gmail.com writes:

 I've done account request steps.
 It's my great pleasure to contribute to this project.
 Let me know if my request is missing something or there is anything that I 
 have
 to know.

I have approved the request; the account should be created soon.

Patches should still be sent to the list for review, but if make check
passes, and no one has said anything after some reasonable period of
time, feel free to just push.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/3] Composite args in stack allocated struct

2011-06-12 Thread Soeren Sandmann
Søren Sandmann sandm...@cs.au.dk writes:

 I'm actually all for this change if it gets confirmed to work a bit
 better and faster (and I expect that it should, considering that all
 this data can be passed through some nested calls multiple
 times). Hopefully we are not running in circles.

 The FbComposeData struct was used in precisely one place to pass
 information between two functions. It could not possibly have provided
 any benefit as it existed in X server 1.3, which is the version that
 eventually became pixman.

 I think what might have happened is that full support for FbComposeData
 was introduced in one of the several forks that existed around 2005 and
 never merged into Xorg, except for that one place. 

 So I'm not too worried about running in circles, but I don't know if it
 is actually a performance advantage. I tend to think that if such
 microoptimizations really result in measurable performance advantage,
 then then the problem should probably be fixed in some other way.

 The following emails contain a rebased version of this branch against
 master.

Performance numbers:

Before:
[ # ]  backend test   min(s) median(s) stddev. count
[ # ]image: pixman 0.23.1
[  0]image   firefox-talos-gfx-20090702   36.974   36.987   0.03%  4/6

After:
[ # ]  backend test   min(s) median(s) stddev. count
[ # ]image: pixman 0.23.1
[  0]image   firefox-talos-gfx-20090702   37.036   37.072   0.04%  5/6

So it's about 0.2% slower on this benchmark, which is very glyph heavy
and therefore calls pixman_image_composite32() a lot. But the main point
to me is performance, but the ability to pass more information to the
compositing routines. It's also a reduction in line count.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms

2011-05-26 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 The problem for the analyze_extents() code is that if it chooses to
 provide the COVER flag, then some code might read common.filter and try
 to read outside the image to do bilinear filtering. But without COVER,
 the standard fast paths won't run.

 It seems to me that the cause of the problem is that the meaning of
 COVER depends on what the filter is: it means there is enough space
 around the sample area for the chosen filter, which then becomes wrong
 when doing filter reductions.

 The meaning of COVER flag does not become wrong after doing filter
 reduction, it's the common.filter variable which gets kind of wrong.
 This common.filter value means a filter value set by the user
 application via pixman_image_set_filter() call. When no filter
 reductions are done, it is also happens to be valid in any part of
 code.

Yes, we do in all cases need to preserve the state that the user
set. This is in fact one of the key properties of the flags: they
contain a 'summary' of the user set state, but they are not in
themselves a full description of the image. The full description is
stored elsewhere in the image struct.

 A subjective advantage is that it also gets rid of a somewhat
 illogical NO_CONVOLUTION_FILTER flag. This flag does not cause any
 problems itself, but just distracts attention when looking at the
 printout of fast path flags.

The flags with NO in their names are a bit confusing in that they lead
to double negatives, but they are needed, because fast paths need to say
I _can't_ deal with property X, so I'll set the FAST_PATH_NO_X
flag. This is a key part of how the flags work. As you wrote yourself:

The flags in pixman are implemented in such a way that they all
have some kind of positive meaning, indicating that the
operation is going to be more simple than the general case in one
way or another. Having more flags set on the initial analysis
stage is always good. Forgetting to set some of the flags is safe
in the sense that it will not make pixman misbehave, pixman will
just run slower because of not taking some fast paths.

Each fast path function has a mask with flags (provided separately
for source, mask and destination), with each bit set there telling
something like I can't handle case X (or with alternative
interpretation I can't handle cases other than Y). Fast path
functions with many flags set in their description are allowed to
cut more corners and run a lot faster because of this.

When asked to do something via pixman_image_composite() function
call, pixman first calculates the flags for the current operation
(again, separately for the source, mask and destination
images). Setting each bit is kind of like telling to the the rest
of the code: You are allowed not to care about case X.

  [ http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00569.html ]

But this breaks down if the compositing code starts *relying* on the
flags being set, for example to know what the filter is; it will no
longer be the case that Forgetting to set some of the flags is safe in
the sense that it will not make pixman misbehave, pixman will just run
slower because of not taking some fast paths.

This is an important rule to preserve, because it allows people to work
on the flag computing code without knowing the details of the
compositing code, and it allows people to write a new compositing
routine or implementation without knowing that the flags even exist.

When such general rules break down is when the code base becomes
incomprehensible and only understandable to people who are intimately
familiar with it.

 Regarding the potential risk of using common.filter accidentally in
a wrong way, I think that this variable can be just renamed and get an
unattractive discouraging name, maybe something like
common.filter_before_optimization. Also for additional safety and
debugging purposes, maybe even valgrind client requests could be used
to mark this variable as unaccessible at the wrong time in a special
debug configuration, but that could be an overkill.

Having to rename a basic property of images to prevent other code from
reading it, it is a symptom of a misdesign taking place. It adds a
strong coupling between the compositing code and the flag computing code
that wasn't there before.

 So maybe a solution would be to provide two separate flags:

        COVER_NEAREST:  there is enough space for a nearest filter
        COVER_BILINEAR: there is enough space for a bilinear filter

 These are unambiguous and because either there is enough space for a
 bilinear filter, or there isn't, they can be set independently of what
 the filter actually is.

 Sure, it's one of the multiple possible options. But I think that
 evaluating and potentially setting both of these flags may introduce a
 bit more overhead on the common code path.

Could be, but not much difference in 

Re: [Pixman] [PATCH] fb: Fix memcpy abuse

2011-05-24 Thread Soeren Sandmann
Keith Packard kei...@keithp.com writes:

 On Mon, 23 May 2011 14:34:55 +0200, Soeren Sandmann sandm...@cs.au.dk wrote:

 I think Keith said that the merge window for 1.11 closes May 29th
 though, so there may not be enough time.

 That's the current schedule, with the release happening on August
 19. Let me know if that works for pixman and we can look at merging a
 requirement for 0.23 before the merge window closes this Friday.

If a requirement is merged for pixman 0.23, we can make 0.24 happen
before August 19th.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [RFC] AVX codepaths

2011-05-16 Thread Soeren Sandmann
Matt Turner matts...@gmail.com writes:

 I took the SSE2 code paths and modified them slightly to generate AVX
 code. AVX doesn't have integer operations like SSE2, so the range of
 256-bit operations is mostly limited to copying pixels.

 Following this email are three patches:
   1) AVX fetcher for x8r8g8b8 (a8 and r5g6b5 require integer unpack 
 operations)
   2) AVX blt, composite_copy_area and associated fast paths
   3) AVX fill

 `make check` still passes all 18 tests.

 The only other bit of low-hanging fruit I see is
 composite_src_x888_, but perhaps there are more fast paths
 possible with AVX.

Of the low-hanging fruit, this is probably the main one. One other might
be to compile pixman-sse2.c with -mavx in order to take advantage of the
VEX prefix for the integer SSE2 instructions. I'm not sure it would be
worth the build system complications though.

Of the higher-hanging fruit that AVX could be used for, gradients is the
obvious candidate. Radial gradients in particular could become really
seriously faster. I think Andrea has some ideas about this. The
moonlight project has a copy of pixman with some gradient performance
improvements using SSE. See this git repository:

https://github.com/mono/moon

Maybe some of that code could be reused, but I think pixman master has
diverged a bit from when they forked.

It would also be useful to redo the wide path used for 10bpc formats to
make it use single precision floating point instead. There is a branch
here:

http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/floatpipe4

that has a start on this. Clearly, if this branch of something similar
gets cleaned up and landed, AVX would be useful to accelerate it.

Also, more generally, it would be interesting to move the test suite
from the current bit-exact standard towards a more close-enough
standard, so that floating point SIMD or GPUs or narrower integer
arithmetic could be used in cases where we currently rely on
precisely-8-bit integer arithmetic.

 I played around with cairo-perf-trace on my Sandy Bridge, but wasn't
able to see any improvements, which may be for any number of reasons,
including but not limited to a) I don't know how to use
cairo-perf-trace, b) AVX doesn't yield any noticeable improvements, c)
the code needs to be better optimized.

It's pretty likely that these operations are already close to the memory
bandwidth limit. There is a program in test/lowlevel-blt-bench that
attempts to measure performance of various operations relative to memory
bandwidth. Maybe it can produce some interesting results.

 Still TODO are AVX cpuid detection and to figure out about SunCC
 support.

Regarding SunCC, feel free to just support GCC initially. The various
types of compiler support really have to be maintained by people who use
those compilers regularly.

I'll try to take a look at the patches later this week, but I don't have
any Sandy Bridge hardware atm., so I can't actually run it.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH]Compile failure on OS/2

2011-05-09 Thread Soeren Sandmann
Dave Yeo dave.r@gmail.com writes:

 Hi, compile on trunk currently dies here,
 ...
 utils.c: In function 'fence_malloc':
 utils.c:257: warning: implicit declaration of function 'mmap'
 utils.c:257: error: 'MAP_PRIVATE' undeclared (first use in this function)
 ...
 As we have sys/mman.h but no working mmap().
 One possible fix is attached, perhaps the check for sys/mman.h should
 be removed?

Pushed, thanks.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [RFC] A couple of minor tweaks for sse2 blitting

2011-05-06 Thread Soeren Sandmann
Chris Wilson ch...@chris-wilson.co.uk writes:

 The generic implementation of the BLT routine is to deny the operation.
 So we need to add support to the sse2 implementation in order to perform
 8bpp. This looked relatively straightforward, so I'm obviously
 overlooking something.

I don't think you are overlooking anything - I added the blt() routine a
long time ago as a quick hack because fbBlt was showing up on RHEL4
profiles, and so pixman_blt() ended up as a not-very-well-though-out API
with a bare minimal implementation.

 Please review kindly. I think adding a fuzzer like blitters-test would
 be useful for validation of these changes. Presumably we would like to
 verify that we can blit between all identical formats from 1bpp to
 64bpp.

Indeed, such a test would be very useful.

 Should the generic implementation fallback to converting the BLT
 into an image composite operation?

Yes, it would make sense to do that and then never return FALSE. That
does complicate the situation a little if we add support for overlapping
blt at some point, though, but probably nothing that can't be handled.

These patches look basically right to me, but I'll follow up with a
couple of minor comments on each one.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/2] sse2: if width == stride, convert to linear image

2011-05-06 Thread Soeren Sandmann
Chris Wilson ch...@chris-wilson.co.uk writes:

 +if (width == stride) {
 + width *= height;
 + height = 1;
 +}

Do we need a check here that the mulitplication doesn't overflow?
pixman_multiply_overflows_int() can be used for that.

 +if (width == src_stride  width == dst_stride) {
 + width *= height;
 + height = 1;
 +}

Same here, and also braces go on their own line in pixman.

It would also be useful to have a brief comment in the commit log about
what a linear image is.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman stable release 0.22.0 now available

2011-05-02 Thread Soeren Sandmann
A new stable pixman release 0.22.0 is now available. Highlights
of this release:

- New r8g8b8a8 and r8g8b8x8 image formats [Alexandros Frantzis]
- Much faster image scaling on ARM and x86 [Siarhei Siamashka, Taekyun Kim]
- Faster 90/270 degree image rotation [Siarhei Siamashka]
- More comprehensive support for compositing triangles and
  trapezoids [Søren Sandmann Pedersen]

Plus a large number of other performance improvements, bug and
portability fixes, and improvements to the test suite.

Thanks to everybody who contributed to pixman 0.22.0, including
Alan Coopersmith, Alexandros Frantzis, Andrea Canciani, Cyril
Brulebois, Gilles Espinasse, Jon TURNEY, Rolland Dudemaine,
Siarhei Siamashka, Søren Sandmann Pedersen, and Taekyun Kim.


Søren


tar.gz:
http://cairographics.org/releases/pixman-0.22.0.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.22.0.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.22.0.tar.bz2

Hashes:
MD5:  cb8f3cb5ce2c8d7294f73ecb7021fda6  pixman-0.22.0.tar.gz
MD5:  307fe4d7dc83b1a558c362907097c0d0  pixman-0.22.0.tar.bz2
SHA1: da0a9c63fa315f163a32c7e40dd0b2a0f88c0d21  pixman-0.22.0.tar.gz
SHA1: d24ea233755d7dce9f0d93136ad99fba8d4e4fa0  pixman-0.22.0.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.22.0.tar.gz.sha1.asc
(signed by Søren Sandmann Pedersen sandm...@daimi.au.dk)

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.22.0

Log:
Alan Coopersmith (1):
  Sun's copyrights belong to Oracle now

Alexandros Frantzis (2):
  Add simple support for the r8g8b8a8 and r8g8b8x8 formats.
  Add support for the r8g8b8a8 and r8g8b8x8 formats to the tests.

Andrea Canciani (10):
  Remove unused stop_range field
  Fix opacity check
  Improve conical gradients opacity check
  Improve handling of tangent circles
  Add a test for radial gradients
  Fix compilation on Win32
  test: Fix tests for compilation on Windows
  test: Add Makefile for Win32
  Do not include unused headers
  test: Silence MSVC warnings

Cyril Brulebois (2):
  Fix argument quoting for AC_INIT.
  Fix linking issues when HAVE_FEENABLEEXCEPT is set.

Gilles Espinasse (2):
  Fix missing AC_MSG_RESULT value from Werror test
  Fix OpenMP not supported case

Jon TURNEY (1):
  Remove stray #include fenv.h

Rolland Dudemaine (4):
  test: Fix for mismatched 'fence_malloc' prototype/implementation
  Correct the initialization of 'max_vx'
  test: Use the right enum types instead of int to fix warnings
  Fix variable was set but never used warnings

Siarhei Siamashka (69):
  Fixed broken configure check for __thread support
  Do CPU features detection from 'constructor' function when 
compiled with 
  ARM: fix 'vld1.8'-'vld1.32' typo in add__ NEON fast path
  ARM: NEON: source image pixel fetcher can be overrided now
  ARM: nearest scaling support for NEON scanline compositing 
functions
  ARM: macro template in C code to simplify using scaled fast paths
  ARM: performance tuning of NEON nearest scaled pixel fetcher
  ARM: NEON optimization for scaled over__ with nearest 
filter
  ARM: NEON optimization for scaled over__0565 with nearest 
filter
  ARM: NEON optimization for scaled src__0565 with nearest 
filter
  ARM: NEON optimization for scaled src_0565_ with nearest 
filter
  ARM: optimization for scaled src_0565_0565 with nearest filter
  C fast path for a1 fill operation
  ARM: added 'neon_composite_over_n_8_8' fast path
  ARM: introduced 'fetch_mask_pixblock' macro to simplify code
  ARM: better NEON instructions scheduling for over_n_8_0565
  ARM: added 'neon_composite_over__n_0565' fast path
  ARM: reuse common NEON code for over_{n_8|_n|_8}_0565
  ARM: added 'neon_composite_over_0565_n_0565' fast path
  ARM: added 'neon_composite_add__8_' fast path
  ARM: better NEON instructions scheduling for add___
  ARM: added 'neon_composite_add_n_8_' fast path
  ARM: added 'neon_composite_add__n_' fast path
  ARM: added flags parameter to some asm fast path wrapper macros
  ARM: added 'neon_composite_in_n_8' fast path
  ARM: added 'neon_src_rpixbuf_' fast path
  Fix for potential unaligned memory accesses
  

Re: [Pixman] [PATCH 0/7] Faster pipelined ARM NEON bilinear scalers: 'src_8888_8888' and 'src_8888_0565'

2011-05-02 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 As far as I know, GdkPixbuf uses the equivalent of a bilinear filter
 with four bits of precision. Once you scale more than 16x, banding
 artefacts are clearly visible, but people don't normally complain about
 its quality, and 16x scaling with bilinear interpolation won't look
 great in any case, so if dropping precision to four bits really makes a
 big performance difference, maybe we should just change
 PIXMAN_FILTER_BILINEAR to do that.

 Reducing bilinear interpolation precision provides a lot more
 performance benefits for x86 (at least 7 bits would be better) and 4
 bits should be really the best. Anyway, who is going to decide whether
 the reduced precision is still ok?

Well, if no one has a problem with dropping the precision, then dropping
precision is OK, I guess.

 One can see a heavy use of bilinear scaling (over__8_ fast
 path, also with optional horizontal flipping) on the web pages like
 this, especially if configured to animate 1000 fishes:
 http://ie.microsoft.com/testdrive/performance/fishietank/
 I wonder how well pixman can actually perform there if it gets some
 good SSSE3 bilinear optimizations and starts utilizing all CPU cores
 (maybe trying OpenMP should be easy)?

For upscaling in particular, it may be interesting to look at a
different algorithm where two horizontally expanded souce lines are
cached and the destination scanlines are computed as a linear
interpolation between those two scanlines. This drops a little precision
because the two cache scanlines would be stored with 8 bits of color
depth and not 16 as we do now, but that probably would be completely
invisible.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path

2011-04-15 Thread Soeren Sandmann
Taekyun Kim podai...@gmail.com writes:

 I marked bubbles that I could find.
 Here we can make step 3 independent(or less dependent) from above step 6 and 7
 by proper allocation of registers.
 So we can insert some instructions of step 3 into the above bubble positions.
 Output of step 1(fetch dest) will be read in step 4 and output of step 2(fetch
 mask) will be read in step 3.
 So I think you can fetch mask first and then dest at the beginning of 
 tail_head
 block and remaining bubbles can be filled with instructions from step 3.

 Maybe this does not work, or there can be some other better ways to achieve
 optimal performance.

Thanks - these comments were helpful. There is a new patch below that
implements these suggestions. I can't find anymore stalls in the inner
loop. This version does produce some measurable speedup with data in L1
cache compared to the non-pipelined version. From typeically 85-95
Mpixels/s to 90-100 Mpixels/s.

The precision of these measurements still leave something to be desired,
but it's pretty clear that there is some amount of improvement here.


Soren


From 32cf3d76454c9984b3e79d92c691e46b8b0b12f7 Mon Sep 17 00:00:00 2001
From: =?utf-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com
Date: Sat, 2 Apr 2011 23:24:48 -0400
Subject: [PATCH] ARM: Add 'neon_composite_over_n__0565' fast path

This improves the performance of the firefox-talos-gfx benchmark with
the image16 backend. Benchmark on an 800 MHz ARM Cortex A8:

Before:

[ # ]  backend test   min(s) median(s) stddev. count
[  0]  image16firefox-talos-gfx  121.773  122.218   0.15%6/6

After:

[ # ]  backend test   min(s) median(s) stddev. count
[  0]  image16firefox-talos-gfx   85.247   85.563   0.22%6/6

V2: Slightly better instruction scheduling based on comments from Taekyun Kim.
V3: Eliminate all stalls from the inner loop. Also based on comments from 
Taekyun Kim.
---
 pixman/pixman-arm-neon-asm.S |  169 ++
 pixman/pixman-arm-neon.c |4 +
 2 files changed, 173 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S
index 5e9fda3..833f18c 100644
--- a/pixman/pixman-arm-neon-asm.S
+++ b/pixman/pixman-arm-neon-asm.S
@@ -1426,6 +1426,175 @@ generate_composite_function \
 
 
/**/
 
+.macro pixman_composite_over_n__0565_ca_process_pixblock_head
+/*
+ * 'combine_mask_ca' replacement
+ *
+ * input:  solid src (n) in {d8,  d9,  d10, d11}  [B, G, R, A]
+ * mask in  {d24, d25, d26}   [B, G, R]
+ * output: updated src in   {d0,  d1,  d2 }   [B, G, R]
+ * updated mask in  {d24, d25, d26}   [B, G, R]
+ */
+vmull.u8q0,  d24, d8
+vmull.u8q1,  d25, d9
+vmull.u8q6,  d26, d10
+vmull.u8q9,  d11, d25
+vmull.u8q12, d11, d24
+vmull.u8q13, d11, d26
+vrshr.u16   q8,  q0,  #8
+vrshr.u16   q10, q1,  #8
+vrshr.u16   q11, q6,  #8
+vraddhn.u16 d0,  q0,  q8
+vraddhn.u16 d1,  q1,  q10
+vraddhn.u16 d2,  q6,  q11
+vrshr.u16   q11, q12, #8
+vrshr.u16   q8,  q9,  #8
+vrshr.u16   q6,  q13, #8
+vraddhn.u16 d24, q12, q11
+vraddhn.u16 d25, q9,  q8
+/*
+ * convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format
+ * and put data into d16 - blue, d17 - green, d18 - red
+ */
+   vshrn.u16   d17, q2,  #3
+   vshrn.u16   d18, q2,  #8
+vraddhn.u16 d26, q13, q6
+   vsli.u16q2,  q2,  #5
+   vsri.u8 d18, d18, #5
+   vsri.u8 d17, d17, #6
+/*
+ * 'combine_over_ca' replacement
+ *
+ * output: updated dest in d16 - blue, d17 - green, d18 - red
+ */
+vmvn.8  q12, q12
+   vshrn.u16   d16, q2,  #2
+vmvn.8  d26, d26
+vmull.u8q6,  d16, d24
+vmull.u8q7,  d17, d25
+vmull.u8q11, d18, d26
+.endm
+
+.macro pixman_composite_over_n__0565_ca_process_pixblock_tail
+/* ... continue 'combine_over_ca' replacement */
+vrshr.u16   q10, q6,  #8
+vrshr.u16   q14, q7,  #8
+vrshr.u16   q15, q11, #8
+vraddhn.u16 d16, q10, q6
+vraddhn.u16 d17, q14, q7
+vraddhn.u16 d18, q15, q11
+vqadd.u8q8,  q0,  q8
+vqadd.u8d18, d2,  d18
+/*
+ * convert the results in d16, d17, d18 to r5g6b5 and store
+ * them into {d28, d29}
+ */
+vshll.u8q14, d18, #8
+vshll.u8q10, d17, #8
+vshll.u8q15, d16, #8
+vsri.u16q14, q10, #5
+vsri.u16q14, q15, #11
+.endm
+
+.macro pixman_composite_over_n__0565_ca_process_pixblock_tail_head
+fetch_mask_pixblock
+vrshr.u16   q10, q6, #8
+vrshr.u16   q14, q7, #8
+vld1.16 {d4, d5}, [DST_R, :128]!
+vrshr.u16   q15, q11, #8
+vraddhn.u16 d16, q10, q6
+vraddhn.u16 d17, q14, q7
+

Re: [Pixman] Unknown thread local support for this system

2011-04-13 Thread Soeren Sandmann
Galen Lee gulin20...@gmail.com writes:

 Hi,gues!
     I compiled pixman0.20.0 use the NDK.With the configure as follow:
  CC=agcc CPPFLAGS=-I/data/local/include LDFLAGS=-L/data/local/lib \
  CXX=agcc LD=arm-eabi-ld RANLIB=arm-eabi-ranlib \
  PKG_CONFIG_LIBDIR=/data/local/lib/pkgconfig:/data/local/share/pkgconfig/  \
  ./configure \
 --prefix=/data/local  \
 --host=arm-eabi-linux \
 --enable-shared \
    and get the follow errors:
 In file included from pixman-private.h:17,
  from pixman-access.c:35:
 pixman-compiler.h:202:6: error: #error Unknown thread local support for this
 system. Pixman will not work with multiple threads. Define PIXMAN_NO_TLS to
 acknowledge and accept this limitation and compile pixman without 
 thread-safety
 support.
     How can I do to solve this problem?

Well, the question is what type of thread local storage does Android
provide?  Google seems to suggest that it supports
pthread_setspecific(). If so, it should be detected at configure
time. If it isn't, you'll probably need to look at config.log to find
out what went wrong.

Alternatively, if Android supports some other type of thread local
storage, you'll need to add support for that. Or, if you don't call
pixman from more than one thread, you can simply define PIXMAN_NO_TLS.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] ARM: NEON scanline functions for bilinear scaling with A8 mask

2011-04-13 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 or are these macros being duplicated between
 pixman-arm-neon-asm.S and this new file? If so, maybe it would be better
 to put them in a header file and share them.

 The whole problem with sharing these macros is that they already did
 change a bit, and they are going to change even more in order to
 improve performance. Taekyun Kim took the initial implementation of
 NEON bilinear code and extended it to support more compositing
 operations such as OVER and ADD, thus branching the older code. And
 because we don't have a stable foundation yet, it seems to be not very
 practical updating his code right now to use newer revisions of these
 common macros because these macros are still going to be updated
 again. I suggested taking his code as-is for pixman-0.22.0 in order to
 give the rest of the world at least some more or less complete set of
 ARM NEON bilinear fast paths sooner.

Okay, that makes sense I guess.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path

2011-04-12 Thread Soeren Sandmann
Hi,

 I'm also a bit new to NEON, however I hope this brief review can be
 helpful to you.  So let's start.

Thanks for the comments. I'm including a new version shortly that
incorporates some of the changes, but unfortunately, I couldn't reliably
measure much of an improvement. On my system, the L1 value from
lowlevel-blt-bench seems to fluctuate between 85 and 95 pretty much no
matter what I do.

 +    /*
 +     * 'combine_over_ca' replacement
 +     *
 +     * output: updated dest in d16 - blue, d17 - green, d18 - red
 +     */
 +    vmvn.8      q12, q12
 +    vmull.u8    q6,  d16, d24
 +    vmull.u8    q7,  d17, d25
 +    vmvn.8      d26, d26
 +    vmull.u8    q11, d18, d26
 +.endm
  
 There are some pipeline stalls, destination registers of vmvn will be 
 available
 at N3.
 I think we can reorder above code like this.
  
 vmvn.8      q12, q12
 vshrn.u16   d16, q2,  #2 // last line of previous block
 vmvn.8      d26, d26
 vmull.u8    q6,  d16, d24
 vmull.u8    q7,  d17, d25
 vmull.u8    q11, d18, d26

The new code incorpates this change, and I *think* the L1 value was
somewhat higher on most runs, but to be honest this might just be
confirmation bias on my part ...

 +    vshll.u8    q14, d18, #8
 +    vshll.u8    q10, d17, #8
 +    vshll.u8    q15, d16, #8
 +    vsri.u16    q14, q10, #5
 +    vsri.u16    q14, q15, #11

  
 It seems there are some what more stalls here.
 See this page. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/
 ch16s06s04.html
 It seems impossible to eliminate all stalls by reordering only in this code
 block.
 Maybe reordering together with some above code lines would work for this.

 vqadd.u8  d18, d2,  d18
 vshll.u8    q10, d17, #8
 vshll.u8    q14, d18, #8
 vqadd.u8  q8,  q0,  q8
 vsri.u16    q14, q10, #5
 vshll.u8    q15, d16, #8
 vsri.u16    q14, q15, #11

That particular version won't work because the first vshll instruction
uses d17 which is affected by the vqadd.u8 q8, q0, q8
instruction. However, it is possible to write it like this:

 vshll.u8    q14, d18, #8
  vqadd.u8  q8,  q0,  q8
 vshll.u8    q10, d17, #8
 vsri.u16    q14, q10, #5
 vshll.u8    q15, d16, #8
 vsri.u16    q14, q15, #11  

but then various other stalls appear. In my testing, this seemed to be a
slight slowdown (with the same caveat as above).

I think the main lesson is that we want to be serious about this level
of microoptimization, need more a more robust way of measuring
performance than lowlevel-blt-bench provides. In particular more
iteratons and tracking of standard deviations would be useful.

Also, I noticed that lowlevel-blt-bench copies the destination back into
the source in various places:

memcpy (src, dst, BUFSIZE);
memcpy (dst, src, BUFSIZE);

This could affect performance if dst ends up containing mostly zeros. In
the cases where the source is supposed to be solid, it could mean
operations like over_n_8_() will turn into noops completely. (This
happened to me in this case due to a bug in the over_n__0565_ca()
code).

I think the order of those memcpy() calls should at a minimum be
reversed, but I'm not completely clear on why they are there in the
first place. To put the sources and destinations into cache perhaps?



Soren



From 3d2c60bace9d5a49794cc5d0198ff6d3d8fa335d Mon Sep 17 00:00:00 2001
From: =?utf-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com
Date: Sat, 2 Apr 2011 23:24:48 -0400
Subject: [PATCH] ARM: Add 'neon_composite_over_n__0565' fast path

This improves the performance of the firefox-talos-gfx benchmark with
the image16 backend. Benchmark on an 800 MHz ARM Cortex A8:

Before:

[ # ]  backend test   min(s) median(s) stddev. count
[  0]  image16firefox-talos-gfx  121.773  122.218   0.15%6/6

After:

[ # ]  backend test   min(s) median(s) stddev. count
[  0]  image16firefox-talos-gfx   85.247   85.563   0.22%6/6

V2: Slightly better instruction scheduling based on comments from
Taekyun Kim.
---
 pixman/pixman-arm-neon-asm.S |  122 ++
 pixman/pixman-arm-neon.c |4 ++
 2 files changed, 126 insertions(+), 0 deletions(-)

diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S
index 5e9fda3..1d9232e 100644
--- a/pixman/pixman-arm-neon-asm.S
+++ b/pixman/pixman-arm-neon-asm.S
@@ -1426,6 +1426,128 @@ generate_composite_function \
 
 
/**/
 
+.macro pixman_composite_over_n__0565_ca_process_pixblock_head
+/*
+ * 'combine_mask_ca' replacement
+ *
+ * input:  solid src (n) in {d8,  d9,  d10, d11}  [B, G, R, A]
+ * mask in  {d24, d25, d26}   [B, G, R]
+ * output: updated src in   {d0,  d1,  d2 }   [B, G, R]
+ * updated mask in  {d24, d25, d26}   [B, G, R]
+ */
+vmull.u8  

Re: [Pixman] [PATCH] Adding TLS for Windows XP

2011-04-06 Thread Soeren Sandmann
Tristan Schmelcher tschmelc...@google.com writes:

 What would you think about removing EXPORT_SYMBOL, prepending
 underscores, and providing an example DllMain that uses the shutdown
 functions for developers that are compiling Pixman as a standalone
 DLL?

I guess if we have a fix that adds a DllMain and calls the init/shutdown
functions from it, it wouldn't be too bad to provide a preprocessor
symbol to allow DllMain to be left out.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/2] ARM: Tiny improvement in over_n_8888_8888_ca_process_pixblock_head

2011-04-06 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 On a second look, other than that, the whole function is actually fine
 on Cortex-A8 and does not need any other performance tweaks because it
 did not have many bubbles to fill in the first place. You can push
 this patch with the additional vmvn.8 d26, d26 instruction move (I
 don't think a separate patch is needed for that).

Alright, pushed to master. I'll follow up on Taekyun Kim's and your
comments on the other patch later this week.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Adding TLS for Windows XP

2011-04-05 Thread Soeren Sandmann
Hi,

 We don't want to add another API either.  It would be nice to make the library
 multi-thread safe in Windows XP and no memory leaks at the same time.  One
 question regarding your suggestions of freeing the memory in DLL unload.
  TlsFree will still need to know all the TLS indices 
 (http://msdn.microsoft.com
 /en-us/library/ms686804(v=vs.85).aspx) and Pixman core will have that
 information.  So we might still need a new API.

Isn't it possible to do something similar to what is described here:

http://msdn.microsoft.com/en-us/library/ms686997%28v=vs.85%29   ?

As I understand it, we only need one TLS index, which can be stored in a
global variable in a new pixman-win32.c file. That index is then used
with TlsSetValue and TlsGetValue to set and get the thread local data,
and can be released with TlsFree() when DllMain() is acalled with
DLL_PROCESS_DETACH.

Note, I don't really know the Windows API, so I might be
misunderstanding something here.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] releases and snapshots

2011-04-02 Thread Soeren Sandmann
Maarten Bosmans mkbosm...@gmail.com writes:

 2011/4/2 Soeren Sandmann sandm...@cs.au.dk:
 Maarten Bosmans mkbosm...@gmail.com writes:

 At the releases page http://cairographics.org/releases/ both the
 latest stable (0.20.2) and unstable (0.21.6) versions are listed. This
 is confusing for people that don't know our versioning scheme
 (although it is no unusual scheme). I already know of one person who
 downloaded 0.21.x thinking it was the latest stable, as also the
 LATEST-pixman link links to the unstable series.

 I have been meaning to do this, actually, but have been procrastinating,
 mainly because doing it would also involve changing the pixman Makefile
 to do the right thing on make release ...

 The relevant snippet from cairo/build/Makefile.am/releasing is:

 RELEASE_OR_SNAPSHOT = $$(if test x$(CAIRO_VERSION_MINOR) = x$$(echo
 $(CAIRO_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo
 snapshot; fi)
 RELEASE_UPLOAD_HOST =   cairographics.org
 RELEASE_UPLOAD_BASE = /srv/cairo.freedesktop.org/www
 RELEASE_UPLOAD_DIR =  $(RELEASE_UPLOAD_BASE)/$(RELEASE_OR_SNAPSHOT)s
 RELEASE_URL_BASE =  http://cairographics.org/$(RELEASE_OR_SNAPSHOT)s

 Perhaps it's even better to copy that file entirely into the pixman
 tree and tweak it for pixman, as some of the functionality is
 currently already in pixman's root Makefile.am.

 Maarten

I have moved the files on www.cairographics.org. Below is a patch that
changes the Makefile.am to upload to the correct directory. I'd
appreciate if someone could take a look to make sure that I didn't
overlook something in the patch.


Soren


From 2e1134c7f53f6e7bcfcb95001d86916533861513 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com
Date: Sat, 2 Apr 2011 14:12:12 -0400
Subject: [PATCH] Makefile.am: Put development releases in snapshots directory

Up until now, all pixman release, both snapshots and releases were
uploaded to the releases directory on www.cairographics.org, but
it's better to development snapshots in the snapshots directory.

This patch changes Makefile.am to do that.
---
 Makefile.am |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/Makefile.am b/Makefile.am
index f479a66..658a375 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -12,10 +12,10 @@ snapshot:
 
 GPGKEY=6FF7C1A8
 USERNAME=$$USER
-RELEASE_OR_SNAPSHOT = $$(if test x$(CAIRO_VERSION_MINOR) = x$$(echo 
$(CAIRO_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo snapshot; fi)
+RELEASE_OR_SNAPSHOT = $$(if test x$(PIXMAN_VERSION_MINOR) = x$$(echo 
$(PIXMAN_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo snapshot; 
fi)
 RELEASE_CAIRO_HOST =   $(USERNAME)@cairographics.org
-RELEASE_CAIRO_DIR =/srv/cairo.freedesktop.org/www/releases
-RELEASE_CAIRO_URL =http://cairographics.org/releases
+RELEASE_CAIRO_DIR =/srv/cairo.freedesktop.org/www/$(RELEASE_OR_SNAPSHOT)s
+RELEASE_CAIRO_URL =http://cairographics.org/$(RELEASE_OR_SNAPSHOT)s
 RELEASE_XORG_URL = http://xorg.freedesktop.org/archive/individual/lib
 RELEASE_XORG_HOST =$(USERNAME)@xorg.freedesktop.org
 RELEASE_XORG_DIR = /srv/xorg.freedesktop.org/archive/individual/lib
@@ -83,7 +83,6 @@ release-tag:
git tag -u $(GPGKEY) -m $(PACKAGE) $(VERSION) release 
$(PACKAGE)-$(VERSION)
 
 release-upload: release-check $(tar_gz) $(tar_bz2) $(sha1_tgz) $(sha1_tbz2) 
$(md5_tgz) $(gpg_file)
-   mkdir -p releases
scp $(tar_gz) $(sha1_tgz) $(gpg_file) 
$(RELEASE_CAIRO_HOST):$(RELEASE_CAIRO_DIR)
scp $(tar_gz) $(tar_bz2) $(RELEASE_XORG_HOST):$(RELEASE_XORG_DIR)
ssh $(RELEASE_CAIRO_HOST) rm -f 
$(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-[0-9]*  ln -s $(tar_gz) 
$(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-$(VERSION)
-- 
1.7.4

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Fixing the runtime error in VS debug build under Windows.

2011-03-30 Thread Soeren Sandmann
Hi,

 We have met a run time crash when using a debug VS build under Windows due to
 potential of using an uninitialized variable.  By looking at the code, max_vx
 could also be an uninitialized value.  The fix is to initial that variable to
 0.  The version we are using is 0.21.4.

Thanks for the patch, but this particular issue is already fixed in git
master and also in pixman 0.21.6 I believe.

I'm curious how this caused crashes? As far as I can tell, the variable
is not actually used uninitialized - it was only a compiler warning.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Adding TLS for Windows XP

2011-03-30 Thread Soeren Sandmann
Hi,

 It turned out that Windows XP does't support __declspec(thread).  In order to
 make a multi-thread safe app using pixman, we tried to add TLS for Windows XP.
 We have added using TLS primitives and tried to avoid memory leak during
 cleaning up.

I'm of two minds regarding this patch. On the one hand, it's definitely
good to get these TLS issues under control on Windows, so thanks for
looking at that.

On the other hand, I really don't want to export new API that an
application has to call simply for correct operation.

If you use pixman through cairo, for example, it is currently completely
safe to ignore pixman's existence. This is an important guarantee to
give because it allows us to update the pixman ABI at a later point
without affecting cairo applications.

Or suppose some other library has spawned a thread that uses pixman for
compositing. In general, the application has no way to know when that
thread exited so it can't know when to call these new functions.

Since you mention that __declspec(thread) doesn't work on XP, I assume
you are using pixman in a dynamically linked DLL. If so, a way to solve
your issue without adding new API might be to add a DllMain() function
and then clean up the thread local storage when it is called with
THREAD_DETACH.

Another possibility would be to avoid TLS altogether and instead just
use global variables protected by locks. Lock striping could be used to
prevent (CPU-)cache thrashing.

Regarding which variation of TLS works on which versions of Windows,
there are several separate variables:

- Whether the binary is statically or dynamically linked

- Which compiler is used (MSVC vs. MinGW32 4.5 vs. MinGW32 4.4)

- Which operating system is used (XP/2003 vs. later)

- Which variation of TLS (__thread, __declspec(thread), DllMain() + TlsAlloc*)

- Whether the operating system is 32 or 64 bit

It would be really useful to have a table of these combinations showing
what works and what doesn't. That will make it much simpler to decide
which combinations can actually be reasonably supported without
requiring new public API.

See also 

https://bugs.freedesktop.org/show_bug.cgi?id=34842

Regardless of all that, though, this patch clearly needs cleaning up
according to the CODING_STYLE document.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Simplify iterator initializer signatures

2011-03-12 Thread Soeren Sandmann
Søren Sandmann sandm...@cs.au.dk writes:

 Here are three patches that simplify the type signature of the
 iterator initializers. The signature before:

 void (*pixman_iter_init_func_t) (pixman_implementation_t *imp,
  pixman_iter_t   *iter,
  pixman_image_t  *image,
  int  x,
  int  y,
  int  width,
  int  height,
  uint8_t *buffer,
  iter_flags_t flags);

 void (*pixman_iter_init_func_t) (pixman_implementation_t *imp,
  pixman_iter_t   *iter);
  
 They achieve this by passing all the information in the iterator
 itself rather than as arguments.

 This means the role of _pixman_implementation_{src,dest}_iter_init()
 changes. Before they were wrappers around the implementation
 functions; now they are convenience functions that store the various
 parameters in the iterator and then call into the implementation.

Sorry, this mail got sent prematurely. I meant to also include a
diffstat:

 pixman-bits-image.c   |   20 +---
 pixman-conical-gradient.c |7 +
 pixman-general.c  |   56 --
 pixman-implementation.c   |   46 ++---
 pixman-linear-gradient.c  |   16 -
 pixman-private.h  |   51 +
 pixman-radial-gradient.c  |7 +
 pixman-solid-fill.c   |   17 +
 pixman-sse2.c |   27 ++
 9 files changed, 84 insertions(+), 163 deletions(-)


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] Planar YUV support

2011-03-04 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 On Wed, Mar 2, 2011 at 4:16 PM, Soeren Sandmann sandm...@cs.au.dk wrote:
 Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 I'm not a big fan of let's make this totally universal and future
 proof approach if only a very small fraction of this functionality is
 going to be actually used. Moreover, I suspect that trying to be too
 general was responsible for slowing down the original Planar YUV
 support plan.

 Part of what was derailing that plan may have been my insisting on being
 precise about how these formats fit into the image processing pipeline,
 including how it related to gamma correction and other colorspace
 issues. I still think this is important. However, we can probably leave
 out some specific features, if there is a credible story about how to do
 them later.

 The pipeline as it is now:

    1 convert image sample to a8r8g8b8
    2 extend sample grid in all directions according to repeat
    3 interpolate between sample according to filter
    4 transform
    5 resample
    6 combine
    7 store

 What is the difference between 3 interpolate between sample according
 to filter and 5 resample?

The output of stage 3 is an image that is defined on all of the real
plane. There are no pixels any more, so there is no question about what
stage 4, transform, means. Stage 5 converts back to pixels by point
sampling.

Of course, pixman is implemented starting from the destination, so
thefunction of the real plane from stage 3 is never actually
computed. Because of the sampling, we only need the value of that
function at discrete locations.

(Better gradient and image quality would involve improving stage 5 to do
something better than point sampling).

 To add support for potentially subsampled YUV, some additional stages
 have to be inserted before the first:

   -2 interpolate subsampled components of YUV to get the same
      resolution as the Y plane

   -1 if the format is planar, stitch together components to form YUV
      pixels

    0 convert to sRGB

 Stage -2 is important because the filter used in that interpolation
 should probably be user-specifiable eventually, which has the
 implication that whatever simple support is added first, it needs to be
 clear what filter precisely is being used.

 Stage 0 is a color space conversion and need to eventually be
 configurable too, which means it has to be specified which matrix is
 being used.

 I'm not totally sure about stage -2. So source interpolation is going
 to happen twice in the pipeline, once when getting rid of subsampling,
 and the second time when applying transform? To me it looks more
 natural to just start with the transform, for each pixel in the
 destination space get a transformed fractional coordinate in the
 source image, look it up in the source YUV grid and interpolate each
 color component for each color plane separately according to the
 selected filter, and finally convert interpolated YUV color to
 a8r8g8b8 according to the used color matrix.

Yes, I thought those two stages were the same for a long time, to. In
fact, in this mail:

http://lists.freedesktop.org/archives/cairo/2010-May/019876.html

I argue that they are the same, and that the pipeline should look like
this:

  * Widen to 8 bit components
  * Extend 
  * Interpolate between samples according to filter
  * Transform
  * Convert to RGB coding
  * Resample
  * Combine
  * Store

which also what you are saying. However, as mentioned here:

http://lists.cairographics.org/archives/cairo/2010-June/020232.html

I have become convinced that this is wrong and the undoing the chroma
subsampling is something different than interpolation happening on sRGB
pixels, although it may be that they can be combined in some special
cases. In that mail I say that the pipeline should (eventually) look
like this:

extend
widen
  
chroma reconstruct  = upsample filter

convert to ar'g'b'  = \
linearize   = | color space convert
premultiply = /

interpolate = interpolation filter
transform
resample= resampling filter, plus sampling rate
combine

convert to dest format

store

The basic argument is that the interpolation filtering should be done in
premultiplied RGB (linear, ideally, but sRGB for now), whereas the
chroma reconstruction obviously has to be done on chroma
channels. Therefore, these two operations have to be considered
distinct.

Ie., since a YUV image is generated by this process:

 1. Start with sRGB
 2. Convert to YUV
 3. Filter and subsample chroma

if we are to do some operation on the image in sRGB, reversing this
process as the first step can't possibly be incorrect:

 1. Reconstruct chroma samples
 2. Convert to sRGB
 3. Interpolate sRGB samples
 4. Transform

And so if the alternative process:

 1. Interpolate YUV samples
 2

Re: [Pixman] [cairo] Planar YUV support

2011-03-02 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 I'm not a big fan of let's make this totally universal and future
 proof approach if only a very small fraction of this functionality is
 going to be actually used. Moreover, I suspect that trying to be too
 general was responsible for slowing down the original Planar YUV
 support plan.

Part of what was derailing that plan may have been my insisting on being
precise about how these formats fit into the image processing pipeline,
including how it related to gamma correction and other colorspace
issues. I still think this is important. However, we can probably leave
out some specific features, if there is a credible story about how to do
them later.

The pipeline as it is now:

1 convert image sample to a8r8g8b8
2 extend sample grid in all directions according to repeat
3 interpolate between sample according to filter
4 transform 
5 resample
6 combine
7 store

To add support for potentially subsampled YUV, some additional stages
have to be inserted before the first:

   -2 interpolate subsampled components of YUV to get the same
  resolution as the Y plane

   -1 if the format is planar, stitch together components to form YUV
  pixels

0 convert to sRGB

Stage -2 is important because the filter used in that interpolation
should probably be user-specifiable eventually, which has the
implication that whatever simple support is added first, it needs to be
clear what filter precisely is being used.

Stage 0 is a color space conversion and need to eventually be
configurable too, which means it has to be specified which matrix is
being used.

There is also the question of how to *write* to a subsampled YUV
image. I don't particularly like read-only image formats, but writing to
YUV is not simple when you consider clip rectangles, because subsampling
involves a filter that probably extends outside the clip boxes.

What Andrea is getting at, is presumably how to specify image formats in
the API. A fully general API like he suggests is perhaps interesting at
some point, but I agree it shouldn't be prerequisite for getting some
YUV support in.

 Surely an alternative solution for html5 video fallbacks is to do
 YUV-RGB conversion of video frames outside of cairo/pixman, and only
 let cairo/pixman handle the RGB data. The only real problem with this
 solution is that such multi pass data processing is somewhat slower.
 We are speaking about at least tens of percents of performance here.
 And pixman support for native single pass YUV-RGB conversion combined
 with scaling could help.

YUV conversion is definitely within the scope of pixman, both YUV-RGB
and RGB-YUV. However, I'm not sure the existing YUV formats are very
well though out.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path

2011-02-26 Thread Soeren Sandmann
Hi,

Most of this series looks good to me, except that this:

  static force_inline uint32_t
 +bilinear_interpolation (uint32_t tl, uint32_t tr,
 + uint32_t bl, uint32_t br,
 + int distx, int wt, int wb)
 +{

is a duplication of the code from pixman-bits-image, except that disty
has been split into wt and wb so that it can deal with the top and
bottom edges of the image.

The bits_image_fetch_bilinear_no_repeat_() function handles the same
problem by simply passing 0 for tl/tr or bl/br. Could it be done the
same way here? Alternatively, if there is a good reason for the split,
presumably that same reason applies to the code in pixman-bits-image.c.

The pixman-fast-path.h file also contains a duplicated version of
repeat where the only difference is that the order of the arguments is
reversed and one uses MOD(), the other while() to do repeating. This
duplication was not introduced by you, but it would be useful to share
that code as well.

Maybe the file pixman-fast-path.h should be renamed to pixman-common.h
or pixman-inlines.h since it would no longer be specific to
pixman-fast-path.c


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 5/7] C variant of bilinear scaled 'src_8888_n_8888' fast path

2011-02-26 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 From: Siarhei Siamashka siarhei.siamas...@nokia.com

 Serves no real practical purpose other than testing solid mask support
 in bilinear scaling main loop template.

If these functions don't serve any practical purpose, is there a reason
to have them? The .text segment of pixman is already getting large. If
these functions end up in the middle of code that is actually used, they
would have to be paged in at application startup time, so they do have a
potential performance and (slight) memory cost.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/7] Main loop template for fast single pass bilinear scaling

2011-02-26 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 +/*
 + * Identify 5 zones in each scanline for bilinear scaling. Depending on
 + * whether 2 pixels to be interpolated are fetched from the image itself,
 + * from the padding area around it or from both image and padding area.
 + */
 +static force_inline void
 +bilinear_pad_repeat_get_scanline_bounds (int32_t source_image_width,
 +  pixman_fixed_t  vx,
 +  pixman_fixed_t  unit_x,
 +  int32_t *   left_pad,
 +  int32_t *   left_tz,
 +  int32_t *   width,
 +  int32_t *   right_tz,
 +  int32_t *   right_pad)

What does tz stand for in these names? Presumably, the z is for zone,
but what about the t? 


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/3] Pixman MIPS DSPASE1

2011-02-24 Thread Soeren Sandmann
Hi,

Thanks for picking up the MIPS work. There are some comments from last
time from Siarhei and myself that I don't think have been addressed. See
these mails:

http://lists.freedesktop.org/archives/pixman/2010-December/000773.html
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html

- In Siarhei's testing, the new over_n_8_() on MIPS32r2 was slower
  than the C fast path. From
  http://lists.freedesktop.org/archives/pixman/2010-December/000773.html :

  One of the reasons for such a slowdown in gnome-system-monitor test is
   that it uses 'over_n_8_' operation with the mask where 96.5% of
   values are zero.  And your MIPS32R2 optimized code does not handle
   these special cases, always taking the slowest path [1].

  Ie., the way to make over_n_8_() fast is to skip compositing
  whenever the mask is 0x00 or 0xff.

  The same is likely also worthwhile even in the SIMD versions since
  memory access is so expensive.

From
http://lists.freedesktop.org/archives/pixman/2010-September/000496.html :

- The patch should be split such that one commit adds the MIPS32r2 part
  and one adds the DSPASE part

- Coding style:
  - Please use /* */ comments
  - Indents are four spaces
  - Put a space before parentheses
  - Don't leave in commented-out code like this:
 //  b = _pixman_implementation_fill(imp-delegate,
 bits, stride, bpp, x, y, width, height, xor);

And finally, while the lowlevel-blt benchmarks are convenient to use,
they are also synthetic, it is also important to test the performance
with real-world workloads such as those found in the cairo perf traces.


Thanks,
Soren



Veli-Matti Valtonen veli-matti.valto...@movial.com writes:

 I started working on this optimizing for MIPS32R2 code originally (Based on 
 the patch by Beloev), but the performance increases seem to be relatively 
 similar to what over_n_8_ shows. The dspase is much more promising in 
 this regard. It rather leaves me wondering if the mips32r2 should not be 
 included.

 It might however be related to the test system, which has a MIPS 74K core. 
 The original I assume was worked on with a MIPS 24K.

 I used pixman-arm-common.h for the assembler binding macros, which is the 
 reason for the 'ARM' found in the glue.

 Compiling the code will result in the gcc producing Warnings about macro 
 expansion, it'd be nice not to have these, but fixing them would have a 
 (slight) negative effect readability.

 PATCH 1 is the original patch by Georgi Beloev, but modified to apply against 
 pixman head.

 Implemented:
 Scanline add, out reverse, over
 fast path:
 over_n_8_
 add__
 add_n_

 Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, 
 MIPS16), Little Endian

 All the test program builds used CFLAGS=-O2 -mdsp -mips32r2

 reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills)

 Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1
 over_n_8_ =  L1:   6.16  L2:   5.34  M:  5.35 ( 19.24%)  HT:  4.78  VT:  
 4.62  R:  4.55  RT:  2.99 (  28Kops/s)
 add__ =  L1:  18.11  L2:  10.15  M:  9.98 ( 45.33%)  HT: 14.80  VT: 
 13.36  R: 13.41  RT:  6.17 (  46Kops/s)
 add_n_ =  L1:  14.26  L2:  10.30  M: 10.38 ( 23.59%)  HT:  8.05  VT:  
 7.64  R:  7.63  RT:  4.05 (  33Kops/s)

 MIPS32R2: --disable-mips-dspase1
 over_n_8_ =  L1:   6.17  L2:   5.62  M:  5.56 ( 20.33%)  HT:  5.00  VT:  
 4.83  R:  4.76  RT:  3.33 (  30Kops/s)

 MIPS DSPASE:
 over_n_8_ =  L1:   9.76  L2:   7.89  M:  7.93 ( 27.11%)  HT:  7.04  VT:  
 6.84  R:  6.63  RT:  4.06 (  34Kops/s)
 add__ =  L1: 117.36  L2:  20.67  M: 23.22 (105.50%)  HT: 17.40  VT: 
 15.96  R: 13.81  RT:  6.48 (  47Kops/s)
 add_n_ =  L1: 145.84  L2:  28.23  M: 31.11 ( 70.66%)  HT: 22.95  VT: 
 18.54  R: 19.99  RT:  8.93 (  50Kops/s)

 Scanline ops benchmarked using low-level-blit:

 I selected these ops by adding a printf to the scanline ops, and finding one 
 that triggers it, if there is a more convenient way to benchmark these ops, I 
 failed to find it.

 Optimizations disabled:
 add_8_8_8 =  L1:   3.31  L2:   5.25  M:  5.16 ( 11.73%)  HT:  3.61  VT:  3.60 
  R:  3.53  RT:  1.77 (  18Kops/s)
 add__1555 =  L1:   6.51  L2:   5.32  M:  5.34 ( 18.20%)  HT:  4.05  VT:  
 3.96  R:  3.94  RT:  2.21 (  22Kops/s)
 outrev_n_8_ =  L1:   6.33  L2:   5.25  M:  5.16 ( 17.60%)  HT:  4.11  VT: 
  4.02  R:  3.97  RT:  2.23 (  22Kops/s)
 over__n_0565 =  L1:   2.83  L2:   3.33  M:  3.21 ( 11.54%)  HT:  2.73  
 VT:  2.69  R:  2.68  RT:  1.67 (  17Kops/s)
 over_n_ =  L1:   7.45  L2:   6.65  M:  6.66 ( 15.14%)  HT:  5.65  VT:  
 5.43  R:  5.43  RT:  3.35 (  30Kops/s)

 MIPS DSPASE:
 add_8_8_8 =  L1:   8.81  L2:   7.67  M:  7.53 ( 17.11%)  HT:  4.62  VT:  4.68 
  R:  4.50  RT:  1.97 (  19Kops/s)
 add__1555 =  L1:   9.07  L2:   7.27  M:  7.29 ( 24.87%)  HT:  5.09  VT:  
 4.95  R:  4.93  RT:  2.50 (  23Kops/s)
 outrev_n_8_ =  L1:   8.48  L2:   6.82  M:  6.88 ( 

Re: [Pixman] Very Large PNGs

2011-02-24 Thread Soeren Sandmann
Frederik Ramm frede...@remote.org writes:

 Dear all,

I posted this to the cairo list but it occurred to me that this
 list is the appropriate place.

 I tried to convert an SVG file to a very large PNG with rsvg-convert.
 If the bitmap requested is larger than about 23k x 23k pixels, it is
 returned an invalid surface (where if you query the state it says out
 of memory), happily ignores that and continues to render which then
 results in no output.

 I then delved into the cairo library and found that it uses pixman to
 create the image; and pixman refuses to create an image that takes
 more than INT32_MAX raw bytes. With one pixel using 4 bytes, that
 means that an image with 23.000 x 23.000 pixels is still ok, but if it
 becomes any larger, that was it.

What pixman is trying to avoid is integer overflows. If you pass
an overflowed number to malloc(), you'll get buffer overruns which could
become security issues

Given that size_t on a 32 bit machine is 32 bits, the limit would be 4
GB in any case, but just to be on the safe side, pixman limits it to 2GB
so that the number will fit in a signed integer.

I think it should be safe to use size_t in create_bits() along with new
functions pixman_multiply/add_overflows_size() if you want to fix this.

However, not that pixman is still limited to 16.16 bits of coordinate
space in various places, so you wouldn't gain that much. This 16 bit
limitation is something that we would like to get rid of, however. 

Out of curiosity, exactly how large are the images you are trying to create?

 Which was a bit sad for me since the machine I was running this on
 could easily have created an image ten times as big, if only
 pixman/cairo had supported that.

 I briefly toyed with adding 64-bit support to the whole bundle but it
 became quickly apparent that this is more than an afternoon's job. I
 also tried to squeeze at least a little more out of the combo by using
 24-bit images instead of 32-bit (no difference since 24-bit images are
 internally stored with 32 bit), and even the deprecated 16-bit data
 type (doesn't work since you cannot create a surface for that).

In cairo 1.10, you actually can create a 16 bit image, and it is stored
internally in a 16 bit data type.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Win32 fixes and improvements

2011-02-24 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 I improved the patchset a little (I fixed the library makefile for real
 and silenced the test warnings) and pushed it as a git branch here:
 http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/win32

I don't really know anything about Windows build systems, but regarding
the snprintf() thing: Can't we just use sprintf()? We know the maximum
sizes of everything involved, and in the end it's just a test program so
if there's a buffer overflow, it's not the end of the world.

The other changes in the branch look okay to me.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] Compilation error undefined reference to `___tls_get_addr'

2011-02-23 Thread Soeren Sandmann
Victor Engmark victor.engm...@gmail.com writes:

 On Tue, Feb 22, 2011 at 12:29 PM, Victor Engmark
 victor.engm...@gmail.com wrote:
 Hi guys,

 I'm trying to create a pixman 0.20.2 package on an old platform, but I
 get the following output during the compile phase:

  CCLD   a1-trap-test
 ../pixman/.libs/libpixman-1.so: undefined reference to
 `___tls_get_addr'

Version 0.20.2 should detect automatically if linking fails with
thread local support. If you can find out why this clause:

AC_MSG_CHECKING(for __thread)
AC_LINK_IFELSE([
#if defined(__MINGW32__)  !(__GNUC__  4 || (__GNUC__ == 4  
__GNUC_MINOR__ = 5))
#error This MinGW version has broken __thread support
#endif


from configure.ac finishes successfully, that would be very helpful.

As mentioned, you can compile with -DPIXMAN_NO_TLS, but that means
pixman can't be used from multiple threads.

If __thread simply doesn't work on your platform, we need some kind of
hack in configure.ac to detect it and fall back to pthreads.



Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/3] Some clean-ups of the test directory

2011-02-22 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 On Thursday 10 February 2011 20:22:38 Søren Sandmann wrote:
  The following patches add a new directory demos and move all the
  GTK+ based test programs there. This allows the Makefiles in both test
  and demos to become much simpler with less redundancy.
  
  I'm not particularly happy about the demos name since the GTK+ tests
  aren't really demos, but I can't think of anything better. Suggestions
  are appreciated.
 
 It's a bit late comment, but eventually adding some real demo(s) which
 would display some nice looking animation and scare the users with huge
 FPS might be a good idea :)
 
Yes, I think various types of real demos would be a good idea, both
showing off performance and features. 

There is this branch:

http://cgit.freedesktop.org/~sandmann/pixman/log/?h=parrot

which makes the composite-test a little more interesting to look at,
and shows better how the compositing operators work. Screenshot:

http://www.daimi.au.dk/~sandmann/composite-test.png

 Well, that is if pixman actually needs any kind of such marketing
 stuff.

I have always thought that pixman should eventually be used in more
places than just the X server and cairo. For example, some features
that have been proposed, such as a floating point pipeline, a JIT
compiler and a shader language API, would let pixman serve the same
role as Core Image does on Mac OS X. And to get people to use it,
demos and other types of marketing would be useful, including getting
a website and a logo.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] pixman: New ARM NEON optimizations

2011-02-22 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 Regarding the (b) part, probably as a side effect of current implementation,
 right now it is possible to do some operations with images having
 non-premultiplied alpha:
 
 src_img = pixman_image_create_bits (
 PIXMAN_x8b8g8r8, width, height, src, stride);
 msk_img = pixman_image_create_bits (
 PIXMAN_a8b8g8r8, width, height, src, stride);
 dst_img = pixman_image_create_bits (
 PIXMAN_a8r8g8b8, width, height, dst, stride);
 
 pixman_image_composite (PIXMAN_OP_SRC, src_img, msk_img, dst_img,
 0, 0, 0, 0, 0, 0, width, height);
 
 We only need to wrap the same a8r8g8b8 buffer into x8r8g8b8
 and a8r8g8b8 pixman image, and use the latter as a mask for
 pixman_image_composite() calls. Any operations which don't
 need mask themselves can use this trick. By also specifying
 negative stride, this is useful for example when dealing with
 the data returned by glReadPixels().

Yeah, this is useful, and it wouldn't directly be possible to do if
the equation were changed to (s OP d) LERP_m d. However, a pretty
simple way to fix it would be to just add an unpremultiplied
format. Benjamin's video patches had this I believe.

Using such a format as a destination can be slow because it requires
divisions, but Joonas has a number of optimized implementations here:

http://cgit.freedesktop.org/~joonas/unpremultiply/tree/

 So I find it convenient that we are also allowed to work with
 masks which are basically interpreted as having a8x24 format. 

Right, having an a8x24 format would be another way to solve the
problem.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Image scaling with bilinear interpolation performance

2011-02-22 Thread Soeren Sandmann
김태균 podai...@gmail.com writes:

 original code : r = a*t0 + b*t1 + c*t2 + d*t3 (in 24 bits precision)
 optimized code : r' = a*(t0  8) + b*(t1  8) + c*(t2  8) + d*(t3  8)
 (in 16 bits precision)
 where t0 + t1 + t2 + t3 = 0x1
 
 Now we split t into two terms u, v where u is upper 8 bits of t and v is
 lower 8 bits of t. (note that t0 = u0*256 + v0, t0  8 = u0)
 
 So,
 
 r' = a*u0 + b*u1 + c*u2 + d*u3
 
 r = a*(u0*256 + v0) + b*(u1*256 + v1) + c*(u2*256 + v2) + d*(u3*256 + v3)
   = 256*(a*u0 + b*u1 + c*u2 + d*u3) + a*v0 + b*v1 + c*v2 + d*v3
   = 256*r' + a*v0 + b*v1 + c*v2 + d*v3
 
 Error would be
 e = (r - (r'  8))  16 = (r - 256*r')  16 = (a*v0 + b*v1 + c*v2 + d*v3)
  16
 
 Each value a, b, c and d can be 0xff at most, So
 
 max(e) = (0xff*(v0 + v1 + v2 + v3))  16 = (0xff*max(v0 + v1 + v2 + v3)) 
 16
 
 max(v0 + v1 + v2 + v3) = 0x300 (because lower 8 bits of t0 + t1 + t2 + t3
 should be 0x00)
 
 So max(e) = (0xff*0x300)  16 = 2
 
 But this does not satisfy rule 5 as you mentioned

Thanks for doing this analysis. A difference of just 2 would be fine
in my opinion, and as you mention the original code was an
approximation as well.

It would be possible to satisfy rule 5 using a kind of error
diffusion, as as demonstrated by this program:

static void
compute_weights (uint8_t distx, uint8_t disty)
{
uint32_t distxy, distxiy, distixy, distixiy;
int e, t;

distxy = distx * disty;
distxiy = (distx  8) - distxy;
distixy = (disty  8) - distxy;
distixiy = 256 * 256 - (disty  8) - (distx  8) + distxy;

t = distxy + 0x80;
e = (t  0xff00) - distxy;
distxy = t  8;

distxiy -= e;
t = distxiy + 0x80;
e = (t  0xff00) - distxiy;
distxiy = t  8;

distixy -= e;
t = distixy + 0x80;
e = (t  0xff00) - distixy;
distixy = t  8;

distixiy -= e;
t = distixiy + 0x80;
e = (t  0xff00) - distixiy;
distixiy = t  8;

assert (distxy + distxiy + distixy + distixiy == 256);
}

int
main ()
{
int i, j;

for (i = 0; i  256; ++i)
{
for (j = 0; j  256; ++j)
compute_weights (i, j);
}
}

although that does do a bit more arithmetic than your code.

  Now regarding accuracy. I have added some comments above regarding the
  potential solid color issue, but this should be relatively easy to
 address. I'm
  also a bit worried about one more thing (in the original pixman code too,
 but
  let's cover this too while we are discussing accuracy in general).
 Wouldn't it
  be a good idea to do shift with rounding for the final value instead of
  dropping the fractional part? And the 'distx'/'disty' variables are also
  obtained by right shifting 'ux' by 8 and dropping fractional part, maybe
  rounding would be more appropriate. Not doing rounding might cause slight
 image
  drift to the left (and top) on repeated rescaling, and also slight
 reduction of
  average brightness.
 
 I agree with that rounding is more appropriate.
 I think supplying distx and disty as properly rounded 4 bits values to
 interpolation function is the best choice we have.
 
 Analysis on error is some what complicated in this case.
 Error may be bigger than previous code, at least 15 (I've done some brute
 force jobs)

Rounding to four bits is going to be a quite visible drop in quality
though, especially if you zoom more than 16x. With four bits of
precision, there will be only 16 different colors in the gradients
generated by the filter, which will show up as banding.

But maybe it's good enough - 16x scaling is not going to look great
with bilinear filtering no matter what.

  I have only one concern about testing. Supposedly when we get both C and
 SSE2
  implementations, it would be much easier for testing if they produce
 identical
  results. Otherwise tests need to be improved to somehow be able to take
 slight
  differences into account.
 
 I think the requirement of producing same results for both C  SIMD(maybe
 sse2, NEON, mmx) is relatively easy.
 But SIMD can produce much better result with less time spent, which can be
 horribly slow with general C implementation.
 I think it is much desirable to keep both C and SIMD code optimized in spite
 of producing slightly different results.

Having the C and SIMD code produce different results is not a problem
in itself, but as Siarhei says, but we would need to make sure the
test suite reflects that decision.

If we decide to move away from bit-exact testing, we would need to
decide on an acceptable deviation from ideal, and then update the
tests to verify that both the C and SIMD implementations are within
that deviation.

For example there could be a reference implementation that computes
the bilinear filtering in double precision floating point, adds +/-
x%, then converts that range into the target 

Re: [Pixman] [PATCH] Add forgotten _mm_empty() calls in the SSE2 fetchers.

2011-02-18 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 I just wonder whether you have investigated why these missing _mm_empty() 
 calls
 are not actually causing pixman tests failure? Is it because fetch operation 
 is
 typically followed by combiner which has _mm_empty()? But SRC combiner should
 not be using MMX/SSE2. Or no MMX instructions are actually used by these
 fetchers (only SSE2)?

It's because these fetchers don't use any MMX instructions, so the
patch is actually pointless. I'll drop it.

It may even be worth considering just dropping all MMX instructions
from the SSE2 implementation. That would avoid all the emms related
problems. The places where we use MMX instructions could just as well
use SSE2 - we never take advantage of the fact that it is possible use
both SSE and MMX registers at the same time.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Image scaling with bilinear interpolation performance

2011-02-17 Thread Soeren Sandmann
김태균 podai...@gmail.com writes:

 I'm using cairo and pixman to composite images.
 I tested image scaling performance with bilinear filtering.
 But it's a little bit slower than I expected.

Bilinear filtering is indeed one of the remaining places where pixman
could use some performance improvement, so thanks for looking into it.

 What I understand about image scaling procedure is as follows
 
 1. Allocate(or acquire statically allocated buffer) horizontal source
 scanline buffer to store interpolated pixels.
 2. Fetch a scanline from source image to source scanline buffer with
 bilinear interpolation.
 3. Composite source scanline and destination scanline with operator SRC.
 4. Go to next scanline.
 5. Free(or release) source scanline buffer.

Yes, this is how it works currently, though note that there have been
some changes since 0.21.2: we now use iterators to access the
images. The overall algorithm is still the same though.

 My optimization point is
 
 1. It seems possible to avoid fetching by directly combining bilinear
 interpolated result with destination.
 2. Bilinear interpolation can be optimized using double-blend trick with a
 little bit of precision loss.
 
 Optimization 1 can reduce one memory write and one memory read operation.
 I think it will not affect the performance significantly.
 However, it can be helpful on some machines with limited cache.

Indeed, it would likely be helpful to do this, even on machines with
where the temporary scanline fits in cache. There is some discussion
in this thread

http://lists.freedesktop.org/archives/pixman/2011-January/000934.html

about one way to implement this optimization.

 Optimization 2 is more critical for image scaling with bilinear filtering.
 I wrote some codes for bilinear_interpolation@pixman-bits-image.c
 
 // Optimization 2
 static force_inline uint32_t bilinear_interpolation (uint32_t tl, uint32_t
 tr, uint32_t bl, uint32_t br, int distx, int disty)
 {
 int distxy, distxiy, distixy, distixiy;
 uint32_t rb, ga;
 
 distxy = distx * disty;
 distxiy = (disty  8) - distxy;
 distixy = (distx  8) - distxy;
 distixiy = 256*256 - (disty  8) - (distx  8) + distxy;
 
 distxy = distxy  8;
 distxiy = distxiy  8;
 distixy = distixy  8;
 distixiy = distixiy  8;
 
 /* Red and Blue */
 rb = (0x00FF00FF  tl)*distixiy + (0x00FF00FF  tr)*distixy +
 (0x00FF00FF  bl)*distxiy + (0x00FF00FF  br)*distxy;
 rb = (rb  8)  0x00FF00FF;
 
 /* Green and Alpha */
 ga = (0x00FF00FF  (tl  8))*distixiy + (0x00FF00FF  (tr  8))*distixy 
 +
 (0x00FF00FF  (bl  8))*distxiy + (0x00FF00FF  (br  8))*distxy;
 ga = ga  0xFF00FF00;
 
 return rb | ga;
 }
 
 Optimization 2 increased the performance by more than twice.
 But it does not always produce the same result as the original code, which
 is also an approximation of bilinear interpolation.
 So it can cause pixel correctness issues on some test cases.

 I've done some numerical analysis and it appears that at most you would have
 a difference of 1 for each RGBA value as the original code.
 
 Let me know if you need for information / explanation.

I'd like to see the analysis that show that the difference is at most
1. If the difference really is that small, then your code above looks
like a big performance win. (Although it looks to me like tr and bl
may have been swapped in the code).



Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Avoid marking images dirty when properties are reset

2011-02-16 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

  @@ -502,7 +502,7 @@ pixman_image_set_transform (pixman_image_t * ═ ═ ═ ═ 
  ═image,
  ═ ═ if (common-transform == transform)
  ═ ═ ═ ═return TRUE;
 
  - ═ ═if (memcmp (id, transform, sizeof (pixman_transform_t)) == 0)
  + ═ ═if (!transform || memcmp (id, transform, sizeof (pixman_transform_t)) 
  == 0)
  ═ ═ {
  ═ ═ ═ ═free (common-transform);
  ═ ═ ═ ═common-transform = NULL;
 
 This change looks like a bugfix unrelated to the commit message.
 Overall, the patch looks correct.

Yeah, or I guess you could call it a new feature; either way, I'll
split it into a separate commit before pushing.

Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [PATCH] Avoid marking images dirty when properties are reset

2011-02-15 Thread Soeren Sandmann
When an image property is set to the same value that it already is,
there is no reason to mark the image dirty and incur a recomputation
of the flags.
---
 pixman/pixman-image.c |   20 +++-
 1 files changed, 19 insertions(+), 1 deletions(-)

diff --git a/pixman/pixman-image.c b/pixman/pixman-image.c
index e91d87c..9103ca6 100644
--- a/pixman/pixman-image.c
+++ b/pixman/pixman-image.c
@@ -502,7 +502,7 @@ pixman_image_set_transform (pixman_image_t *  image,
 if (common-transform == transform)
return TRUE;
 
-if (memcmp (id, transform, sizeof (pixman_transform_t)) == 0)
+if (!transform || memcmp (id, transform, sizeof (pixman_transform_t)) == 
0)
 {
free (common-transform);
common-transform = NULL;
@@ -511,6 +511,12 @@ pixman_image_set_transform (pixman_image_t *  
image,
goto out;
 }
 
+if (common-transform 
+   memcmp (common-transform, transform, sizeof (pixman_transform_t) == 0))
+{
+   return TRUE;
+}
+
 if (common-transform == NULL)
common-transform = malloc (sizeof (pixman_transform_t));
 
@@ -535,6 +541,9 @@ PIXMAN_EXPORT void
 pixman_image_set_repeat (pixman_image_t *image,
  pixman_repeat_t repeat)
 {
+if (image-common.repeat == repeat)
+   return;
+
 image-common.repeat = repeat;
 
 image_property_changed (image);
@@ -579,6 +588,9 @@ PIXMAN_EXPORT void
 pixman_image_set_source_clipping (pixman_image_t *image,
   pixman_bool_t   clip_sources)
 {
+if (image-common.clip_sources == clip_sources)
+   return;
+
 image-common.clip_sources = clip_sources;
 
 image_property_changed (image);
@@ -594,6 +606,9 @@ pixman_image_set_indexed (pixman_image_t *image,
 {
 bits_image_t *bits = (bits_image_t *)image;
 
+if (bits-indexed == indexed)
+   return;
+
 bits-indexed = indexed;
 
 image_property_changed (image);
@@ -656,6 +671,9 @@ PIXMAN_EXPORT void
 pixman_image_set_component_alpha   (pixman_image_t *image,
 pixman_bool_t   component_alpha)
 {
+if (image-common.component_alpha == component_alpha)
+   return;
+
 image-common.component_alpha = component_alpha;
 
 image_property_changed (image);
-- 
1.6.0.6

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] pixman: New ARM NEON optimizations

2011-02-12 Thread Soeren Sandmann
Bill Spitzak spit...@gmail.com writes:

 Soeren Sandmann wrote:
 
  (1) The equation would be (src OP dst) LERP_mask dst
  and not (src IN mask) OP dst
  (2) The RGB channels of Alpha-only images would be considered to be
  the same as the alpha channel, and not 0 as they are now. For
  example, a 0xb9 pixel in an a8 image would be considered
  equivalent to 0xb9b9b9b9 and not to 0xb900. That is, they
  would be considered a translucent white rather than a translucent
  black.
 
 Why not just try making these changes directly to Render? This is
 pretty much what Cairo did and nobody complained. Panicking about
 back-compatibility when it is likely that nobody relies on the current
 behavior is silly.
 
 The only software I know of using Render for anything other than OVER
 is Cairo, and it must be avoiding all the cases where changing this
 would make a difference because it does not match it's rules anyway.

Well, Qt exposes these operators to applications, and there is at
least one place where it depends on the existing semantics for SRC
(unless I'm misreading the code).

There are also drivers, including the binary NVIDIA driver, who have
implemented the existing semantics already.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 4/8] Better support for NONE repeat in nearest scaling main loop template

2011-02-10 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 From: Siarhei Siamashka siarhei.siamas...@nokia.com
 
 Scaling function now gets an extra boolean argument, which is set
 to TRUE when we are fetching padding pixels for NONE repeat. This
 allows to make a decision whether to interpret alpha as 0xFF or 0x00
 for such pixels when working with formats which don't have alpha
 channel (for example x8r8g8b8 and r5g6b5).

Another way to do this would be to clip away zeros up front for source
images. For example, a new function pointer could be added to the
image struct:

clip_zeros (pixman_image_t *image, pixman_region_t *region)

that would clip away those parts of region where the image is known to
be zero. Then, in pixman.c, this function could be called for both
source and mask if the operator was one where zeros had no effect.

That would benefit all operations and avoid cluttering these
macros. It would also cause the SAMPLES_COVER_CLIP to be set in more
cases.

I'm fine with this patch as is too though.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 3/8] Support for a8 and solid mask in nearest scaling main loop template

2011-02-10 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 +#define SIMPLE_NEAREST_A8MASK_FAST_PATH_NORMAL(op,s,d,func)  \
 +#define SIMPLE_NEAREST_SOLIDMASK_FAST_PATH_NORMAL(op,s,d,func) 

Is there any reason to not use A8_MASK and SOLID_MASK instead?
Those names read better to me.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] pixman-conical-gradient.c is missing from GNU make + MSVC build

2011-02-10 Thread Soeren Sandmann
Kirill Tishin si...@bk.ru writes:

 pixman-conical-gradient.c is missing from GNU make + MSVC build
 (pixman/Makefile.win32) (in release 0.21.4).
 Which leads to unresolved symbol _pixman_conical_gradient_iter_init
 when compiling cairo later on.

Thanks, I have added it in master. 

The win32 Makefile is unfortunately not really maintained by anyone -
if you or anyone else is interested in maintaining a build system for
MSVC, that would welcome.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/5] Add SSE2 fetcher for x8r8g8b8

2011-02-02 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 Are we going to have this new code performing linear search in this array
 once per each compositing operation now? It may be bad for performance unless
 some kind of caching (at the time of pixman_image_t creation?) can be
 added later.

Well for now, there are only three fetchers, so a cache seems like
overkill, but yes, some sort of caching could be added. It could be
done similarly to the fast path cache, where the cache is thread
private and indexed by (format, iter_flags, image_flags) or something
like that. Then it wouldn't have to happen at image creation time and
there would be no issues with locking the images.

 Also is there a plan to provide SSE2 optimized store functions in addition
 to fetchers later?

I don't have any plans personally, but if someone wants to write some,
that would certainly be useful.

 Anyway, with such changes coming, I see more and more reasons to avoid general
 compositing path in pixman as much as possible :)

I'm not sure I follow? Certainly, if the general path is getting
faster, that would mean there are fewer reasons to avoid it?


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Move fallback decisions from implementations into pixman-cpu.c.

2011-01-26 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 By the way, fallbacks may need to be tweaked a bit to be sure that
 really the fastest code is selected. For example, after the introduction
 of faster fetchers [1], now the performance of 'over__'
 operation (translucent case) with the nearest scaling of the source
 image from 2000x2000 to approximately same size looks like this:
 
 C fetcher + C combiner: ~76.71 MPix/s
 C fast path:~106.47 MPix/s
 C fetcher + SSE2 combiner:  ~182.65 MPix/s
 SSE2 fast path: ~270.57 MPix/s
 
 The important part is that C fetcher + SSE2 combiner is now faster
 than full C fast path, which performs everything in a single pass, but
 without SSE2. So when SSE2 combiners are available, it makes sense to
 deactivate nearest scaling C fast paths for OVER operator. Not everything
 is so simple though, because this C fast path has special processing for
 fully transparent or opaque pixels, and may in some cases be actually
 better. But it is clearly slower if the performance of the worst case is
 important.
 
 Something similar may apply to PPC Altivec optimizations. Because there
 are only Altivec combiners but not full Altivec fast paths, the existing
 C fast paths will be executed for some compositing operations, preventing
 the use of Altivec combiners.

Indeed, this is increasingly a problem. Another variation is if there
is a SIMD fast path that can deal with affine transformations, and a
general fast path specialized for scaling transformations. Which one
do you pick for a scaling operation? It isn't obvious.

Or, suppose there is an MMX fast path for over__, but also a
noop source iterator for a8r8g8b8. Then the general path would
degenerate to just calling sse2_combine_over_u() in a loop with no
fetching, which is likely faster than mmx_composite_over__();

I don't have a good answer, but here are some random ideas:

- Classify compositing routines in terms of performance. Ie., try and
  statically associate each compositing routine and fetcher with a
  number, then pick the one that should be fastest.

- If the fast path cache is empty for the operation in question, try
  all possibilities and store the best one in the cache. This will
  probably be unpleasant and complicated.

- Tweak the fallbacks so that things work in practice. This is ugly
  and fragile.

- JIT compiler. This is a good solution, but a lot of work.

It isn't a huge problem yet, but it will only get worse.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] warnings in pixman 0.21.4

2011-01-25 Thread Soeren Sandmann
Rolland Dudemaine roll...@ghs.com writes:

 I'm not sure what a clever compiled would do with a value that
 doesn't belong to the enum is casted to the enum, but I would expect
 it to complain that the value is not defined in the enum, resulting in
 a different but nonetheless replacement warning.

Such a compiler would be wrong, in my opinion. In C (though not C++),
it is perfectly legal to mix enums and ints interchangably. It *might*
be reasonable for a compiler to warn if you do so without casting, but
if it warns *with* the cast, it's just broken.

 I'm not sure what the issue is with adding those dummy ones to the
 enum. If your concern is that you cannot count the number of elements
 in the enum, then maybe one entry should be added to define a _MAX
 value.

The issue is that pixman.h is a public header containing API that we
are committed to maintain for the forseeable future. The PIXMAN_solid
etc. values are internal API that shouldn't be used by users of
pixman.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/3] ARM: fixes and workarounds for using pixman with QEMU

2011-01-12 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 Patch 3/3 looks like an hack... is it possible to avoid parsing auxv
 if its content is not correct/meaningful?

 Otherwise I'd rather attempt a clean and correct implementation of
 cpu features detection using SIGILL.

In the end, how to do CPU detection on ARM is Siarhei's call, but IMO,
the auxv parsing hack seems much more likely to be robust than trying
to guard against all possible signal issues. 

An important advantage to the auxv parsing is that we wouldn't depend
on what other signal handling the rest of the process might have set
up.

It doesn't even matter whether POSIX says it's possible in theory,
because if we are relying on a close reading of some standard, then
it's pretty likely that we or someone else read it wrong or
implemented it wrong.

For example, I still don't know exactly what was causing the SIGILL
loop mentioned here:

http://lists.freedesktop.org/archives/pixman/2010-December/000789.html

The auxv parsing may be a hack, but it's also something that would
only be used in the qemu case, so few people would be affected even if
there were a bug in it.

 I'm going through the documentation you linked in
 http://lists.freedesktop.org/archives/pixman/2010-December/000786.html
 and I think that the portability issue can be fixed.
 
 Would SIGILL be considered a viable alternative if it was done in a
 thread-safe way? How much portability do we want to provide?
 In particular, should we only assume C90 or can we assume that
 sigsetjmp/siglongjmp are available (they are part of POSIX.1-1988).

The answer to that question depends on how many platforms are lacking
the support, how many users and developers those platforms have, and
how much work it is to support an alternative, so it's difficult to
give a general answer.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/18] Add iterators to images

2011-01-12 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

   15 files changed, 743 insertions(+), 470 deletions(-)
 
 On the other hand, I'm not so happy about the implementation. The number of
 lines of code as well as the complexity has increased, and the performance
 seems to have dropped a little bit in some of the cases. 

The performance numbers you posted seem very close to noise to me; in
some cases the new code is faster. Considering that this was measured
with the CPU specific fast paths turned off, most users won't actually
see even this small slowdown.

 I really want to see some practical benefits which depend on these
 added iterators in the near future.

It isn't clear to me what specifically you are suggesting.

One practical benefit of the patch set is that it fixes the bug where
destination properties aren't ignored as they should be. This bug
could probably be fixed some other way, but it doesn't seem to be
possible with separating source- and destination accessors in some
way. This separation is where most of the additional new lines are
coming from.

In fact, as far as I can tell, the iterators make the code _less_
complex. Compare the new loop in pixman-general.c:

for (i = 0; i  height; ++i)
{
uint32_t *s, *m, *d;

m = mask_iter.get_scanline (mask_iter, NULL);
s = src_iter.get_scanline (src_iter, m);
d = dest_iter.get_scanline (dest_iter, NULL);

compose (imp-toplevel, op, d, s, m, width);

dest_iter.write_back (dest_iter);
}

with the old one:

for (i = 0; i  height; ++i)
{
/* fill first half of scanline with source */
if (fetch_src)
{
if (fetch_mask)
{
/* fetch mask before source so that fetching of
   source can be optimized */
fetch_mask (mask, mask_x, mask_y + i,
width, (void *)mask_buffer, 0);

if (mask_class == SOURCE_IMAGE_CLASS_HORIZONTAL)
fetch_mask = NULL;
}

if (src_class == SOURCE_IMAGE_CLASS_HORIZONTAL)
{
fetch_src (src, src_x, src_y + i,
   width, (void *)src_buffer, 0);
fetch_src = NULL;
}
else
{
fetch_src (src, src_x, src_y + i,
   width, (void *)src_buffer, (void
*)mask_buffer);
}
}
else if (fetch_mask)
{
fetch_mask (mask, mask_x, mask_y + i,
width, (void *)mask_buffer, 0);
}

if (store)
{
/* fill dest into second half of scanline */
if (fetch_dest)
{
fetch_dest (dest, dest_x, dest_y + i,
width, (void *)dest_buffer, 0);
}

/* blend */
compose (imp-toplevel, op,
 (void *)dest_buffer,
 (void *)src_buffer,
 (void *)mask_buffer,
 width);

/* write back */
store ((dest-bits), dest_x, dest_y + i, width,
   (void *)dest_buffer);
}
else
{
/* blend */
compose (imp-toplevel, op,
 bits + (dest_y + i) * stride + dest_x,
 (void *)src_buffer, (void *)mask_buffer, width);
}
}



Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/18] Add iterators to images

2011-01-08 Thread Soeren Sandmann
 Iterators gets initialized with a buffer which can always contain at
 least width pixels. This is sufficient for current image types,
 but future image types might need additional space.
 
 Would it be possible to have the general_composite_rect ask to the
 iterator about how much memory it will need (and provide a buffer
 with at least this much memory)?

If future images need something like this, the iterators can be
extended to support it. It's always difficult to predict what
infrastructure might be needed in the future.

 Or would it be better to have the iterator allocate that memory internally
 upon initialization?
 In this case, we should probably have the initializer return success/error
 and also have some kind of iter_fini()

Right, that's another possibility, although if memory allocation
fails, the image might also simply install a noop() fetcher. If we get
out-of-memory errors during rendering, we generally don't guarantee
the results, except that it obeys they clipping rules. This is
something pixman has inherited from the X server.

 Apart from this, I skimmed through the patches and noticed that
 sometimes a change is fixed or improved by a following patch
 (example: patch 15/18 is immediately generalized by patch 16/18). Is
 this intentional?

Yeah, generally, I'm trying to avoid patches being too
clairvoyant. In principle, new infrastructure shouldn't be
introduced until something else makes it necessary. It isn't always
possible or convenient to do it that way, though.

Wrt. localized alpha:

 I thought I understood the definition after the first two lines, then
 the OVER/ADD explanation confused me. After reading again everything,
 I was convinced again that I had understood it correctly the first
 time.

 I don't know if this is actually better (I still think that it needs a
 better wording), but I'd suggest something like:

 +/* Separable alpha is when the alpha channel is used only to compute 
 the
 + * alpha value of the destination. This means that the computation of the
 + * RGB values of the result is independent of the alpha value.
 + *
 + * For example, the OVER operator has separable alpha for the 
 destination,
 + * because the RGB values of the result can be computed without knowing
 + * the destination alpha. Similarly, ADD has separable alpha for both 
 source
 + * and destination because the RGB values of the result can be
 + * computed without knowing the alpha value of source or destination.

 I'm proposing the separable term because it is consistent with
 separable blend modes (compositing operators in which each channel
 can be computed individually), whereas localized didn't tell me
 anything until I read the explanation.

I think your new description is better, but I don't like using the
word separable for two things that are not really the same. A
separable operator is one that can be applied separately for each R,
G, and B channel, whereas the localized alpha means that you can
compute the R, G, and B channels independent from the alpha channel in
question.

I have pushed a new branch here:

http://cgit.freedesktop.org/~sandmann/pixman/log/?h=iterators4

that shold take care of all your other comments. In addition, I also
squashed together moving the gradient initializers; there didn't seem
to be much gained from having them in separate commits.

Patches to come in follow-up emails.


Thanks for the review,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Making ref counting thread safe

2011-01-08 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 This problem looks bad enough to me, or at least makes me feel uneasy. 
 Changing
 how the caching works could also cause some weird problems in the future 
 unless
 everything is carefully taken into account. Also various tools like helgrind
 will complain a lot and it would be hard to differentiate real problems from
 just some mostly safe ugliness.

I agree this really should be fixed, my main point is that I don't
think it's a huge problem in current practice - in particular,
versions of pixman with these caches have been used with no problems
with cairo 1.8.x, including presumably in some of those applications
that uses images in a read-only way from multiple threads.

 In addition, requiring not to change the properties may be a bit restrictive,
 limiting the usefulness of this model.
 
 I would suggest to generally forbid using the same pixman_image_t from 
 multiple
 threads, but also add a function for creating pixman_image_t clone, which
 would share the pixel buffer with the original image and track pixel buffer
 lifetime via thread-safe refcounting. So using the same image from another 
 thread would just require creating a clone instead of doing 
 'pixman_image_ref', 
 all the rest should be exactly the same. 

It might be possible to do both: guarantee that read-only use will
work, but then also add a pixman_image_clone() function for the case
where you want the same pixels with different properties.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/18] Add iterators to images

2011-01-05 Thread Soeren Sandmann
Søren Sandmann sandm...@cs.au.dk writes:

 The following patch series changes the scanline access to be based on
 iterators instead of direct calls to virtual functions. There are
 several benefits to this:

The patches are also available in the iterators3 branch here:

http://cgit.freedesktop.org/~sandmann/pixman/log/?h=iterators3


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Making ref counting thread safe

2010-12-23 Thread Soeren Sandmann
Christian Hergert christian.herg...@gmail.com writes:

 If I where to go through and work on making the reference counting
 thread safe, what things would I need to make sure I take into
 account for the patch to be accepted? For example:

I put up some notes on the subject here:


http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=docsid=773a9833cb4bb887239dd358cf584e1499276f4a

Basically, the only two types of systems that really matter is Windows
and pthreads.

On Windows we always have atomic primitives available through the
Interlocked* functions. On pthreads, we can statically initialize
mutexes, so if worst comes to worst, we can fake the atomics with
mutexes.

   * Do we just need to use __sync_fetch_and_add()/__sync_sub_and_fetch()
 within the functions?

I think they should be hidden behind PIXMAN_ATOMIC_* macros.

   * Should we abstract the atomic operations to support compilers other
 than GCC? If so, what compilers do we need to support?

With the approach described in the link above, I think it should be
possible to support most compilers without too much trouble.

   * Do we wan't to fallback to ++/-- when used under X since threading
 isn't needed? (Add something like pixman_thread_init()).

Wouldn't any such mechanism have overhead in itself? In any case, I
don't think this would be necessary until benchmarks say otherwise.

   * Is it okay to start with just pixman_image_t and support others
 in future patches?

I think that's fine.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 3/7] Extend gradient-crash-test

2010-12-18 Thread Soeren Sandmann
Jon TURNEY jon.tur...@dronecode.org.uk writes:

 This seems to have a stray #include fenv.h, which causes compilation to fail
 when fenv.h doesn't exist, see [1].

Applied.

Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/2] Delete simple repeat code

2010-12-16 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

  As it turns out, non-scaled images having NORMAL repeat (tiled pictures)
  are actually used occasionally (typically with SRC operator). And removing
  simple repeat code caused ~4x performance regression in these cases. Looks
  like a new fast path is needed to solve the problem.
 
 Just an update to this. I tried to check whether this is easily solvable, but
 looks like there are a handful of missing fast paths for NORMAL repeat (they
 primarily show up when browsing various pages). Just doing it in a simple and
 straightforward way by adding all of them as new fast path functions may
 cause more redundancy in the pixman source code. Modifying the existing
 scaled fast paths to accept non-scaled cases also works, but it does not
 provide the best performance.

An idea might be to move the lookup_fast_path() into pixman-utils.c,
and then write a new fast path that would call this function to look
up another fast path and call it through the simple repeat code.

This would require the full set of flags to be passed down to the fast
paths, but we need that for the performance reporting anyway.

Just a suggestion, I don't know whether it would work in practice.

(An interesting possibility from this would be to use the regular
blitters as the combiners in the general code. That way, if you
have, say, a

gradient IN a8 OVER 565

operation, then instead of converting both a8 and 565 to , an
existing over__8_565 function could just be called with pointers
directly into the mask and destination. Some changes to the general
code would obviously be required though).

 For now I just reverted this patch in my pixman tree to workaround
 the problem and buy some more time to think about how it can be
 solved.

Does reverting the patch actually work? The standard flags currently
include SAMPLES_COVER_CLIP, and for a typical repeating blit, that
would not be set.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Add support for AltiVec detection for OpenBSD/PowerPC.

2010-12-15 Thread Soeren Sandmann
Brad b...@comstyle.com writes:

 On Tuesday 14 December 2010 14:44:09 Søren Sandmann wrote:
  From: Brad Smith b...@comstyle.com
 
  Bug 29331.
 
 Why was this posted? It was already commited 4 months ago.

I mistakenly sent out a bunch of old patches.  Sorry about that.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Check for NEON using a signal handler to catch SIGILL

2010-12-14 Thread Soeren Sandmann
Mike McCormack mj.mccorm...@samsung.com writes:

 Pixman already uses the SIGILL mechanism elsewhere in the same file.

FWIW, the SIGILL mechanism for PowerPC is only used if the operating
system is neither OS X, OpenBSD, or Linux. 

It used to be used on Linux, but it caused some kind of a infinite
loop in the build system at Red Hat, where it kept generating and
handling SIGILL, so David Woodhouse changed it to parse
/proc/auxv. See cd2a79ab81045aa7e35bc901081e57

I don't know whether the SIGILL handling in pixman was buggy, or
whether there was some other underlying reason.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Fix argument quoting for AC_INIT.

2010-11-19 Thread Soeren Sandmann
Cyril Brulebois k...@debian.org writes:

 One gets rid of this accordingly:
 | autoreconf -vfi
 | autoreconf: Entering directory `.'
 | autoreconf: configure.ac: not using Gettext
 | autoreconf: running: aclocal --force
 | configure.ac:61: warning: AC_INIT: not a literal: 
 pixman@lists.freedesktop.org
 | autoreconf: configure.ac: tracing
 | configure.ac:61: warning: AC_INIT: not a literal: 
 pixman@lists.freedesktop.org
 
 Signed-off-by: Cyril Brulebois k...@debian.org

Thanks - applied.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8

2010-11-19 Thread Soeren Sandmann
Xu, Samuel samuel...@intel.com writes:

 Appreciate comments on this patch

I'll be away for the next three weeks, so I won't be able to review
anything until then. 


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


[Pixman] [ANNOUNCE] pixman release 0.21.2 now available

2010-11-16 Thread Soeren Sandmann
A new pixman release 0.21.2 is now available. This is the first
development snapshot leading up to a stable 0.22 release.

News:
  ARM: Performance improvements for image scaling [Siarhei Siamashka]
  Performance improvements for affine transformations [Soren Sandmann]

Plus bug fixes and other improvements [Andrea Canciani, Siarhei, Soren].

NOTE: In this release a workaround for a bug in older version of the X
server has been removed. If your X server is version 1.6 or older, you
may see image corruption bugs with this version of pixman. 


Thanks,
Soren


tar.gz:
http://cairographics.org/releases/pixman-0.21.2.tar.gz
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.21.2.tar.gz

tar.bz2:
http://xorg.freedesktop.org/archive/individual/lib/pixman-0.21.2.tar.bz2

Hashes:
MD5:  9e09fd6e58cbf9717140891e0b7d4a7a  pixman-0.21.2.tar.gz
MD5:  4bc4cf052635265f7a98ad3e890ae329  pixman-0.21.2.tar.bz2
SHA1: c0ff07d7e4877dd4d0d369ca09e50ca956e3386e  pixman-0.21.2.tar.gz
SHA1: bb5cba514fe6756f9264a6995ec9dfed4d7439b5  pixman-0.21.2.tar.bz2

GPG signature:
http://cairographics.org/releases/pixman-0.21.2.tar.gz.sha1.asc
(signed by Søren Sandmann Pedersen sandm...@daimi.au.dk

Git:
git://git.freedesktop.org/git/pixman
tag: pixman-0.21.2

Log:
Andrea Canciani (3):
  Remove unused stop_range field
  Fix opacity check
  Improve conical gradients opacity check

Siarhei Siamashka (12):
  Fixed broken configure check for __thread support
  Do CPU features detection from 'constructor' function when 
compiled with 
  ARM: fix 'vld1.8'-'vld1.32' typo in add__ NEON fast path
  ARM: NEON: source image pixel fetcher can be overrided now
  ARM: nearest scaling support for NEON scanline compositing 
functions
  ARM: macro template in C code to simplify using scaled fast paths
  ARM: performance tuning of NEON nearest scaled pixel fetcher
  ARM: NEON optimization for scaled over__ with nearest 
filter
  ARM: NEON optimization for scaled over__0565 with nearest 
filter
  ARM: NEON optimization for scaled src__0565 with nearest 
filter
  ARM: NEON optimization for scaled src_0565_ with nearest 
filter
  ARM: optimization for scaled src_0565_0565 with nearest filter

Søren Sandmann Pedersen (8):
  Post-release version bump to 0.20.1
  Version bump 0.21.1.
  COPYING: Stop saying that a modification is currently under 
discussion.
  Remove workaround for a bug in the 1.6 X server.
  [mmx] Mark some of the output variables as early-clobber.
  Delete the source_image_t struct.
  Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images
  Pre-release version bump
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/3] Fix opacity check

2010-11-03 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 @@ -436,6 +436,12 @@ compute_image_info (pixman_image_t *image)
   {
   int i;
  
 + if (image-common.type == RADIAL 
 + image-radial.a = 0)
 + {
 + break;
 + }

I think this code needs a comment explaining why it's necessary, and
why checking for a = 0 works. Also, instead of having the extra if
statement, I think I'd prefer to refactor the switch like this:

case RADIAL:
/* Comment explaining why this test is necessary */
if (image-radial.a = 0)
break;

/* Fall through */

case LINEAR:
...

Other than, this and the other two patches look good to me.

It would of course also be very useful to extend the test suite to
catch this problem.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Plug leak in the alphamap test.

2010-10-26 Thread Soeren Sandmann
Jon TURNEY jon.tur...@dronecode.org.uk writes:

 Even with this fix, the cygwin tinderbox is still failing (silently)
 [1]. The alphamap process grows to 2G in size and then mmap() starts
 failing.
 
 It looks like this is because we are still leaking the alphamap
 pixmaps. Attached is a patch to fix this, although I am not very
 familiar with the pixman API, so this might not be the best way to do
 this :-)

It's close enough. I have pushed it to master.

 It would be better if this failure wasn't silent, I only noticed it
 by accident.

On the other hand, isn't it a little iffy to fail the test suite just
because malloc() or mmap() failed? Though, I guess many of the tests
would actually fail in that case anyway.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Radial gradients (again)

2010-10-18 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 I don't remember the timeline for pixman 0.20, 

Basically, I wanted to get a new release out tomorrow and call it a
release candidate, but it looks like that's not going to happen.

I think pixman 0.20 by now is feature frozen, and the next development
release will become 0.20 unless major bugs turn up.

The one thing that could still make it is the linear gradient fix. As
I recall, there was some issues about the classify method. If someone
wants to take a look at that patch:

http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/linear-masterid=7a9433f3942cf4f0c929fb3907d89af87a2d477c

it would be very helpful.

 I want to port this test or something very similar to pixman
 and check it against a different (and slow, but numerically
 stable) implementation of the gradients, but it will take some
 time, because I will be quite busy until the end of October.

This sounds like it would be extremely useful.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Gradients patches

2010-10-12 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

  Indeed, the code looks mostly fine to me; I couldn't find any
  functional problems with it. The main complaints I have:
 
  - In the affine case, some of the computation is done in 32.32, and
   some in double precision. Is there any reason for this? Doing all of
   it in double precision would let us get rid of the dot() function
   and the fdot() one could then be renamed to just dot().
 
 To keep the whole precision, using 32.32 is needed. If we want to use
 double there, we should at least stop doing the differences trick (and still
 the precision would be lower).

Aside from the formatting issues mentioned in IRC, I think the latest
code is probably fine.

We will need to find out exactly what numerical guarantees should be
given and then add tests to the test suite to verify that we meet them
in both the affine and the projective. However, simply having a
well-defined specification for the radials is a big step forward.

  I think this *does* need to be cross posted to xorg-devel because even
  though the spec says that it only allows gradients where one circle is
  contained in the other, in practice, it doesn't check for that. And
  cairo does not check that the circles are contained within eachother
  before sending them off to the X server.
 
 I see that now xorg-devel is in the CC addresses, anyway Cairo should pay
 more attention to what it does. Latest RENDER specification says:
 The array of stops has to contain values between 0 and 1 (inclusive) and
 has to be ordered in increasing size or a Value error is generated. The inner
 circle has to be completely contained inside the outer one or a Value error is
 generated.

It would be useful for someone to update the Render spec to say that
the radials are PDF Type 3 Shadings, defined in section 8.7.4.5.4 of
the PDF specification, and delete the stuff about the circles being
contained in each other.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Gradients patches

2010-10-10 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 On Mon, Oct 4, 2010 at 12:07 PM, Andrea Canciani ranm...@gmail.com wrote:
  This morning I pushed
  http://cgit.freedesktop.org/~ranma42/pixman/log/?h=radial-for-master
  It should be ready for master since it is documented, tested (at least
  on my laptop) and not using
  any new features (so it should not be broken on other
  architectures/compilers/etc).
 
 As suggested on IRC, I tried to cleanup the patch and improve code reuse.
 The result is 
 http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/radial-master
 Except for differentiation, there is basically no clever trick and the code
 should look quite straightforward and minimal.

Indeed, the code looks mostly fine to me; I couldn't find any
functional problems with it. The main complaints I have:

- In the affine case, some of the computation is done in 32.32, and
  some in double precision. Is there any reason for this? Doing all of
  it in double precision would let us get rid of the dot() function
  and the fdot() one could then be renamed to just dot().

- The various values that don't depend on the coordinates (a, cdx, cdy
  etc) can be precomputed and stored in the radial struct. The current
  code does that (and at the very least, if you are not going to
  precompute them, the existing ones should be removed).

- I think both the big comment and the commit message should mention
  the section of the PDF spec where the gradients are specified.

- Formatting. Please format according to CODING_STYLE. In particular:

- braces on their own line

- blank lines

- between logically distinct pieces of code

  For example there should be a blank line between the
  computation of db and the computation of dc and ddc since
  these relate to two different variables (b and c).

- after blocks of variable declarations

- in the big comment that describes the equation

  Also, please indent the formulas to set them off from the
  rest of the text and surround them with blank lines.

 I'm omitting the crosspost to the X mailing list because I checked
 again the RENDER extension and it currently only allows radial
 gradients with one circle completely contained in the other one (and
 in this case the old definition gives the same results as the new
 one).

I think this *does* need to be cross posted to xorg-devel because even
though the spec says that it only allows gradients where one circle is
contained in the other, in practice, it doesn't check for that. And
cairo does not check that the circles are contained within eachother
before sending them off to the X server.

 Not all of them were actually true, so I took the linear-float branch and
 worked on it:
 http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/linear-master
 I cleaned it up and made it more robust with respect to error propagation.
 The performance is just slightly worse than its fixed-point counterpart
 (around 10% on x86_64), but it has the big advantage of avoiding all the
 overflows.
 (For the cairo-interested ones, this fixes gradient-linear-large).
 
 I didn't change linear_gradient_classify (), but it should be corrected or,
 if it does not provide a measurable performance gain, removed.
 (The code seem to indicate that now the only two meaningful classes
 are UNKNOWN and HORIZONTAL).

HORIZONTAL is still important because for horizontal gradients, only
one scanline needs to be rendered; that one can be repeated over and
over. See general_composite_rect(). But yes,
linear_gradient_classify() should be corrected. If vertical is not
used in the gradient code itself anymore, that enum value can be
removed.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Plug leak in the alphamap test.

2010-10-08 Thread Soeren Sandmann
Søren Sandmann sandm...@daimi.au.dk writes:

  static pixman_image_t *
  make_image (pixman_format_code_t format)
  {
  uint32_t *bits;
  uint8_t bpp = PIXMAN_FORMAT_BPP (format) / 8;
 +pixman_image_t *image;
  
  bits = (uint32_t *)make_random_bytes (WIDTH * HEIGHT * bpp);
  
 -return pixman_image_create_bits (format, WIDTH, HEIGHT, bits, WIDTH * 
 bpp);
 +image = pixman_image_create_bits (format, WIDTH, HEIGHT, bits, WIDTH * 
 bpp);
 +
 +if (image  bits)
 + pixman_image_set_destroy_function (image, on_destroy, NULL);
  }

I need a return image there, of course. Somehow this didn't actually
cause the test to fail. Maybe image was in eax or something.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 4/5] test: Parallize composite.c with OpenMP

2010-10-06 Thread Soeren Sandmann
Maarten Bosmans mkbosm...@gmail.com writes:

 Why would you use OpenMP at all, just to run the tests in parallel?
 I'd imagine that the thread-level parallelism it gives isn't necessary
 and the tests can just as well be parallized on process-level. Isn't
 it easier to avoid the dependency on OpenMP and just use a clever
 Makefile to run multiple tests at the same time?

We already use OpenMP for a number of tests. It's not a hard
dependency - if the configure script doesn't detect OpenMP, it will
just run them serially. 

I don't think Makefile hackery would be simpler since right now all we
have to do to get the test suite running is this:

TESTS = $(TESTPROGRAMS)


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] FPU-based implementation of the core pixel pipeline

2010-10-05 Thread Soeren Sandmann
Dmitri Vorobiev dmitri.vorob...@movial.com writes:

 As for the conversion table, what comes into my mind is keeping a
 global table, and accessing it using inter-process communication
 mechanisms such as a shared memory area. Any comments on this?

A pre-generated 'static const' table would certainly be better, but
half a megabyte is probably still too much.

How must of a performance improvement is that table really over just
using regular conversions?

 Finally, what exactly are the coding style problems?

Please see the CODING_STYLE file. 


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Some clean-ups in fence_malloc() and fence_free()

2010-10-05 Thread Soeren Sandmann
Dmitri Vorobiev dmitri.vorob...@movial.com writes:

 This patch removes an unnecessary typecast of MAP_FAILED,
 replaces an erroneous free() by the correct munmap() in the
 error path for a failing mprotect(), and, finally, removes
 redundant calls to mprotect() that aren't necessary, because
 munmap() doesn't call for any specific memory protection.

Applied, thanks.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] test: Change composite so that it tests randomly generated images

2010-10-05 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 On Sunday 07 March 2010, Søren Sandmann wrote:
  Previously this test would try to exhaustively test all combinations
  of formats and operators, which meant that it would take years to run.
 
  Instead, generate random images and test those. The random seed is
  based on time(), so we will get different tests every time it is run.
  Whenever the test fails, it will print what the value of lcg_seed was
  when the test began.
 
  This patch also adds a table of random seeds that have failed at some
  point. These seeds are run first before the random testing begins.
  Whenever this tests fails, you are supposed to add the seed that
  failed to the table, so that we can track that the bug doesn't get
  reintroduced.
 
 I don't quite like any nondeterministic random factor in the standard 
 regression tests. Preferably the results of such tests should be
 reproducible from run to run, even if they are not perfect and do not
 provide full coverage.

I just sent a new set of patches that don't have the nondeterministic
behavior. However, considering that several of the tinderboxes here:

http://tinderbox.x.org/

are running make check over and over it would be really useful to make
them run different sets of tests each time.

 This is quite important for having a clearly defined formal patch submission
 process (Before submitting a patch, one needs to make sure that the regression
 tests pass. If they don't pass, the problem has to be investigated and patch
 fixed or regression tests updated if needed).
 
 With the randomness in the tests, patch contributor may end up in different
 confusing situations:
 - regression test fails for him, even if his patch is fine (if the problem was
 introduced by somebody else earlier)
 - regression test passes for him, but fails for the others later (due to the
 bug in the patch). In this case it would be hard to say if the contributor did
 proper job running the regression tests in the first place.

I'm not sure this is a big problem. If the test fails, it would print
out the seed that failed so the contributor can try the test again
without the patch applied. This will allow him to determine whether
the problem was introduced by him or not. If it wasn't, hopefully he
would report the bug.

It's true that some people, if the test fails intermittently for them,
might try submitting anyway hoping that nobody will notice. However, I
don't think most people would do that, and if the test fails so rarely
that they could get away with something like that, any fixed subset of
the test would likely also miss the problem.



Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [RFC] Performance reporting capabilities for pixman?

2010-09-27 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 Let's face it, I think the end users rarely report performance problems in
 pixman unless the performance is really notoriously bad (and not ignored
 because it's software rendering, so it is expected to be slow). Finding 
 real 
 performance bottlenecks in pixman requires running a profiler and being able 
 to
 actually interpret its logs.

Yeah, in fact almost all users never report *anything* unless it's so
broken that they can't possibly work around it. They have been trained
to do this because many projects simply ignore bugs for years in some
cases, and because bugzilla is a huge pain to deal with.

 I wonder if it would make sense to add an extra configure option for the
 special profiling enabled pixman build which would do:
 1. Avoid caching fast path lookup failures
 2. Once hitting slow path, record the type of operation, image color formats
 and image flags.
 3. For each unique slow path variant, accumulate the number of times it got
 hit, the total number of (destination?) pixels which got processed, the total
 number of scanlines.
 4. Once per N minutes, spam into syslog with the current accumulated 
 statistics.

This all makes sense to me. It might even be interesting to have it
compiled in by default, but disabled unless it is turned on through an
environment variable.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Build failure with pixman-0.19.4 - under MinGW64

2010-09-21 Thread Soeren Sandmann
Tilton, James C. (GSFC-6063) james.c.til...@nasa.gov writes:

 The following build error was encountered:
 
 $ make
 make  all-recursive
 make[1]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4'
 Making all in pixman
 make[2]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/pixman'
 make  all-am
 make[3]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/pixman'
 make[3]: Nothing to be done for `all-am'.
 make[3]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/pixman'
 make[2]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/pixman'
 Making all in test
 make[2]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/test'
   CC utils.o
 In file included from utils.c:1:0:
 utils.h:12:0: warning: ignoring #pragma omp threadprivate
 utils.c: In function 'fence_malloc':
 utils.c:226:5: warning: implicit declaration of function 'getpagesize'
 utils.c:238:5: warning: implicit declaration of function 'mmap'
 utils.c:238:33: error: 'PROT_READ' undeclared (first use in this function)
 utils.c:238:33: note: each undeclared identifier is reported only once for 
 each
 function it appears in
 utils.c:238:45: error: 'PROT_WRITE' undeclared (first use in this function)
 utils.c:238:57: error: 'MAP_PRIVATE' undeclared (first use in this function)
 utils.c:238:71: error: 'MAP_ANONYMOUS' undeclared (first use in this function)
 utils.c:241:25: error: 'MAP_FAILED' undeclared (first use in this function)
 utils.c:247:33: warning: cast from pointer to integer of different size
 utils.c:247:20: warning: cast to pointer from integer of different size
 utils.c:257:5: warning: implicit declaration of function 'mprotect'
 utils.c:258:5: error: 'PROT_NONE' undeclared (first use in this function)
 utils.c: In function 'fence_free':
 utils.c:288:8: error: 'PROT_READ' undeclared (first use in this function)
 utils.c:288:20: error: 'PROT_WRITE' undeclared (first use in this function)
 utils.c:293:5: warning: implicit declaration of function 'munmap'

Thanks for the bug report. It would be helpful if you can send the
output of the configure script. It looks like it detects mmap() etc.,
but not the sys/mman.h header file.

 utils.c: In function 'fuzzer_test_main':
 utils.c:407:0: warning: ignoring #pragma omp parallel
 utils.c: At top level:
 utils.c:457:1: warning: 'on_alarm' defined but not used
 make[2]: *** [utils.o] Error 1
 make[2]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/test'
 make[1]: *** [all-recursive] Error 1
 make[1]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4'
 make: *** [all] Error 2
 
 I am not able to find where PROT_READ, PROT_WRITE, PROT_NONE, MAP_PRIVATE, 
 MAP_ANONYMOUS, and MAP_FAILED are defined.

On a regular Unix system, these are defined in the sys/mman.h
header. Do you happen to know if MinGW has that header file or the
equivalent?


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/1] Add a lowlevel blitter benchmark

2010-09-17 Thread Soeren Sandmann
Hi,

 +#include stdint.h
 +#include stdio.h
 +#include string.h
 +#include stdlib.h
 +#include unistd.h
 +#include stdlib.h
 +#include malloc.h

I don't think either unistd.h or malloc.h needs to be included here.

 +void __attribute__((noinline))

__attribute__((noinline)) is GCC specific, so it needs to be defined
in pixman-compiler.h similarly to force_inline. I think the Visual C
version is called __declspec(noinline).

Also, in the gettime patch, utils.c needs to include sys/time.h to
get the prototype for gettimeofday(). It should be checked for in
configure.ac with AC_CHECK_HEADER().

Other than that, these patches look good to me.


Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/5] Add fence_malloc() and fence_free().

2010-09-17 Thread Soeren Sandmann
Søren Sandmann sandm...@daimi.au.dk writes:

 From: Søren Sandmann Pedersen s...@redhat.com
 
 These variants of malloc() and free() try to surround the allocated
 memory with protected pages so that out-of-bounds accessess will cause
 a segmentation fault.
 
 If mprotect() and getpagesize() are not available, these functions are
 simply equivalent to malloc() and free().

This patch was pretty broken, in two ways. 

 +AC_CHECK_FUNC(getpagesize, have_getpagesize=yes, have_=no)
 ^ 
A typo here.

 --- a/test/utils.c
 +++ b/test/utils.c
 @@ -5,6 +5,8 @@
  #include unistd.h
  #endif
  
 +#include sys/mman.h
 +

This should be guarded by HAVE_SYS_MMAN_H.

New version below.

Soren


commit 9b735ad08f9ad1b150534739a9d7ac66ae67d9de
Author: Søren Sandmann Pedersen s...@redhat.com
Date:   Mon Sep 13 14:34:34 2010 -0400

Add fence_malloc() and fence_free().

These variants of malloc() and free() try to surround the allocated
memory with protected pages so that out-of-bounds accessess will cause
a segmentation fault.

If mprotect() and getpagesize() are not available, these functions are
simply equivalent to malloc() and free().

diff --git a/configure.ac b/configure.ac
index dbff2a6..d3b71fa 100644
--- a/configure.ac
+++ b/configure.ac
@@ -623,6 +623,19 @@ if test x$have_alarm = xyes; then
AC_DEFINE(HAVE_ALARM, 1, [Whether we have alarm()])
 fi
 
+AC_CHECK_HEADER([sys/mman.h],
+   [AC_DEFINE(HAVE_SYS_MMAN_H, [1], [Define to 1 if we have sys/mman.h])])
+
+AC_CHECK_FUNC(mprotect, have_mprotect=yes, have_mprotect=no)
+if test x$have_mprotect = xyes; then
+   AC_DEFINE(HAVE_MPROTECT, 1, [Whether we have mprotect()])
+fi
+
+AC_CHECK_FUNC(getpagesize, have_getpagesize=yes, have_getpagesize=no)
+if test x$have_getpagesize = xyes; then
+   AC_DEFINE(HAVE_GETPAGESIZE, 1, [Whether we have getpagesize()])
+fi
+
 dnl =
 dnl Thread local storage
 
diff --git a/test/utils.c b/test/utils.c
index d95cbc2..ec6fd4e 100644
--- a/test/utils.c
+++ b/test/utils.c
@@ -5,6 +5,10 @@
 #include unistd.h
 #endif
 
+#ifdef HAVE_SYS_MMAN_H
+#include sys/mman.h
+#endif
+
 /* Random number seed
  */
 
@@ -197,10 +201,111 @@ image_endian_swap (pixman_image_t *img, int bpp)
 }
 }
 
+#define N_LEADING_PROTECTED10
+#define N_TRAILING_PROTECTED   10
+#define INITIAL_PAGE
+
+typedef struct
+{
+void *addr;
+uint32_t len;
+uint8_t *trailing;
+} info_t;
+
+#if defined(HAVE_MPROTECT)  defined(HAVE_GETPAGESIZE)
+
+void *
+fence_malloc (uint32_t len)
+{
+unsigned long page_size = getpagesize();
+unsigned long page_mask = page_size - 1;
+uint32_t n_payload_bytes = (len + page_mask)  ~page_mask;
+uint32_t n_bytes =
+   (len +
+page_size * (N_LEADING_PROTECTED + N_TRAILING_PROTECTED + 2) +
+n_payload_bytes)  ~page_mask;
+uint8_t *initial_page;
+uint8_t *leading_protected;
+uint8_t *trailing_protected;
+uint8_t *payload;
+uint8_t *addr;
+
+addr = malloc (n_bytes);
+
+if (!addr)
+{
+   printf (malloc failed on %u %u\n, len, n_bytes);
+   return NULL;
+}
+
+initial_page = (uint8_t *)(((unsigned long)addr + page_mask)  ~page_mask);
+leading_protected = initial_page + page_size;
+payload = leading_protected + N_LEADING_PROTECTED * page_size;
+trailing_protected = payload + n_payload_bytes;
+
+((info_t *)initial_page)-addr = addr;
+((info_t *)initial_page)-len = len;
+((info_t *)initial_page)-trailing = trailing_protected;
+
+if (mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_NONE) == -1)
+{
+   free (addr);
+   return NULL;
+}
+
+if (mprotect (trailing_protected, N_TRAILING_PROTECTED * page_size,
+ PROT_NONE) == -1)
+{
+   mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+   free (addr);
+   return NULL;
+}
+
+return payload;
+}
+
+void
+fence_free (void *data)
+{
+uint32_t page_size = getpagesize();
+uint8_t *payload = data;
+uint8_t *leading_protected = payload - N_LEADING_PROTECTED * page_size;
+uint8_t *initial_page = leading_protected - page_size;
+info_t *info = (info_t *)initial_page;
+uint8_t *trailing_protected = info-trailing;
+
+mprotect (leading_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+mprotect (trailing_protected, N_LEADING_PROTECTED * page_size,
+ PROT_READ | PROT_WRITE);
+
+free (info-addr);
+}
+
+#else
+
+void *
+fence_malloc (uint32_t len)
+{
+return malloc (len);
+}
+
+void
+fence_free (void *data)
+{
+free (data);
+}
+
+#endif
+
 uint8_t *
 make_random_bytes (int n_bytes)
 {
-uint8_t *bytes = malloc (n_bytes);
+uint8_t *bytes = fence_malloc (n_bytes);
 int i;
 
 if (!bytes)
diff --git a/test/utils.h b/test/utils.h
index 

Re: [Pixman] [PATCH] Add a lowlevel blitter benchmark

2010-09-15 Thread Soeren Sandmann
Dmitri Vorobiev dmitri.vorob...@movial.com writes:

 diff --git a/test/Makefile.am b/test/Makefile.am
 index 3d98e17..8f3fdaf 100644
 --- a/test/Makefile.am
 +++ b/test/Makefile.am
 @@ -18,7 +18,8 @@ TESTPROGRAMS =  \
   scaling-crash-test  \
   blitters-test   \
   scaling-test\
 - composite
 + composite   \
 + lowlevel-blt-bench

As Siarhei mentioned, this shouldn't go in TESTPROGRAMS. Eventually,
I'd like to have a separate performance test suite, perhaps even in
its own directory next to test, but for now, let's just add it in the
test directory.

 +++ b/test/lowlevel-blt-bench.c
 @@ -0,0 +1,713 @@
 +/*
 + * Copyright © 2010 Movial Creative Technologies Oy
 + *
 + * Permission to use, copy, modify, distribute, and sell this software and 
 its
 + * documentation for any purpose is hereby granted without fee, provided that
 + * the above copyright notice appear in all copies and that both that
 + * copyright notice and this permission notice appear in supporting
 + * documentation, and that the name of ARM Ltd not be used in
 + * advertising or publicity pertaining to distribution of the software 
 without
 + * specific, written prior permission.  ARM Ltd makes no
 + * representations about the suitability of this software for any purpose.  
 It
 + * is provided as is without express or implied warranty.
 + *
 + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS
 + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND
 + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY
 + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
 + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN
 + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
 + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS
 + * SOFTWARE.
 + *
 + * Author:  Jonathan Morton (jonathan.mor...@movial.com)
 + *
 + * Based on the code by Siarhei Siamashka siarhei.siamas...@gmail.com
 + *
 + */

This says copyright Movial, but mentions ARM Ltd in the text. 

Unless you have a reason to not use it, the preferred license is the
one in COPYING. Its main advantage is that it avoids this kind of
cut-and-paste issue.

Also, is this code mainly written by Siarhei? If so, presumably either
Nokia or Siarhei are also copyright holders.

 +tests_tbl[] =
 +{
  
 +{ src_n_,PIXMAN_a8r8g8b8,1, PIXMAN_OP_SRC, 
 PIXMAN_null, 0, PIXMAN_a1r5g5b5 },

This one looks like it should have a2r2g2b2 in the destination

 +{ src_n_,PIXMAN_a8r8g8b8,1, PIXMAN_OP_SRC, 
 PIXMAN_null, 0, PIXMAN_a1r5g5b5 },

This one looks wrong too.

 +/* Return time since the Epoch in seconds */
 +double
 +gettime (void);

Like the malloc() changes, this could usefully be done in its own patch.



Thanks,
Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions

2010-09-15 Thread Soeren Sandmann
 The DSP ASE version uses the MIPS32R2 version of
 pixman_fill32(). Initially the plan was to develop separate versions
 of each function for each target instruction set (and DSP ASE Rev 2
 was also on the list) but the DSP ASE version of fill32 was
 eventually dropped. There are no MIPS processors that implement DSP
 ASE but are not MIPS32R2-compatible. It is convenient to be able to
 fall back to MIPS32R2 implementation if the DSP ASE code does not
 provide enough speedup to justify its use.

Right, I'm not complaining about the fact that it falls back. I'm
saying that the MIPS32R2 code should be added in one patch, followed
by another patch that adds the DSP ASE code.

  - Is there a reason to not do runtime checking? I realize that most
   people using MIPS will likely do so on an embedded system where they
   know ahead of time what the CPU supports, but we do have runtime
   checking for the other CPU specific implementations.
 
 Yes, at the present time this isn't very easy to do on MIPS Linux --
 checking if DSP ASE is implemented requires the use of privileged
 instructions. MIPS is aware of this problem and will push an update
 to the kernel that enables the use of AT_HWCAP for this purpose.

OK.

 Thanks for the all the comments! I'll update the code per your
 comments and submit another patch. Should I mail it just to the
 pixman list or CC both cairo and pixman?

Usually for pixman patches, it's fine to just mail the pixman lists,
although if a patch has implications for X or cairo, then those lists
should be CC'd as well.

One other thing: We need a copyright statement for the new files. The
canonical copyright text can be found in the pixman/COPYING file.


Soren


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path

2010-09-13 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 +/* A variant of 'over', which works faster for non-additive blending on the
 + * platforms which do not have special instructions for saturated addition
 + */
 +static force_inline uint32_t
 +over_a (uint32_t src, uint32_t dest, pixman_bool_t additive_blending)
 +{
 +uint32_t a = ~src  24;
 +if (additive_blending)
 +{
 + UN8x4_MUL_UN8_ADD_UN8x4 (dest, a, src);
 + return dest;
 +}
 +else
 +{
 + UN8x4_MUL_UN8 (dest, a);
 + return dest + src;
 +}
 +}

Is there any reason to not just add a boolean additive_blending to
the existing force_inline over() function?

It might also be interesting to add the check as a new
NOT_SUPER_LUMINESCENT flag and then simply require it for the source
for all the over_n_*() functions. That would allow similar
optimizations for the n_8_565 case and probably the n___ca()
case as well.

The flag could be set for all the gradients and any time an image is
opaque.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] PDF radial gradients

2010-09-10 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 I updated the documentation of radial gradients in my wip/radial branch
 (http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/radial) to reflect
 my changes (which make pixman gradients behave as specified by pdf
 reference manual).

It's of course important to point out here that changing the radial
gradients to behave as specified in the PDF spec, really is a change
from they used to do.

As I understand it, for most common cases, the output is identical to
what it used be, is that correct? It would be helpful to have a
description of the cases where the behavior differs.

Other than that, I'm in favor of just using the PDF spec here. I don't
think there is much real value in coming up with our own
specification.

Regarding this:

When a point does not belong to any of the circles, it is
transparent.

It's probably worthwhile to say explicitly that the point has the RGBA
value (0, 0, 0, 0) instead of transparent. That way it's clear that
you can't leave out the point when using the SOURCE operator for
example.

I'm not completely clear on what cairo is doing (or supposed to do) in
this case. The cairo SOURCE operator is bounded by the mask, but is it
also bounded by the source?

 The code is not yet ready to be merged (it uses __int128 variables, which
 probably won't work on 32 bit architectures), but I would appreciate reviews
 (expecially of the new documentation) and testing.

If __int128 is intended to help with non-FPU systems, then let me
repeat that using floating point in pixman is totally fine, and almost
certainly faster than any __int128 would be.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] PDF radial gradients

2010-09-10 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

 The behavior is the same when one of the two circle completely contains
 the other one, otherwise it is different.
 What is the right place for this observation? The code documentation?
 Or is it better to write it in the commit message?

The commit message, I'd say, though it's probably a good idea to also
CC the X and cairo lists to make sure they know this is happening.

  The code is not yet ready to be merged (it uses __int128 variables, which
  probably won't work on 32 bit architectures), but I would appreciate
  reviews
  (expecially of the new documentation) and testing.
 
  If __int128 is intended to help with non-FPU systems, then let me
  repeat that using floating point in pixman is totally fine, and almost
  certainly faster than any __int128 would be.
 
 The purpose of using 128 bits is avoiding approximations (thus retaining
 full precision), but I also expect it to be faster than floating point, 
 because
 in the inner loop the values are updated using multiple differentiation
 (i.e. just 3 additions), beside the double-precision computations when
 actually needed.

Okay, if it's necessary for precision, that's fine. I haven't looked
at the code yet.

 Why do you think that floating point would be faster?

I'd guess on most systems, arithmetic on floating point values will be
much faster than arithmetic on int128s, simply because an int128 would
have to be built up from two 64 bit, or four 32 bit registers.

 (I expect that int to double conversion is the slow operation in this code,
 but it only happens when a sqrt is computed, so it's probably not a big deal)
 Did you also consider the changes required to retain full precision?
 
 Thank you for your comments, I'll fix the documentation and benchmark
 fixed vs float soon

Thanks for fixing the gradients btw. At this point they are by far the
most embarrassing part of pixman.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Valgrind-clean pixman

2010-08-30 Thread Soeren Sandmann
Andrea Canciani ranm...@gmail.com writes:

  I may be wrong here, but seems like everyone has his own definition of what 
  is
  considered to be thread safe. Currently pixman appears to be thread safe 
  as
  long as same pixman objects (pixman_image_t and the others) are not used 
  from
  multiple threads simultaneously. This is somewhat similar to the thread 
  safety
  assumptions of the GNU implementation of a standard C++ template library. 
  I'm
  adding this link here because they took care of documenting what can be
  guaranteed when working in a multi-threaded environment, and what can't be:
  http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html#manual.intro.using.concurrency.thread_safety
  http://www.sgi.com/tech/stl/thread_safety.html

 I see. Currently cairo makes stronger assumptions about pixman thread safety
 (something like: a pixman_image can be used as source from multiple threads
 as long as no threads is writing to it).

I think allowing an image to be used as a source in multiple threads
and as a destination in only one, is a reasonable guarantee to
give. How far from that are we at the moment?

  I'm working on a library of simple inline functions (based on cairo
  atomic and mutex implementations) to share some common utility functions
  which have different availability on different platforms (right now: 
  atomics,
  wideint, mutexes). I hope to make it available soon (after I get autotools
  do the right things to make it work).

I'd just as well have these things in pixman directly, unless your new
library can become some sort of 'libfreedesktop' type thing that
contains the various data structures and portability gunk that all
projects end up having.

I wrote some notes at one point on how mutexes and atomics could be
added:

http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=threadsid=99c5d5edfef6275ed38e4ec9881a59f5b4ba9492


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] Valgrind-clean pixman

2010-08-29 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

  We probably want to implement DllMain on win32 to allocate/free TLS anyway.
 
 Yes, I think it makes sense to give this a try. The constructor attribute is 
 supported since at least gcc 2.95, and it seems to be fine in clang 2.7 too.
 There's always a risk that this stuff may be broken for some gcc versions or 
 target systems, but we can never know before trying :)

The only problematic case I can think of is Solaris with older
versions of the Sun compiler. However, we may be able to use the
_init() and _fini() functions there.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Use windows.h directly for mingw32 build

2010-08-29 Thread Soeren Sandmann
Maarten Bosmans mkbosm...@gmail.com writes:

 This patch adresses the issue discussed in
 http://lists.freedesktop.org/archives/pixman/2010-April/000163.html
 
 There were only two clashing identifiers.  The first one is IN, which
 obviously causes problems in Pixman for lines like
 PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8),
 Fortunately the mingw headers provide a solution: by defining
 _NO_W32_PSEUDO_MODIFIERS, these stupid symbols are skipped.
 
 The other name is UINT64, used in pixman-mmx.c. I renamed that
 function to to_uint64, but may be another name is more appropriate. It
 may also be good to rename the other function M64 to to_m64, but I
 left that one alone for now.

I do think M64 should be renamed to to_m64() for consistency, but
other than that, the patch looks good to me.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] is whitespace clean-up good for the upstream?

2010-08-28 Thread Soeren Sandmann
Dmitri Vorobiev dmitri.vorob...@movial.com writes:

 While working on a task, which is only loosely related to Pixman
 upstream, I noticed that Pixman sources contain quite many lines
 consisting of whitespace only. Since such lines have an adverse effect
 on what I am doing now, I am cleaning away such whitespaces. I could
 submit a patch doing such clean-up, however, I would like to know
 first if it is interesting for Pixman maintainers and would be
 applied.

I'm curious why this whitespace is a problem. The CODING_STYLE
document used to says something about avoiding trailing whitespace,
but I removed it recently because it just doesn't bother me.

 So, my question is: it there any interest in applying a code clean-up
 patch that does not change any functionality (and even does not change
 any compiled code)?

If there is some good reason to do this, it could be considered, but
this kind of formatting change tends to introduce gratuitous conflicts
in people's branches.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 2/2] ARM: added 'neon_composite_src_8888_8_0565' fast path

2010-08-26 Thread Soeren Sandmann
Siarhei Siamashka siarhei.siamas...@gmail.com writes:

 +.macro pixman_composite_src_n_8_0565_process_pixblock_head
 +/* in */
 +vmull.u8q15, d24, d2
 +vmull.u8q3,  d24, d1
 +vmull.u8q2,  d24, d0
 +vrshr.u16   q12, q15, #8
 +vrshr.u16   q11, q3,  #8
 +vrshr.u16   q10, q2,  #8
 +vraddhn.u16 d16, q15, q12
 +vraddhn.u16 d19, q3,  q11
 +vraddhn.u16 d18, q2,  q10
 +.endm
 +
 +.macro pixman_composite_src_n_8_0565_process_pixblock_tail
 +/* convert to r5g6b5 */
 +vshll.u8q14, d16, #8
 +vshll.u8q8,  d19, #8
 +vshll.u8q9,  d18, #8
 +vsri.u16q14, q8,  #5
 +vsri.u16q14, q9,  #11
 +.endm

It's remarkable how much NEON can do with so few instructions.

 @@ -90,6 +90,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, 
 over___,
  uint32_t, 1, uint32_t, 1, uint32_t, 
 1)
  PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, over__8_0565,
  uint32_t, 1, uint8_t, 1, uint16_t, 1)
 +PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, src__8_0565,
 +uint32_t, 1, uint8_t, 1, uint16_t, 1)
  
  void
  pixman_composite_src_n_8_asm_neon (int32_t   w,
 @@ -198,6 +200,8 @@ pixman_blt_neon (uint32_t *src_bits,
  
  static const pixman_fast_path_t arm_neon_fast_paths[] =
  {
 +PIXMAN_STD_FAST_PATH (SRC,  a8r8g8b8, a8,   r5g6b5,   
 neon_composite_src__8_0565),
 +PIXMAN_STD_FAST_PATH (SRC,  a8b8g8r8, a8,   b5g6r5,   
 neon_composite_src__8_0565),

It seems like this fast path will work as src_x888_8_0565 as well
since the alpha channel is completely ignored.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


  1   2   >