Re: [Pixman] [PATCH 0/11] Use macros to generate fetchers
Soeren Sandmann sandm...@cs.au.dk writes: Chris Wilson ch...@chris-wilson.co.uk writes: The final patch is a little scary though. Was that motivated by observation of the generated assembly or through performance testing? It was motivated by looking at the generated assembly, but I'll run some performance tests. Performance results below. I tested master, the series without the final fast path patch, and the series with the final patch. The results are that without wins 10/26, master 9/26, and fast paths 7/26, but there is never a huge difference. I guess the conclusion is that the fast paths can't be justified by these measurements. Soren minimum median std.dev. winner --- [ 0]imagefirefox-talos-gfx 21.013 21.058 0.11%6/6 without [ 0]imagefirefox-talos-gfx 20.714 20.790 0.27%6/6 [ 0]imagefirefox-talos-gfx 21.249 21.312 0.15%6/6 --- [ 0] image16firefox-talos-gfx 19.623 19.630 0.04%5/6 without [ 0] image16firefox-talos-gfx 19.596 19.618 0.14%6/6 [ 0] image16firefox-talos-gfx 19.925 19.948 0.12%6/6 --- [ 0]imagefirefox-talos-svg 120.890 121.006 0.07%5/6 without [ 0]imagefirefox-talos-svg 120.623 120.647 0.03%5/6 [ 0]imagefirefox-talos-svg 120.978 121.055 0.04%5/6 --- [ 0] image16firefox-talos-svg 138.907 139.182 0.12%5/6 master [ 0] image16firefox-talos-svg 140.011 140.043 0.04%5/6 [ 0] image16firefox-talos-svg 139.609 139.825 0.08%5/6 --- [ 0]imageevolution 25.322 25.456 0.27%6/6 without [ 0]imageevolution 25.248 25.414 0.27%6/6 [ 0]imageevolution 26.654 26.732 0.15%6/6 --- [ 0] image16evolution 13.096 13.165 0.51%6/6 master [ 0] image16evolution 13.217 13.250 0.17%5/6 [ 0] image16evolution 13.368 13.435 0.26%6/6 --- [ 0]image gnome-system-monitor 10.192 10.222 0.13%6/6 without [ 0]image gnome-system-monitor 10.166 10.195 0.13%6/6 [ 0]image gnome-system-monitor 10.488 10.499 0.12%6/6 --- [ 0] image16 gnome-system-monitor 10.210 10.255 0.34%6/6 without [ 0] image16 gnome-system-monitor 10.191 10.236 0.34%6/6 [ 0] image16 gnome-system-monitor 10.503 10.537 0.40%6/6 --- [ 0]image gnome-terminal-vim6.5476.692 0.92%6/6 master [ 0]image gnome-terminal-vim6.9927.098 1.07%6/6 [ 0]image gnome-terminal-vim6.7226.864 1.17%6/6 --- [ 0] image16 gnome-terminal-vim6.1046.242 0.95%6/6 fast paths [ 0] image16 gnome-terminal-vim6.1756.271 0.74%6/6 [ 0] image16 gnome-terminal-vim6.0436.141 0.79%6/6 --- [ 0]image grads-heat-map0.5520.561 0.85%6/6 master [ 0]image grads-heat-map0.5780.584 0.48%6/6 [ 0]image grads-heat-map0.5580.568 0.80%6/6 --- [ 0] image16 grads-heat-map0.5560.559 0.56%6/6 master [ 0] image16 grads-heat-map0.5750.576 0.07%4/6 [ 0] image16 grads-heat-map0.5580.561 0.27%6/6 --- [ 0]image gvim 32.374 32.459 0.15%6/6 fast paths [ 0]image gvim 32.414 32.505 0.11%6/6 [ 0]image
Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms
Soeren Sandmann sandm...@cs.au.dk writes: The new code only computes the transformation once, and then it add the absolute value of the 00 and 10 components to get a conservative estimate of the transformed destination boundary. I'm especially interested in review of this part, both of the code and of whether it's actually correct mathematically. It is in fact not at all correct. It's completely bogus in fact. For affine matrices the idea may not be entirely wrong, but the correct thing to add to the X coordinates would be |m00| + |m01| + |m11| and not just |m00|. Similar for Y coordinates. But that fails to take projective matrices into account, so I just went back to computing the transformed bounding box twice. New patch below. Soren From e02d17612531fb8556d30fc2dde3ad2680f2b739 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com Date: Mon, 5 Sep 2011 00:19:51 -0400 Subject: [PATCH] Eliminate compute_sample_extents() function In analyze_extents(), instead of calling compute_sample_extents() call compute_transformed_extents() and inline the remaining part of compute_sample_extents(). The upcoming bilinear-nearest optimization will do something different with these two pieces of code. --- pixman/pixman.c | 100 +++ 1 files changed, 42 insertions(+), 58 deletions(-) diff --git a/pixman/pixman.c b/pixman/pixman.c index 264a56b..19eda08 100644 --- a/pixman/pixman.c +++ b/pixman/pixman.c @@ -514,45 +514,9 @@ compute_transformed_extents (pixman_transform_t *transform, return TRUE; } -static pixman_bool_t -compute_sample_extents (pixman_transform_t *transform, - pixman_box32_t *extents, - pixman_fixed_t x_off, pixman_fixed_t y_off, - pixman_fixed_t width, pixman_fixed_t height) -{ -box_48_16_t transformed; - -if (!compute_transformed_extents (transform, extents, transformed)) - return FALSE; - -/* Expand the source area by a tiny bit so account of different rounding that - * may happen during sampling. Note that (8 * pixman_fixed_e) is very far from - * 0.5 so this won't cause the area computed to be overly pessimistic. - */ -transformed.x1 += x_off - 8 * pixman_fixed_e; -transformed.y1 += y_off - 8 * pixman_fixed_e; -transformed.x2 += x_off + width + 8 * pixman_fixed_e; -transformed.y2 += y_off + height + 8 * pixman_fixed_e; - -if (transformed.x1 pixman_min_fixed_48_16 || transformed.x1 pixman_max_fixed_48_16 || - transformed.y1 pixman_min_fixed_48_16 || transformed.y1 pixman_max_fixed_48_16 || - transformed.x2 pixman_min_fixed_48_16 || transformed.x2 pixman_max_fixed_48_16 || - transformed.y2 pixman_min_fixed_48_16 || transformed.y2 pixman_max_fixed_48_16) -{ - return FALSE; -} -else -{ - extents-x1 = pixman_fixed_to_int (transformed.x1); - extents-y1 = pixman_fixed_to_int (transformed.y1); - extents-x2 = pixman_fixed_to_int (transformed.x2) + 1; - extents-y2 = pixman_fixed_to_int (transformed.y2) + 1; - - return TRUE; -} -} - #define IS_16BIT(x) (((x) = INT16_MIN) ((x) = INT16_MAX)) +#define ABS(f) (((f) 0)? (-(f)) : (f)) +#define IS_16_16(f) (((f) = pixman_min_fixed_48_16 ((f) = pixman_max_fixed_48_16))) static pixman_bool_t analyze_extent (pixman_image_t *image, @@ -563,7 +527,8 @@ analyze_extent (pixman_image_t *image, pixman_fixed_t *params; pixman_fixed_t x_off, y_off; pixman_fixed_t width, height; -pixman_box32_t ex; +box_48_16_t transformed; +pixman_box32_t exp_extents; if (!image) return TRUE; @@ -633,17 +598,6 @@ analyze_extent (pixman_image_t *image, default: return FALSE; } - - /* Check whether the non-expanded, transformed extent is entirely within -* the source image, and set the FAST_PATH_SAMPLES_COVER_CLIP if it is. -*/ - ex = *extents; - if (compute_sample_extents (transform, ex, x_off, y_off, width, height) - ex.x1 = 0 ex.y1 = 0 - ex.x2 = image-bits.width ex.y2 = image-bits.height) - { - *flags |= FAST_PATH_SAMPLES_COVER_CLIP; - } } else { @@ -653,17 +607,47 @@ analyze_extent (pixman_image_t *image, height = 0; } -/* Check that the extents expanded by one don't overflow. This ensures that - * compositing functions can simply walk the source space using 16.16 - * variables without worrying about overflow. +if (!compute_transformed_extents (transform, extents, transformed)) + return FALSE; + +/* Expand the source area by a tiny bit so account of different rounding that + * may happen during sampling. Note that (8 * pixman_fixed_e) is very far from + * 0.5 so this won't cause the area computed to be overly pessimistic
[Pixman] [ANNOUNCE] pixman release 0.23.4 now available
A new pixman release 0.23.4 is now available. This is a development release leading up to a stable release 0.24.0. Note: If you run the test suite on this released tarball, this message: *** BUG *** In pixman_region32_union_rect: Invalid rectangle passed Set a breakpoint on '_pixman_log_error' to debug will be printed a number of times when the region-contains-test is running. The test is still expected to pass, however. Please report bugs to pixman@lists.freedesktop.org or file them at https://bugs.freedesktop.org/enter_bug.cgi?product=pixman Søren tar.gz: http://cairographics.org/snapshots/pixman-0.23.4.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.4.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.4.tar.bz2 Hashes: MD5: 389ce497fcfb26eb99396ca5ed6e17ab pixman-0.23.4.tar.gz MD5: 1fb7c1ab150cfbc417290d5888d68ce8 pixman-0.23.4.tar.bz2 SHA1: 539f4875070ce836a109fa0fae7a3421d503ca46 pixman-0.23.4.tar.gz SHA1: 5c85fff3f474436fdcc0c29139ce4ed87e38e24b pixman-0.23.4.tar.bz2 GPG signature: http://cairographics.org/snapshots/pixman-0.23.4.tar.gz.sha1.asc (signed by Søren Sandmann Pedersen sandm...@cs.au.dk) Git: git://git.freedesktop.org/git/pixman tag: pixman-0.23.4 Log: Andrea Canciani (4): radial: Improve documentation and naming radial: Fix typos and trailing whitespace win32: Build benchmarks Workaround bug in llvm-gcc Chris Wilson (1): bits: optimise fetching width==1 repeats Siarhei Siamashka (2): C fast path for scaled src_x888_ with nearest filter ARM: workaround binutils bug #12931 (code sections alignment) Søren Sandmann Pedersen (12): Post-release version bump to 0.23.3 Makefile.am: Add pixman@lists.freedesktop.org to RELEASE_ANNOUNCE_LIST Fix lcg_rand_u32() to return 32 random bits. New test of pixman_region_contains_{rectangle,point} Speed up pixman_region{,32}_contains_rectangle() Use find_box_for_y() in pixman_region_contains_point() too Don't include stdint.h in lowlevel-blt-bench.c In pixman_image_create_bits() allow images larger than 2GB Rename pixman-fast-path.h to pixman-inlines.h Use repeat() function from pixman-inlines.h in pixman-bits-image.c Move bilinear interpolation to pixman-inlines.h Pre-release version bump to 0.23.4 Taekyun Kim (3): ARM NEON: Standard fast path out_reverse_8_ ARM: NEON better instruction scheduling of over_n_8_ ARM: NEON better instruction scheduling of over_n_ ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] 0.23.4
Hi, We've gone too long without a release, so I'm planning on making a 0.23.4 release some time this week. If you have outstanding patches that are ready to go in, please push them. Off the top of my head, I can think of the ARM scheduling patches, and the 1 pixel fill patches, though there might be others. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms
It seems to me that the cause of the problem is that the meaning of COVER depends on what the filter is: it means there is enough space around the sample area for the chosen filter, which then becomes wrong when doing filter reductions. So maybe a solution would be to provide two separate flags: COVER_NEAREST: there is enough space for a nearest filter COVER_BILINEAR: there is enough space for a bilinear filter These are unambiguous and because either there is enough space for a bilinear filter, or there isn't, they can be set independently of what the filter actually is. Code that currently depends on COVER generally also depends on filter, at least implicitly, so there shouldn't be any problem picking the correct version of COVER for existing fast paths. This would solve the problem because in the problematic case, the flags FILTER_NEAREST | COVER_NEAREST could be set, and the standard fast paths would run, but at the same time, bilinear/cover paths wouldn't be triggered because they would require FILTER_BILINEAR | COVER_BILINEAR. As a side effect, this also means we can drop the code that worries about what covering means when there is a convolution filter. I tried implementing this scheme in the patch series that follows. There is first a general cleanup of the analyze_extent routines. The old code computed the transformed sample extent twice, once to determine whether the clip was contained within the samples, and once to determine whether the transformation of the destination rectangle, expanded by 1, was contained within the 16 bit coordinate system. The new code only computes the transformation once, and then it add the absolute value of the 00 and 10 components to get a conservative estimate of the transformed destination boundary. I'm especially interested in review of this part, both of the code and of whether it's actually correct mathematically. Following that cleanup is Siarhei's change to affine-test and a change to blitters test to make it use a bilinear filter occasionally. Then the splitting of SAMPLES_COVER_CLIP and the bilinear reduction, followed by Siarhei's patch to also do the reduction for 90/180/270 degree rotations and integral translations. The bug, for reference: https://bugs.freedesktop.org/show_bug.cgi?id=37421 Performance testing with Xephyr on a Core2 Duo @1GHz: Current master: *** ROUND 1 *** --- Test: Test Xrender doing non-scaled Over blends Time: 5.720 sec. --- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 5.149 sec. --- Test: Test Imlib2 doing non-scaled Over blends Time: 6.237 sec. With bilinear reduction: *** ROUND 1 *** --- Test: Test Xrender doing non-scaled Over blends Time: 4.947 sec. --- Test: Test Xrender (offscreen) doing non-scaled Over blends Time: 4.487 sec. --- Test: Test Imlib2 doing non-scaled Over blends Time: 6.235 sec. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms
Soeren Sandmann sandm...@cs.au.dk writes: I tried implementing this scheme in the patch series that follows. The patch series is also available here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=bilinear-reduction Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/11] Use macros to generate fetchers
Chris Wilson ch...@chris-wilson.co.uk writes: The final patch is a little scary though. Was that motivated by observation of the generated assembly or through performance testing? It was motivated by looking at the generated assembly, but I'll run some performance tests. Can pixman export a wrapper to its convert_pixel function? There are quite a few open-coded routines for doing the same spread around the X server and its brethren. I wouldn't necessarily be opposed to exporting such a function if a patch show up. It probably has to be driven by someone who has a need for it though, to make sure we get the right API. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] ARM iwmmxt patches
Matt Turner matts...@gmail.com writes: I've been trying to figure out if the ARM iwmmxt inline assembly makes any difference at all. I think the conclusion is that it does not. Updated code is here: http://cgit.freedesktop.org/~mattst88/pixman/log/?h=iwmmxt-optimizations You mean that inline assembly in mmx_fill() doesn't make a difference? See http://people.freedesktop.org/~mattst88/pixman-iwmmxt-benchdata.txt The lowlevel-blt benchmark doesn't hit the fill and blt routines at all, so this data doesn't support the conclusion that inline assembly in mmx_fill() and mmx_blt() makes no difference. Never does using inline assembly seem to make any sort of meaningful difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried checking the alignment in the 'wip' commit in the blt function to avoid a lot of unnecessary walign instructions, but as you can see from the benchmark results, it doesn't help anything. The cairo-trace tests are better benchmarks to use in general because they reflect real-world use. lowlevel-blt-bench really should only be used for the case where you are optimizing a specific compositing routine. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] ARM iwmmxt patches
Matt Turner matts...@gmail.com writes: Never does using inline assembly seem to make any sort of meaningful difference over simply compiling pixman-mmx.c for ARM/iwmmxt. I tried checking the alignment in the 'wip' commit in the blt function to avoid a lot of unnecessary walign instructions, but as you can see from the benchmark results, it doesn't help anything. The cairo-trace tests are better benchmarks to use in general because they reflect real-world use. lowlevel-blt-bench really should only be used for the case where you are optimizing a specific compositing routine. OK, I'll run cairo-trace to determine the effect of the inline assembly. I think the addition of inline assembly could go in as follow-on patches though, right? Right, as long as they are not required to avoid regressions on either x86 or ARM. There are still some problems with the rest of the patch set though. Several of the comments from Siarhei and me have not been addressed, and compiling your iwmmxt-optimizations2 branch on x86 results in ../pixman/.libs/libpixman-1.so: undefined reference to `_mm_align_si64' Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Shapes, etc.
Hi, Here are some ideas on how to improve the support in pixman for cairo's compositing operation. There are several separate questions related to that: 1. Can we support cairo's compositing operation better? 2. Can we pass shape information to pixman more efficiently? and a tangential third one: 3. Should the pixman API be based on something other than images? The TL;DR is a strawman proposal to add: - pixman_image_set_clip_image() - support for op_LERP operators - either a polygon or a spans image 1. Can we support cairo's compositing operation better? The first question pertains to the fact that cairo's operation is more complex than what pixman and X Render provides. This leads to extra work in cairo to construct the desired compositing operation. One aspect of this is that cairo supports clipping to arbitrary shapes, whereas pixman only supports axis aligned regions. A straightforward fix for this is just add support for clip images: pixman_image_set_clip_image (pixman_image_t *image, pixman_image_t *clip); which means that when @image is used as a destination, all rendering is clipped to the alpha channel of @clip in addition to being clipped to the clip region. (We need both the region and the clip image because the X server will want to set a clip region based on the window hierarchy in any case). The simplest way to implement this is to have the general implementation run an iterator for the clip image, and then pass a clip buffer to the combiners. In addition to computing the regular operator, the combiners would then also do LERP_clip onto the destination buffer. pixman_image_composite_with_clip (src, mask, dest, clip, ...) It may also be that it would more useful to instead add a new function that takes a clip image in addition to src, mask, and dest. Another aspect is that cairo uses a different rendering equation in some cases. The two equations used are: unbounded: [(src IN shape) OP dest ] LERP_clip dest bounded: [(src OP dest) LERP_shape dest ] LERP_clip dest With shaped clips, the LERP_clip part is taken care of, and the first equation is already directly supported by pixman. The second one could be supported by adding new op_LERP operators that would use the (src OP dest) LERP_shape dest equation. For cairo's purposes, all we need is CLEAR_LERP and SRC_LERP, but I think it could be useful to have the full set of LERP operators. 2. Can we pass shape information to pixman more efficiently? The second question is whether there is a more efficient way to pass shape information to pixman than passing them as 8 bit alpha masks. A polygon image is a possibility, and the one that I prefer as it allows more processing to happen in one pass, and because it is a more compact representation which is useful for the X protocol. There could be a method on such an image to generate a list of spans, so that cairo wouldn't need to maintain its own polygon rasterizer. But passing the information as spans is another possibility. If we add support for clip images, the obvious way to add spans support would be through a spans image: pixman_image_create_spans (const pixman_span_t *spans, int n_spans); that could be used both as the shape and as the clip. The initial implementation of such an image would be very simple: just implement an iterator that walks the list of spans, generating argb scanline buffers. An appealing prospect here is a fast path that processes both shape and clip in one pass. If both span lists are known to be sorted in YX order, then they could be processed in one linear pass through both. 3. Should the pixman API be based on something other than images? Finally, the more tangential question is whether pixman's API should be based on image objects in the future. They tend to be a little clumsy to use in both cairo and X. I wrote down some halfbaked ideas here: http://cgit.freedesktop.org/~sandmann/pixman/tree/docs/new-api?h=docs inspired by Chris' old suggestion that pixman should behave more like a GPU. The gist of it is that instead of operating on image objects, there is just one pixman_context_t that has a bunch of properties that can be set. Once all the properties are set, the user is expected to call pixman_context_composite(). There is also an api that allows bulk setting of properties so that an application can submit a list of commands and pixman would then carry them out, pretty much like a GPU would. It could be considered a generalization of the pixman_compositor_t idea from Chris' patch. Comments on all three questions welcome. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Workaround bug in llvm-gcc
Andrea Canciani ranm...@gmail.com writes: The same issue affects non-Apple llvm-gcc, but it's deprecated and there will be no further releases. I would not be surprised if Apple just started using clang as the default compiler in future XCode versions. Okay, if there won't ever be a fixed version of llvm-gcc, then I guess there is no point trying to be future-proof. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Workaround bug in llvm-gcc
Andrea Canciani ranm...@gmail.com writes: llvm-gcc (shipped in Apple XCode 4.1.1 as the default compiler or in the 2.9 release of LLVM) performs an invalid optimization which unifies the empty_region and the bad_region structures because they have the same content. A bug has been filed against Apple Developers Tool for this issue. This commit works around this bug by making one of the two structures volatile, so that it cannot be merged. Is the bug report publicly readable? static const box_type_t PREFIX (_empty_box_) = { 0, 0, 0, 0 }; static const region_data_type_t PREFIX (_empty_data_) = { 0, 0 }; +#if defined (__llvm__) !defined (__clang__) +static const volatile region_data_type_t PREFIX (_broken_data_) = { 0, 0 }; +#else static const region_data_type_t PREFIX (_broken_data_) = { 0, 0 }; +#endif It worries me a little that these #ifdefs will still trigger when Apple fix their compiler. Can we detect it at configure time instead by running something like this: typedef struct { int x, y; } blah_t; static const blah_t b1 = { 0, 0 }; static const blah_t b2 = { 0, 0 }; int main () { if (b1 == b2) return 1; else return 0; } Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] bits: optimise fetching width==1 repeats
Chris Wilson ch...@chris-wilson.co.uk writes: diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c index 4e9ed14..a762c27 100644 --- a/pixman/pixman-bits-image.c +++ b/pixman/pixman-bits-image.c @@ -1149,7 +1149,34 @@ bits_image_fetch_untransformed_repeat_normal (bits_image_t *image, while (y = image-height) y -= image-height; -while (width) +if (image-width == 1) +{ + /* XXX duplication from fetch solid */ + if (wide) + { + uint64_t color; + uint64_t *b = (uint64_t *)buffer; + uint64_t *end; + + color = image-fetch_pixel_64 (image, 0, y); + + end = b + width; + while (b end) + *(b++) = color; + } + else + { + uint32_t color; + uint32_t *end; + + color = image-fetch_pixel_32 (image, 0, y); + + end = buffer + width; + while (buffer end) + *(buffer++) = color; + } +} +else while (width) { while (x 0) x += image-width; Makes sense to me, but I think the duplicated code should be in a new force_inline function replicate_pixel() or something. Also, maybe the patch got mangled in the mail, but it looks to me like it uses eight-space indents. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC] A spans interface for pixman
Bill Spitzak spit...@gmail.com writes: Yes; to be a bit more precise, with the current code, the alpha channel is a sampled version of the polygon and the rgb channels are 0. Ie., the pixels are black with varying levels of opacity. An interesting alternative possibility would be to consider the RGB channels duplicates of the alpha channel, which would make the pixels shades of white instead. This is definitely better and much more useful. I would in fact have ALL 1-channel images treated this way, removing a lot of need to have different operators whose purpose is only to source a 1-channel image that defaults to acting like 0,0,0,x. I'm not sure what will break in pixman and/or cairo if this is done, however. A way to introduce this behavior without breaking backwards compatibility would be to just add new PIXMAN_w8/w4/w1 format that would operate in this way. I'm not sure which other one-channel images you mean. Aside from a8/a4/a1, the only other one-channel images pixman supports are g8/g4/g1 and c8/c4/c1 which are paletted and not used much. It could be interesting to add support for polygon operators: pixman_image_create_polygon_intersection (polygon1, polygon2) pixman_image_create_polygon_union (polygon1, polygon2); ... These would return a new polygon image that would contain pointers to the other two. It may be possible to actually intersect these and make a new polygon with full accuracy. Though in some cases such as matching lines it can be painful and may produce a much larger number of path components than the two originals. But also if users do this enough times you will end up with a whole tree of polygons and CSG-like work. What I meant was that you could rasterize all the polygons at the same time, and then do the intersect/union at the subpixel scanline level. That would produce the same accuracy as intersecting the polygons up front. Questions: Can the polygon images be scaled or transformed when used as a source? (and produce a correct antialiased result, not scaled pixels). Right now they can't, but they probably should considering all other images can be. Is this any more complicated than simply transforming all the edge endpoints? At the cairo level how could a user directly create a polygon surface? Could it just detect that you only drew paths filled with white? I don't know if a polygon pattern is interesting as a user-visible concept in cairo. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] In pixman_image_create_bits() allow images larger than 2GB
Chris Wilson ch...@chris-wilson.co.uk writes: On Thu, 11 Aug 2011 06:43:03 -0400, Søren Sandmann sandm...@cs.au.dk wrote: From: Søren Sandmann Pedersen s...@redhat.com There is no reason for pixman_image_create_bits() to check that the image size fits in int32_t. The correct check is against size_t since that is what the argument to calloc() is. The only question I have is whether ssize_t width and height is appropriate, the valid image dimensions are still only int32_t right? Yeah, you are right. It's only an internal function, so using ssize_t is harmless, but there's no need to introduce this confusion. New version below. Thanks, Soren commit cb49f9427ead511b9bd60e771fc1b0d74d72d22a Author: Søren Sandmann Pedersen s...@redhat.com Date: Thu Aug 11 06:30:43 2011 -0400 In pixman_image_create_bits() allow images larger than 2GB There is no reason for pixman_image_create_bits() to check that the image size fits in int32_t. The correct check is against size_t since that is what the argument to calloc() is. This patch fixes this by adding a new _pixman_multiply_overflows_size() and using it in create_bits(). Also prepend an underscore to the names of other similar functions since they are internal to pixman. V2: Use int, not ssize_t for the arguments in create_bits() since width/height are still limited to 32 bits, as pointed out by Chris Wilson. diff --git a/pixman/pixman-bits-image.c b/pixman/pixman-bits-image.c index 4e9ed14..f5b66dc 100644 --- a/pixman/pixman-bits-image.c +++ b/pixman/pixman-bits-image.c @@ -1490,10 +1490,10 @@ static uint32_t * create_bits (pixman_format_code_t format, int width, int height, - int *rowstride_bytes) + int * rowstride_bytes) { int stride; -int buf_size; +size_t buf_size; int bpp; /* what follows is a long-winded way, avoiding any possibility of integer @@ -1502,11 +1502,11 @@ create_bits (pixman_format_code_t format, */ bpp = PIXMAN_FORMAT_BPP (format); -if (pixman_multiply_overflows_int (width, bpp)) +if (_pixman_multiply_overflows_int (width, bpp)) return NULL; stride = width * bpp; -if (pixman_addition_overflows_int (stride, 0x1f)) +if (_pixman_addition_overflows_int (stride, 0x1f)) return NULL; stride += 0x1f; @@ -1514,7 +1514,7 @@ create_bits (pixman_format_code_t format, stride *= sizeof (uint32_t); -if (pixman_multiply_overflows_int (height, stride)) +if (_pixman_multiply_overflows_size (height, stride)) return NULL; buf_size = height * stride; diff --git a/pixman/pixman-private.h b/pixman/pixman-private.h index 6a3935e..a25897d 100644 --- a/pixman/pixman-private.h +++ b/pixman/pixman-private.h @@ -691,10 +691,13 @@ void * pixman_malloc_abc (unsigned int a, unsigned int b, unsigned int c); pixman_bool_t -pixman_multiply_overflows_int (unsigned int a, unsigned int b); +_pixman_multiply_overflows_size (size_t a, size_t b); pixman_bool_t -pixman_addition_overflows_int (unsigned int a, unsigned int b); +_pixman_multiply_overflows_int (unsigned int a, unsigned int b); + +pixman_bool_t +_pixman_addition_overflows_int (unsigned int a, unsigned int b); /* Compositing utilities */ void diff --git a/pixman/pixman-utils.c b/pixman/pixman-utils.c index cb4e621..49e3488 100644 --- a/pixman/pixman-utils.c +++ b/pixman/pixman-utils.c @@ -31,15 +31,19 @@ #include pixman-private.h pixman_bool_t -pixman_multiply_overflows_int (unsigned int a, - unsigned int b) +_pixman_multiply_overflows_size (size_t a, size_t b) +{ +return a = SIZE_MAX / b; +} + +pixman_bool_t +_pixman_multiply_overflows_int (unsigned int a, unsigned int b) { return a = INT32_MAX / b; } pixman_bool_t -pixman_addition_overflows_int (unsigned int a, - unsigned int b) +_pixman_addition_overflows_int (unsigned int a, unsigned int b) { return a INT32_MAX - b; } ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] mmx: fix unaligned accesses
matts...@gmail.com writes: @@ -2569,20 +2585,31 @@ mmx_composite_in_8_8 (pixman_implementation_t *imp, src_line += src_stride; w = width; - if unsigned long)dest_image 3) == 0) - (((unsigned long)src_image 3) == 0)) + while (w (unsigned long)dst 3) { - while (w = 4) - { - uint32_t *s = (uint32_t *)src; - uint32_t *d = (uint32_t *)dst; + uint8_t s, d; + uint16_t tmp; + + s = *src; + d = *dst; - *d = store (in (load (*s), load (*d))); + *dst = MUL_UN8 (s, d, tmp); - w -= 4; - dst += 4; - src += 4; - } + src++; + dst++; + w--; + } + + while (w = 4) + { + uint32_t *s = (uint32_t *)src; + uint32_t *d = (uint32_t *)dst; + + *d = store (in (load (*s), load (*d))); + + w -= 4; + dst += 4; + src += 4; } while (w--) The previous code here is totally bogus. These two checks: if unsigned long)dst_image 3) == 0) (((unsigned long)src_image 3) == 0)) check that the pointer to the *image struct* is aligned, which is completely useless. There are some other instances of this problem (in_n_8_8, add_n_8_8). This was not introduced by your patch, but we may as well get them fixed anyway, not the least because those will also SIGBUS on ARM. Your patch changes the code to just check that the destination is aligned. Is that actually sufficient? In this code: while (w = 4) { uint32_t *s = (uint32_t *)src; uint32_t *d = (uint32_t *)dst; *d = store (in (load (*s), load (*d))); w -= 4; dst += 4; src += 4; } there doesn't seem to be any protection against (*s) being an unaligned access. The same thing seems to apply to other changes in the patch. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/4] ARM: REPEAT_NORMAL support for standard fast paths
Taekyun Kim podai...@gmail.com writes: On 07/11/2011 09:18 PM, Soeren Sandmann wrote: This performance regression was introduced when the simple repeat code was removed. But I'm not sure hacking it into the ARM backend is the right plan. See this mail for a different approach: http://lists.freedesktop.org/archives/pixman/2010-December/000815.html I have a branch with a start on doing it that way here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=simple-repeat which may or may not be useful as a starting point. (I'd be interested in seeing what the benchmark results of that branch are). It seems to be the right place where we can put simple repeat codes. It can handle simple repeat for both sse2 and ARM at common place. I'm a bit worried that tiling does not give us good memory access patterns causing cache overhead. 1 x n source images would be as slow as 90 degree rotation. Memory buffer will be accessed in vertical order. Yeah, that is a problem, and that was in fact one of the reasons the original 'simple repeat' code was deleted. It's memory access pattern for 1xn images was really bad. It may be that adding this support to the ARM backend, as you did, is the better way. May I take yours as a starting point and integrate with mine? Of course. Below is benchmark results. (Core2 Duo E5200) I couldn't see any noticeable performance changes. No, it looks like a slowdown ... Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/4] ARM: REPEAT_NORMAL support for standard fast paths
Taekyun Kim podai...@gmail.com writes: Current standard fast paths do not support REPEAT_NORMAL causing such paths being on top of profile result of some popular web sites. They usually use REPEAT_NORMAL to draw some tiled background or gradient. This has come up before: http://lists.freedesktop.org/archives/pixman/2010-November/000731.html This performance regression was introduced when the simple repeat code was removed. But I'm not sure hacking it into the ARM backend is the right plan. See this mail for a different approach: http://lists.freedesktop.org/archives/pixman/2010-December/000815.html I have a branch with a start on doing it that way here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=simple-repeat which may or may not be useful as a starting point. (I'd be interested in seeing what the benchmark results of that branch are). Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.23.2 now available
A new pixman release 0.23.2 is now available. This is the first development release leading up to a stable 0.24 release. - Improved support for tiled bilinear scaling on SSE2 and ARM [Taekyun Kim] - Fix for a bug causing glyph corruption on ARM devices with a 16 bit frame buffer [Søren Sandmann] Please report bugs to pixman at lists.freedesktop.org or file them at https://bugs.freedesktop.org/enter_bug.cgi?product=pixman Søren tar.gz: http://cairographics.org/snapshots/pixman-0.23.2.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.2.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.23.2.tar.bz2 Hashes: MD5: 775c5ecde7485bd69936dcc978f5aa58 pixman-0.23.2.tar.gz MD5: 2e2805f5ca02edeb15a7862779670069 pixman-0.23.2.tar.bz2 SHA1: 5100c1e2d4566e507ead5c638d2a66bb165b2e63 pixman-0.23.2.tar.gz SHA1: 62568ed6eb84c0312cc0e6f9c5f4bc12b8202492 pixman-0.23.2.tar.bz2 GPG signature: http://cairographics.org/snapshots/pixman-0.23.2.tar.gz.sha1.asc (signed by Soren Sandmann sandm...@daimi.au.dk) Git: git://git.freedesktop.org/git/pixman tag: pixman-0.23.2 Log: Andrea Canciani (3): test: Fix compilation on win32 Include noop in win32 builds Silence autoconf warnings Dave Yeo (1): Check for working mmap() Nis Martensen (1): Fix a few typos in pixman-combine.c.template Søren Sandmann (5): mmx: Delete some unused variables sse2: Delete some unused variables demos: Comment out some unused variables ARM: Fix two bugs in neon_composite_over_n__0565_ca(). test: Make fuzzer-find-diff.pl executable Søren Sandmann Pedersen (12): Post-release version bump to 0.23.1 Add a noop implementation. Add a noop composite function for the DST operator Move noop dest fetching to noop implementation Add a noop src iterator Move NULL iterator into pixman-noop.c Move NOP src iterator into noop implementation. Replace instances of dst_* with dest_* In pixman-general.c rename image_parameters to {src, mask, dest}_image Replace argumentxs to composite functions with a pointer to a struct blitters-test: Make common formats more likely to be tested. Pre-release version bump to 0.23.2 Taekyun Kim (6): Replace boolean arguments with flags for bilinear fast path template REPEAT_NORMAL support for bilinear fast path template sse2: Declare bilinear src__ REPEAT_NORMAL composite function ARM: Add REPEAT_NORMAL functions to bilinear BIND macros Enable REPEAT_NORMAL bilinear fast path entries Bilinear REPEAT_NORMAL source line extension for too short src_width ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PIXMAN][PATCH 0/6] REPEAT_NORMAL support for bilinear scaled fast paths
Taekyun Kim podai...@gmail.com writes: I've done account request steps. It's my great pleasure to contribute to this project. Let me know if my request is missing something or there is anything that I have to know. I have approved the request; the account should be created soon. Patches should still be sent to the list for review, but if make check passes, and no one has said anything after some reasonable period of time, feel free to just push. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/3] Composite args in stack allocated struct
Søren Sandmann sandm...@cs.au.dk writes: I'm actually all for this change if it gets confirmed to work a bit better and faster (and I expect that it should, considering that all this data can be passed through some nested calls multiple times). Hopefully we are not running in circles. The FbComposeData struct was used in precisely one place to pass information between two functions. It could not possibly have provided any benefit as it existed in X server 1.3, which is the version that eventually became pixman. I think what might have happened is that full support for FbComposeData was introduced in one of the several forks that existed around 2005 and never merged into Xorg, except for that one place. So I'm not too worried about running in circles, but I don't know if it is actually a performance advantage. I tend to think that if such microoptimizations really result in measurable performance advantage, then then the problem should probably be fixed in some other way. The following emails contain a rebased version of this branch against master. Performance numbers: Before: [ # ] backend test min(s) median(s) stddev. count [ # ]image: pixman 0.23.1 [ 0]image firefox-talos-gfx-20090702 36.974 36.987 0.03% 4/6 After: [ # ] backend test min(s) median(s) stddev. count [ # ]image: pixman 0.23.1 [ 0]image firefox-talos-gfx-20090702 37.036 37.072 0.04% 5/6 So it's about 0.2% slower on this benchmark, which is very glyph heavy and therefore calls pixman_image_composite32() a lot. But the main point to me is performance, but the ability to pass more information to the compositing routines. It's also a reduction in line count. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] Optimize BILINEAR filter to NEAREST for identity transforms
Siarhei Siamashka siarhei.siamas...@gmail.com writes: The problem for the analyze_extents() code is that if it chooses to provide the COVER flag, then some code might read common.filter and try to read outside the image to do bilinear filtering. But without COVER, the standard fast paths won't run. It seems to me that the cause of the problem is that the meaning of COVER depends on what the filter is: it means there is enough space around the sample area for the chosen filter, which then becomes wrong when doing filter reductions. The meaning of COVER flag does not become wrong after doing filter reduction, it's the common.filter variable which gets kind of wrong. This common.filter value means a filter value set by the user application via pixman_image_set_filter() call. When no filter reductions are done, it is also happens to be valid in any part of code. Yes, we do in all cases need to preserve the state that the user set. This is in fact one of the key properties of the flags: they contain a 'summary' of the user set state, but they are not in themselves a full description of the image. The full description is stored elsewhere in the image struct. A subjective advantage is that it also gets rid of a somewhat illogical NO_CONVOLUTION_FILTER flag. This flag does not cause any problems itself, but just distracts attention when looking at the printout of fast path flags. The flags with NO in their names are a bit confusing in that they lead to double negatives, but they are needed, because fast paths need to say I _can't_ deal with property X, so I'll set the FAST_PATH_NO_X flag. This is a key part of how the flags work. As you wrote yourself: The flags in pixman are implemented in such a way that they all have some kind of positive meaning, indicating that the operation is going to be more simple than the general case in one way or another. Having more flags set on the initial analysis stage is always good. Forgetting to set some of the flags is safe in the sense that it will not make pixman misbehave, pixman will just run slower because of not taking some fast paths. Each fast path function has a mask with flags (provided separately for source, mask and destination), with each bit set there telling something like I can't handle case X (or with alternative interpretation I can't handle cases other than Y). Fast path functions with many flags set in their description are allowed to cut more corners and run a lot faster because of this. When asked to do something via pixman_image_composite() function call, pixman first calculates the flags for the current operation (again, separately for the source, mask and destination images). Setting each bit is kind of like telling to the the rest of the code: You are allowed not to care about case X. [ http://www.mail-archive.com/pixman@lists.freedesktop.org/msg00569.html ] But this breaks down if the compositing code starts *relying* on the flags being set, for example to know what the filter is; it will no longer be the case that Forgetting to set some of the flags is safe in the sense that it will not make pixman misbehave, pixman will just run slower because of not taking some fast paths. This is an important rule to preserve, because it allows people to work on the flag computing code without knowing the details of the compositing code, and it allows people to write a new compositing routine or implementation without knowing that the flags even exist. When such general rules break down is when the code base becomes incomprehensible and only understandable to people who are intimately familiar with it. Regarding the potential risk of using common.filter accidentally in a wrong way, I think that this variable can be just renamed and get an unattractive discouraging name, maybe something like common.filter_before_optimization. Also for additional safety and debugging purposes, maybe even valgrind client requests could be used to mark this variable as unaccessible at the wrong time in a special debug configuration, but that could be an overkill. Having to rename a basic property of images to prevent other code from reading it, it is a symptom of a misdesign taking place. It adds a strong coupling between the compositing code and the flag computing code that wasn't there before. So maybe a solution would be to provide two separate flags: COVER_NEAREST: there is enough space for a nearest filter COVER_BILINEAR: there is enough space for a bilinear filter These are unambiguous and because either there is enough space for a bilinear filter, or there isn't, they can be set independently of what the filter actually is. Sure, it's one of the multiple possible options. But I think that evaluating and potentially setting both of these flags may introduce a bit more overhead on the common code path. Could be, but not much difference in
Re: [Pixman] [PATCH] fb: Fix memcpy abuse
Keith Packard kei...@keithp.com writes: On Mon, 23 May 2011 14:34:55 +0200, Soeren Sandmann sandm...@cs.au.dk wrote: I think Keith said that the merge window for 1.11 closes May 29th though, so there may not be enough time. That's the current schedule, with the release happening on August 19. Let me know if that works for pixman and we can look at merging a requirement for 0.23 before the merge window closes this Friday. If a requirement is merged for pixman 0.23, we can make 0.24 happen before August 19th. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC] AVX codepaths
Matt Turner matts...@gmail.com writes: I took the SSE2 code paths and modified them slightly to generate AVX code. AVX doesn't have integer operations like SSE2, so the range of 256-bit operations is mostly limited to copying pixels. Following this email are three patches: 1) AVX fetcher for x8r8g8b8 (a8 and r5g6b5 require integer unpack operations) 2) AVX blt, composite_copy_area and associated fast paths 3) AVX fill `make check` still passes all 18 tests. The only other bit of low-hanging fruit I see is composite_src_x888_, but perhaps there are more fast paths possible with AVX. Of the low-hanging fruit, this is probably the main one. One other might be to compile pixman-sse2.c with -mavx in order to take advantage of the VEX prefix for the integer SSE2 instructions. I'm not sure it would be worth the build system complications though. Of the higher-hanging fruit that AVX could be used for, gradients is the obvious candidate. Radial gradients in particular could become really seriously faster. I think Andrea has some ideas about this. The moonlight project has a copy of pixman with some gradient performance improvements using SSE. See this git repository: https://github.com/mono/moon Maybe some of that code could be reused, but I think pixman master has diverged a bit from when they forked. It would also be useful to redo the wide path used for 10bpc formats to make it use single precision floating point instead. There is a branch here: http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/floatpipe4 that has a start on this. Clearly, if this branch of something similar gets cleaned up and landed, AVX would be useful to accelerate it. Also, more generally, it would be interesting to move the test suite from the current bit-exact standard towards a more close-enough standard, so that floating point SIMD or GPUs or narrower integer arithmetic could be used in cases where we currently rely on precisely-8-bit integer arithmetic. I played around with cairo-perf-trace on my Sandy Bridge, but wasn't able to see any improvements, which may be for any number of reasons, including but not limited to a) I don't know how to use cairo-perf-trace, b) AVX doesn't yield any noticeable improvements, c) the code needs to be better optimized. It's pretty likely that these operations are already close to the memory bandwidth limit. There is a program in test/lowlevel-blt-bench that attempts to measure performance of various operations relative to memory bandwidth. Maybe it can produce some interesting results. Still TODO are AVX cpuid detection and to figure out about SunCC support. Regarding SunCC, feel free to just support GCC initially. The various types of compiler support really have to be maintained by people who use those compilers regularly. I'll try to take a look at the patches later this week, but I don't have any Sandy Bridge hardware atm., so I can't actually run it. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH]Compile failure on OS/2
Dave Yeo dave.r@gmail.com writes: Hi, compile on trunk currently dies here, ... utils.c: In function 'fence_malloc': utils.c:257: warning: implicit declaration of function 'mmap' utils.c:257: error: 'MAP_PRIVATE' undeclared (first use in this function) ... As we have sys/mman.h but no working mmap(). One possible fix is attached, perhaps the check for sys/mman.h should be removed? Pushed, thanks. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC] A couple of minor tweaks for sse2 blitting
Chris Wilson ch...@chris-wilson.co.uk writes: The generic implementation of the BLT routine is to deny the operation. So we need to add support to the sse2 implementation in order to perform 8bpp. This looked relatively straightforward, so I'm obviously overlooking something. I don't think you are overlooking anything - I added the blt() routine a long time ago as a quick hack because fbBlt was showing up on RHEL4 profiles, and so pixman_blt() ended up as a not-very-well-though-out API with a bare minimal implementation. Please review kindly. I think adding a fuzzer like blitters-test would be useful for validation of these changes. Presumably we would like to verify that we can blit between all identical formats from 1bpp to 64bpp. Indeed, such a test would be very useful. Should the generic implementation fallback to converting the BLT into an image composite operation? Yes, it would make sense to do that and then never return FALSE. That does complicate the situation a little if we add support for overlapping blt at some point, though, but probably nothing that can't be handled. These patches look basically right to me, but I'll follow up with a couple of minor comments on each one. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] sse2: if width == stride, convert to linear image
Chris Wilson ch...@chris-wilson.co.uk writes: +if (width == stride) { + width *= height; + height = 1; +} Do we need a check here that the mulitplication doesn't overflow? pixman_multiply_overflows_int() can be used for that. +if (width == src_stride width == dst_stride) { + width *= height; + height = 1; +} Same here, and also braces go on their own line in pixman. It would also be useful to have a brief comment in the commit log about what a linear image is. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman stable release 0.22.0 now available
A new stable pixman release 0.22.0 is now available. Highlights of this release: - New r8g8b8a8 and r8g8b8x8 image formats [Alexandros Frantzis] - Much faster image scaling on ARM and x86 [Siarhei Siamashka, Taekyun Kim] - Faster 90/270 degree image rotation [Siarhei Siamashka] - More comprehensive support for compositing triangles and trapezoids [Søren Sandmann Pedersen] Plus a large number of other performance improvements, bug and portability fixes, and improvements to the test suite. Thanks to everybody who contributed to pixman 0.22.0, including Alan Coopersmith, Alexandros Frantzis, Andrea Canciani, Cyril Brulebois, Gilles Espinasse, Jon TURNEY, Rolland Dudemaine, Siarhei Siamashka, Søren Sandmann Pedersen, and Taekyun Kim. Søren tar.gz: http://cairographics.org/releases/pixman-0.22.0.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.22.0.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.22.0.tar.bz2 Hashes: MD5: cb8f3cb5ce2c8d7294f73ecb7021fda6 pixman-0.22.0.tar.gz MD5: 307fe4d7dc83b1a558c362907097c0d0 pixman-0.22.0.tar.bz2 SHA1: da0a9c63fa315f163a32c7e40dd0b2a0f88c0d21 pixman-0.22.0.tar.gz SHA1: d24ea233755d7dce9f0d93136ad99fba8d4e4fa0 pixman-0.22.0.tar.bz2 GPG signature: http://cairographics.org/releases/pixman-0.22.0.tar.gz.sha1.asc (signed by Søren Sandmann Pedersen sandm...@daimi.au.dk) Git: git://git.freedesktop.org/git/pixman tag: pixman-0.22.0 Log: Alan Coopersmith (1): Sun's copyrights belong to Oracle now Alexandros Frantzis (2): Add simple support for the r8g8b8a8 and r8g8b8x8 formats. Add support for the r8g8b8a8 and r8g8b8x8 formats to the tests. Andrea Canciani (10): Remove unused stop_range field Fix opacity check Improve conical gradients opacity check Improve handling of tangent circles Add a test for radial gradients Fix compilation on Win32 test: Fix tests for compilation on Windows test: Add Makefile for Win32 Do not include unused headers test: Silence MSVC warnings Cyril Brulebois (2): Fix argument quoting for AC_INIT. Fix linking issues when HAVE_FEENABLEEXCEPT is set. Gilles Espinasse (2): Fix missing AC_MSG_RESULT value from Werror test Fix OpenMP not supported case Jon TURNEY (1): Remove stray #include fenv.h Rolland Dudemaine (4): test: Fix for mismatched 'fence_malloc' prototype/implementation Correct the initialization of 'max_vx' test: Use the right enum types instead of int to fix warnings Fix variable was set but never used warnings Siarhei Siamashka (69): Fixed broken configure check for __thread support Do CPU features detection from 'constructor' function when compiled with ARM: fix 'vld1.8'-'vld1.32' typo in add__ NEON fast path ARM: NEON: source image pixel fetcher can be overrided now ARM: nearest scaling support for NEON scanline compositing functions ARM: macro template in C code to simplify using scaled fast paths ARM: performance tuning of NEON nearest scaled pixel fetcher ARM: NEON optimization for scaled over__ with nearest filter ARM: NEON optimization for scaled over__0565 with nearest filter ARM: NEON optimization for scaled src__0565 with nearest filter ARM: NEON optimization for scaled src_0565_ with nearest filter ARM: optimization for scaled src_0565_0565 with nearest filter C fast path for a1 fill operation ARM: added 'neon_composite_over_n_8_8' fast path ARM: introduced 'fetch_mask_pixblock' macro to simplify code ARM: better NEON instructions scheduling for over_n_8_0565 ARM: added 'neon_composite_over__n_0565' fast path ARM: reuse common NEON code for over_{n_8|_n|_8}_0565 ARM: added 'neon_composite_over_0565_n_0565' fast path ARM: added 'neon_composite_add__8_' fast path ARM: better NEON instructions scheduling for add___ ARM: added 'neon_composite_add_n_8_' fast path ARM: added 'neon_composite_add__n_' fast path ARM: added flags parameter to some asm fast path wrapper macros ARM: added 'neon_composite_in_n_8' fast path ARM: added 'neon_src_rpixbuf_' fast path Fix for potential unaligned memory accesses
Re: [Pixman] [PATCH 0/7] Faster pipelined ARM NEON bilinear scalers: 'src_8888_8888' and 'src_8888_0565'
Siarhei Siamashka siarhei.siamas...@gmail.com writes: As far as I know, GdkPixbuf uses the equivalent of a bilinear filter with four bits of precision. Once you scale more than 16x, banding artefacts are clearly visible, but people don't normally complain about its quality, and 16x scaling with bilinear interpolation won't look great in any case, so if dropping precision to four bits really makes a big performance difference, maybe we should just change PIXMAN_FILTER_BILINEAR to do that. Reducing bilinear interpolation precision provides a lot more performance benefits for x86 (at least 7 bits would be better) and 4 bits should be really the best. Anyway, who is going to decide whether the reduced precision is still ok? Well, if no one has a problem with dropping the precision, then dropping precision is OK, I guess. One can see a heavy use of bilinear scaling (over__8_ fast path, also with optional horizontal flipping) on the web pages like this, especially if configured to animate 1000 fishes: http://ie.microsoft.com/testdrive/performance/fishietank/ I wonder how well pixman can actually perform there if it gets some good SSSE3 bilinear optimizations and starts utilizing all CPU cores (maybe trying OpenMP should be easy)? For upscaling in particular, it may be interesting to look at a different algorithm where two horizontally expanded souce lines are cached and the destination scanlines are computed as a linear interpolation between those two scanlines. This drops a little precision because the two cache scanlines would be stored with 8 bits of color depth and not 16 as we do now, but that probably would be completely invisible. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path
Taekyun Kim podai...@gmail.com writes: I marked bubbles that I could find. Here we can make step 3 independent(or less dependent) from above step 6 and 7 by proper allocation of registers. So we can insert some instructions of step 3 into the above bubble positions. Output of step 1(fetch dest) will be read in step 4 and output of step 2(fetch mask) will be read in step 3. So I think you can fetch mask first and then dest at the beginning of tail_head block and remaining bubbles can be filled with instructions from step 3. Maybe this does not work, or there can be some other better ways to achieve optimal performance. Thanks - these comments were helpful. There is a new patch below that implements these suggestions. I can't find anymore stalls in the inner loop. This version does produce some measurable speedup with data in L1 cache compared to the non-pipelined version. From typeically 85-95 Mpixels/s to 90-100 Mpixels/s. The precision of these measurements still leave something to be desired, but it's pretty clear that there is some amount of improvement here. Soren From 32cf3d76454c9984b3e79d92c691e46b8b0b12f7 Mon Sep 17 00:00:00 2001 From: =?utf-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com Date: Sat, 2 Apr 2011 23:24:48 -0400 Subject: [PATCH] ARM: Add 'neon_composite_over_n__0565' fast path This improves the performance of the firefox-talos-gfx benchmark with the image16 backend. Benchmark on an 800 MHz ARM Cortex A8: Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image16firefox-talos-gfx 121.773 122.218 0.15%6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image16firefox-talos-gfx 85.247 85.563 0.22%6/6 V2: Slightly better instruction scheduling based on comments from Taekyun Kim. V3: Eliminate all stalls from the inner loop. Also based on comments from Taekyun Kim. --- pixman/pixman-arm-neon-asm.S | 169 ++ pixman/pixman-arm-neon.c |4 + 2 files changed, 173 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 5e9fda3..833f18c 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -1426,6 +1426,175 @@ generate_composite_function \ /**/ +.macro pixman_composite_over_n__0565_ca_process_pixblock_head +/* + * 'combine_mask_ca' replacement + * + * input: solid src (n) in {d8, d9, d10, d11} [B, G, R, A] + * mask in {d24, d25, d26} [B, G, R] + * output: updated src in {d0, d1, d2 } [B, G, R] + * updated mask in {d24, d25, d26} [B, G, R] + */ +vmull.u8q0, d24, d8 +vmull.u8q1, d25, d9 +vmull.u8q6, d26, d10 +vmull.u8q9, d11, d25 +vmull.u8q12, d11, d24 +vmull.u8q13, d11, d26 +vrshr.u16 q8, q0, #8 +vrshr.u16 q10, q1, #8 +vrshr.u16 q11, q6, #8 +vraddhn.u16 d0, q0, q8 +vraddhn.u16 d1, q1, q10 +vraddhn.u16 d2, q6, q11 +vrshr.u16 q11, q12, #8 +vrshr.u16 q8, q9, #8 +vrshr.u16 q6, q13, #8 +vraddhn.u16 d24, q12, q11 +vraddhn.u16 d25, q9, q8 +/* + * convert 8 r5g6b5 pixel data from {d4, d5} to planar 8-bit format + * and put data into d16 - blue, d17 - green, d18 - red + */ + vshrn.u16 d17, q2, #3 + vshrn.u16 d18, q2, #8 +vraddhn.u16 d26, q13, q6 + vsli.u16q2, q2, #5 + vsri.u8 d18, d18, #5 + vsri.u8 d17, d17, #6 +/* + * 'combine_over_ca' replacement + * + * output: updated dest in d16 - blue, d17 - green, d18 - red + */ +vmvn.8 q12, q12 + vshrn.u16 d16, q2, #2 +vmvn.8 d26, d26 +vmull.u8q6, d16, d24 +vmull.u8q7, d17, d25 +vmull.u8q11, d18, d26 +.endm + +.macro pixman_composite_over_n__0565_ca_process_pixblock_tail +/* ... continue 'combine_over_ca' replacement */ +vrshr.u16 q10, q6, #8 +vrshr.u16 q14, q7, #8 +vrshr.u16 q15, q11, #8 +vraddhn.u16 d16, q10, q6 +vraddhn.u16 d17, q14, q7 +vraddhn.u16 d18, q15, q11 +vqadd.u8q8, q0, q8 +vqadd.u8d18, d2, d18 +/* + * convert the results in d16, d17, d18 to r5g6b5 and store + * them into {d28, d29} + */ +vshll.u8q14, d18, #8 +vshll.u8q10, d17, #8 +vshll.u8q15, d16, #8 +vsri.u16q14, q10, #5 +vsri.u16q14, q15, #11 +.endm + +.macro pixman_composite_over_n__0565_ca_process_pixblock_tail_head +fetch_mask_pixblock +vrshr.u16 q10, q6, #8 +vrshr.u16 q14, q7, #8 +vld1.16 {d4, d5}, [DST_R, :128]! +vrshr.u16 q15, q11, #8 +vraddhn.u16 d16, q10, q6 +vraddhn.u16 d17, q14, q7 +
Re: [Pixman] Unknown thread local support for this system
Galen Lee gulin20...@gmail.com writes: Hi,gues! I compiled pixman0.20.0 use the NDK.With the configure as follow: CC=agcc CPPFLAGS=-I/data/local/include LDFLAGS=-L/data/local/lib \ CXX=agcc LD=arm-eabi-ld RANLIB=arm-eabi-ranlib \ PKG_CONFIG_LIBDIR=/data/local/lib/pkgconfig:/data/local/share/pkgconfig/ \ ./configure \ --prefix=/data/local \ --host=arm-eabi-linux \ --enable-shared \ and get the follow errors: In file included from pixman-private.h:17, from pixman-access.c:35: pixman-compiler.h:202:6: error: #error Unknown thread local support for this system. Pixman will not work with multiple threads. Define PIXMAN_NO_TLS to acknowledge and accept this limitation and compile pixman without thread-safety support. How can I do to solve this problem? Well, the question is what type of thread local storage does Android provide? Google seems to suggest that it supports pthread_setspecific(). If so, it should be detected at configure time. If it isn't, you'll probably need to look at config.log to find out what went wrong. Alternatively, if Android supports some other type of thread local storage, you'll need to add support for that. Or, if you don't call pixman from more than one thread, you can simply define PIXMAN_NO_TLS. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] ARM: NEON scanline functions for bilinear scaling with A8 mask
Siarhei Siamashka siarhei.siamas...@gmail.com writes: or are these macros being duplicated between pixman-arm-neon-asm.S and this new file? If so, maybe it would be better to put them in a header file and share them. The whole problem with sharing these macros is that they already did change a bit, and they are going to change even more in order to improve performance. Taekyun Kim took the initial implementation of NEON bilinear code and extended it to support more compositing operations such as OVER and ADD, thus branching the older code. And because we don't have a stable foundation yet, it seems to be not very practical updating his code right now to use newer revisions of these common macros because these macros are still going to be updated again. I suggested taking his code as-is for pixman-0.22.0 in order to give the rest of the world at least some more or less complete set of ARM NEON bilinear fast paths sooner. Okay, that makes sense I guess. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] ARM: Add 'neon_composite_over_n_8888_0565' fast path
Hi, I'm also a bit new to NEON, however I hope this brief review can be helpful to you. So let's start. Thanks for the comments. I'm including a new version shortly that incorporates some of the changes, but unfortunately, I couldn't reliably measure much of an improvement. On my system, the L1 value from lowlevel-blt-bench seems to fluctuate between 85 and 95 pretty much no matter what I do. + /* + * 'combine_over_ca' replacement + * + * output: updated dest in d16 - blue, d17 - green, d18 - red + */ + vmvn.8 q12, q12 + vmull.u8 q6, d16, d24 + vmull.u8 q7, d17, d25 + vmvn.8 d26, d26 + vmull.u8 q11, d18, d26 +.endm There are some pipeline stalls, destination registers of vmvn will be available at N3. I think we can reorder above code like this. vmvn.8 q12, q12 vshrn.u16 d16, q2, #2 // last line of previous block vmvn.8 d26, d26 vmull.u8 q6, d16, d24 vmull.u8 q7, d17, d25 vmull.u8 q11, d18, d26 The new code incorpates this change, and I *think* the L1 value was somewhat higher on most runs, but to be honest this might just be confirmation bias on my part ... + vshll.u8 q14, d18, #8 + vshll.u8 q10, d17, #8 + vshll.u8 q15, d16, #8 + vsri.u16 q14, q10, #5 + vsri.u16 q14, q15, #11 It seems there are some what more stalls here. See this page. http://infocenter.arm.com/help/topic/com.arm.doc.ddi0344k/ ch16s06s04.html It seems impossible to eliminate all stalls by reordering only in this code block. Maybe reordering together with some above code lines would work for this. vqadd.u8 d18, d2, d18 vshll.u8 q10, d17, #8 vshll.u8 q14, d18, #8 vqadd.u8 q8, q0, q8 vsri.u16 q14, q10, #5 vshll.u8 q15, d16, #8 vsri.u16 q14, q15, #11 That particular version won't work because the first vshll instruction uses d17 which is affected by the vqadd.u8 q8, q0, q8 instruction. However, it is possible to write it like this: vshll.u8 q14, d18, #8 vqadd.u8 q8, q0, q8 vshll.u8 q10, d17, #8 vsri.u16 q14, q10, #5 vshll.u8 q15, d16, #8 vsri.u16 q14, q15, #11 but then various other stalls appear. In my testing, this seemed to be a slight slowdown (with the same caveat as above). I think the main lesson is that we want to be serious about this level of microoptimization, need more a more robust way of measuring performance than lowlevel-blt-bench provides. In particular more iteratons and tracking of standard deviations would be useful. Also, I noticed that lowlevel-blt-bench copies the destination back into the source in various places: memcpy (src, dst, BUFSIZE); memcpy (dst, src, BUFSIZE); This could affect performance if dst ends up containing mostly zeros. In the cases where the source is supposed to be solid, it could mean operations like over_n_8_() will turn into noops completely. (This happened to me in this case due to a bug in the over_n__0565_ca() code). I think the order of those memcpy() calls should at a minimum be reversed, but I'm not completely clear on why they are there in the first place. To put the sources and destinations into cache perhaps? Soren From 3d2c60bace9d5a49794cc5d0198ff6d3d8fa335d Mon Sep 17 00:00:00 2001 From: =?utf-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com Date: Sat, 2 Apr 2011 23:24:48 -0400 Subject: [PATCH] ARM: Add 'neon_composite_over_n__0565' fast path This improves the performance of the firefox-talos-gfx benchmark with the image16 backend. Benchmark on an 800 MHz ARM Cortex A8: Before: [ # ] backend test min(s) median(s) stddev. count [ 0] image16firefox-talos-gfx 121.773 122.218 0.15%6/6 After: [ # ] backend test min(s) median(s) stddev. count [ 0] image16firefox-talos-gfx 85.247 85.563 0.22%6/6 V2: Slightly better instruction scheduling based on comments from Taekyun Kim. --- pixman/pixman-arm-neon-asm.S | 122 ++ pixman/pixman-arm-neon.c |4 ++ 2 files changed, 126 insertions(+), 0 deletions(-) diff --git a/pixman/pixman-arm-neon-asm.S b/pixman/pixman-arm-neon-asm.S index 5e9fda3..1d9232e 100644 --- a/pixman/pixman-arm-neon-asm.S +++ b/pixman/pixman-arm-neon-asm.S @@ -1426,6 +1426,128 @@ generate_composite_function \ /**/ +.macro pixman_composite_over_n__0565_ca_process_pixblock_head +/* + * 'combine_mask_ca' replacement + * + * input: solid src (n) in {d8, d9, d10, d11} [B, G, R, A] + * mask in {d24, d25, d26} [B, G, R] + * output: updated src in {d0, d1, d2 } [B, G, R] + * updated mask in {d24, d25, d26} [B, G, R] + */ +vmull.u8
Re: [Pixman] [PATCH] Adding TLS for Windows XP
Tristan Schmelcher tschmelc...@google.com writes: What would you think about removing EXPORT_SYMBOL, prepending underscores, and providing an example DllMain that uses the shutdown functions for developers that are compiling Pixman as a standalone DLL? I guess if we have a fix that adds a DllMain and calls the init/shutdown functions from it, it wouldn't be too bad to provide a preprocessor symbol to allow DllMain to be left out. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/2] ARM: Tiny improvement in over_n_8888_8888_ca_process_pixblock_head
Siarhei Siamashka siarhei.siamas...@gmail.com writes: On a second look, other than that, the whole function is actually fine on Cortex-A8 and does not need any other performance tweaks because it did not have many bubbles to fill in the first place. You can push this patch with the additional vmvn.8 d26, d26 instruction move (I don't think a separate patch is needed for that). Alright, pushed to master. I'll follow up on Taekyun Kim's and your comments on the other patch later this week. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Adding TLS for Windows XP
Hi, We don't want to add another API either. It would be nice to make the library multi-thread safe in Windows XP and no memory leaks at the same time. One question regarding your suggestions of freeing the memory in DLL unload. TlsFree will still need to know all the TLS indices (http://msdn.microsoft.com /en-us/library/ms686804(v=vs.85).aspx) and Pixman core will have that information. So we might still need a new API. Isn't it possible to do something similar to what is described here: http://msdn.microsoft.com/en-us/library/ms686997%28v=vs.85%29 ? As I understand it, we only need one TLS index, which can be stored in a global variable in a new pixman-win32.c file. That index is then used with TlsSetValue and TlsGetValue to set and get the thread local data, and can be released with TlsFree() when DllMain() is acalled with DLL_PROCESS_DETACH. Note, I don't really know the Windows API, so I might be misunderstanding something here. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] releases and snapshots
Maarten Bosmans mkbosm...@gmail.com writes: 2011/4/2 Soeren Sandmann sandm...@cs.au.dk: Maarten Bosmans mkbosm...@gmail.com writes: At the releases page http://cairographics.org/releases/ both the latest stable (0.20.2) and unstable (0.21.6) versions are listed. This is confusing for people that don't know our versioning scheme (although it is no unusual scheme). I already know of one person who downloaded 0.21.x thinking it was the latest stable, as also the LATEST-pixman link links to the unstable series. I have been meaning to do this, actually, but have been procrastinating, mainly because doing it would also involve changing the pixman Makefile to do the right thing on make release ... The relevant snippet from cairo/build/Makefile.am/releasing is: RELEASE_OR_SNAPSHOT = $$(if test x$(CAIRO_VERSION_MINOR) = x$$(echo $(CAIRO_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo snapshot; fi) RELEASE_UPLOAD_HOST = cairographics.org RELEASE_UPLOAD_BASE = /srv/cairo.freedesktop.org/www RELEASE_UPLOAD_DIR = $(RELEASE_UPLOAD_BASE)/$(RELEASE_OR_SNAPSHOT)s RELEASE_URL_BASE = http://cairographics.org/$(RELEASE_OR_SNAPSHOT)s Perhaps it's even better to copy that file entirely into the pixman tree and tweak it for pixman, as some of the functionality is currently already in pixman's root Makefile.am. Maarten I have moved the files on www.cairographics.org. Below is a patch that changes the Makefile.am to upload to the correct directory. I'd appreciate if someone could take a look to make sure that I didn't overlook something in the patch. Soren From 2e1134c7f53f6e7bcfcb95001d86916533861513 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?S=C3=B8ren=20Sandmann=20Pedersen?= s...@redhat.com Date: Sat, 2 Apr 2011 14:12:12 -0400 Subject: [PATCH] Makefile.am: Put development releases in snapshots directory Up until now, all pixman release, both snapshots and releases were uploaded to the releases directory on www.cairographics.org, but it's better to development snapshots in the snapshots directory. This patch changes Makefile.am to do that. --- Makefile.am |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) diff --git a/Makefile.am b/Makefile.am index f479a66..658a375 100644 --- a/Makefile.am +++ b/Makefile.am @@ -12,10 +12,10 @@ snapshot: GPGKEY=6FF7C1A8 USERNAME=$$USER -RELEASE_OR_SNAPSHOT = $$(if test x$(CAIRO_VERSION_MINOR) = x$$(echo $(CAIRO_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo snapshot; fi) +RELEASE_OR_SNAPSHOT = $$(if test x$(PIXMAN_VERSION_MINOR) = x$$(echo $(PIXMAN_VERSION_MINOR)/2*2 | bc) ; then echo release; else echo snapshot; fi) RELEASE_CAIRO_HOST = $(USERNAME)@cairographics.org -RELEASE_CAIRO_DIR =/srv/cairo.freedesktop.org/www/releases -RELEASE_CAIRO_URL =http://cairographics.org/releases +RELEASE_CAIRO_DIR =/srv/cairo.freedesktop.org/www/$(RELEASE_OR_SNAPSHOT)s +RELEASE_CAIRO_URL =http://cairographics.org/$(RELEASE_OR_SNAPSHOT)s RELEASE_XORG_URL = http://xorg.freedesktop.org/archive/individual/lib RELEASE_XORG_HOST =$(USERNAME)@xorg.freedesktop.org RELEASE_XORG_DIR = /srv/xorg.freedesktop.org/archive/individual/lib @@ -83,7 +83,6 @@ release-tag: git tag -u $(GPGKEY) -m $(PACKAGE) $(VERSION) release $(PACKAGE)-$(VERSION) release-upload: release-check $(tar_gz) $(tar_bz2) $(sha1_tgz) $(sha1_tbz2) $(md5_tgz) $(gpg_file) - mkdir -p releases scp $(tar_gz) $(sha1_tgz) $(gpg_file) $(RELEASE_CAIRO_HOST):$(RELEASE_CAIRO_DIR) scp $(tar_gz) $(tar_bz2) $(RELEASE_XORG_HOST):$(RELEASE_XORG_DIR) ssh $(RELEASE_CAIRO_HOST) rm -f $(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-[0-9]* ln -s $(tar_gz) $(RELEASE_CAIRO_DIR)/LATEST-$(PACKAGE)-$(VERSION) -- 1.7.4 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Fixing the runtime error in VS debug build under Windows.
Hi, We have met a run time crash when using a debug VS build under Windows due to potential of using an uninitialized variable. By looking at the code, max_vx could also be an uninitialized value. The fix is to initial that variable to 0. The version we are using is 0.21.4. Thanks for the patch, but this particular issue is already fixed in git master and also in pixman 0.21.6 I believe. I'm curious how this caused crashes? As far as I can tell, the variable is not actually used uninitialized - it was only a compiler warning. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Adding TLS for Windows XP
Hi, It turned out that Windows XP does't support __declspec(thread). In order to make a multi-thread safe app using pixman, we tried to add TLS for Windows XP. We have added using TLS primitives and tried to avoid memory leak during cleaning up. I'm of two minds regarding this patch. On the one hand, it's definitely good to get these TLS issues under control on Windows, so thanks for looking at that. On the other hand, I really don't want to export new API that an application has to call simply for correct operation. If you use pixman through cairo, for example, it is currently completely safe to ignore pixman's existence. This is an important guarantee to give because it allows us to update the pixman ABI at a later point without affecting cairo applications. Or suppose some other library has spawned a thread that uses pixman for compositing. In general, the application has no way to know when that thread exited so it can't know when to call these new functions. Since you mention that __declspec(thread) doesn't work on XP, I assume you are using pixman in a dynamically linked DLL. If so, a way to solve your issue without adding new API might be to add a DllMain() function and then clean up the thread local storage when it is called with THREAD_DETACH. Another possibility would be to avoid TLS altogether and instead just use global variables protected by locks. Lock striping could be used to prevent (CPU-)cache thrashing. Regarding which variation of TLS works on which versions of Windows, there are several separate variables: - Whether the binary is statically or dynamically linked - Which compiler is used (MSVC vs. MinGW32 4.5 vs. MinGW32 4.4) - Which operating system is used (XP/2003 vs. later) - Which variation of TLS (__thread, __declspec(thread), DllMain() + TlsAlloc*) - Whether the operating system is 32 or 64 bit It would be really useful to have a table of these combinations showing what works and what doesn't. That will make it much simpler to decide which combinations can actually be reasonably supported without requiring new public API. See also https://bugs.freedesktop.org/show_bug.cgi?id=34842 Regardless of all that, though, this patch clearly needs cleaning up according to the CODING_STYLE document. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Simplify iterator initializer signatures
Søren Sandmann sandm...@cs.au.dk writes: Here are three patches that simplify the type signature of the iterator initializers. The signature before: void (*pixman_iter_init_func_t) (pixman_implementation_t *imp, pixman_iter_t *iter, pixman_image_t *image, int x, int y, int width, int height, uint8_t *buffer, iter_flags_t flags); void (*pixman_iter_init_func_t) (pixman_implementation_t *imp, pixman_iter_t *iter); They achieve this by passing all the information in the iterator itself rather than as arguments. This means the role of _pixman_implementation_{src,dest}_iter_init() changes. Before they were wrappers around the implementation functions; now they are convenience functions that store the various parameters in the iterator and then call into the implementation. Sorry, this mail got sent prematurely. I meant to also include a diffstat: pixman-bits-image.c | 20 +--- pixman-conical-gradient.c |7 + pixman-general.c | 56 -- pixman-implementation.c | 46 ++--- pixman-linear-gradient.c | 16 - pixman-private.h | 51 + pixman-radial-gradient.c |7 + pixman-solid-fill.c | 17 + pixman-sse2.c | 27 ++ 9 files changed, 84 insertions(+), 163 deletions(-) Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] Planar YUV support
Siarhei Siamashka siarhei.siamas...@gmail.com writes: On Wed, Mar 2, 2011 at 4:16 PM, Soeren Sandmann sandm...@cs.au.dk wrote: Siarhei Siamashka siarhei.siamas...@gmail.com writes: I'm not a big fan of let's make this totally universal and future proof approach if only a very small fraction of this functionality is going to be actually used. Moreover, I suspect that trying to be too general was responsible for slowing down the original Planar YUV support plan. Part of what was derailing that plan may have been my insisting on being precise about how these formats fit into the image processing pipeline, including how it related to gamma correction and other colorspace issues. I still think this is important. However, we can probably leave out some specific features, if there is a credible story about how to do them later. The pipeline as it is now: 1 convert image sample to a8r8g8b8 2 extend sample grid in all directions according to repeat 3 interpolate between sample according to filter 4 transform 5 resample 6 combine 7 store What is the difference between 3 interpolate between sample according to filter and 5 resample? The output of stage 3 is an image that is defined on all of the real plane. There are no pixels any more, so there is no question about what stage 4, transform, means. Stage 5 converts back to pixels by point sampling. Of course, pixman is implemented starting from the destination, so thefunction of the real plane from stage 3 is never actually computed. Because of the sampling, we only need the value of that function at discrete locations. (Better gradient and image quality would involve improving stage 5 to do something better than point sampling). To add support for potentially subsampled YUV, some additional stages have to be inserted before the first: -2 interpolate subsampled components of YUV to get the same resolution as the Y plane -1 if the format is planar, stitch together components to form YUV pixels 0 convert to sRGB Stage -2 is important because the filter used in that interpolation should probably be user-specifiable eventually, which has the implication that whatever simple support is added first, it needs to be clear what filter precisely is being used. Stage 0 is a color space conversion and need to eventually be configurable too, which means it has to be specified which matrix is being used. I'm not totally sure about stage -2. So source interpolation is going to happen twice in the pipeline, once when getting rid of subsampling, and the second time when applying transform? To me it looks more natural to just start with the transform, for each pixel in the destination space get a transformed fractional coordinate in the source image, look it up in the source YUV grid and interpolate each color component for each color plane separately according to the selected filter, and finally convert interpolated YUV color to a8r8g8b8 according to the used color matrix. Yes, I thought those two stages were the same for a long time, to. In fact, in this mail: http://lists.freedesktop.org/archives/cairo/2010-May/019876.html I argue that they are the same, and that the pipeline should look like this: * Widen to 8 bit components * Extend * Interpolate between samples according to filter * Transform * Convert to RGB coding * Resample * Combine * Store which also what you are saying. However, as mentioned here: http://lists.cairographics.org/archives/cairo/2010-June/020232.html I have become convinced that this is wrong and the undoing the chroma subsampling is something different than interpolation happening on sRGB pixels, although it may be that they can be combined in some special cases. In that mail I say that the pipeline should (eventually) look like this: extend widen chroma reconstruct = upsample filter convert to ar'g'b' = \ linearize = | color space convert premultiply = / interpolate = interpolation filter transform resample= resampling filter, plus sampling rate combine convert to dest format store The basic argument is that the interpolation filtering should be done in premultiplied RGB (linear, ideally, but sRGB for now), whereas the chroma reconstruction obviously has to be done on chroma channels. Therefore, these two operations have to be considered distinct. Ie., since a YUV image is generated by this process: 1. Start with sRGB 2. Convert to YUV 3. Filter and subsample chroma if we are to do some operation on the image in sRGB, reversing this process as the first step can't possibly be incorrect: 1. Reconstruct chroma samples 2. Convert to sRGB 3. Interpolate sRGB samples 4. Transform And so if the alternative process: 1. Interpolate YUV samples 2
Re: [Pixman] [cairo] Planar YUV support
Siarhei Siamashka siarhei.siamas...@gmail.com writes: I'm not a big fan of let's make this totally universal and future proof approach if only a very small fraction of this functionality is going to be actually used. Moreover, I suspect that trying to be too general was responsible for slowing down the original Planar YUV support plan. Part of what was derailing that plan may have been my insisting on being precise about how these formats fit into the image processing pipeline, including how it related to gamma correction and other colorspace issues. I still think this is important. However, we can probably leave out some specific features, if there is a credible story about how to do them later. The pipeline as it is now: 1 convert image sample to a8r8g8b8 2 extend sample grid in all directions according to repeat 3 interpolate between sample according to filter 4 transform 5 resample 6 combine 7 store To add support for potentially subsampled YUV, some additional stages have to be inserted before the first: -2 interpolate subsampled components of YUV to get the same resolution as the Y plane -1 if the format is planar, stitch together components to form YUV pixels 0 convert to sRGB Stage -2 is important because the filter used in that interpolation should probably be user-specifiable eventually, which has the implication that whatever simple support is added first, it needs to be clear what filter precisely is being used. Stage 0 is a color space conversion and need to eventually be configurable too, which means it has to be specified which matrix is being used. There is also the question of how to *write* to a subsampled YUV image. I don't particularly like read-only image formats, but writing to YUV is not simple when you consider clip rectangles, because subsampling involves a filter that probably extends outside the clip boxes. What Andrea is getting at, is presumably how to specify image formats in the API. A fully general API like he suggests is perhaps interesting at some point, but I agree it shouldn't be prerequisite for getting some YUV support in. Surely an alternative solution for html5 video fallbacks is to do YUV-RGB conversion of video frames outside of cairo/pixman, and only let cairo/pixman handle the RGB data. The only real problem with this solution is that such multi pass data processing is somewhat slower. We are speaking about at least tens of percents of performance here. And pixman support for native single pass YUV-RGB conversion combined with scaling could help. YUV conversion is definitely within the scope of pixman, both YUV-RGB and RGB-YUV. However, I'm not sure the existing YUV formats are very well though out. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/7] C variant of bilinear scaled 'src_8888_8888' fast path
Hi, Most of this series looks good to me, except that this: static force_inline uint32_t +bilinear_interpolation (uint32_t tl, uint32_t tr, + uint32_t bl, uint32_t br, + int distx, int wt, int wb) +{ is a duplication of the code from pixman-bits-image, except that disty has been split into wt and wb so that it can deal with the top and bottom edges of the image. The bits_image_fetch_bilinear_no_repeat_() function handles the same problem by simply passing 0 for tl/tr or bl/br. Could it be done the same way here? Alternatively, if there is a good reason for the split, presumably that same reason applies to the code in pixman-bits-image.c. The pixman-fast-path.h file also contains a duplicated version of repeat where the only difference is that the order of the arguments is reversed and one uses MOD(), the other while() to do repeating. This duplication was not introduced by you, but it would be useful to share that code as well. Maybe the file pixman-fast-path.h should be renamed to pixman-common.h or pixman-inlines.h since it would no longer be specific to pixman-fast-path.c Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 5/7] C variant of bilinear scaled 'src_8888_n_8888' fast path
Siarhei Siamashka siarhei.siamas...@gmail.com writes: From: Siarhei Siamashka siarhei.siamas...@nokia.com Serves no real practical purpose other than testing solid mask support in bilinear scaling main loop template. If these functions don't serve any practical purpose, is there a reason to have them? The .text segment of pixman is already getting large. If these functions end up in the middle of code that is actually used, they would have to be paged in at application startup time, so they do have a potential performance and (slight) memory cost. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/7] Main loop template for fast single pass bilinear scaling
Siarhei Siamashka siarhei.siamas...@gmail.com writes: +/* + * Identify 5 zones in each scanline for bilinear scaling. Depending on + * whether 2 pixels to be interpolated are fetched from the image itself, + * from the padding area around it or from both image and padding area. + */ +static force_inline void +bilinear_pad_repeat_get_scanline_bounds (int32_t source_image_width, + pixman_fixed_t vx, + pixman_fixed_t unit_x, + int32_t * left_pad, + int32_t * left_tz, + int32_t * width, + int32_t * right_tz, + int32_t * right_pad) What does tz stand for in these names? Presumably, the z is for zone, but what about the t? Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/3] Pixman MIPS DSPASE1
Hi, Thanks for picking up the MIPS work. There are some comments from last time from Siarhei and myself that I don't think have been addressed. See these mails: http://lists.freedesktop.org/archives/pixman/2010-December/000773.html http://lists.freedesktop.org/archives/pixman/2010-September/000496.html - In Siarhei's testing, the new over_n_8_() on MIPS32r2 was slower than the C fast path. From http://lists.freedesktop.org/archives/pixman/2010-December/000773.html : One of the reasons for such a slowdown in gnome-system-monitor test is that it uses 'over_n_8_' operation with the mask where 96.5% of values are zero. And your MIPS32R2 optimized code does not handle these special cases, always taking the slowest path [1]. Ie., the way to make over_n_8_() fast is to skip compositing whenever the mask is 0x00 or 0xff. The same is likely also worthwhile even in the SIMD versions since memory access is so expensive. From http://lists.freedesktop.org/archives/pixman/2010-September/000496.html : - The patch should be split such that one commit adds the MIPS32r2 part and one adds the DSPASE part - Coding style: - Please use /* */ comments - Indents are four spaces - Put a space before parentheses - Don't leave in commented-out code like this: // b = _pixman_implementation_fill(imp-delegate, bits, stride, bpp, x, y, width, height, xor); And finally, while the lowlevel-blt benchmarks are convenient to use, they are also synthetic, it is also important to test the performance with real-world workloads such as those found in the cairo perf traces. Thanks, Soren Veli-Matti Valtonen veli-matti.valto...@movial.com writes: I started working on this optimizing for MIPS32R2 code originally (Based on the patch by Beloev), but the performance increases seem to be relatively similar to what over_n_8_ shows. The dspase is much more promising in this regard. It rather leaves me wondering if the mips32r2 should not be included. It might however be related to the test system, which has a MIPS 74K core. The original I assume was worked on with a MIPS 24K. I used pixman-arm-common.h for the assembler binding macros, which is the reason for the 'ARM' found in the glue. Compiling the code will result in the gcc producing Warnings about macro expansion, it'd be nice not to have these, but fixing them would have a (slight) negative effect readability. PATCH 1 is the original patch by Georgi Beloev, but modified to apply against pixman head. Implemented: Scanline add, out reverse, over fast path: over_n_8_ add__ add_n_ Test hardware: Broadcom BCM4718, 453MHz, MIPS 74K V4.0 (Inc. DSP Rev2, MIPS16), Little Endian All the test program builds used CFLAGS=-O2 -mdsp -mips32r2 reference memcpy speed = 176.0MB/s (44.0MP/s for 32bpp fills) Optimizations disabled: --disable-mips32r2 --disable-mips-dspase1 over_n_8_ = L1: 6.16 L2: 5.34 M: 5.35 ( 19.24%) HT: 4.78 VT: 4.62 R: 4.55 RT: 2.99 ( 28Kops/s) add__ = L1: 18.11 L2: 10.15 M: 9.98 ( 45.33%) HT: 14.80 VT: 13.36 R: 13.41 RT: 6.17 ( 46Kops/s) add_n_ = L1: 14.26 L2: 10.30 M: 10.38 ( 23.59%) HT: 8.05 VT: 7.64 R: 7.63 RT: 4.05 ( 33Kops/s) MIPS32R2: --disable-mips-dspase1 over_n_8_ = L1: 6.17 L2: 5.62 M: 5.56 ( 20.33%) HT: 5.00 VT: 4.83 R: 4.76 RT: 3.33 ( 30Kops/s) MIPS DSPASE: over_n_8_ = L1: 9.76 L2: 7.89 M: 7.93 ( 27.11%) HT: 7.04 VT: 6.84 R: 6.63 RT: 4.06 ( 34Kops/s) add__ = L1: 117.36 L2: 20.67 M: 23.22 (105.50%) HT: 17.40 VT: 15.96 R: 13.81 RT: 6.48 ( 47Kops/s) add_n_ = L1: 145.84 L2: 28.23 M: 31.11 ( 70.66%) HT: 22.95 VT: 18.54 R: 19.99 RT: 8.93 ( 50Kops/s) Scanline ops benchmarked using low-level-blit: I selected these ops by adding a printf to the scanline ops, and finding one that triggers it, if there is a more convenient way to benchmark these ops, I failed to find it. Optimizations disabled: add_8_8_8 = L1: 3.31 L2: 5.25 M: 5.16 ( 11.73%) HT: 3.61 VT: 3.60 R: 3.53 RT: 1.77 ( 18Kops/s) add__1555 = L1: 6.51 L2: 5.32 M: 5.34 ( 18.20%) HT: 4.05 VT: 3.96 R: 3.94 RT: 2.21 ( 22Kops/s) outrev_n_8_ = L1: 6.33 L2: 5.25 M: 5.16 ( 17.60%) HT: 4.11 VT: 4.02 R: 3.97 RT: 2.23 ( 22Kops/s) over__n_0565 = L1: 2.83 L2: 3.33 M: 3.21 ( 11.54%) HT: 2.73 VT: 2.69 R: 2.68 RT: 1.67 ( 17Kops/s) over_n_ = L1: 7.45 L2: 6.65 M: 6.66 ( 15.14%) HT: 5.65 VT: 5.43 R: 5.43 RT: 3.35 ( 30Kops/s) MIPS DSPASE: add_8_8_8 = L1: 8.81 L2: 7.67 M: 7.53 ( 17.11%) HT: 4.62 VT: 4.68 R: 4.50 RT: 1.97 ( 19Kops/s) add__1555 = L1: 9.07 L2: 7.27 M: 7.29 ( 24.87%) HT: 5.09 VT: 4.95 R: 4.93 RT: 2.50 ( 23Kops/s) outrev_n_8_ = L1: 8.48 L2: 6.82 M: 6.88 (
Re: [Pixman] Very Large PNGs
Frederik Ramm frede...@remote.org writes: Dear all, I posted this to the cairo list but it occurred to me that this list is the appropriate place. I tried to convert an SVG file to a very large PNG with rsvg-convert. If the bitmap requested is larger than about 23k x 23k pixels, it is returned an invalid surface (where if you query the state it says out of memory), happily ignores that and continues to render which then results in no output. I then delved into the cairo library and found that it uses pixman to create the image; and pixman refuses to create an image that takes more than INT32_MAX raw bytes. With one pixel using 4 bytes, that means that an image with 23.000 x 23.000 pixels is still ok, but if it becomes any larger, that was it. What pixman is trying to avoid is integer overflows. If you pass an overflowed number to malloc(), you'll get buffer overruns which could become security issues Given that size_t on a 32 bit machine is 32 bits, the limit would be 4 GB in any case, but just to be on the safe side, pixman limits it to 2GB so that the number will fit in a signed integer. I think it should be safe to use size_t in create_bits() along with new functions pixman_multiply/add_overflows_size() if you want to fix this. However, not that pixman is still limited to 16.16 bits of coordinate space in various places, so you wouldn't gain that much. This 16 bit limitation is something that we would like to get rid of, however. Out of curiosity, exactly how large are the images you are trying to create? Which was a bit sad for me since the machine I was running this on could easily have created an image ten times as big, if only pixman/cairo had supported that. I briefly toyed with adding 64-bit support to the whole bundle but it became quickly apparent that this is more than an afternoon's job. I also tried to squeeze at least a little more out of the combo by using 24-bit images instead of 32-bit (no difference since 24-bit images are internally stored with 32 bit), and even the deprecated 16-bit data type (doesn't work since you cannot create a surface for that). In cairo 1.10, you actually can create a 16 bit image, and it is stored internally in a 16 bit data type. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Win32 fixes and improvements
Andrea Canciani ranm...@gmail.com writes: I improved the patchset a little (I fixed the library makefile for real and silenced the test warnings) and pushed it as a git branch here: http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/win32 I don't really know anything about Windows build systems, but regarding the snprintf() thing: Can't we just use sprintf()? We know the maximum sizes of everything involved, and in the end it's just a test program so if there's a buffer overflow, it's not the end of the world. The other changes in the branch look okay to me. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] Compilation error undefined reference to `___tls_get_addr'
Victor Engmark victor.engm...@gmail.com writes: On Tue, Feb 22, 2011 at 12:29 PM, Victor Engmark victor.engm...@gmail.com wrote: Hi guys, I'm trying to create a pixman 0.20.2 package on an old platform, but I get the following output during the compile phase: CCLD a1-trap-test ../pixman/.libs/libpixman-1.so: undefined reference to `___tls_get_addr' Version 0.20.2 should detect automatically if linking fails with thread local support. If you can find out why this clause: AC_MSG_CHECKING(for __thread) AC_LINK_IFELSE([ #if defined(__MINGW32__) !(__GNUC__ 4 || (__GNUC__ == 4 __GNUC_MINOR__ = 5)) #error This MinGW version has broken __thread support #endif from configure.ac finishes successfully, that would be very helpful. As mentioned, you can compile with -DPIXMAN_NO_TLS, but that means pixman can't be used from multiple threads. If __thread simply doesn't work on your platform, we need some kind of hack in configure.ac to detect it and fall back to pthreads. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/3] Some clean-ups of the test directory
Siarhei Siamashka siarhei.siamas...@gmail.com writes: On Thursday 10 February 2011 20:22:38 Søren Sandmann wrote: The following patches add a new directory demos and move all the GTK+ based test programs there. This allows the Makefiles in both test and demos to become much simpler with less redundancy. I'm not particularly happy about the demos name since the GTK+ tests aren't really demos, but I can't think of anything better. Suggestions are appreciated. It's a bit late comment, but eventually adding some real demo(s) which would display some nice looking animation and scare the users with huge FPS might be a good idea :) Yes, I think various types of real demos would be a good idea, both showing off performance and features. There is this branch: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=parrot which makes the composite-test a little more interesting to look at, and shows better how the compositing operators work. Screenshot: http://www.daimi.au.dk/~sandmann/composite-test.png Well, that is if pixman actually needs any kind of such marketing stuff. I have always thought that pixman should eventually be used in more places than just the X server and cairo. For example, some features that have been proposed, such as a floating point pipeline, a JIT compiler and a shader language API, would let pixman serve the same role as Core Image does on Mac OS X. And to get people to use it, demos and other types of marketing would be useful, including getting a website and a logo. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] pixman: New ARM NEON optimizations
Siarhei Siamashka siarhei.siamas...@gmail.com writes: Regarding the (b) part, probably as a side effect of current implementation, right now it is possible to do some operations with images having non-premultiplied alpha: src_img = pixman_image_create_bits ( PIXMAN_x8b8g8r8, width, height, src, stride); msk_img = pixman_image_create_bits ( PIXMAN_a8b8g8r8, width, height, src, stride); dst_img = pixman_image_create_bits ( PIXMAN_a8r8g8b8, width, height, dst, stride); pixman_image_composite (PIXMAN_OP_SRC, src_img, msk_img, dst_img, 0, 0, 0, 0, 0, 0, width, height); We only need to wrap the same a8r8g8b8 buffer into x8r8g8b8 and a8r8g8b8 pixman image, and use the latter as a mask for pixman_image_composite() calls. Any operations which don't need mask themselves can use this trick. By also specifying negative stride, this is useful for example when dealing with the data returned by glReadPixels(). Yeah, this is useful, and it wouldn't directly be possible to do if the equation were changed to (s OP d) LERP_m d. However, a pretty simple way to fix it would be to just add an unpremultiplied format. Benjamin's video patches had this I believe. Using such a format as a destination can be slow because it requires divisions, but Joonas has a number of optimized implementations here: http://cgit.freedesktop.org/~joonas/unpremultiply/tree/ So I find it convenient that we are also allowed to work with masks which are basically interpreted as having a8x24 format. Right, having an a8x24 format would be another way to solve the problem. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Image scaling with bilinear interpolation performance
김태균 podai...@gmail.com writes: original code : r = a*t0 + b*t1 + c*t2 + d*t3 (in 24 bits precision) optimized code : r' = a*(t0 8) + b*(t1 8) + c*(t2 8) + d*(t3 8) (in 16 bits precision) where t0 + t1 + t2 + t3 = 0x1 Now we split t into two terms u, v where u is upper 8 bits of t and v is lower 8 bits of t. (note that t0 = u0*256 + v0, t0 8 = u0) So, r' = a*u0 + b*u1 + c*u2 + d*u3 r = a*(u0*256 + v0) + b*(u1*256 + v1) + c*(u2*256 + v2) + d*(u3*256 + v3) = 256*(a*u0 + b*u1 + c*u2 + d*u3) + a*v0 + b*v1 + c*v2 + d*v3 = 256*r' + a*v0 + b*v1 + c*v2 + d*v3 Error would be e = (r - (r' 8)) 16 = (r - 256*r') 16 = (a*v0 + b*v1 + c*v2 + d*v3) 16 Each value a, b, c and d can be 0xff at most, So max(e) = (0xff*(v0 + v1 + v2 + v3)) 16 = (0xff*max(v0 + v1 + v2 + v3)) 16 max(v0 + v1 + v2 + v3) = 0x300 (because lower 8 bits of t0 + t1 + t2 + t3 should be 0x00) So max(e) = (0xff*0x300) 16 = 2 But this does not satisfy rule 5 as you mentioned Thanks for doing this analysis. A difference of just 2 would be fine in my opinion, and as you mention the original code was an approximation as well. It would be possible to satisfy rule 5 using a kind of error diffusion, as as demonstrated by this program: static void compute_weights (uint8_t distx, uint8_t disty) { uint32_t distxy, distxiy, distixy, distixiy; int e, t; distxy = distx * disty; distxiy = (distx 8) - distxy; distixy = (disty 8) - distxy; distixiy = 256 * 256 - (disty 8) - (distx 8) + distxy; t = distxy + 0x80; e = (t 0xff00) - distxy; distxy = t 8; distxiy -= e; t = distxiy + 0x80; e = (t 0xff00) - distxiy; distxiy = t 8; distixy -= e; t = distixy + 0x80; e = (t 0xff00) - distixy; distixy = t 8; distixiy -= e; t = distixiy + 0x80; e = (t 0xff00) - distixiy; distixiy = t 8; assert (distxy + distxiy + distixy + distixiy == 256); } int main () { int i, j; for (i = 0; i 256; ++i) { for (j = 0; j 256; ++j) compute_weights (i, j); } } although that does do a bit more arithmetic than your code. Now regarding accuracy. I have added some comments above regarding the potential solid color issue, but this should be relatively easy to address. I'm also a bit worried about one more thing (in the original pixman code too, but let's cover this too while we are discussing accuracy in general). Wouldn't it be a good idea to do shift with rounding for the final value instead of dropping the fractional part? And the 'distx'/'disty' variables are also obtained by right shifting 'ux' by 8 and dropping fractional part, maybe rounding would be more appropriate. Not doing rounding might cause slight image drift to the left (and top) on repeated rescaling, and also slight reduction of average brightness. I agree with that rounding is more appropriate. I think supplying distx and disty as properly rounded 4 bits values to interpolation function is the best choice we have. Analysis on error is some what complicated in this case. Error may be bigger than previous code, at least 15 (I've done some brute force jobs) Rounding to four bits is going to be a quite visible drop in quality though, especially if you zoom more than 16x. With four bits of precision, there will be only 16 different colors in the gradients generated by the filter, which will show up as banding. But maybe it's good enough - 16x scaling is not going to look great with bilinear filtering no matter what. I have only one concern about testing. Supposedly when we get both C and SSE2 implementations, it would be much easier for testing if they produce identical results. Otherwise tests need to be improved to somehow be able to take slight differences into account. I think the requirement of producing same results for both C SIMD(maybe sse2, NEON, mmx) is relatively easy. But SIMD can produce much better result with less time spent, which can be horribly slow with general C implementation. I think it is much desirable to keep both C and SIMD code optimized in spite of producing slightly different results. Having the C and SIMD code produce different results is not a problem in itself, but as Siarhei says, but we would need to make sure the test suite reflects that decision. If we decide to move away from bit-exact testing, we would need to decide on an acceptable deviation from ideal, and then update the tests to verify that both the C and SIMD implementations are within that deviation. For example there could be a reference implementation that computes the bilinear filtering in double precision floating point, adds +/- x%, then converts that range into the target
Re: [Pixman] [PATCH] Add forgotten _mm_empty() calls in the SSE2 fetchers.
Siarhei Siamashka siarhei.siamas...@gmail.com writes: I just wonder whether you have investigated why these missing _mm_empty() calls are not actually causing pixman tests failure? Is it because fetch operation is typically followed by combiner which has _mm_empty()? But SRC combiner should not be using MMX/SSE2. Or no MMX instructions are actually used by these fetchers (only SSE2)? It's because these fetchers don't use any MMX instructions, so the patch is actually pointless. I'll drop it. It may even be worth considering just dropping all MMX instructions from the SSE2 implementation. That would avoid all the emms related problems. The places where we use MMX instructions could just as well use SSE2 - we never take advantage of the fact that it is possible use both SSE and MMX registers at the same time. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Image scaling with bilinear interpolation performance
김태균 podai...@gmail.com writes: I'm using cairo and pixman to composite images. I tested image scaling performance with bilinear filtering. But it's a little bit slower than I expected. Bilinear filtering is indeed one of the remaining places where pixman could use some performance improvement, so thanks for looking into it. What I understand about image scaling procedure is as follows 1. Allocate(or acquire statically allocated buffer) horizontal source scanline buffer to store interpolated pixels. 2. Fetch a scanline from source image to source scanline buffer with bilinear interpolation. 3. Composite source scanline and destination scanline with operator SRC. 4. Go to next scanline. 5. Free(or release) source scanline buffer. Yes, this is how it works currently, though note that there have been some changes since 0.21.2: we now use iterators to access the images. The overall algorithm is still the same though. My optimization point is 1. It seems possible to avoid fetching by directly combining bilinear interpolated result with destination. 2. Bilinear interpolation can be optimized using double-blend trick with a little bit of precision loss. Optimization 1 can reduce one memory write and one memory read operation. I think it will not affect the performance significantly. However, it can be helpful on some machines with limited cache. Indeed, it would likely be helpful to do this, even on machines with where the temporary scanline fits in cache. There is some discussion in this thread http://lists.freedesktop.org/archives/pixman/2011-January/000934.html about one way to implement this optimization. Optimization 2 is more critical for image scaling with bilinear filtering. I wrote some codes for bilinear_interpolation@pixman-bits-image.c // Optimization 2 static force_inline uint32_t bilinear_interpolation (uint32_t tl, uint32_t tr, uint32_t bl, uint32_t br, int distx, int disty) { int distxy, distxiy, distixy, distixiy; uint32_t rb, ga; distxy = distx * disty; distxiy = (disty 8) - distxy; distixy = (distx 8) - distxy; distixiy = 256*256 - (disty 8) - (distx 8) + distxy; distxy = distxy 8; distxiy = distxiy 8; distixy = distixy 8; distixiy = distixiy 8; /* Red and Blue */ rb = (0x00FF00FF tl)*distixiy + (0x00FF00FF tr)*distixy + (0x00FF00FF bl)*distxiy + (0x00FF00FF br)*distxy; rb = (rb 8) 0x00FF00FF; /* Green and Alpha */ ga = (0x00FF00FF (tl 8))*distixiy + (0x00FF00FF (tr 8))*distixy + (0x00FF00FF (bl 8))*distxiy + (0x00FF00FF (br 8))*distxy; ga = ga 0xFF00FF00; return rb | ga; } Optimization 2 increased the performance by more than twice. But it does not always produce the same result as the original code, which is also an approximation of bilinear interpolation. So it can cause pixel correctness issues on some test cases. I've done some numerical analysis and it appears that at most you would have a difference of 1 for each RGBA value as the original code. Let me know if you need for information / explanation. I'd like to see the analysis that show that the difference is at most 1. If the difference really is that small, then your code above looks like a big performance win. (Although it looks to me like tr and bl may have been swapped in the code). Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Avoid marking images dirty when properties are reset
Andrea Canciani ranm...@gmail.com writes: @@ -502,7 +502,7 @@ pixman_image_set_transform (pixman_image_t * ═ ═ ═ ═ ═image, ═ ═ if (common-transform == transform) ═ ═ ═ ═return TRUE; - ═ ═if (memcmp (id, transform, sizeof (pixman_transform_t)) == 0) + ═ ═if (!transform || memcmp (id, transform, sizeof (pixman_transform_t)) == 0) ═ ═ { ═ ═ ═ ═free (common-transform); ═ ═ ═ ═common-transform = NULL; This change looks like a bugfix unrelated to the commit message. Overall, the patch looks correct. Yeah, or I guess you could call it a new feature; either way, I'll split it into a separate commit before pushing. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [PATCH] Avoid marking images dirty when properties are reset
When an image property is set to the same value that it already is, there is no reason to mark the image dirty and incur a recomputation of the flags. --- pixman/pixman-image.c | 20 +++- 1 files changed, 19 insertions(+), 1 deletions(-) diff --git a/pixman/pixman-image.c b/pixman/pixman-image.c index e91d87c..9103ca6 100644 --- a/pixman/pixman-image.c +++ b/pixman/pixman-image.c @@ -502,7 +502,7 @@ pixman_image_set_transform (pixman_image_t * image, if (common-transform == transform) return TRUE; -if (memcmp (id, transform, sizeof (pixman_transform_t)) == 0) +if (!transform || memcmp (id, transform, sizeof (pixman_transform_t)) == 0) { free (common-transform); common-transform = NULL; @@ -511,6 +511,12 @@ pixman_image_set_transform (pixman_image_t * image, goto out; } +if (common-transform + memcmp (common-transform, transform, sizeof (pixman_transform_t) == 0)) +{ + return TRUE; +} + if (common-transform == NULL) common-transform = malloc (sizeof (pixman_transform_t)); @@ -535,6 +541,9 @@ PIXMAN_EXPORT void pixman_image_set_repeat (pixman_image_t *image, pixman_repeat_t repeat) { +if (image-common.repeat == repeat) + return; + image-common.repeat = repeat; image_property_changed (image); @@ -579,6 +588,9 @@ PIXMAN_EXPORT void pixman_image_set_source_clipping (pixman_image_t *image, pixman_bool_t clip_sources) { +if (image-common.clip_sources == clip_sources) + return; + image-common.clip_sources = clip_sources; image_property_changed (image); @@ -594,6 +606,9 @@ pixman_image_set_indexed (pixman_image_t *image, { bits_image_t *bits = (bits_image_t *)image; +if (bits-indexed == indexed) + return; + bits-indexed = indexed; image_property_changed (image); @@ -656,6 +671,9 @@ PIXMAN_EXPORT void pixman_image_set_component_alpha (pixman_image_t *image, pixman_bool_t component_alpha) { +if (image-common.component_alpha == component_alpha) + return; + image-common.component_alpha = component_alpha; image_property_changed (image); -- 1.6.0.6 ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] pixman: New ARM NEON optimizations
Bill Spitzak spit...@gmail.com writes: Soeren Sandmann wrote: (1) The equation would be (src OP dst) LERP_mask dst and not (src IN mask) OP dst (2) The RGB channels of Alpha-only images would be considered to be the same as the alpha channel, and not 0 as they are now. For example, a 0xb9 pixel in an a8 image would be considered equivalent to 0xb9b9b9b9 and not to 0xb900. That is, they would be considered a translucent white rather than a translucent black. Why not just try making these changes directly to Render? This is pretty much what Cairo did and nobody complained. Panicking about back-compatibility when it is likely that nobody relies on the current behavior is silly. The only software I know of using Render for anything other than OVER is Cairo, and it must be avoiding all the cases where changing this would make a difference because it does not match it's rules anyway. Well, Qt exposes these operators to applications, and there is at least one place where it depends on the existing semantics for SRC (unless I'm misreading the code). There are also drivers, including the binary NVIDIA driver, who have implemented the existing semantics already. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 4/8] Better support for NONE repeat in nearest scaling main loop template
Siarhei Siamashka siarhei.siamas...@gmail.com writes: From: Siarhei Siamashka siarhei.siamas...@nokia.com Scaling function now gets an extra boolean argument, which is set to TRUE when we are fetching padding pixels for NONE repeat. This allows to make a decision whether to interpret alpha as 0xFF or 0x00 for such pixels when working with formats which don't have alpha channel (for example x8r8g8b8 and r5g6b5). Another way to do this would be to clip away zeros up front for source images. For example, a new function pointer could be added to the image struct: clip_zeros (pixman_image_t *image, pixman_region_t *region) that would clip away those parts of region where the image is known to be zero. Then, in pixman.c, this function could be called for both source and mask if the operator was one where zeros had no effect. That would benefit all operations and avoid cluttering these macros. It would also cause the SAMPLES_COVER_CLIP to be set in more cases. I'm fine with this patch as is too though. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/8] Support for a8 and solid mask in nearest scaling main loop template
Siarhei Siamashka siarhei.siamas...@gmail.com writes: +#define SIMPLE_NEAREST_A8MASK_FAST_PATH_NORMAL(op,s,d,func) \ +#define SIMPLE_NEAREST_SOLIDMASK_FAST_PATH_NORMAL(op,s,d,func) Is there any reason to not use A8_MASK and SOLID_MASK instead? Those names read better to me. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] pixman-conical-gradient.c is missing from GNU make + MSVC build
Kirill Tishin si...@bk.ru writes: pixman-conical-gradient.c is missing from GNU make + MSVC build (pixman/Makefile.win32) (in release 0.21.4). Which leads to unresolved symbol _pixman_conical_gradient_iter_init when compiling cairo later on. Thanks, I have added it in master. The win32 Makefile is unfortunately not really maintained by anyone - if you or anyone else is interested in maintaining a build system for MSVC, that would welcome. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/5] Add SSE2 fetcher for x8r8g8b8
Siarhei Siamashka siarhei.siamas...@gmail.com writes: Are we going to have this new code performing linear search in this array once per each compositing operation now? It may be bad for performance unless some kind of caching (at the time of pixman_image_t creation?) can be added later. Well for now, there are only three fetchers, so a cache seems like overkill, but yes, some sort of caching could be added. It could be done similarly to the fast path cache, where the cache is thread private and indexed by (format, iter_flags, image_flags) or something like that. Then it wouldn't have to happen at image creation time and there would be no issues with locking the images. Also is there a plan to provide SSE2 optimized store functions in addition to fetchers later? I don't have any plans personally, but if someone wants to write some, that would certainly be useful. Anyway, with such changes coming, I see more and more reasons to avoid general compositing path in pixman as much as possible :) I'm not sure I follow? Certainly, if the general path is getting faster, that would mean there are fewer reasons to avoid it? Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Move fallback decisions from implementations into pixman-cpu.c.
Siarhei Siamashka siarhei.siamas...@gmail.com writes: By the way, fallbacks may need to be tweaked a bit to be sure that really the fastest code is selected. For example, after the introduction of faster fetchers [1], now the performance of 'over__' operation (translucent case) with the nearest scaling of the source image from 2000x2000 to approximately same size looks like this: C fetcher + C combiner: ~76.71 MPix/s C fast path:~106.47 MPix/s C fetcher + SSE2 combiner: ~182.65 MPix/s SSE2 fast path: ~270.57 MPix/s The important part is that C fetcher + SSE2 combiner is now faster than full C fast path, which performs everything in a single pass, but without SSE2. So when SSE2 combiners are available, it makes sense to deactivate nearest scaling C fast paths for OVER operator. Not everything is so simple though, because this C fast path has special processing for fully transparent or opaque pixels, and may in some cases be actually better. But it is clearly slower if the performance of the worst case is important. Something similar may apply to PPC Altivec optimizations. Because there are only Altivec combiners but not full Altivec fast paths, the existing C fast paths will be executed for some compositing operations, preventing the use of Altivec combiners. Indeed, this is increasingly a problem. Another variation is if there is a SIMD fast path that can deal with affine transformations, and a general fast path specialized for scaling transformations. Which one do you pick for a scaling operation? It isn't obvious. Or, suppose there is an MMX fast path for over__, but also a noop source iterator for a8r8g8b8. Then the general path would degenerate to just calling sse2_combine_over_u() in a loop with no fetching, which is likely faster than mmx_composite_over__(); I don't have a good answer, but here are some random ideas: - Classify compositing routines in terms of performance. Ie., try and statically associate each compositing routine and fetcher with a number, then pick the one that should be fastest. - If the fast path cache is empty for the operation in question, try all possibilities and store the best one in the cache. This will probably be unpleasant and complicated. - Tweak the fallbacks so that things work in practice. This is ugly and fragile. - JIT compiler. This is a good solution, but a lot of work. It isn't a huge problem yet, but it will only get worse. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] warnings in pixman 0.21.4
Rolland Dudemaine roll...@ghs.com writes: I'm not sure what a clever compiled would do with a value that doesn't belong to the enum is casted to the enum, but I would expect it to complain that the value is not defined in the enum, resulting in a different but nonetheless replacement warning. Such a compiler would be wrong, in my opinion. In C (though not C++), it is perfectly legal to mix enums and ints interchangably. It *might* be reasonable for a compiler to warn if you do so without casting, but if it warns *with* the cast, it's just broken. I'm not sure what the issue is with adding those dummy ones to the enum. If your concern is that you cannot count the number of elements in the enum, then maybe one entry should be added to define a _MAX value. The issue is that pixman.h is a public header containing API that we are committed to maintain for the forseeable future. The PIXMAN_solid etc. values are internal API that shouldn't be used by users of pixman. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/3] ARM: fixes and workarounds for using pixman with QEMU
Andrea Canciani ranm...@gmail.com writes: Patch 3/3 looks like an hack... is it possible to avoid parsing auxv if its content is not correct/meaningful? Otherwise I'd rather attempt a clean and correct implementation of cpu features detection using SIGILL. In the end, how to do CPU detection on ARM is Siarhei's call, but IMO, the auxv parsing hack seems much more likely to be robust than trying to guard against all possible signal issues. An important advantage to the auxv parsing is that we wouldn't depend on what other signal handling the rest of the process might have set up. It doesn't even matter whether POSIX says it's possible in theory, because if we are relying on a close reading of some standard, then it's pretty likely that we or someone else read it wrong or implemented it wrong. For example, I still don't know exactly what was causing the SIGILL loop mentioned here: http://lists.freedesktop.org/archives/pixman/2010-December/000789.html The auxv parsing may be a hack, but it's also something that would only be used in the qemu case, so few people would be affected even if there were a bug in it. I'm going through the documentation you linked in http://lists.freedesktop.org/archives/pixman/2010-December/000786.html and I think that the portability issue can be fixed. Would SIGILL be considered a viable alternative if it was done in a thread-safe way? How much portability do we want to provide? In particular, should we only assume C90 or can we assume that sigsetjmp/siglongjmp are available (they are part of POSIX.1-1988). The answer to that question depends on how many platforms are lacking the support, how many users and developers those platforms have, and how much work it is to support an alternative, so it's difficult to give a general answer. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/18] Add iterators to images
Siarhei Siamashka siarhei.siamas...@gmail.com writes: 15 files changed, 743 insertions(+), 470 deletions(-) On the other hand, I'm not so happy about the implementation. The number of lines of code as well as the complexity has increased, and the performance seems to have dropped a little bit in some of the cases. The performance numbers you posted seem very close to noise to me; in some cases the new code is faster. Considering that this was measured with the CPU specific fast paths turned off, most users won't actually see even this small slowdown. I really want to see some practical benefits which depend on these added iterators in the near future. It isn't clear to me what specifically you are suggesting. One practical benefit of the patch set is that it fixes the bug where destination properties aren't ignored as they should be. This bug could probably be fixed some other way, but it doesn't seem to be possible with separating source- and destination accessors in some way. This separation is where most of the additional new lines are coming from. In fact, as far as I can tell, the iterators make the code _less_ complex. Compare the new loop in pixman-general.c: for (i = 0; i height; ++i) { uint32_t *s, *m, *d; m = mask_iter.get_scanline (mask_iter, NULL); s = src_iter.get_scanline (src_iter, m); d = dest_iter.get_scanline (dest_iter, NULL); compose (imp-toplevel, op, d, s, m, width); dest_iter.write_back (dest_iter); } with the old one: for (i = 0; i height; ++i) { /* fill first half of scanline with source */ if (fetch_src) { if (fetch_mask) { /* fetch mask before source so that fetching of source can be optimized */ fetch_mask (mask, mask_x, mask_y + i, width, (void *)mask_buffer, 0); if (mask_class == SOURCE_IMAGE_CLASS_HORIZONTAL) fetch_mask = NULL; } if (src_class == SOURCE_IMAGE_CLASS_HORIZONTAL) { fetch_src (src, src_x, src_y + i, width, (void *)src_buffer, 0); fetch_src = NULL; } else { fetch_src (src, src_x, src_y + i, width, (void *)src_buffer, (void *)mask_buffer); } } else if (fetch_mask) { fetch_mask (mask, mask_x, mask_y + i, width, (void *)mask_buffer, 0); } if (store) { /* fill dest into second half of scanline */ if (fetch_dest) { fetch_dest (dest, dest_x, dest_y + i, width, (void *)dest_buffer, 0); } /* blend */ compose (imp-toplevel, op, (void *)dest_buffer, (void *)src_buffer, (void *)mask_buffer, width); /* write back */ store ((dest-bits), dest_x, dest_y + i, width, (void *)dest_buffer); } else { /* blend */ compose (imp-toplevel, op, bits + (dest_y + i) * stride + dest_x, (void *)src_buffer, (void *)mask_buffer, width); } } Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/18] Add iterators to images
Iterators gets initialized with a buffer which can always contain at least width pixels. This is sufficient for current image types, but future image types might need additional space. Would it be possible to have the general_composite_rect ask to the iterator about how much memory it will need (and provide a buffer with at least this much memory)? If future images need something like this, the iterators can be extended to support it. It's always difficult to predict what infrastructure might be needed in the future. Or would it be better to have the iterator allocate that memory internally upon initialization? In this case, we should probably have the initializer return success/error and also have some kind of iter_fini() Right, that's another possibility, although if memory allocation fails, the image might also simply install a noop() fetcher. If we get out-of-memory errors during rendering, we generally don't guarantee the results, except that it obeys they clipping rules. This is something pixman has inherited from the X server. Apart from this, I skimmed through the patches and noticed that sometimes a change is fixed or improved by a following patch (example: patch 15/18 is immediately generalized by patch 16/18). Is this intentional? Yeah, generally, I'm trying to avoid patches being too clairvoyant. In principle, new infrastructure shouldn't be introduced until something else makes it necessary. It isn't always possible or convenient to do it that way, though. Wrt. localized alpha: I thought I understood the definition after the first two lines, then the OVER/ADD explanation confused me. After reading again everything, I was convinced again that I had understood it correctly the first time. I don't know if this is actually better (I still think that it needs a better wording), but I'd suggest something like: +/* Separable alpha is when the alpha channel is used only to compute the + * alpha value of the destination. This means that the computation of the + * RGB values of the result is independent of the alpha value. + * + * For example, the OVER operator has separable alpha for the destination, + * because the RGB values of the result can be computed without knowing + * the destination alpha. Similarly, ADD has separable alpha for both source + * and destination because the RGB values of the result can be + * computed without knowing the alpha value of source or destination. I'm proposing the separable term because it is consistent with separable blend modes (compositing operators in which each channel can be computed individually), whereas localized didn't tell me anything until I read the explanation. I think your new description is better, but I don't like using the word separable for two things that are not really the same. A separable operator is one that can be applied separately for each R, G, and B channel, whereas the localized alpha means that you can compute the R, G, and B channels independent from the alpha channel in question. I have pushed a new branch here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=iterators4 that shold take care of all your other comments. In addition, I also squashed together moving the gradient initializers; there didn't seem to be much gained from having them in separate commits. Patches to come in follow-up emails. Thanks for the review, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Making ref counting thread safe
Siarhei Siamashka siarhei.siamas...@gmail.com writes: This problem looks bad enough to me, or at least makes me feel uneasy. Changing how the caching works could also cause some weird problems in the future unless everything is carefully taken into account. Also various tools like helgrind will complain a lot and it would be hard to differentiate real problems from just some mostly safe ugliness. I agree this really should be fixed, my main point is that I don't think it's a huge problem in current practice - in particular, versions of pixman with these caches have been used with no problems with cairo 1.8.x, including presumably in some of those applications that uses images in a read-only way from multiple threads. In addition, requiring not to change the properties may be a bit restrictive, limiting the usefulness of this model. I would suggest to generally forbid using the same pixman_image_t from multiple threads, but also add a function for creating pixman_image_t clone, which would share the pixel buffer with the original image and track pixel buffer lifetime via thread-safe refcounting. So using the same image from another thread would just require creating a clone instead of doing 'pixman_image_ref', all the rest should be exactly the same. It might be possible to do both: guarantee that read-only use will work, but then also add a pixman_image_clone() function for the case where you want the same pixels with different properties. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 0/18] Add iterators to images
Søren Sandmann sandm...@cs.au.dk writes: The following patch series changes the scanline access to be based on iterators instead of direct calls to virtual functions. There are several benefits to this: The patches are also available in the iterators3 branch here: http://cgit.freedesktop.org/~sandmann/pixman/log/?h=iterators3 Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Making ref counting thread safe
Christian Hergert christian.herg...@gmail.com writes: If I where to go through and work on making the reference counting thread safe, what things would I need to make sure I take into account for the patch to be accepted? For example: I put up some notes on the subject here: http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=docsid=773a9833cb4bb887239dd358cf584e1499276f4a Basically, the only two types of systems that really matter is Windows and pthreads. On Windows we always have atomic primitives available through the Interlocked* functions. On pthreads, we can statically initialize mutexes, so if worst comes to worst, we can fake the atomics with mutexes. * Do we just need to use __sync_fetch_and_add()/__sync_sub_and_fetch() within the functions? I think they should be hidden behind PIXMAN_ATOMIC_* macros. * Should we abstract the atomic operations to support compilers other than GCC? If so, what compilers do we need to support? With the approach described in the link above, I think it should be possible to support most compilers without too much trouble. * Do we wan't to fallback to ++/-- when used under X since threading isn't needed? (Add something like pixman_thread_init()). Wouldn't any such mechanism have overhead in itself? In any case, I don't think this would be necessary until benchmarks say otherwise. * Is it okay to start with just pixman_image_t and support others in future patches? I think that's fine. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 3/7] Extend gradient-crash-test
Jon TURNEY jon.tur...@dronecode.org.uk writes: This seems to have a stray #include fenv.h, which causes compilation to fail when fenv.h doesn't exist, see [1]. Applied. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] Delete simple repeat code
Siarhei Siamashka siarhei.siamas...@gmail.com writes: As it turns out, non-scaled images having NORMAL repeat (tiled pictures) are actually used occasionally (typically with SRC operator). And removing simple repeat code caused ~4x performance regression in these cases. Looks like a new fast path is needed to solve the problem. Just an update to this. I tried to check whether this is easily solvable, but looks like there are a handful of missing fast paths for NORMAL repeat (they primarily show up when browsing various pages). Just doing it in a simple and straightforward way by adding all of them as new fast path functions may cause more redundancy in the pixman source code. Modifying the existing scaled fast paths to accept non-scaled cases also works, but it does not provide the best performance. An idea might be to move the lookup_fast_path() into pixman-utils.c, and then write a new fast path that would call this function to look up another fast path and call it through the simple repeat code. This would require the full set of flags to be passed down to the fast paths, but we need that for the performance reporting anyway. Just a suggestion, I don't know whether it would work in practice. (An interesting possibility from this would be to use the regular blitters as the combiners in the general code. That way, if you have, say, a gradient IN a8 OVER 565 operation, then instead of converting both a8 and 565 to , an existing over__8_565 function could just be called with pointers directly into the mask and destination. Some changes to the general code would obviously be required though). For now I just reverted this patch in my pixman tree to workaround the problem and buy some more time to think about how it can be solved. Does reverting the patch actually work? The standard flags currently include SAMPLES_COVER_CLIP, and for a typical repeating blit, that would not be set. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Add support for AltiVec detection for OpenBSD/PowerPC.
Brad b...@comstyle.com writes: On Tuesday 14 December 2010 14:44:09 Søren Sandmann wrote: From: Brad Smith b...@comstyle.com Bug 29331. Why was this posted? It was already commited 4 months ago. I mistakenly sent out a bunch of old patches. Sorry about that. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Check for NEON using a signal handler to catch SIGILL
Mike McCormack mj.mccorm...@samsung.com writes: Pixman already uses the SIGILL mechanism elsewhere in the same file. FWIW, the SIGILL mechanism for PowerPC is only used if the operating system is neither OS X, OpenBSD, or Linux. It used to be used on Linux, but it caused some kind of a infinite loop in the build system at Red Hat, where it kept generating and handling SIGILL, so David Woodhouse changed it to parse /proc/auxv. See cd2a79ab81045aa7e35bc901081e57 I don't know whether the SIGILL handling in pixman was buggy, or whether there was some other underlying reason. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Fix argument quoting for AC_INIT.
Cyril Brulebois k...@debian.org writes: One gets rid of this accordingly: | autoreconf -vfi | autoreconf: Entering directory `.' | autoreconf: configure.ac: not using Gettext | autoreconf: running: aclocal --force | configure.ac:61: warning: AC_INIT: not a literal: pixman@lists.freedesktop.org | autoreconf: configure.ac: tracing | configure.ac:61: warning: AC_INIT: not a literal: pixman@lists.freedesktop.org Signed-off-by: Cyril Brulebois k...@debian.org Thanks - applied. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [ssse3]Optimization for fetch_scanline_x8r8g8b8
Xu, Samuel samuel...@intel.com writes: Appreciate comments on this patch I'll be away for the next three weeks, so I won't be able to review anything until then. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] [ANNOUNCE] pixman release 0.21.2 now available
A new pixman release 0.21.2 is now available. This is the first development snapshot leading up to a stable 0.22 release. News: ARM: Performance improvements for image scaling [Siarhei Siamashka] Performance improvements for affine transformations [Soren Sandmann] Plus bug fixes and other improvements [Andrea Canciani, Siarhei, Soren]. NOTE: In this release a workaround for a bug in older version of the X server has been removed. If your X server is version 1.6 or older, you may see image corruption bugs with this version of pixman. Thanks, Soren tar.gz: http://cairographics.org/releases/pixman-0.21.2.tar.gz http://xorg.freedesktop.org/archive/individual/lib/pixman-0.21.2.tar.gz tar.bz2: http://xorg.freedesktop.org/archive/individual/lib/pixman-0.21.2.tar.bz2 Hashes: MD5: 9e09fd6e58cbf9717140891e0b7d4a7a pixman-0.21.2.tar.gz MD5: 4bc4cf052635265f7a98ad3e890ae329 pixman-0.21.2.tar.bz2 SHA1: c0ff07d7e4877dd4d0d369ca09e50ca956e3386e pixman-0.21.2.tar.gz SHA1: bb5cba514fe6756f9264a6995ec9dfed4d7439b5 pixman-0.21.2.tar.bz2 GPG signature: http://cairographics.org/releases/pixman-0.21.2.tar.gz.sha1.asc (signed by Søren Sandmann Pedersen sandm...@daimi.au.dk Git: git://git.freedesktop.org/git/pixman tag: pixman-0.21.2 Log: Andrea Canciani (3): Remove unused stop_range field Fix opacity check Improve conical gradients opacity check Siarhei Siamashka (12): Fixed broken configure check for __thread support Do CPU features detection from 'constructor' function when compiled with ARM: fix 'vld1.8'-'vld1.32' typo in add__ NEON fast path ARM: NEON: source image pixel fetcher can be overrided now ARM: nearest scaling support for NEON scanline compositing functions ARM: macro template in C code to simplify using scaled fast paths ARM: performance tuning of NEON nearest scaled pixel fetcher ARM: NEON optimization for scaled over__ with nearest filter ARM: NEON optimization for scaled over__0565 with nearest filter ARM: NEON optimization for scaled src__0565 with nearest filter ARM: NEON optimization for scaled src_0565_ with nearest filter ARM: optimization for scaled src_0565_0565 with nearest filter Søren Sandmann Pedersen (8): Post-release version bump to 0.20.1 Version bump 0.21.1. COPYING: Stop saying that a modification is currently under discussion. Remove workaround for a bug in the 1.6 X server. [mmx] Mark some of the output variables as early-clobber. Delete the source_image_t struct. Generate {a,x}8r8g8b8, a8, 565 fetchers for nearest/affine images Pre-release version bump ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/3] Fix opacity check
Andrea Canciani ranm...@gmail.com writes: @@ -436,6 +436,12 @@ compute_image_info (pixman_image_t *image) { int i; + if (image-common.type == RADIAL + image-radial.a = 0) + { + break; + } I think this code needs a comment explaining why it's necessary, and why checking for a = 0 works. Also, instead of having the extra if statement, I think I'd prefer to refactor the switch like this: case RADIAL: /* Comment explaining why this test is necessary */ if (image-radial.a = 0) break; /* Fall through */ case LINEAR: ... Other than, this and the other two patches look good to me. It would of course also be very useful to extend the test suite to catch this problem. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Plug leak in the alphamap test.
Jon TURNEY jon.tur...@dronecode.org.uk writes: Even with this fix, the cygwin tinderbox is still failing (silently) [1]. The alphamap process grows to 2G in size and then mmap() starts failing. It looks like this is because we are still leaking the alphamap pixmaps. Attached is a patch to fix this, although I am not very familiar with the pixman API, so this might not be the best way to do this :-) It's close enough. I have pushed it to master. It would be better if this failure wasn't silent, I only noticed it by accident. On the other hand, isn't it a little iffy to fail the test suite just because malloc() or mmap() failed? Though, I guess many of the tests would actually fail in that case anyway. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Radial gradients (again)
Andrea Canciani ranm...@gmail.com writes: I don't remember the timeline for pixman 0.20, Basically, I wanted to get a new release out tomorrow and call it a release candidate, but it looks like that's not going to happen. I think pixman 0.20 by now is feature frozen, and the next development release will become 0.20 unless major bugs turn up. The one thing that could still make it is the linear gradient fix. As I recall, there was some issues about the classify method. If someone wants to take a look at that patch: http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/linear-masterid=7a9433f3942cf4f0c929fb3907d89af87a2d477c it would be very helpful. I want to port this test or something very similar to pixman and check it against a different (and slow, but numerically stable) implementation of the gradients, but it will take some time, because I will be quite busy until the end of October. This sounds like it would be extremely useful. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Gradients patches
Andrea Canciani ranm...@gmail.com writes: Indeed, the code looks mostly fine to me; I couldn't find any functional problems with it. The main complaints I have: - In the affine case, some of the computation is done in 32.32, and some in double precision. Is there any reason for this? Doing all of it in double precision would let us get rid of the dot() function and the fdot() one could then be renamed to just dot(). To keep the whole precision, using 32.32 is needed. If we want to use double there, we should at least stop doing the differences trick (and still the precision would be lower). Aside from the formatting issues mentioned in IRC, I think the latest code is probably fine. We will need to find out exactly what numerical guarantees should be given and then add tests to the test suite to verify that we meet them in both the affine and the projective. However, simply having a well-defined specification for the radials is a big step forward. I think this *does* need to be cross posted to xorg-devel because even though the spec says that it only allows gradients where one circle is contained in the other, in practice, it doesn't check for that. And cairo does not check that the circles are contained within eachother before sending them off to the X server. I see that now xorg-devel is in the CC addresses, anyway Cairo should pay more attention to what it does. Latest RENDER specification says: The array of stops has to contain values between 0 and 1 (inclusive) and has to be ordered in increasing size or a Value error is generated. The inner circle has to be completely contained inside the outer one or a Value error is generated. It would be useful for someone to update the Render spec to say that the radials are PDF Type 3 Shadings, defined in section 8.7.4.5.4 of the PDF specification, and delete the stuff about the circles being contained in each other. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Gradients patches
Andrea Canciani ranm...@gmail.com writes: On Mon, Oct 4, 2010 at 12:07 PM, Andrea Canciani ranm...@gmail.com wrote: This morning I pushed http://cgit.freedesktop.org/~ranma42/pixman/log/?h=radial-for-master It should be ready for master since it is documented, tested (at least on my laptop) and not using any new features (so it should not be broken on other architectures/compilers/etc). As suggested on IRC, I tried to cleanup the patch and improve code reuse. The result is http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/radial-master Except for differentiation, there is basically no clever trick and the code should look quite straightforward and minimal. Indeed, the code looks mostly fine to me; I couldn't find any functional problems with it. The main complaints I have: - In the affine case, some of the computation is done in 32.32, and some in double precision. Is there any reason for this? Doing all of it in double precision would let us get rid of the dot() function and the fdot() one could then be renamed to just dot(). - The various values that don't depend on the coordinates (a, cdx, cdy etc) can be precomputed and stored in the radial struct. The current code does that (and at the very least, if you are not going to precompute them, the existing ones should be removed). - I think both the big comment and the commit message should mention the section of the PDF spec where the gradients are specified. - Formatting. Please format according to CODING_STYLE. In particular: - braces on their own line - blank lines - between logically distinct pieces of code For example there should be a blank line between the computation of db and the computation of dc and ddc since these relate to two different variables (b and c). - after blocks of variable declarations - in the big comment that describes the equation Also, please indent the formulas to set them off from the rest of the text and surround them with blank lines. I'm omitting the crosspost to the X mailing list because I checked again the RENDER extension and it currently only allows radial gradients with one circle completely contained in the other one (and in this case the old definition gives the same results as the new one). I think this *does* need to be cross posted to xorg-devel because even though the spec says that it only allows gradients where one circle is contained in the other, in practice, it doesn't check for that. And cairo does not check that the circles are contained within eachother before sending them off to the X server. Not all of them were actually true, so I took the linear-float branch and worked on it: http://cgit.freedesktop.org/~ranma42/pixman/commit/?h=wip/linear-master I cleaned it up and made it more robust with respect to error propagation. The performance is just slightly worse than its fixed-point counterpart (around 10% on x86_64), but it has the big advantage of avoiding all the overflows. (For the cairo-interested ones, this fixes gradient-linear-large). I didn't change linear_gradient_classify (), but it should be corrected or, if it does not provide a measurable performance gain, removed. (The code seem to indicate that now the only two meaningful classes are UNKNOWN and HORIZONTAL). HORIZONTAL is still important because for horizontal gradients, only one scanline needs to be rendered; that one can be repeated over and over. See general_composite_rect(). But yes, linear_gradient_classify() should be corrected. If vertical is not used in the gradient code itself anymore, that enum value can be removed. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Plug leak in the alphamap test.
Søren Sandmann sandm...@daimi.au.dk writes: static pixman_image_t * make_image (pixman_format_code_t format) { uint32_t *bits; uint8_t bpp = PIXMAN_FORMAT_BPP (format) / 8; +pixman_image_t *image; bits = (uint32_t *)make_random_bytes (WIDTH * HEIGHT * bpp); -return pixman_image_create_bits (format, WIDTH, HEIGHT, bits, WIDTH * bpp); +image = pixman_image_create_bits (format, WIDTH, HEIGHT, bits, WIDTH * bpp); + +if (image bits) + pixman_image_set_destroy_function (image, on_destroy, NULL); } I need a return image there, of course. Somehow this didn't actually cause the test to fail. Maybe image was in eax or something. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 4/5] test: Parallize composite.c with OpenMP
Maarten Bosmans mkbosm...@gmail.com writes: Why would you use OpenMP at all, just to run the tests in parallel? I'd imagine that the thread-level parallelism it gives isn't necessary and the tests can just as well be parallized on process-level. Isn't it easier to avoid the dependency on OpenMP and just use a clever Makefile to run multiple tests at the same time? We already use OpenMP for a number of tests. It's not a hard dependency - if the configure script doesn't detect OpenMP, it will just run them serially. I don't think Makefile hackery would be simpler since right now all we have to do to get the test suite running is this: TESTS = $(TESTPROGRAMS) Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] FPU-based implementation of the core pixel pipeline
Dmitri Vorobiev dmitri.vorob...@movial.com writes: As for the conversion table, what comes into my mind is keeping a global table, and accessing it using inter-process communication mechanisms such as a shared memory area. Any comments on this? A pre-generated 'static const' table would certainly be better, but half a megabyte is probably still too much. How must of a performance improvement is that table really over just using regular conversions? Finally, what exactly are the coding style problems? Please see the CODING_STYLE file. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Some clean-ups in fence_malloc() and fence_free()
Dmitri Vorobiev dmitri.vorob...@movial.com writes: This patch removes an unnecessary typecast of MAP_FAILED, replaces an erroneous free() by the correct munmap() in the error path for a failing mprotect(), and, finally, removes redundant calls to mprotect() that aren't necessary, because munmap() doesn't call for any specific memory protection. Applied, thanks. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] test: Change composite so that it tests randomly generated images
Siarhei Siamashka siarhei.siamas...@gmail.com writes: On Sunday 07 March 2010, Søren Sandmann wrote: Previously this test would try to exhaustively test all combinations of formats and operators, which meant that it would take years to run. Instead, generate random images and test those. The random seed is based on time(), so we will get different tests every time it is run. Whenever the test fails, it will print what the value of lcg_seed was when the test began. This patch also adds a table of random seeds that have failed at some point. These seeds are run first before the random testing begins. Whenever this tests fails, you are supposed to add the seed that failed to the table, so that we can track that the bug doesn't get reintroduced. I don't quite like any nondeterministic random factor in the standard regression tests. Preferably the results of such tests should be reproducible from run to run, even if they are not perfect and do not provide full coverage. I just sent a new set of patches that don't have the nondeterministic behavior. However, considering that several of the tinderboxes here: http://tinderbox.x.org/ are running make check over and over it would be really useful to make them run different sets of tests each time. This is quite important for having a clearly defined formal patch submission process (Before submitting a patch, one needs to make sure that the regression tests pass. If they don't pass, the problem has to be investigated and patch fixed or regression tests updated if needed). With the randomness in the tests, patch contributor may end up in different confusing situations: - regression test fails for him, even if his patch is fine (if the problem was introduced by somebody else earlier) - regression test passes for him, but fails for the others later (due to the bug in the patch). In this case it would be hard to say if the contributor did proper job running the regression tests in the first place. I'm not sure this is a big problem. If the test fails, it would print out the seed that failed so the contributor can try the test again without the patch applied. This will allow him to determine whether the problem was introduced by him or not. If it wasn't, hopefully he would report the bug. It's true that some people, if the test fails intermittently for them, might try submitting anyway hoping that nobody will notice. However, I don't think most people would do that, and if the test fails so rarely that they could get away with something like that, any fixed subset of the test would likely also miss the problem. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [RFC] Performance reporting capabilities for pixman?
Siarhei Siamashka siarhei.siamas...@gmail.com writes: Let's face it, I think the end users rarely report performance problems in pixman unless the performance is really notoriously bad (and not ignored because it's software rendering, so it is expected to be slow). Finding real performance bottlenecks in pixman requires running a profiler and being able to actually interpret its logs. Yeah, in fact almost all users never report *anything* unless it's so broken that they can't possibly work around it. They have been trained to do this because many projects simply ignore bugs for years in some cases, and because bugzilla is a huge pain to deal with. I wonder if it would make sense to add an extra configure option for the special profiling enabled pixman build which would do: 1. Avoid caching fast path lookup failures 2. Once hitting slow path, record the type of operation, image color formats and image flags. 3. For each unique slow path variant, accumulate the number of times it got hit, the total number of (destination?) pixels which got processed, the total number of scanlines. 4. Once per N minutes, spam into syslog with the current accumulated statistics. This all makes sense to me. It might even be interesting to have it compiled in by default, but disabled unless it is turned on through an environment variable. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Build failure with pixman-0.19.4 - under MinGW64
Tilton, James C. (GSFC-6063) james.c.til...@nasa.gov writes: The following build error was encountered: $ make make all-recursive make[1]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4' Making all in pixman make[2]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/pixman' make all-am make[3]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/pixman' make[3]: Nothing to be done for `all-am'. make[3]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/pixman' make[2]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/pixman' Making all in test make[2]: Entering directory `/home/jtilton/Downloads/pixman-0.19.4/test' CC utils.o In file included from utils.c:1:0: utils.h:12:0: warning: ignoring #pragma omp threadprivate utils.c: In function 'fence_malloc': utils.c:226:5: warning: implicit declaration of function 'getpagesize' utils.c:238:5: warning: implicit declaration of function 'mmap' utils.c:238:33: error: 'PROT_READ' undeclared (first use in this function) utils.c:238:33: note: each undeclared identifier is reported only once for each function it appears in utils.c:238:45: error: 'PROT_WRITE' undeclared (first use in this function) utils.c:238:57: error: 'MAP_PRIVATE' undeclared (first use in this function) utils.c:238:71: error: 'MAP_ANONYMOUS' undeclared (first use in this function) utils.c:241:25: error: 'MAP_FAILED' undeclared (first use in this function) utils.c:247:33: warning: cast from pointer to integer of different size utils.c:247:20: warning: cast to pointer from integer of different size utils.c:257:5: warning: implicit declaration of function 'mprotect' utils.c:258:5: error: 'PROT_NONE' undeclared (first use in this function) utils.c: In function 'fence_free': utils.c:288:8: error: 'PROT_READ' undeclared (first use in this function) utils.c:288:20: error: 'PROT_WRITE' undeclared (first use in this function) utils.c:293:5: warning: implicit declaration of function 'munmap' Thanks for the bug report. It would be helpful if you can send the output of the configure script. It looks like it detects mmap() etc., but not the sys/mman.h header file. utils.c: In function 'fuzzer_test_main': utils.c:407:0: warning: ignoring #pragma omp parallel utils.c: At top level: utils.c:457:1: warning: 'on_alarm' defined but not used make[2]: *** [utils.o] Error 1 make[2]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4/test' make[1]: *** [all-recursive] Error 1 make[1]: Leaving directory `/home/jtilton/Downloads/pixman-0.19.4' make: *** [all] Error 2 I am not able to find where PROT_READ, PROT_WRITE, PROT_NONE, MAP_PRIVATE, MAP_ANONYMOUS, and MAP_FAILED are defined. On a regular Unix system, these are defined in the sys/mman.h header. Do you happen to know if MinGW has that header file or the equivalent? Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/1] Add a lowlevel blitter benchmark
Hi, +#include stdint.h +#include stdio.h +#include string.h +#include stdlib.h +#include unistd.h +#include stdlib.h +#include malloc.h I don't think either unistd.h or malloc.h needs to be included here. +void __attribute__((noinline)) __attribute__((noinline)) is GCC specific, so it needs to be defined in pixman-compiler.h similarly to force_inline. I think the Visual C version is called __declspec(noinline). Also, in the gettime patch, utils.c needs to include sys/time.h to get the prototype for gettimeofday(). It should be checked for in configure.ac with AC_CHECK_HEADER(). Other than that, these patches look good to me. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 1/5] Add fence_malloc() and fence_free().
Søren Sandmann sandm...@daimi.au.dk writes: From: Søren Sandmann Pedersen s...@redhat.com These variants of malloc() and free() try to surround the allocated memory with protected pages so that out-of-bounds accessess will cause a segmentation fault. If mprotect() and getpagesize() are not available, these functions are simply equivalent to malloc() and free(). This patch was pretty broken, in two ways. +AC_CHECK_FUNC(getpagesize, have_getpagesize=yes, have_=no) ^ A typo here. --- a/test/utils.c +++ b/test/utils.c @@ -5,6 +5,8 @@ #include unistd.h #endif +#include sys/mman.h + This should be guarded by HAVE_SYS_MMAN_H. New version below. Soren commit 9b735ad08f9ad1b150534739a9d7ac66ae67d9de Author: Søren Sandmann Pedersen s...@redhat.com Date: Mon Sep 13 14:34:34 2010 -0400 Add fence_malloc() and fence_free(). These variants of malloc() and free() try to surround the allocated memory with protected pages so that out-of-bounds accessess will cause a segmentation fault. If mprotect() and getpagesize() are not available, these functions are simply equivalent to malloc() and free(). diff --git a/configure.ac b/configure.ac index dbff2a6..d3b71fa 100644 --- a/configure.ac +++ b/configure.ac @@ -623,6 +623,19 @@ if test x$have_alarm = xyes; then AC_DEFINE(HAVE_ALARM, 1, [Whether we have alarm()]) fi +AC_CHECK_HEADER([sys/mman.h], + [AC_DEFINE(HAVE_SYS_MMAN_H, [1], [Define to 1 if we have sys/mman.h])]) + +AC_CHECK_FUNC(mprotect, have_mprotect=yes, have_mprotect=no) +if test x$have_mprotect = xyes; then + AC_DEFINE(HAVE_MPROTECT, 1, [Whether we have mprotect()]) +fi + +AC_CHECK_FUNC(getpagesize, have_getpagesize=yes, have_getpagesize=no) +if test x$have_getpagesize = xyes; then + AC_DEFINE(HAVE_GETPAGESIZE, 1, [Whether we have getpagesize()]) +fi + dnl = dnl Thread local storage diff --git a/test/utils.c b/test/utils.c index d95cbc2..ec6fd4e 100644 --- a/test/utils.c +++ b/test/utils.c @@ -5,6 +5,10 @@ #include unistd.h #endif +#ifdef HAVE_SYS_MMAN_H +#include sys/mman.h +#endif + /* Random number seed */ @@ -197,10 +201,111 @@ image_endian_swap (pixman_image_t *img, int bpp) } } +#define N_LEADING_PROTECTED10 +#define N_TRAILING_PROTECTED 10 +#define INITIAL_PAGE + +typedef struct +{ +void *addr; +uint32_t len; +uint8_t *trailing; +} info_t; + +#if defined(HAVE_MPROTECT) defined(HAVE_GETPAGESIZE) + +void * +fence_malloc (uint32_t len) +{ +unsigned long page_size = getpagesize(); +unsigned long page_mask = page_size - 1; +uint32_t n_payload_bytes = (len + page_mask) ~page_mask; +uint32_t n_bytes = + (len + +page_size * (N_LEADING_PROTECTED + N_TRAILING_PROTECTED + 2) + +n_payload_bytes) ~page_mask; +uint8_t *initial_page; +uint8_t *leading_protected; +uint8_t *trailing_protected; +uint8_t *payload; +uint8_t *addr; + +addr = malloc (n_bytes); + +if (!addr) +{ + printf (malloc failed on %u %u\n, len, n_bytes); + return NULL; +} + +initial_page = (uint8_t *)(((unsigned long)addr + page_mask) ~page_mask); +leading_protected = initial_page + page_size; +payload = leading_protected + N_LEADING_PROTECTED * page_size; +trailing_protected = payload + n_payload_bytes; + +((info_t *)initial_page)-addr = addr; +((info_t *)initial_page)-len = len; +((info_t *)initial_page)-trailing = trailing_protected; + +if (mprotect (leading_protected, N_LEADING_PROTECTED * page_size, + PROT_NONE) == -1) +{ + free (addr); + return NULL; +} + +if (mprotect (trailing_protected, N_TRAILING_PROTECTED * page_size, + PROT_NONE) == -1) +{ + mprotect (leading_protected, N_LEADING_PROTECTED * page_size, + PROT_READ | PROT_WRITE); + + free (addr); + return NULL; +} + +return payload; +} + +void +fence_free (void *data) +{ +uint32_t page_size = getpagesize(); +uint8_t *payload = data; +uint8_t *leading_protected = payload - N_LEADING_PROTECTED * page_size; +uint8_t *initial_page = leading_protected - page_size; +info_t *info = (info_t *)initial_page; +uint8_t *trailing_protected = info-trailing; + +mprotect (leading_protected, N_LEADING_PROTECTED * page_size, + PROT_READ | PROT_WRITE); + +mprotect (trailing_protected, N_LEADING_PROTECTED * page_size, + PROT_READ | PROT_WRITE); + +free (info-addr); +} + +#else + +void * +fence_malloc (uint32_t len) +{ +return malloc (len); +} + +void +fence_free (void *data) +{ +free (data); +} + +#endif + uint8_t * make_random_bytes (int n_bytes) { -uint8_t *bytes = malloc (n_bytes); +uint8_t *bytes = fence_malloc (n_bytes); int i; if (!bytes) diff --git a/test/utils.h b/test/utils.h index
Re: [Pixman] [PATCH] Add a lowlevel blitter benchmark
Dmitri Vorobiev dmitri.vorob...@movial.com writes: diff --git a/test/Makefile.am b/test/Makefile.am index 3d98e17..8f3fdaf 100644 --- a/test/Makefile.am +++ b/test/Makefile.am @@ -18,7 +18,8 @@ TESTPROGRAMS = \ scaling-crash-test \ blitters-test \ scaling-test\ - composite + composite \ + lowlevel-blt-bench As Siarhei mentioned, this shouldn't go in TESTPROGRAMS. Eventually, I'd like to have a separate performance test suite, perhaps even in its own directory next to test, but for now, let's just add it in the test directory. +++ b/test/lowlevel-blt-bench.c @@ -0,0 +1,713 @@ +/* + * Copyright © 2010 Movial Creative Technologies Oy + * + * Permission to use, copy, modify, distribute, and sell this software and its + * documentation for any purpose is hereby granted without fee, provided that + * the above copyright notice appear in all copies and that both that + * copyright notice and this permission notice appear in supporting + * documentation, and that the name of ARM Ltd not be used in + * advertising or publicity pertaining to distribution of the software without + * specific, written prior permission. ARM Ltd makes no + * representations about the suitability of this software for any purpose. It + * is provided as is without express or implied warranty. + * + * THE COPYRIGHT HOLDERS DISCLAIM ALL WARRANTIES WITH REGARD TO THIS + * SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND + * FITNESS, IN NO EVENT SHALL THE COPYRIGHT HOLDERS BE LIABLE FOR ANY + * SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES + * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN + * AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING + * OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS + * SOFTWARE. + * + * Author: Jonathan Morton (jonathan.mor...@movial.com) + * + * Based on the code by Siarhei Siamashka siarhei.siamas...@gmail.com + * + */ This says copyright Movial, but mentions ARM Ltd in the text. Unless you have a reason to not use it, the preferred license is the one in COPYING. Its main advantage is that it avoids this kind of cut-and-paste issue. Also, is this code mainly written by Siarhei? If so, presumably either Nokia or Siarhei are also copyright holders. +tests_tbl[] = +{ +{ src_n_,PIXMAN_a8r8g8b8,1, PIXMAN_OP_SRC, PIXMAN_null, 0, PIXMAN_a1r5g5b5 }, This one looks like it should have a2r2g2b2 in the destination +{ src_n_,PIXMAN_a8r8g8b8,1, PIXMAN_OP_SRC, PIXMAN_null, 0, PIXMAN_a1r5g5b5 }, This one looks wrong too. +/* Return time since the Epoch in seconds */ +double +gettime (void); Like the malloc() changes, this could usefully be done in its own patch. Thanks, Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [cairo] [PATCH] Added MIPS32R2 and MIPS DSP ASE optimized functions
The DSP ASE version uses the MIPS32R2 version of pixman_fill32(). Initially the plan was to develop separate versions of each function for each target instruction set (and DSP ASE Rev 2 was also on the list) but the DSP ASE version of fill32 was eventually dropped. There are no MIPS processors that implement DSP ASE but are not MIPS32R2-compatible. It is convenient to be able to fall back to MIPS32R2 implementation if the DSP ASE code does not provide enough speedup to justify its use. Right, I'm not complaining about the fact that it falls back. I'm saying that the MIPS32R2 code should be added in one patch, followed by another patch that adds the DSP ASE code. - Is there a reason to not do runtime checking? I realize that most people using MIPS will likely do so on an embedded system where they know ahead of time what the CPU supports, but we do have runtime checking for the other CPU specific implementations. Yes, at the present time this isn't very easy to do on MIPS Linux -- checking if DSP ASE is implemented requires the use of privileged instructions. MIPS is aware of this problem and will push an update to the kernel that enables the use of AT_HWCAP for this purpose. OK. Thanks for the all the comments! I'll update the code per your comments and submit another patch. Should I mail it just to the pixman list or CC both cairo and pixman? Usually for pixman patches, it's fine to just mail the pixman lists, although if a patch has implications for X or cairo, then those lists should be CC'd as well. One other thing: We need a copyright statement for the new files. The canonical copyright text can be found in the pixman/COPYING file. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Faster C variant of over_n_8_8888 fast path
Siarhei Siamashka siarhei.siamas...@gmail.com writes: +/* A variant of 'over', which works faster for non-additive blending on the + * platforms which do not have special instructions for saturated addition + */ +static force_inline uint32_t +over_a (uint32_t src, uint32_t dest, pixman_bool_t additive_blending) +{ +uint32_t a = ~src 24; +if (additive_blending) +{ + UN8x4_MUL_UN8_ADD_UN8x4 (dest, a, src); + return dest; +} +else +{ + UN8x4_MUL_UN8 (dest, a); + return dest + src; +} +} Is there any reason to not just add a boolean additive_blending to the existing force_inline over() function? It might also be interesting to add the check as a new NOT_SUPER_LUMINESCENT flag and then simply require it for the source for all the over_n_*() functions. That would allow similar optimizations for the n_8_565 case and probably the n___ca() case as well. The flag could be set for all the gradients and any time an image is opaque. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] PDF radial gradients
Andrea Canciani ranm...@gmail.com writes: I updated the documentation of radial gradients in my wip/radial branch (http://cgit.freedesktop.org/~ranma42/pixman/log/?h=wip/radial) to reflect my changes (which make pixman gradients behave as specified by pdf reference manual). It's of course important to point out here that changing the radial gradients to behave as specified in the PDF spec, really is a change from they used to do. As I understand it, for most common cases, the output is identical to what it used be, is that correct? It would be helpful to have a description of the cases where the behavior differs. Other than that, I'm in favor of just using the PDF spec here. I don't think there is much real value in coming up with our own specification. Regarding this: When a point does not belong to any of the circles, it is transparent. It's probably worthwhile to say explicitly that the point has the RGBA value (0, 0, 0, 0) instead of transparent. That way it's clear that you can't leave out the point when using the SOURCE operator for example. I'm not completely clear on what cairo is doing (or supposed to do) in this case. The cairo SOURCE operator is bounded by the mask, but is it also bounded by the source? The code is not yet ready to be merged (it uses __int128 variables, which probably won't work on 32 bit architectures), but I would appreciate reviews (expecially of the new documentation) and testing. If __int128 is intended to help with non-FPU systems, then let me repeat that using floating point in pixman is totally fine, and almost certainly faster than any __int128 would be. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] PDF radial gradients
Andrea Canciani ranm...@gmail.com writes: The behavior is the same when one of the two circle completely contains the other one, otherwise it is different. What is the right place for this observation? The code documentation? Or is it better to write it in the commit message? The commit message, I'd say, though it's probably a good idea to also CC the X and cairo lists to make sure they know this is happening. The code is not yet ready to be merged (it uses __int128 variables, which probably won't work on 32 bit architectures), but I would appreciate reviews (expecially of the new documentation) and testing. If __int128 is intended to help with non-FPU systems, then let me repeat that using floating point in pixman is totally fine, and almost certainly faster than any __int128 would be. The purpose of using 128 bits is avoiding approximations (thus retaining full precision), but I also expect it to be faster than floating point, because in the inner loop the values are updated using multiple differentiation (i.e. just 3 additions), beside the double-precision computations when actually needed. Okay, if it's necessary for precision, that's fine. I haven't looked at the code yet. Why do you think that floating point would be faster? I'd guess on most systems, arithmetic on floating point values will be much faster than arithmetic on int128s, simply because an int128 would have to be built up from two 64 bit, or four 32 bit registers. (I expect that int to double conversion is the slow operation in this code, but it only happens when a sqrt is computed, so it's probably not a big deal) Did you also consider the changes required to retain full precision? Thank you for your comments, I'll fix the documentation and benchmark fixed vs float soon Thanks for fixing the gradients btw. At this point they are by far the most embarrassing part of pixman. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Valgrind-clean pixman
Andrea Canciani ranm...@gmail.com writes: I may be wrong here, but seems like everyone has his own definition of what is considered to be thread safe. Currently pixman appears to be thread safe as long as same pixman objects (pixman_image_t and the others) are not used from multiple threads simultaneously. This is somewhat similar to the thread safety assumptions of the GNU implementation of a standard C++ template library. I'm adding this link here because they took care of documenting what can be guaranteed when working in a multi-threaded environment, and what can't be: http://gcc.gnu.org/onlinedocs/libstdc++/manual/using_concurrency.html#manual.intro.using.concurrency.thread_safety http://www.sgi.com/tech/stl/thread_safety.html I see. Currently cairo makes stronger assumptions about pixman thread safety (something like: a pixman_image can be used as source from multiple threads as long as no threads is writing to it). I think allowing an image to be used as a source in multiple threads and as a destination in only one, is a reasonable guarantee to give. How far from that are we at the moment? I'm working on a library of simple inline functions (based on cairo atomic and mutex implementations) to share some common utility functions which have different availability on different platforms (right now: atomics, wideint, mutexes). I hope to make it available soon (after I get autotools do the right things to make it work). I'd just as well have these things in pixman directly, unless your new library can become some sort of 'libfreedesktop' type thing that contains the various data structures and portability gunk that all projects end up having. I wrote some notes at one point on how mutexes and atomics could be added: http://cgit.freedesktop.org/~sandmann/pixman/commit/?h=threadsid=99c5d5edfef6275ed38e4ec9881a59f5b4ba9492 Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Valgrind-clean pixman
Siarhei Siamashka siarhei.siamas...@gmail.com writes: We probably want to implement DllMain on win32 to allocate/free TLS anyway. Yes, I think it makes sense to give this a try. The constructor attribute is supported since at least gcc 2.95, and it seems to be fine in clang 2.7 too. There's always a risk that this stuff may be broken for some gcc versions or target systems, but we can never know before trying :) The only problematic case I can think of is Solaris with older versions of the Sun compiler. However, we may be able to use the _init() and _fini() functions there. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Use windows.h directly for mingw32 build
Maarten Bosmans mkbosm...@gmail.com writes: This patch adresses the issue discussed in http://lists.freedesktop.org/archives/pixman/2010-April/000163.html There were only two clashing identifiers. The first one is IN, which obviously causes problems in Pixman for lines like PIXMAN_STD_FAST_PATH (IN, solid, a8, a8, fast_composite_in_n_8_8), Fortunately the mingw headers provide a solution: by defining _NO_W32_PSEUDO_MODIFIERS, these stupid symbols are skipped. The other name is UINT64, used in pixman-mmx.c. I renamed that function to to_uint64, but may be another name is more appropriate. It may also be good to rename the other function M64 to to_m64, but I left that one alone for now. I do think M64 should be renamed to to_m64() for consistency, but other than that, the patch looks good to me. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] is whitespace clean-up good for the upstream?
Dmitri Vorobiev dmitri.vorob...@movial.com writes: While working on a task, which is only loosely related to Pixman upstream, I noticed that Pixman sources contain quite many lines consisting of whitespace only. Since such lines have an adverse effect on what I am doing now, I am cleaning away such whitespaces. I could submit a patch doing such clean-up, however, I would like to know first if it is interesting for Pixman maintainers and would be applied. I'm curious why this whitespace is a problem. The CODING_STYLE document used to says something about avoiding trailing whitespace, but I removed it recently because it just doesn't bother me. So, my question is: it there any interest in applying a code clean-up patch that does not change any functionality (and even does not change any compiled code)? If there is some good reason to do this, it could be considered, but this kind of formatting change tends to introduce gratuitous conflicts in people's branches. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH 2/2] ARM: added 'neon_composite_src_8888_8_0565' fast path
Siarhei Siamashka siarhei.siamas...@gmail.com writes: +.macro pixman_composite_src_n_8_0565_process_pixblock_head +/* in */ +vmull.u8q15, d24, d2 +vmull.u8q3, d24, d1 +vmull.u8q2, d24, d0 +vrshr.u16 q12, q15, #8 +vrshr.u16 q11, q3, #8 +vrshr.u16 q10, q2, #8 +vraddhn.u16 d16, q15, q12 +vraddhn.u16 d19, q3, q11 +vraddhn.u16 d18, q2, q10 +.endm + +.macro pixman_composite_src_n_8_0565_process_pixblock_tail +/* convert to r5g6b5 */ +vshll.u8q14, d16, #8 +vshll.u8q8, d19, #8 +vshll.u8q9, d18, #8 +vsri.u16q14, q8, #5 +vsri.u16q14, q9, #11 +.endm It's remarkable how much NEON can do with so few instructions. @@ -90,6 +90,8 @@ PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, over___, uint32_t, 1, uint32_t, 1, uint32_t, 1) PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, over__8_0565, uint32_t, 1, uint8_t, 1, uint16_t, 1) +PIXMAN_ARM_BIND_FAST_PATH_SRC_MASK_DST (neon, src__8_0565, +uint32_t, 1, uint8_t, 1, uint16_t, 1) void pixman_composite_src_n_8_asm_neon (int32_t w, @@ -198,6 +200,8 @@ pixman_blt_neon (uint32_t *src_bits, static const pixman_fast_path_t arm_neon_fast_paths[] = { +PIXMAN_STD_FAST_PATH (SRC, a8r8g8b8, a8, r5g6b5, neon_composite_src__8_0565), +PIXMAN_STD_FAST_PATH (SRC, a8b8g8r8, a8, b5g6r5, neon_composite_src__8_0565), It seems like this fast path will work as src_x888_8_0565 as well since the alpha channel is completely ignored. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman