Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Sun, 20 Sep 2015 15:47:01 +0300 Oded Gabbaywrote: > On Sun, Sep 20, 2015 at 1:22 PM, Oded Gabbay wrote: > > On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanen wrote: > >> The series as benchmarked is available at: > >> https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1 > >> > >> The benchmark points are: > >> > >> - baseline: "test: Add cover-test v5" > >> > >> - cleanup: "affine-bench: remove 8e margin from COVER area" > >> Includes the 8e extra safety margin removal. > >> > >> - tight: "pixman-fast-path: Make bilinear cover fetcher use > >> COVER_CLIP_TIGHT flag" > >> Includes all the COVER_CLIP_BILINEAR related patches from > >> Ben. > > I decided to also run the cairo trimmed benchmarks on my POWER8 > > ppc64le and POWER7 ppc64. > > To make things clearer, I used the same definitions for "baseline", > > "cleanup" and "tight". > > > > I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4 > > I run the benchmarks doing (it's from inside a script): > > "cairo-perf-trace benchmark -r -i8 > ../${__output}.perf" > Please disregard the email above - the results there are bogus because > my server is inside a VM!!! > > After I sent my email, I run the cleanup version 5 times in a row. The > first 4 times were identical, but the 5th time showed major slowdown. > > I also run the tight version 5 times in a row. The 2nd run showed > major improvement over the 1st run, the 3rd run showed an additional > improvement on top of that, and the 4th and 5th runs were identical to > the 3rd run. > > On the one hand, I'm not running anything else on this server. On the > other hand, this is a VM, so maybe the host machine is > over-subscribed. > > I then went to test it on a physical server without VM (the ppc64 > version). I run cleanup and tight 5 times, and all results were > identical. So I think the issue is definitely with the VM. > > And as for the results, I'm happy to say that there is no change > between cleanup and tight :) Hi Oded, heh, yeah, double-checking is worth it. :-) Thank you for confirming my results, even though that's not exactly good news for the patches. ;-) Not bad either. I will be trying to record a Cairo trace that would represent the usage patterns we saw in Raspbian which is the root of Ben's efforts. Thanks, pq pgpwNxZCi5PLK.pgp Description: OpenPGP digital signature ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanenwrote: > On Fri, 4 Sep 2015 03:09:20 +0100 > Ben Avison wrote: > >> As discussed in >> http://lists.freedesktop.org/archives/pixman/2015-August/003905.html >> >> the 8 * pixman_fixed_e adjustment which was applied to the transformed >> coordinates is a legacy of rounding errors which used to occur in old >> versions of Pixman, but which no longer apply. For any affine transform, >> you are now guaranteed to get the same result by transforming the upper >> coordinate as though you transform the lower coordinate and add (size-1) >> steps of the increment in source coordinate space. No projective >> transform routines use the COVER_CLIP flags, so they cannot be affected. > > Hi all, > > as we doing these things not just for cleaning up but with the premise > that there are missed optimization opportunities, I have benchmarked > this patch series. > > The series as benchmarked is available at: > https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1 > > The benchmark points are: > > - baseline: "test: Add cover-test v5" > > - cleanup: "affine-bench: remove 8e margin from COVER area" > Includes the 8e extra safety margin removal. > > - tight: "pixman-fast-path: Make bilinear cover fetcher use > COVER_CLIP_TIGHT flag" > Includes all the COVER_CLIP_BILINEAR related patches from > Ben. > > Note, that ssse3_iters[] in pixman-ssse3.c still contains > FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR. > > Cairo version is 1.14.2 for the benchmarks, which are run like: > $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt > > I tried both "image" and "image16" on an x86_64 (Sandybridge), and got > no performance differences in the trimmed-cairo-traces set in either > baseline/cleanup or cleanup/tight. > > I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I > verified I am really running what I think I am by editing Pixman and > seeing the effect in the benchmark. > > Am I missing something? > > I thought we would see at least some improvements also on x86_64 when > comparing cleanup/tight. > > Should I run the same on rpi2? Or is the best effect on the fast paths > we haven't merged yet? > > I'd rather not run this on rpi1 due to the function address / > performance quirk, doing the required iterations there would probably > take too long and I'd need to rearrange the result files too. > > Or maybe our test set is not enough? I recall having some problems with > that in the past. > > So, I patched Pixman to yell whenever TIGHT is set but > COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and > t-firefox-fishtank hit it with source image, each twice per iteration. > Definitely seems like this test set is not hitting the cases we are > interested in. I think I need to dig up our old performance profiles > and see if we could record a trace from a real app that would hit these > cases, now that Cairo's trace recording is supposedly fixed. > > The removal of the 8e extra safety margins shouldn't need performance > profiles as justification, but for the tightening patches they'd be > nice to have, especially since the usefulness of them has been > questioned. > > > Thanks, > pq Hi Pekka, Ben I decided to also run the cairo trimmed benchmarks on my POWER8 ppc64le and POWER7 ppc64. To make things clearer, I used the same definitions for "baseline", "cleanup" and "tight". I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4 I run the benchmarks doing (it's from inside a script): "cairo-perf-trace benchmark -r -i8 > ../${__output}.perf" First of all, diff between baseline/cleanup showed no change, in both platforms, so that's good :) Now, for cleanup/tight: With POWER8 ppc64le, I got the following very modest boost: imaget-firefox-asteroids 483.10 (523.85 3.49%) -> 452.84 (480.34 3.16%): 1.07x speedup image t-firefox-chalkboard 691.38 (692.09 0.06%) -> 653.07 (654.60 0.26%): 1.06x speedup However, with POWER7 ppc64, I got the following regressions, which is quite bad: imaget-firefox-asteroids 545.55 (559.64 1.79%) -> 781.07 (791.83 2.33%): 1.43x slowdown imaget-firefox-scrolling 1185.45 (1186.02 0.05%) -> 1748.76 (1754.85 0.20%): 1.48x slowdown image t-firefox-chalkboard 1444.76 (1464.55 0.88%) -> 2315.76 (2333.10 0.34%): 1.60x slowdown imaget-firefox-paintball 681.43 (682.28 0.10%) -> 1138.15 (1140.19 0.08%): 1.67x slowdown image t-firefox-canvas 890.14 (890.90 0.10%) -> 1492.83 (1493.51 0.20%): 1.68x slowdown image t-firefox-canvas-swscroll 1369.94 (1371.66 0.05%) -> 2297.53 (2305.70 0.18%): 1.68x slowdown imaget-xfce4-terminal-a1 829.35 (832.39 0.16%) -> 1392.50 (1414.69 1.08%): 1.68x slowdown image t-firefox-fishbowl 3112.93 (3114.13 0.02%) -> 5227.18 (5229.05 0.03%): 1.68x slowdown image t-poppler
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Sun, Sep 20, 2015 at 1:22 PM, Oded Gabbaywrote: > On Wed, Sep 16, 2015 at 2:25 PM, Pekka Paalanen wrote: >> On Fri, 4 Sep 2015 03:09:20 +0100 >> Ben Avison wrote: >> >>> As discussed in >>> http://lists.freedesktop.org/archives/pixman/2015-August/003905.html >>> >>> the 8 * pixman_fixed_e adjustment which was applied to the transformed >>> coordinates is a legacy of rounding errors which used to occur in old >>> versions of Pixman, but which no longer apply. For any affine transform, >>> you are now guaranteed to get the same result by transforming the upper >>> coordinate as though you transform the lower coordinate and add (size-1) >>> steps of the increment in source coordinate space. No projective >>> transform routines use the COVER_CLIP flags, so they cannot be affected. >> >> Hi all, >> >> as we doing these things not just for cleaning up but with the premise >> that there are missed optimization opportunities, I have benchmarked >> this patch series. >> >> The series as benchmarked is available at: >> https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1 >> >> The benchmark points are: >> >> - baseline: "test: Add cover-test v5" >> >> - cleanup: "affine-bench: remove 8e margin from COVER area" >> Includes the 8e extra safety margin removal. >> >> - tight: "pixman-fast-path: Make bilinear cover fetcher use >> COVER_CLIP_TIGHT flag" >> Includes all the COVER_CLIP_BILINEAR related patches from >> Ben. >> >> Note, that ssse3_iters[] in pixman-ssse3.c still contains >> FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR. >> >> Cairo version is 1.14.2 for the benchmarks, which are run like: >> $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt >> >> I tried both "image" and "image16" on an x86_64 (Sandybridge), and got >> no performance differences in the trimmed-cairo-traces set in either >> baseline/cleanup or cleanup/tight. >> >> I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I >> verified I am really running what I think I am by editing Pixman and >> seeing the effect in the benchmark. >> >> Am I missing something? >> >> I thought we would see at least some improvements also on x86_64 when >> comparing cleanup/tight. >> >> Should I run the same on rpi2? Or is the best effect on the fast paths >> we haven't merged yet? >> >> I'd rather not run this on rpi1 due to the function address / >> performance quirk, doing the required iterations there would probably >> take too long and I'd need to rearrange the result files too. >> >> Or maybe our test set is not enough? I recall having some problems with >> that in the past. >> >> So, I patched Pixman to yell whenever TIGHT is set but >> COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and >> t-firefox-fishtank hit it with source image, each twice per iteration. >> Definitely seems like this test set is not hitting the cases we are >> interested in. I think I need to dig up our old performance profiles >> and see if we could record a trace from a real app that would hit these >> cases, now that Cairo's trace recording is supposedly fixed. >> >> The removal of the 8e extra safety margins shouldn't need performance >> profiles as justification, but for the tightening patches they'd be >> nice to have, especially since the usefulness of them has been >> questioned. >> >> >> Thanks, >> pq > > Hi Pekka, Ben > > I decided to also run the cairo trimmed benchmarks on my POWER8 > ppc64le and POWER7 ppc64. > To make things clearer, I used the same definitions for "baseline", > "cleanup" and "tight". > > I used Cairo version 1.14.3, actually from git with head set to 6f7a9b4 > I run the benchmarks doing (it's from inside a script): > "cairo-perf-trace benchmark -r -i8 > ../${__output}.perf" > > First of all, diff between baseline/cleanup showed no change, in both > platforms, so that's good :) > > Now, for cleanup/tight: > > With POWER8 ppc64le, I got the following very modest boost: > > imaget-firefox-asteroids 483.10 (523.85 3.49%) -> 452.84 > (480.34 3.16%): 1.07x speedup > image t-firefox-chalkboard 691.38 (692.09 0.06%) -> 653.07 > (654.60 0.26%): 1.06x speedup > > However, with POWER7 ppc64, I got the following regressions, which is quite > bad: > > imaget-firefox-asteroids 545.55 (559.64 1.79%) -> 781.07 > (791.83 2.33%): 1.43x slowdown > imaget-firefox-scrolling 1185.45 (1186.02 0.05%) -> 1748.76 > (1754.85 0.20%): 1.48x slowdown > image t-firefox-chalkboard 1444.76 (1464.55 0.88%) -> 2315.76 > (2333.10 0.34%): 1.60x slowdown > imaget-firefox-paintball 681.43 (682.28 0.10%) -> 1138.15 > (1140.19 0.08%): 1.67x slowdown > image t-firefox-canvas 890.14 (890.90 0.10%) -> 1492.83 > (1493.51 0.20%): 1.68x slowdown > image t-firefox-canvas-swscroll 1369.94 (1371.66 0.05%) -> 2297.53 > (2305.70 0.18%): 1.68x slowdown > image
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Fri, 18 Sep 2015 19:35:02 +0100, Bill Spitzakwrote: On Wed, Sep 16, 2015 at 7:30 AM, Ben Avison wrote: Just by thinking things through, I realised that we would regularly fail to hit COVER paths if Pixman's caller set the scale factors such that the centre of the outermost destination pixels aligned with the centre of the outermost source pixels. There has been some argument about whether this is representative of how Pixman should be used. I happen to think this is a perfectly reasonable thing to expect Pixman to support, but there are other models you can follow, notably setting the scale factors such that the outer edges of the outermost destination pixels align to the outer edges of the outermost source pixels. If the Cairo traces are using this latter model, then it's understandable if you aren't hitting the edge case that I'm concerned about very often. There are currently very good reasons for clients to use the first model: it is a method of removing undesirable "fuzzy edges" from zoomed- in images. The alternative of scaling the outer edge, while not hitting this fast path, will not hit the other fast path either! (the other fast path is the one that allows a zero-weighted pixel at one end to be read from the image). I'm generally in agreement with you. For the sake of completeness though, I should point out that when zooming *out*, both models will match the conditions for COVER paths. But then again, when you're zooming out, particularly by a large factor, it is more appropriate to use a multi-tap filter rather than bilinear scaling, so maybe that's not much of an argument. NOTE: the method of scaling the centers of the pixels is generally wrong. The math is wonky: does scaling an image by 2 produce a 2*w-1 image, or does it spread the pixels slightly more than 2 apart? Programs are going to disagree. I'd suggest that if the highest layer software specifies a particular scale factor, then that's what should be used, but if it requests a particular destination size (e.g. "full screen") then that should be the primary consideration even if it means the corresponding scale factor is not the "obvious" one. But this is all really a concern for the layers above Pixman, because the starting fractional position and per-pixel increment is part of the Pixman API. Pixman just has to do whatever has been asked of it. Just because some of the ways the layers above Pixman do their calculations are impossible for Pixman to optimise further shouldn't be a reason for Pixman not to optimise the cases that it can do, especially when that set of cases include many that use a scale factor of 1 on one axis. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Wed, Sep 16, 2015 at 7:30 AM, Ben Avisonwrote: Just by thinking things through, I realised that we would regularly fail > to hit COVER paths if Pixman's caller set the scale factors such that the > centre of the outermost destination pixels aligned with the centre of the > outermost source pixels. There has been some argument about whether this > is representative of how Pixman should be used. I happen to think this is > a perfectly reasonable thing to expect Pixman to support, but there are > other models you can follow, notably setting the scale factors such that > the outer edges of the outermost destination pixels align to the outer > edges of the outermost source pixels. If the Cairo traces are using this > latter model, then it's understandable if you aren't hitting the edge > case that I'm concerned about very often. > There are currently very good reasons for clients to use the first model: it is a method of removing undesirable "fuzzy edges" from zoomed-in images. The alternative of scaling the outer edge, while not hitting this fast path, will not hit the other fast path either! (the other fast path is the one that allows a zero-weighted pixel at one end to be read from the image). Therefore I believe there is actual usage of this new fast path, while there is almost zero usage of the old fast path. This is the main reason I think the current flag should be removed, and a new one indicating that all the non-zero samples are inside the image, added. This second one will actually be used by real code. NOTE: the method of scaling the centers of the pixels is generally wrong. The math is wonky: does scaling an image by 2 produce a 2*w-1 image, or does it spread the pixels slightly more than 2 apart? Programs are going to disagree. And you have to alter the math as soon as the scale goes below 1 or you will get fuzzy edges again. And lots of code are going to get coordinates in the resulting image wrong, resulting in annoying problems like overlaid lines not lining up or shifting their alignment from one side to another. I very much believe Cairo should change how CAIRO_EXTEND_NONE is done. It should instead mean that the path is intersected with a quadrilateral that is the transform of the outer edge of the source, then drawing as though using CAIRO_EXTEND_REPEAT. This is what most users of scaling expect, and anybody wanting the previous effect can get it by adding a black pixel to the outer edge of their source surface. Now it is true that this result can be achieved with Cairo right now. The reason for making this the default is because non-experts are unlikely to figure this out. Currently filling a path equal to the source boundary results in a double-mulitplication of the edge pixels, a subtle error that can be very frustrating when graphics gets more complex. Making this error not happen by default would be very helpful. I also believe making this the default will result in faster implementations, as EXTEND_REPEAT fast paths can be used, and this is often directly supported by graphics hardware. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
[Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Fri, 4 Sep 2015 03:09:20 +0100 Ben Avisonwrote: > As discussed in > http://lists.freedesktop.org/archives/pixman/2015-August/003905.html > > the 8 * pixman_fixed_e adjustment which was applied to the transformed > coordinates is a legacy of rounding errors which used to occur in old > versions of Pixman, but which no longer apply. For any affine transform, > you are now guaranteed to get the same result by transforming the upper > coordinate as though you transform the lower coordinate and add (size-1) > steps of the increment in source coordinate space. No projective > transform routines use the COVER_CLIP flags, so they cannot be affected. Hi all, as we doing these things not just for cleaning up but with the premise that there are missed optimization opportunities, I have benchmarked this patch series. The series as benchmarked is available at: https://git.collabora.com/cgit/user/pq/pixman.git/log/?h=cover-benchmark-1 The benchmark points are: - baseline: "test: Add cover-test v5" - cleanup: "affine-bench: remove 8e margin from COVER area" Includes the 8e extra safety margin removal. - tight: "pixman-fast-path: Make bilinear cover fetcher use COVER_CLIP_TIGHT flag" Includes all the COVER_CLIP_BILINEAR related patches from Ben. Note, that ssse3_iters[] in pixman-ssse3.c still contains FAST_PATH_SAMPLES_COVER_CLIP_BILINEAR. Cairo version is 1.14.2 for the benchmarks, which are run like: $ CAIRO_TEST_TARGET=image cairo-perf-trace -r -v -i8 > baseline-image-2.txt I tried both "image" and "image16" on an x86_64 (Sandybridge), and got no performance differences in the trimmed-cairo-traces set in either baseline/cleanup or cleanup/tight. I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I verified I am really running what I think I am by editing Pixman and seeing the effect in the benchmark. Am I missing something? I thought we would see at least some improvements also on x86_64 when comparing cleanup/tight. Should I run the same on rpi2? Or is the best effect on the fast paths we haven't merged yet? I'd rather not run this on rpi1 due to the function address / performance quirk, doing the required iterations there would probably take too long and I'd need to rearrange the result files too. Or maybe our test set is not enough? I recall having some problems with that in the past. So, I patched Pixman to yell whenever TIGHT is set but COVER_CLIP_BILINEAR is not set. Only t-firefox-canvas-swscroll and t-firefox-fishtank hit it with source image, each twice per iteration. Definitely seems like this test set is not hitting the cases we are interested in. I think I need to dig up our old performance profiles and see if we could record a trace from a real app that would hit these cases, now that Cairo's trace recording is supposedly fixed. The removal of the 8e extra safety margins shouldn't need performance profiles as justification, but for the tightening patches they'd be nice to have, especially since the usefulness of them has been questioned. Thanks, pq pgp3rSGxn2kg1.pgp Description: OpenPGP digital signature ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] Benchmarked: [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags
On Wed, 16 Sep 2015 12:25:59 +0100, Pekka Paalanenwrote: I tried both "image" and "image16" on an x86_64 (Sandybridge), and got no performance differences in the trimmed-cairo-traces set in either baseline/cleanup or cleanup/tight. I also tried with PIXMAN_DISABLE=ssse3 and still got no difference. I verified I am really running what I think I am by editing Pixman and seeing the effect in the benchmark. Am I missing something? Well, there has been a long and tangled history behind this change at this point. I'll admit my twin motivations were code cleanup and making it possible to measure the speed of bilinear operations using a benchmarker, but it's also ended up spawning the creation of affine-bench and cover-test, which isn't a bad thing. Just by thinking things through, I realised that we would regularly fail to hit COVER paths if Pixman's caller set the scale factors such that the centre of the outermost destination pixels aligned with the centre of the outermost source pixels. There has been some argument about whether this is representative of how Pixman should be used. I happen to think this is a perfectly reasonable thing to expect Pixman to support, but there are other models you can follow, notably setting the scale factors such that the outer edges of the outermost destination pixels align to the outer edges of the outermost source pixels. If the Cairo traces are using this latter model, then it's understandable if you aren't hitting the edge case that I'm concerned about very often. The useful thing about the scaled-in-one-axis-only example is that on the axis that isn't scaled, there is no dispute about how you set the scale factors, so it sidesteps this argument. Should I run the same on rpi2? Or is the best effect on the fast paths we haven't merged yet? The get a sizeable speed difference, you need two things: 1) a plot geometry that aligns the centres of the high-coordinate pixels of the source and destination 2) a reasonable speed difference between COVER and repeat fast paths or iterators At present, the speed difference will be most marked when you are using the bilinear iterators from pixman-fast-path.c or pixman-ssse3.c, which are both currently restricted to COVER plots of a8r8g8b8 source images only. If you test on a Raspberry Pi 2, then you also need to allow for the fact that the ARMv7 implementation has bilinear fast paths for src__, src__0565, over__ and add__, for most types of image repeat as well as for COVER, so these reduce the applicability of the iterator still further. In my patch series from last year, I added ARMv6 bilinear iterators for COVER plots of a8r8g8b8, x8r8g8b8, r5g6b5 and a8 images. Of course, you won't be seeing the effect of these because they haven't been accepted yet - but when (I hope) they are then having the new COVER_CLIP definition in place should help demonstrate their effectiveness. Basically, these are all just pieces of one big jigsaw puzzle. Or maybe our test set is not enough? I recall having some problems with that in the past. Yes, I know I have a number of fast paths queued up which were not represented at all in the Cairo perf traces, but were being hammered by the Epiphany browser on the Raspberry Pi. I'm not sure how we justify their inclusion if the Cairo perf traces is the only criterion allowed. I hope there will be some flexibility in this respect. If you want a more real-world example of when you might encounter bilinear scaling in one axis only, here's one that I think is plausible. Consider screen grabs of taken of an NTSC SD video at ITU656 sample positions (720x480) - or equally the output of a codec for a video at that resolution. Now try to display it with bilinear scaling at the correct aspect ratio on a display with square pixels: the destination rectangle will likely be chosen to be 640x480, with a vertical pixel increment of 1 and a horizontal pixel increment of 1.125. There's also a reasonable chance that you'll also be wanting to re-plot this 30 times per second, for obvious reasons. You'd hope that this could be achieved using a COVER fast path or iterator, but with the current flag definitions, they can't be used. Ben ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman