Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888

2015-09-10 Thread Oded Gabbay
On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay  wrote:
>
> On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka
>  wrote:
> > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz,
> > Gentoo ppc (32-bit userland) gave the following results:
> >
> > before:  over_n_ =  L1: 147.47  L2: 205.86  M:121.07
> > after:   over_n_ =  L1: 287.27  L2: 261.09  M:133.48
> >
> > Signed-off-by: Siarhei Siamashka 
> > ---
> >  pixman/pixman-vmx.c |   54 
> > +++
> >  1 files changed, 54 insertions(+), 0 deletions(-)
> >
> > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c
> > index a9bd024..9e551b3 100644
> > --- a/pixman/pixman-vmx.c
> > +++ b/pixman/pixman-vmx.c
> > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ (pixman_implementation_t 
> > *imp,
> >  }
> >
> >  static void
> > +vmx_composite_over_n_ (pixman_implementation_t *imp,
> > +   pixman_composite_info_t *info)
> > +{
> > +PIXMAN_COMPOSITE_ARGS (info);
> > +uint32_t *dst_line, *dst;
> > +uint32_t src, ia;
> > +int  i, w, dst_stride;
> > +vector unsigned int vdst, vsrc, via;
> > +
> > +src = _pixman_image_get_solid (imp, src_image, 
> > dest_image->bits.format);
> > +
> > +if (src == 0)
> > +   return;
> > +
> > +PIXMAN_IMAGE_GET_LINE (
> > +   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1);
> > +
> > +vsrc = (vector unsigned int){src, src, src, src};
> > +via = negate (splat_alpha (vsrc));
> If we will use the over function (see my next comment), we need to
> remove the negate() from the above statement, as it is done in the
> over function.
>
> > +ia = ALPHA_8 (~src);
> > +
> > +while (height--)
> > +{
> > +   dst = dst_line;
> > +   dst_line += dst_stride;
> > +   w = width;
> > +
> > +   while (w && ((uintptr_t)dst & 15))
> > +   {
> > +   uint32_t d = *dst;
> > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > +   *dst++ = d;
> > +   w--;
> > +   }
> > +
> > +   for (i = w / 4; i > 0; i--)
> > +   {
> > +   vdst = pix_multiply (load_128_aligned (dst), via);
> > +   save_128_aligned (dst, pix_add (vsrc, vdst));
>
> Instead of the above two lines, I would simply use the over function
> in vmx, which does exactly that. So:
> vdst = over(vsrc, via, load_128_aligned(dst))
> save_128_aligned (dst, vdst);
>
> I prefer this as it reuses an existing function which helps
> maintainability, and using it has no impact on performance.
>
> > +   dst += 4;
> > +   }
> > +
> > +   for (i = w % 4; --i >= 0;)
> > +   {
> > +   uint32_t d = dst[i];
> > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > +   dst[i] = d;
> > +   }
> > +}
> > +}
> > +
> > +static void
> >  vmx_composite_over__ (pixman_implementation_t *imp,
> > pixman_composite_info_t *info)
> >  {
> > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP (vmx___normal_OVER,
> >
> >  static const pixman_fast_path_t vmx_fast_paths[] =
> >  {
> > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, 
> > vmx_composite_over_n_),
> > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, 
> > vmx_composite_over_n_),
> >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
> > vmx_composite_over__),
> >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
> > vmx_composite_over__),
> >  PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
> > vmx_composite_over__),
> > --
> > 1.7.8.6
> >
>
> Indeed, this implementation is much better than what I did.
> Apparently, converting sse2 to vmx calls isn't the optimal way.
> On my POWER8 machine, I get:
>
> reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
> L1  572.29  1539.47 +169.00%
> L2  1038.08  1549.04 +49.22%
> M  1104.1  1522.22 +37.87%
> HT  447.45  676.32 +51.15%
> VT  520.82  764.82 +46.85%
> R  407.92  570.54 +39.87%
> RT  148.9  208.77 +40.21%
> Kops/s  1100  1418 +28.91%
>
> So, assuming the change above, this patch is:
>
> Reviewed-by: Oded Gabbay 


Hi Siarhei,

After I fixed my cairo setup (See
http://lists.freedesktop.org/archives/pixman/2015-September/003987.html),
I went and re-tested your patch with cairo trimmed benchmark against
current pixman master.
Unfortunately, it gives a minor slowdown:

Slowdowns
=
t-firefox-scrolling  1232.30 -> 1295.75 :  1.05x slowdown

even if I apply your patch over my latest patch-set (that was inspired
by your patch), I still get a slowdown, albeit in 

Re: [Pixman] [PATCH 3/3] test: Add cover-test v4

2015-09-10 Thread Pekka Paalanen
On Wed, 09 Sep 2015 18:31:42 +0100
"Ben Avison"  wrote:

> On Wed, 09 Sep 2015 09:37:41 +0100, Pekka Paalanen  
> wrote:
> > I think we need some indication whether cover-test runs with or without
> > fencing. So far I have thought that if fence-image-self-test is
> > skipped, then cover-test can only run without fencing. If
> > fence-image-self-test is not skipped, then cover-test uses fencing if
> > it is not skipped.
> >
> > It's perhaps a bit too subtle.
> 
> Too subtle for me :)
> 
> > Maybe cover-test should have a single printf telling if it is fenced or
> > not? That would show up on old autotools, but on new ones you have to
> > go look in the logs anyway.
> >
> > Maybe it would be most obvious if cover-test either always used fencing
> > or skipped. We'd lose the CRC check on too-large-page systems, but at
> > least if we see it as a PASS, we can be sure it used fencing. How's that?
> 
> Since one test is compile-time (availability of fencing) and the other is
> runtime (reading the page size) I admit it's easier to settle on skipping
> the test in both cases  - the alternative would need to be a layer of
> runtime abstraction between fenced and non-fenced images.

Yup, skip in both cases would at least be obvious.

> Perhaps a compromise is to
> 
> a) skip the test if the page size is too large (i.e. treat this as an
> error condition, until someone is motivated to either abstract the fence
> image code so it can be disabled at runtime, or to rework it to support
> larger page sizes)

So a skip or an error? Error wouldn't be nice for Oded, since he
wouldn't be able to have 'make check' pass on his PPC boxes. Would be
rude to dump this on him.

> 
> b) printf a warning iff fencing isn't available (a bit like the way the
> PIXMAN_DISABLE parser doesn't feel the need to list the implementations
> that aren't being skipped)

I don't think we can see the warning in buildbot logs.

Anyway, the case we are considering here is when fencing is not
available. I cannot tell what platform that would be, I assume it to be
very rare. IMHO skipping the whole cover-test on such platforms is not
that bad. And it's more an OS feature than CPU instruction set - all
asm paths we have someone can test on fence-supporting platforms, right?

I'll go with the skip, and send v5 just in case.


Thanks,
pq


pgp4eU_5EqZZw.pgp
Description: OpenPGP digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 0/4] More VMX enhancements

2015-09-10 Thread Pekka Paalanen
On Thu, 10 Sep 2015 11:55:56 +0300
Oded Gabbay  wrote:

> On Mon, Sep 7, 2015 at 2:09 PM, Oded Gabbay  wrote:
> 
> > On Mon, Sep 7, 2015 at 2:04 PM, Pekka Paalanen 
> > wrote:
> > > On Sun,  6 Sep 2015 18:27:07 +0300
> > > Oded Gabbay  wrote:
> > >
> > >> This patch-set contains optimizations for two existing VMX fast-paths
> > and a new
> > >> VMX fast-path function.
> > >>
> > >> The optimization ideas came from Siarhei's recent implementation of
> > over_n_
> > >> VMX fast path (see
> > http://lists.freedesktop.org/archives/pixman/2015-September/003951.html).
> > >>
> > >> The new function I added is actually one that I already implemented a
> > couple
> > >> of months ago, but it produced conflicting results regarding the
> > performance.
> > >> However, I now optimized it and it now shows considerable performance
> > >> improvement over the non-vmx path.
> > >>
> > >> The last patch removes many helper functions that caused the less than
> > stellar
> > >> performance the current fast-paths provide. I removed them as I don't
> > want
> > >> anyone to try and use them, because there are much better alternatives,
> > as
> > >> I've demonstrated with this patch-set.
> > >>
> > >> Thanks,
> > >>
> > >>   Oded
> > >>
> > >> Oded Gabbay (4):
> > >>   vmx: optimize scaled_nearest_scanline_vmx___OVER
> > >>   vmx: optimize vmx_composite_over_n___ca
> > >>   vmx: implement fast path vmx_composite_over_n_8_
> > >>   vmx: Remove unused expensive functions
> > >>
> > >>  pixman/pixman-vmx.c | 439
> > ++--
> > >>  1 file changed, 150 insertions(+), 289 deletions(-)
> > >>
> > >
> > > Hi Oded,
> > >
> > > nice diffstat. :-)
> > >
> > > This series is:
> > > Acked-by: Pekka Paalanen 
> > >
> > > I did notice a few minor issues. Patch 1 has a dereference before
> > > NULL-check, and you sometimes forget the space before an opening
> > > parenthesis.
> > >
> > > I suppose there is no danger of regressing operations you didn't
> > > touch? ;-)
> > >
> > >
> > > Thanks,
> > > pq
> >
> > HI Pekka,
> > I run cario benchmark (trimmed) and there was *no* regression.
> > I don't think optimizing some fast-paths affects other, non-related,
> > fast-paths. And, of course, I don't think it has *any* impact on non
> > POWER systems.
> > However, if someone thinks of a specific other function I need to
> > check for regression, I'm open for suggestions :)
> >
> >  Oded
> >
> 
> ​It bugged me that there was no change, neither up nor down in cairo
> benchmark.
> So I rechecked it and I had a wrong setup - cairo used the system-installed
> pixman instead of my pixman.
> 
> After fixing that, I saw several modest speedups for this patch series:
> 
> Speedups
> 
> imaget-firefox-scrolling  1232.30 (1237.81 0.40%) -> 1080.17
> (1097.06 0.99%):  1.14x speedup
> imaget-gnome-terminal-vim  613.86 (615.04 0.12%) -> 549.73 (551.32
> 0.13%):  1.12x speedup
> imaget-evolution  405.54 (412.06 0.81%) -> 370.57 (379.11 1.89%):
>  1.09x speedup
> imaget-gvim  653.02 (655.16 0.16%) -> 615.31 (618.40 1.68%):  1.06x
> speedup
> imaget-firefox-talos-gfx  919.31 (926.31 0.36%) -> 867.05 (870.01
> 0.35%):  1.06x speedup
> ​
> I'll add it into the last commit of this patch-set for future references.

Paranoia pays off!


Cheers,
pq


pgp9RdYLOWQ75.pgp
Description: OpenPGP digital signature
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags

2015-09-10 Thread Pekka Paalanen
On Wed, 09 Sep 2015 18:09:04 +0100
"Ben Avison"  wrote:

> On Wed, 09 Sep 2015 11:42:26 +0100, Pekka Paalanen 
> wrote:
> 
> > On Wed, 9 Sep 2015 09:39:07 +0300
> > Oded Gabbay  wrote:
> >
> >> On Fri, Sep 4, 2015 at 5:09 AM, Ben Avison  wrote:
> >> > Many bilinear fast paths currently assume that COVER_CLIP_BILINEAR is
> >> > not set when the transformed upper coordinate corresponds exactly
> >> > to the last row/pixel in source space. This is due to a detail of
> >> > many current implementations - they assume they can always load the
> >> > pixel after the one you get by dividing the coordinate by 2^16 with
> >> > rounding to -infinity.
> >> >
> >> > To avoid causing these implementations to exceed array bounds in
> >> > these cases, the COVER_CLIP_BILINEAR flag retains this feature in
> >> > this patch. Subsequent patches deal with removing this assumption,
> >> > to enable cover fast paths to be used in the maximum number of cases.
> >
> > I'm not sure if these two paragraphs about bilinear are really
> > necessary here. The essence is that we remove the extra margins from
> > both NEAREST (not 8*eps, but 7*eps and 9*eps, see below) and BILINEAR
> > (8*eps), without changing them in any other way.
> >
> > The subsequent patches are still under discussion, and we have to see
> > how they work out.
> 
> I felt that since the point of the patch is about getting the thresholds
> correct to the exact multiple of pixman_fixed_e, and I'd been asked for a
> "strict proof of correctness" in the commit message, then I felt this had
> to be said - after all, I don't think this behaviour is documented
> anywhere else.
> 
> Just witness the confusion about the issue - even Søren posted to say that
> when the coordinate was exactly coincident with a source pixel that the
> pixel with the next-lowest coordinate was loaded and multiplied by 0,
> before correcting himself shortly afterwards to say it was the
> next-highest one.
> 
> I hope I've demonstrated, via three completely different approaches, that
> there's no reason why the upper bound has to be set the way it is - you
> don't even have to sacrifice the ability to do SIMD loads. Even with the
> existing implementations, most of those that do load pixels from the
> next-highest coordinate when we end aligned on a source pixel in the
> horizontal direction, don't do the same in the vertical direction.
> 
> I'm also playing a longer game - but because people get overfaced with
> very long patch series, I'm holding back on some others that build on this
> change. Eventually I re-use the same assembly functions in a context where
> they need to adhere to my tighter limits, but I expect the pool of people
> able to review the assembly code will be pretty small, so the ability to
> demonstrate that they obey the limits when used for a COVER operation is
> very useful.
> 
> Granted, perhaps the last sentence you quoted probably belonged after the
> --- separator, especially if the whole series doesn't get pushed at once,
> but I really want to see all of them to go through. I did originally post
> the whole thing as a single patch, after all (mind you, I've spent a lot
> of time working on bilinear scaling for ARMv6 and ARMv7, and this feels
> like a really tiny and obvious part of it to me now...)

Hi Ben,

you're right, documenting this is important. However, I think this
particular patch is not the best place, and here is why.

When we recently discussed this, both I and Siarhei had the opinion
that this needs to be done in two separate steps:

1. Remove the useless 8e fuzz margins.

2. Change the meaning of the COVER_CLIP_BILINEAR flag so that it is no
   longer safe for fetchers to always fetch a 2x2 pixel block.

In that sense, patch 1/4 of this series is step 1. Patches 2, 3 and 4
are step 2, which I assumed to be a follow-up patch series.

We're still pending on cover-test, too, so we're getting quite a lot
ahead of ourselves. Thankfully cover-test is essentially sorted by now.

It seems to me the old maintainers processed each patch series as a
single unit. It makes sense because a series is usually interdependent.
I am more liberal in that, I can accept patches from the top or even
the middle if there are no dependencies. That's why I am looking only
at the step 1 patch right now and making sure it is what we want and
get it merged.

That is why I consider talking about the zero-weight pixel off-topic
for *this one* patch. It is very much on topic on the three other
patches that focus on the definition of COVER in the BILINEAR case,
specifically on the point of zero-weight input pixels.

All in due time, IMHO. At least we are moving now. :-)

> > All the above otherwise looks good to me, but there is one more place
> > that has 8*eps, analyze_extent() in pixman.c has:
> >
> > if (!compute_transformed_extents (transform, _extents, 
> > ))
> > return FALSE;
> 

Re: [Pixman] [PATCH] vmx: implement fast path vmx_composite_over_n_8888

2015-09-10 Thread Siarhei Siamashka
On Thu, 10 Sep 2015 12:27:18 +0300
Oded Gabbay  wrote:

> On Sat, Sep 5, 2015 at 10:03 PM, Oded Gabbay  wrote:
> >
> > On Fri, Sep 4, 2015 at 3:39 PM, Siarhei Siamashka
> >  wrote:
> > > Running "lowlevel-blt-bench over_n_" on Playstation3 3.2GHz,
> > > Gentoo ppc (32-bit userland) gave the following results:
> > >
> > > before:  over_n_ =  L1: 147.47  L2: 205.86  M:121.07
> > > after:   over_n_ =  L1: 287.27  L2: 261.09  M:133.48
> > >
> > > Signed-off-by: Siarhei Siamashka 
> > > ---
> > >  pixman/pixman-vmx.c |   54 
> > > +++
> > >  1 files changed, 54 insertions(+), 0 deletions(-)
> > >
> > > diff --git a/pixman/pixman-vmx.c b/pixman/pixman-vmx.c
> > > index a9bd024..9e551b3 100644
> > > --- a/pixman/pixman-vmx.c
> > > +++ b/pixman/pixman-vmx.c
> > > @@ -2745,6 +2745,58 @@ vmx_composite_src_x888_ 
> > > (pixman_implementation_t *imp,
> > >  }
> > >
> > >  static void
> > > +vmx_composite_over_n_ (pixman_implementation_t *imp,
> > > +   pixman_composite_info_t *info)
> > > +{
> > > +PIXMAN_COMPOSITE_ARGS (info);
> > > +uint32_t *dst_line, *dst;
> > > +uint32_t src, ia;
> > > +int  i, w, dst_stride;
> > > +vector unsigned int vdst, vsrc, via;
> > > +
> > > +src = _pixman_image_get_solid (imp, src_image, 
> > > dest_image->bits.format);
> > > +
> > > +if (src == 0)
> > > +   return;
> > > +
> > > +PIXMAN_IMAGE_GET_LINE (
> > > +   dest_image, dest_x, dest_y, uint32_t, dst_stride, dst_line, 1);
> > > +
> > > +vsrc = (vector unsigned int){src, src, src, src};
> > > +via = negate (splat_alpha (vsrc));
> > If we will use the over function (see my next comment), we need to
> > remove the negate() from the above statement, as it is done in the
> > over function.
> >
> > > +ia = ALPHA_8 (~src);
> > > +
> > > +while (height--)
> > > +{
> > > +   dst = dst_line;
> > > +   dst_line += dst_stride;
> > > +   w = width;
> > > +
> > > +   while (w && ((uintptr_t)dst & 15))
> > > +   {
> > > +   uint32_t d = *dst;
> > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > > +   *dst++ = d;
> > > +   w--;
> > > +   }
> > > +
> > > +   for (i = w / 4; i > 0; i--)
> > > +   {
> > > +   vdst = pix_multiply (load_128_aligned (dst), via);
> > > +   save_128_aligned (dst, pix_add (vsrc, vdst));
> >
> > Instead of the above two lines, I would simply use the over function
> > in vmx, which does exactly that. So:
> > vdst = over(vsrc, via, load_128_aligned(dst))
> > save_128_aligned (dst, vdst);
> >
> > I prefer this as it reuses an existing function which helps
> > maintainability, and using it has no impact on performance.
> >
> > > +   dst += 4;
> > > +   }
> > > +
> > > +   for (i = w % 4; --i >= 0;)
> > > +   {
> > > +   uint32_t d = dst[i];
> > > +   UN8x4_MUL_UN8_ADD_UN8x4 (d, ia, src);
> > > +   dst[i] = d;
> > > +   }
> > > +}
> > > +}
> > > +
> > > +static void
> > >  vmx_composite_over__ (pixman_implementation_t *imp,
> > > pixman_composite_info_t *info)
> > >  {
> > > @@ -3079,6 +3131,8 @@ FAST_NEAREST_MAINLOOP (vmx___normal_OVER,
> > >
> > >  static const pixman_fast_path_t vmx_fast_paths[] =
> > >  {
> > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, a8r8g8b8, 
> > > vmx_composite_over_n_),
> > > +PIXMAN_STD_FAST_PATH (OVER, solid,null, x8r8g8b8, 
> > > vmx_composite_over_n_),
> > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, a8r8g8b8, 
> > > vmx_composite_over__),
> > >  PIXMAN_STD_FAST_PATH (OVER, a8r8g8b8, null, x8r8g8b8, 
> > > vmx_composite_over__),
> > >  PIXMAN_STD_FAST_PATH (OVER, a8b8g8r8, null, a8b8g8r8, 
> > > vmx_composite_over__),
> > > --
> > > 1.7.8.6
> > >
> >
> > Indeed, this implementation is much better than what I did.
> > Apparently, converting sse2 to vmx calls isn't the optimal way.
> > On my POWER8 machine, I get:
> >
> > reference memcpy speed = 24764.8MB/s (6191.2MP/s for 32bpp fills)
> > L1  572.29  1539.47 +169.00%
> > L2  1038.08  1549.04 +49.22%
> > M  1104.1  1522.22 +37.87%
> > HT  447.45  676.32 +51.15%
> > VT  520.82  764.82 +46.85%
> > R  407.92  570.54 +39.87%
> > RT  148.9  208.77 +40.21%
> > Kops/s  1100  1418 +28.91%
> >
> > So, assuming the change above, this patch is:
> >
> > Reviewed-by: Oded Gabbay 
> 
> 
> Hi Siarhei,

Hi,
 
> After I fixed my cairo setup (See
> 

Re: [Pixman] [PATCH 1/4] Change conditions for setting FAST_PATH_SAMPLES_COVER_CLIP flags

2015-09-10 Thread Bill Spitzak
On Thu, Sep 10, 2015 at 9:35 AM, Ben Avison  wrote:

> On Thu, 10 Sep 2015 12:46:50 +0100, Pekka Paalanen 
> wrote:
>
>> you're right, documenting this is important. However, I think this
>> particular patch is not the best place, and here is why.
>>
>> When we recently discussed this, both I and Siarhei had the opinion
>> that this needs to be done in two separate steps:
>>
>> 1. Remove the useless 8e fuzz margins.
>>
>> 2. Change the meaning of the COVER_CLIP_BILINEAR flag so that it is no
>>longer safe for fetchers to always fetch a 2x2 pixel block.
>>
>
This sounds exactly right to me.

Another way to say it is that it is safe to fetch all pixels that have a
non-zero weight. Certain bilinear positions will produce 2x2 blocks that
contain up to 3 pixels with a zero weight, those pixels may be outside the
source clip even when this flag is on.


> I sense we're taking a slightly different perspective on the problem
> here. I don't really see these two steps as different in spirit. In the
> first one, the flag calculation was allowing extra space to permit the
> corresponding fast path to be a bit sloppy with its coordinate
> transformations. In the second one, the flag calculation was allowing
> extra space to permit the corresponding fast path to be a bit sloppy with
> loading data that it doesn't need. Apart from meaning that less efficient
> fast paths sometimes get used, this also means a lot of unnecessary cache
> lines get loaded in many cases, which has got to hurt performance.
>

I believe you are confusing the implementation of the bilinear with this
flag.

For a coordinate of .5, pixel 0 will have a weight of 1.0, and all other
pixels will have a weight of 0.0. This includes both the pixel at -1 and
the pixel at 1. They both have a weight of zero.

It is useful for a bilinear algorithm to think about pairs of pixels, and
depending on the implementation the pair produced for the .5 coordinate may
be pixels 0 and 1, or pixels -1 and 0.

However this does not change the setting of COVER_CLIP_BILINEAR! No pixel
has changed it's weight, both pixel -1 and 1 remain with a weight of zero,
no matter which is chosen for the pair.

I think you are right that there are reasons for the bilinear
*implementation* to round down, so the weighted-zero pixel is at the lower
coordinate. This is because the special code to avoid fetching it can be at
the start of the loop, rather than at the end. But this has nothing to do
with COVER_CLIP_BILINEAR and should not be part of what sets it.

And it is true that there are a lot of broken bilinear fetches that do read
the weight-zero pixel. These need to be fixed. But this still does not
change the meaning of the flag.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman