Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-29 Thread Jonathan Morton
On Wed, 2010-09-29 at 14:39 +0200, Soeren Sandmann wrote:
> I think the right fix is to use different flags for solids and
> non-solids.

Fine by me.

-- 
--
From: Jonathan Morton
  jonathan.mor...@movial.com


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-29 Thread Soeren Sandmann
Siarhei Siamashka  writes:

> Standard fast paths can be used in the following two cases:
> a) pixels outside image boundaries are not accessed
> b) solid source (special cases of 1x1 image with repeat set)
> And can't be when:
> c) pixels outside image boundaries are accessed (excluding solid case)
> 
> The problem is that there is only a single flag bit to differentiate all
> these three cases at the moment. It means that for each particular repeat 
> type,
> either (a) case is accelerated, or it is (b). But not both.

I think the right fix is to use different flags for solids and
non-solids. The reason it was using the same flags for both is simply
that it was translated more or less directly from an if statement from
back before the flags were introduced.

> > Actually, I don't even think a large scale refactoring is necessary -
> > most of the real fix is just deleting code. The main piece of new
> > logic is the extending of PIXMAN_STD_FAST_PATH to generate different
> > flags for solid and bits images.

Patches to do this follows. The deletion of the simple repeat code is
actually orthogonal to this. I still think it's worth doing though.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-28 Thread Soeren Sandmann
Siarhei Siamashka  writes:

> On Tuesday 28 September 2010 17:24:14 Siarhei Siamashka wrote:
> > The flags in pixman are implemented in such a way that they all have some
> > kind of "positive" meaning, indicating that the operation is going to be
> > more simple than the general case in one way or another. Having more flags
> > set on the initial analysis stage is always good. Forgetting to set some
> > of the flags is safe in the sense that it will not make pixman misbehave,
> > pixman will just run slower because of not taking some fast paths.

To rephrase: the flags should be though of as properties that the
images have.

   If the FAST_PATH_BLAH flag is set, then the image has property
   BLAH.

   If a fast path includes BLAH in its mask, then it is saying "I can
   only run if the image has property BLAH".

For fast paths, that's really all there is to it, bugs in the
PIXMAN_STD_FAST_PATH macro notwithstanding.

So the logic we rely on is:

  BLAH is set  =>  image has property BLAH

But there is no need for the arrow to go the other way because if an
image has property BLAH and the flag isn't set, then we just won't run
that particular fast path.

Ie., the flags have a positive meaning, as you say.

> Tried to actually verify this claim by artificially clearing flags in 
> 'compute_image_info' and running pixman tests. Appears that not everything
> is so simple.
> 
> Some flags are a bit special:
> - NARROW_FORMAT flag needs to be propagated to 'general_composite_rect'
> function because it is used there
> - calculation of NEEDS_WORKAROUND flag also can't be skipped

That is a good point. 

Yes, in the cases you point out (NARROW_FORMAT and NEEDS_WORKAROUND),
we do make use of the negative meaning of the flag. If NARROW_FORMAT
is *not* set, then wide fetchers are used. So, we rely on this:

  BLAH is not set => image does not have property BLAH

which by contraposition gives

  image has property BLAH  =>  BLAH is set

So those two flags really are special in that they *must* be set if
the image actually has the property.

This is actually kind of bad. I don't like special cases, and I hadn't
realized we had one here. But maybe we can get rid of NEEDS_WORKAROUND
in 0.22, and if we add a floating point implementation as a fallback
for the general one, that would allow the general one to just put
NARROW_FORMAT flags on its "fast path", using the flag in a more
conventional way.

> - flags ID_TRANSFORM, NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, NO_PAD_REPEAT
> and NO_REFLECT_REPEAT are needed to select untransformed fetchers 
> bit_image_fetch_untransformed_32/bit_image_fetch_untransformed_64 instead of
> bit_image_fetch_general/_pixman_image_get_scanline_generic_64, and apparently 
> these behave differently

This is in *principle* the same positive use of the flags. It should
in *principle* be possible to ignore all these fetchers and just use
the bits_image_fetch_general() function. However, there are
differences because when you have a transformation, the 64 bit
pipeline is actually using 32 bit pixels expanded to 64 bits.

Which is broken, of course. So hopefully, this is another case where a
floating point implementation can help. However, for that to work, we
will need a couple of things from the floating point implemenetation:

- The floating point implementation must do transformations and
  gradients (unlike the existing 64 bit pipe).

- We need the ability to plug in implementation specific
  fetchers. Otherwise we can't do proper fallback from the general to
  the floating point implementation.

So if Dmitri and Jonathan are still reading, I think the above points
are worth keeping in mind.

Here are some notes on how CPU specific fetchers could be done:

http://cgit.freedesktop.org/~sandmann/pixman/tree/docs/fetchers?h=docs&id=2dcb46955869ed042d1bc32f738ab23ab569b32e


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-28 Thread Siarhei Siamashka
On Tuesday 28 September 2010 17:24:14 Siarhei Siamashka wrote:
> The flags in pixman are implemented in such a way that they all have some
> kind of "positive" meaning, indicating that the operation is going to be
> more simple than the general case in one way or another. Having more flags
> set on the initial analysis stage is always good. Forgetting to set some
> of the flags is safe in the sense that it will not make pixman misbehave,
> pixman will just run slower because of not taking some fast paths.

Tried to actually verify this claim by artificially clearing flags in 
'compute_image_info' and running pixman tests. Appears that not everything
is so simple.

Some flags are a bit special:
- NARROW_FORMAT flag needs to be propagated to 'general_composite_rect'
function because it is used there
- calculation of NEEDS_WORKAROUND flag also can't be skipped
- flags ID_TRANSFORM, NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, NO_PAD_REPEAT
and NO_REFLECT_REPEAT are needed to select untransformed fetchers 
bit_image_fetch_untransformed_32/bit_image_fetch_untransformed_64 instead of
bit_image_fetch_general/_pixman_image_get_scanline_generic_64, and apparently 
these behave differently

The rest of flags seem to be really optional and only result in skipping fast
paths if not set. But of course current tests do not provide full coverage of
pixman functionality and some other quirks could have been overlooked.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-28 Thread Siarhei Siamashka
On Tuesday 28 September 2010 14:55:41 Siarhei Siamashka wrote:
> On Monday 27 September 2010 17:51:32 Jonathan Morton wrote:
> > If I may inject an alternative point of view into this lively
> > discussion...
> > 
> > I think the current flags system is overcomplicated for the common case.
> > If any one of a wide variety of flags is set for an operation, this
> > excludes a huge number of potential fastpaths.
> 
> That's not quite true. Adding new flags has no effect on the selection of
> the existing fast paths. New flags provide a possibility to add more
> special fast paths which require these flags.

I think a bit more detailed explanation might be useful.

The flags in pixman are implemented in such a way that they all have some kind
of "positive" meaning, indicating that the operation is going to be more simple
than the general case in one way or another. Having more flags set on the 
initial analysis stage is always good. Forgetting to set some of the flags is
safe in the sense that it will not make pixman misbehave, pixman will just run
slower because of not taking some fast paths.

Each fast path function has a mask with flags (provided separately for source,
mask and destination), with each bit set there telling something like "I can't
handle case X" (or with alternative interpretation "I can't handle cases other
than Y"). Fast path functions with many flags set in their description are
allowed to cut more corners and run a lot faster because of this.

When asked to do something via pixman_image_composite() function call, pixman
first calculates the flags for the current operation (again, separately for the
source, mask and destination images). Setting each bit is kind of like telling
to the the rest of the code: "You are allowed not to care about case X". 

With all the flags for the current operation available, fast path selection
code just walks over the list of available fast path functions and selects the
first one with matching flags. It is possible to have multiple functions in the
list with the flags matching to the current operation, so the list should be
just ordered so that the better fast path functions are checked first. Platform
specific SIMD optimizations naturally come first in the list, then followed by
fast path functions implemented in C, and finally falling back to a very slow
general implementation (which is capable of handling any operation).

There may be also some overlap in fast path functions functionality, providing
multiple alternatives to do the same operation in some corner cases. For
example, when having nearest neighbor scaling for a8r8g8b8 images with NORMAL
repeat but not trying to access any pixels outside image boundaries, either 
'fast_composite_scaled_nearest___cover_SRC()' or alternatively 
'fast_composite_scaled_nearest___normal_SRC()' can be selected.
The former fast path function works faster, so it is better to be put in the
list first.

Walking over the list of fast path functions can be relatively slow. That's why
whenever some fast path function is found in the list, it is put into fast path
cache. This improves performance when similar operations are repeatedly used.


Regarding possible optimization when pixman flag types become more settled
less likely to change. Evaluating all the possible flags in the initial
analysis stage may be somewhat slow. Theoretically, if we know that we have
some extremely fast function which only needs flag A, then we can cut some
corners skipping calculation of the rest of flags B, C, D, E and move to fast
path selection code immediately. One of the examples could be (yet to be
introduced?) SOLID flag. If we already have it set, then we don't care much
whether we also have any transformation, its type, filter, repeat, etc. But
this kind of optimization may make code less readable/maintainable, so IMHO can
be only used when really justified by a significant performance increase.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-28 Thread Siarhei Siamashka
On Monday 27 September 2010 17:51:32 Jonathan Morton wrote:
> If I may inject an alternative point of view into this lively
> discussion...
> 
> I think the current flags system is overcomplicated for the common case.
> If any one of a wide variety of flags is set for an operation, this
> excludes a huge number of potential fastpaths.

That's not quite true. Adding new flags has no effect on the selection of the 
existing fast paths. New flags provide a possibility to add more special fast
paths which require these flags.

> Meanwhile we seem to have a problem identifying the cases when the standard-
> form fastpaths *can* be used.

That's just bug, partially caused by having somewhat more complicated than
necessary legacy code inherited from xserver. The flags system is not at
fault. I think the problem is going to be fixed and everyone will be happy
again :)

> Oddly enough, I have to solve much the same problem when writing EXA
> drivers, since EXA presents information about the operation in a similar
> way as to Pixman.  I do so in a much simpler manner:
> 
> - I go through each of the features that I know the hardware doesn't
> support, such as unusual image formats, alpha maps or component-alpha.
> If any of these are set, I tell EXA to fall back to software
> immediately.
> 
> In Pixman itself, this could simply be setting an ON_CRACK flag (at
> least for alpha maps and custom accessors) which forces use of the
> general path.
> 
> - Next, I examine the transform, filter and repeat state, and use that
> to construct a "fetchmode" index.  This has distinct values for "solid"
> and "untransformed and covers clip" which are easily tested for later.
> The fetchmode fits in 8 bits per source, so 16 bits are enough for a
> masked operation.  This is small enough to make a direct lookup table
> feasible if necessary.
> 
> A big feature of this is that the definitions of the "covers clip" and
> "solid" modes are mutually exclusive, and the "solid" mode doesn't care
> what repeat mode is used to trigger it as long as it's not "normal".  If
> an image is solid, I don't even need to check whether a transform is set
> on it - I only need the dimensions and the repeat mode.

Thanks for sharing the details about your implementation. For this particular 
"covers clip" vs. "solid" case, I think it's not like we don't know what to do
to fix it. It's quite the opposite, we actually have multiple alternatives to
select from  ;)

> Some fetchmodes may not be supported by the hardware, so I make these
> fall back to software at this point too.  This is equivalent to
> searching the fastpath list in Pixman.
> 
> The common cases are as follows:
> 
> - Source covers clip without transform, no mask, no crack.
> - Source is solid, mask covers clip without transform, no crack.
> - Source is transformed (typically in a less-than-general manner) and
> may or may not cover clip, no mask, no crack.
> 
> Surely it's worth optimising the flags generation for these common
> cases?

I think pixman still does not have all the needed hooks for plugging
different types of optimizations yet, so it's a bit too early to "tighten"
this code and add shortcuts based on some assumptions.

Currently I need to somehow introduce ARM NEON optimized over_0565_8_0565
operation when the source image is scaled (using nearest filer so far) and
actively uses PAD repeat. Mask image fortunately does not use any kind of
transformation and "covers clip". This case is a little bit more complex
than the ones you described above. And I guess, more complex compositing
optimizations may need to be added later.

That said, I think that having a new microbenchmark program for measuring
pixman function call and fast path selection overhead could be very useful.
Just in order to be able to easily spot obvious (and easily fixable) 
performance problems/regressions when tweaking this flags setting and fast
path selection code.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-27 Thread Jonathan Morton
If I may inject an alternative point of view into this lively
discussion...

I think the current flags system is overcomplicated for the common case.
If any one of a wide variety of flags is set for an operation, this
excludes a huge number of potential fastpaths.  Meanwhile we seem to
have a problem identifying the cases when the standard-form fastpaths
*can* be used.

Oddly enough, I have to solve much the same problem when writing EXA
drivers, since EXA presents information about the operation in a similar
way as to Pixman.  I do so in a much simpler manner:

- I go through each of the features that I know the hardware doesn't
support, such as unusual image formats, alpha maps or component-alpha.
If any of these are set, I tell EXA to fall back to software
immediately.

In Pixman itself, this could simply be setting an ON_CRACK flag (at
least for alpha maps and custom accessors) which forces use of the
general path.

- Next, I examine the transform, filter and repeat state, and use that
to construct a "fetchmode" index.  This has distinct values for "solid"
and "untransformed and covers clip" which are easily tested for later.
The fetchmode fits in 8 bits per source, so 16 bits are enough for a
masked operation.  This is small enough to make a direct lookup table
feasible if necessary.

A big feature of this is that the definitions of the "covers clip" and
"solid" modes are mutually exclusive, and the "solid" mode doesn't care
what repeat mode is used to trigger it as long as it's not "normal".  If
an image is solid, I don't even need to check whether a transform is set
on it - I only need the dimensions and the repeat mode.

Some fetchmodes may not be supported by the hardware, so I make these
fall back to software at this point too.  This is equivalent to
searching the fastpath list in Pixman.

The common cases are as follows:

- Source covers clip without transform, no mask, no crack.
- Source is solid, mask covers clip without transform, no crack.
- Source is transformed (typically in a less-than-general manner) and
may or may not cover clip, no mask, no crack.

Surely it's worth optimising the flags generation for these common
cases?

-- 
--
From: Jonathan Morton
  jonathan.mor...@movial.com


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-27 Thread Siarhei Siamashka
On Friday 24 September 2010 19:39:10 Soeren Sandmann wrote:
> Siarhei Siamashka  writes:
> > On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote:
> > > Siarhei Siamashka  writes:
> > > > But right now setting PAD repeat for everying will cause serious
> > > > performance problems for pixman, because it will stop using simple
> > > > non-transformed fast paths.
> > > > 
> > > > This patch tries to address the problem. It passes current pixman
> > > > tests and I think that it is most likely fine. But I'm still not
> > > > totally sure if it is really safe in all possible cases.
> > > 
> > > I can't think of a case where it would be broken, but I don't think
> > > it's really *right* either. The COVERS_CLIP flag is supposed to
> > > indicate that the image doesn't have any undefined parts, which is
> > > certainly the case when you have PAD or REFLECT set.
> > 
> > I don't think the patch is more broken than the current code. Both NONE
> > and PAD repeats are very similar, as they both define the values of
> > pixels outside image boundaries according to some predefined rules. Both
> > can't be used with simple non-transformed fast paths if the operation
> > involves accessing any pixels outside image boundaries. So in my opinion
> > treating them equally in this part of code is the right thing.
> 
> You can make exactly the same argument about NORMAL: it defines values
> of pixels outside image boundaries according to some predefined
> rules.

Exactly. This is one of the possible interpretations. A bit more details below.

> The reason you can't remove COVERS_CLIP in that case is that solid
> images created by NORMAL repeating a 1x1 image is still a common way
> to generate solid images. But PAD and REFLECT *could* be used that way
> too, even if in practice they aren't.
>
> I guess that's an example of something that is in principle more
> broken with the patch that without: with the patch you can't use PAD
> and REFLECT with a 1x1 image to make solid images.

Well, you still can and it will even work. But admittedly it will be just
ridiculously slow. Flipping these flag bits just changes where we are going to
take serious performance hit: for nonscaled standard fast paths when the source
image has PAD/REFLECT repeat or for the solid sources which are represented by
1x1 images with PAD/REFLECT repeat

Anyway, that's a good point and a proof that this stuff really needs a better
fix. Having performance regressions for any use cases is not good.

> But my main objection is still that it uses the flags in a meaningless
> way that just happen to give a desired result. With the patch you
> can't assign any real meaning to the COVERS_CLIP flag.

Looking from the end of the pipeline, the COVERS_CLIP flag is set when there is 
no risk accessing pixels outside image boundaries from the fast path functions 
(which implies some knowledge about what fast path functions are actually
doing). And it is one of the requirements for selecting standard fast paths.

Standard fast paths can be used in the following two cases:
a) pixels outside image boundaries are not accessed
b) solid source (special cases of 1x1 image with repeat set)
And can't be when:
c) pixels outside image boundaries are accessed (excluding solid case)

The problem is that there is only a single flag bit to differentiate all
these three cases at the moment. It means that for each particular repeat type,
either (a) case is accelerated, or it is (b). But not both.

Before the patch it was:
NONE:a)
PAD: b)
REFLECT: b)
NORMAL:  b)

After the patch it is going to be:
NONE:a)
PAD: a)
REFLECT: a)
NORMAL:  b)

Either way is not optimal.

One of the possible solutions is probably to explicitly set COVERS_CLIP flag
when the image is known to be solid, but not just based on the repeat type.
Like in the attached patch variant.

But surely the problem can be solved better, additionally accelerating solid
case even when transformation is set.

> Ie., it's a hack to get around some admittedly not-too-well-though-out
> existing code, instead of fixing it.
> 
> > Even though this treatment itself is not perfect and can be improved
> > as you explained below.
> > 
> > Would it be a bad idea to apply this patch now, and then proceed with
> > a larger scale refactoring next?
> 
> If I thought that a larger scale refactoring was likely to actually
> happen, then I'd be more willing to consider applying this patch.

The most difficult part is that any problem can be typically solved in many
different ways, with pretty much equal end result. So some misunderstandings
can be always possible unless a really unambiguous explanation is provided
about how exactly you would like to see it fixed. I understand that providing
such clear explanation may take even more time than fixing the problem
yourself and that's unfortunate.

I would actually prefer to see you taking care of all the general
infrastructure for the fast path code selection in pixman. Yes, may

Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-24 Thread Soeren Sandmann
Siarhei Siamashka  writes:

> On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote:
> > Siarhei Siamashka  writes:
> > > But right now setting PAD repeat for everying will cause serious
> > > performance problems for pixman, because it will stop using simple
> > > non-transformed fast paths.
> > > 
> > > This patch tries to address the problem. It passes current pixman tests
> > > and I think that it is most likely fine. But I'm still not totally sure
> > > if it is really safe in all possible cases.
> > 
> > I can't think of a case where it would be broken, but I don't think
> > it's really *right* either. The COVERS_CLIP flag is supposed to
> > indicate that the image doesn't have any undefined parts, which is
> > certainly the case when you have PAD or REFLECT set.
> 
> I don't think the patch is more broken than the current code. Both NONE
> and PAD repeats are very similar, as they both define the values of pixels
> outside image boundaries according to some predefined rules. Both can't
> be used with simple non-transformed fast paths if the operation involves
> accessing any pixels outside image boundaries. So in my opinion treating
> them equally in this part of code is the right thing.

You can make exactly the same argument about NORMAL: it defines values
of pixels outside image boundaries according to some predefined
rules.

The reason you can't remove COVERS_CLIP in that case is that solid
images created by NORMAL repeating a 1x1 image is still a common way
to generate solid images. But PAD and REFLECT *could* be used that way
too, even if in practice they aren't. 

I guess that's an example of something that is in principle more
broken with the patch that without: with the patch you can't use PAD
and REFLECT with a 1x1 image to make solid images.

But my main objection is still that it uses the flags in a meaningless
way that just happen to give a desired result. With the patch you
can't assign any real meaning to the COVERS_CLIP flag. 

Ie., it's a hack to get around some admittedly not-too-well-though-out
existing code, instead of fixing it.

> Even though this treatment itself is not perfect and can be improved
> as you explained below.
> 
> Would it be a bad idea to apply this patch now, and then proceed with
> a larger scale refactoring next?

If I thought that a larger scale refactoring was likely to actually
happen, then I'd be more willing to consider applying this patch. But
once the immediate symptom is solved, people inevitably end up not
fixing the real problem.

Actually, I don't even think a large scale refactoring is necessary -
most of the real fix is just deleting code. The main piece of new
logic is the extending of PIXMAN_STD_FAST_PATH to generate different
flags for solid and bits images.


Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-23 Thread Siarhei Siamashka
Hi Bill,

It seems to me that you are looking at the problem from a bit different angle.

The whole issue is that the developers of radeon drivers are recommending
to use PAD repeat in applications in order to avoid some performance problems
on this type of hardware. But if the application developers listen to that and
start using PAD repeat more often, then software rendering done via pixman is
going to take a severe performance hit unless this minor issue is addressed.

Putting myself in application developer's shoes, I can imagine that deciding
whether to use PAD repeat may be non-obvious if there are cases (like the use
of pixman) where it causes performance regressions.

So let's just make PAD repeat fast in pixman in order not to hinder its
adoption anywhere else :)

The patch discussed here is not supposed to change pixman behavior in any way,
it's just a performance optimization.

On Wednesday 22 September 2010 21:03:34 Bill Spitzak wrote:
> What I do in OpenGL is draw a rectangle (well two triangles) textured
> with the image with PAD repeat. Outside the rectangle is 0,0,0,0. This
> produces the sharp edge I wanted and allows full acceleration to be
> used. Because OpenGL and DirectX assume that such sharp edges are how
> images will be drawn with translations, they obviously made this the
> accelerated path.
> 
> Even if you don't want to use geometry, you could also intersect the
> clip with a quad (using the current clip implementation) and then draw
> all images with PAD.
> 
> The problem is that you have to get rid of the current, and useless,
> definition of NONE. It must be changed to make a "sharp" edge. This will
> not change the output when the image scale is 1 or less, and I believe
> the change when the image scale is > 1 will produce the results desired
> by most users (ie they expect that drawing a 10x10 image will cover the
> same pixels as a 10x10 rectangle).
> 
> Soeren Sandmann wrote:
> > Bill Spitzak  writes:
> >> This is in fact what users expect, and also what other graphics
> >> libraries are doing (which explains why the hardware is supporting PAD
> >> directly).
> > 
> > The reason Render's NONE is difficult to accelerate in hardware is
> > that for formats without alpha (such as x8r8g8b8), Render wants the
> > border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0).
> > 
> > I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but
> > maybe not.
> > 
> > 
> > Soren


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-23 Thread Siarhei Siamashka
On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote:
> Siarhei Siamashka  writes:
> > But right now setting PAD repeat for everying will cause serious
> > performance problems for pixman, because it will stop using simple
> > non-transformed fast paths.
> > 
> > This patch tries to address the problem. It passes current pixman tests
> > and I think that it is most likely fine. But I'm still not totally sure
> > if it is really safe in all possible cases.
> 
> I can't think of a case where it would be broken, but I don't think
> it's really *right* either. The COVERS_CLIP flag is supposed to
> indicate that the image doesn't have any undefined parts, which is
> certainly the case when you have PAD or REFLECT set.

I don't think the patch is more broken than the current code. Both NONE
and PAD repeats are very similar, as they both define the values of pixels
outside image boundaries according to some predefined rules. Both can't
be used with simple non-transformed fast paths if the operation involves
accessing any pixels outside image boundaries. So in my opinion treating
them equally in this part of code is the right thing. Even though this
treatment itself is not perfect and can be improved as you explained below.

Would it be a bad idea to apply this patch now, and then proceed with
a larger scale refactoring next?

> The fundamental problem is that the PIXMAN_STD_FAST_PATH macro is
> mixing up several things. There are two kinds of images involved in
> the standard fast paths: solids and bits. The problem is that the
> required flags for those images are quite different.
> 
> A solid image has basically no requirements on the flags other than
> the usual NO_ALPHAMAP, NO_CONVOLUTION etc. Generally, as long as it's
> a solid color, the fast path will work.
> 
> The bits images have basically the requirement that we now call
> SAMPLES_COVER_CLIP. That allows the biltters to simply walk the image
> without worrying about reading out of bounds.
> 
> Then there is the 'simple repeat' stuff, which is an optimization that
> notices that in some cases you can do a repeat operation by repeating
> the compositing instead of sampling with wrap-around addressing. This
> code has survived since the dawn of time, when pixman was called fb,
> didn't have transformations, and NORMAL was the only repeat mode
> available. Back then this code was the way to do repeating.
> 
> On IRC you pointed out that for 1xn repeated images, this simple
> repeat stuff is not an optimization at all because it results in
> terrible memory access patterns. It has historically had other
> pathological cases, such as one it would end up calling a fast path a
> zillion times with a 1x1 rectangle.
> 
> In fact, it was never written as an optimization, it was only left in
> because it was assumed to be one. I don't think anyone has ever
> measured the impact compared to the general path.
> 
> Considering that it is really at the wrong level in the pipeline too
> (repeating conceptually happens during sampling, not after
> compositing), maybe it's time to retire this code. There are cases
> where it is probably still useful, but I'd rather avoid the headache
> of thinking about special cases where it can't or shouldn't be used.
> 
> So, basically, what I think could be done is:
> 
> - Change the PIXMAN_STD_FAST_PATH to put in different flags when an
>   image is solid vs. when it is bits. The flags in the solid case
>   would be just the usual ones, in the bits case, it would be the
>   usual ones plus SAMPLES_COVER_CLIP.
> 
> - Delete the COVERS_CLIP flag, since it would not be used in either
>   case and isn't used elsewhere.
> 
> - Delete the simple_repeat stuff (along with the SIMPLE_REPEAT flags)
> 
> If there are specific benchmarks where the simple_repeat stuff made a
> noticable difference, then it's probably better to just do a regular
> fast path for them. A scanline approach would likely be faster than a
> tiled approach anyway in those cases.

All of this looks like a good plan to me.

-- 
Best regards,
Siarhei Siamashka


signature.asc
Description: This is a digitally signed message part.
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-22 Thread Bill Spitzak
What I do in OpenGL is draw a rectangle (well two triangles) textured 
with the image with PAD repeat. Outside the rectangle is 0,0,0,0. This 
produces the sharp edge I wanted and allows full acceleration to be 
used. Because OpenGL and DirectX assume that such sharp edges are how 
images will be drawn with translations, they obviously made this the 
accelerated path.


Even if you don't want to use geometry, you could also intersect the 
clip with a quad (using the current clip implementation) and then draw 
all images with PAD.


The problem is that you have to get rid of the current, and useless, 
definition of NONE. It must be changed to make a "sharp" edge. This will 
not change the output when the image scale is 1 or less, and I believe 
the change when the image scale is > 1 will produce the results desired 
by most users (ie they expect that drawing a 10x10 image will cover the 
same pixels as a 10x10 rectangle).


Soeren Sandmann wrote:

Bill Spitzak  writes:


This is in fact what users expect, and also what other graphics
libraries are doing (which explains why the hardware is supporting PAD
directly).


The reason Render's NONE is difficult to accelerate in hardware is
that for formats without alpha (such as x8r8g8b8), Render wants the
border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0).

I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but
maybe not.


Soren


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-22 Thread Soeren Sandmann
Bill Spitzak  writes:

> This is in fact what users expect, and also what other graphics
> libraries are doing (which explains why the hardware is supporting PAD
> directly).

The reason Render's NONE is difficult to accelerate in hardware is
that for formats without alpha (such as x8r8g8b8), Render wants the
border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0).

I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but
maybe not.


Soren

___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-22 Thread Soeren Sandmann
Siarhei Siamashka  writes:

> From: Siarhei Siamashka 
> 
> Without this fix, setting PAD repeat on a source image prevents
> the use of any nonscaled standard fast paths, affecting performance
> a lot. But as long as no pixels outside the source image boundaries
> are touched by the compositing operation, all the repeat types
> behave the same and can take the same fast paths.
> 
> This is important because setting PAD repeat instead of NONE is
> more hardware acceleration friendly (for the drivers implementing
> RENDER extension) and does not inhibit OVER->SRC operator
> optimization in pixman.
> ---
> 
> We discussed this problem and the related code with Soeren on IRC the
> other day, and it may be that the code dealing with FAST_PATH_COVERS_CLIP
> and FAST_PATH_SAMPLES_COVER_CLIP would better to be changed a bit. Along
> with the selection of fast path functions dealing with SOLID sources or
> masks (for example, any transformation set on a SOLID source image does
> not actually make any difference for end result, so it can be ignored).
> And there is a special optimization for NORMAL repeat in 
> walk_region_internal()
> function which is also somehow related. Well, I guess this all stuff
> carried us away from the original problem :)
> 
> But now returning back to this issue. Ideally, PAD repeat would be a better
> default instead of NONE. Just because NONE repeat is a bit strange and
> hard to accelerate in hardware. This is one example from freedesktop.org
> bugzilla (and there are more similar reports):
> https://bugs.freedesktop.org/show_bug.cgi?id=27954
> 
> But right now setting PAD repeat for everying will cause serious performance
> problems for pixman, because it will stop using simple non-transformed fast
> paths.
> 
> This patch tries to address the problem. It passes current pixman tests and
> I think that it is most likely fine. But I'm still not totally sure if it is
> really safe in all possible cases.

I can't think of a case where it would be broken, but I don't think
it's really *right* either. The COVERS_CLIP flag is supposed to
indicate that the image doesn't have any undefined parts, which is
certainly the case when you have PAD or REFLECT set.

The fundamental problem is that the PIXMAN_STD_FAST_PATH macro is
mixing up several things. There are two kinds of images involved in
the standard fast paths: solids and bits. The problem is that the
required flags for those images are quite different.

A solid image has basically no requirements on the flags other than
the usual NO_ALPHAMAP, NO_CONVOLUTION etc. Generally, as long as it's
a solid color, the fast path will work.

The bits images have basically the requirement that we now call
SAMPLES_COVER_CLIP. That allows the biltters to simply walk the image
without worrying about reading out of bounds.

Then there is the 'simple repeat' stuff, which is an optimization that
notices that in some cases you can do a repeat operation by repeating
the compositing instead of sampling with wrap-around addressing. This
code has survived since the dawn of time, when pixman was called fb,
didn't have transformations, and NORMAL was the only repeat mode
available. Back then this code was the way to do repeating.

On IRC you pointed out that for 1xn repeated images, this simple
repeat stuff is not an optimization at all because it results in
terrible memory access patterns. It has historically had other
pathological cases, such as one it would end up calling a fast path a
zillion times with a 1x1 rectangle.

In fact, it was never written as an optimization, it was only left in
because it was assumed to be one. I don't think anyone has ever
measured the impact compared to the general path.

Considering that it is really at the wrong level in the pipeline too
(repeating conceptually happens during sampling, not after
compositing), maybe it's time to retire this code. There are cases
where it is probably still useful, but I'd rather avoid the headache
of thinking about special cases where it can't or shouldn't be used.

So, basically, what I think could be done is:

- Change the PIXMAN_STD_FAST_PATH to put in different flags when an
  image is solid vs. when it is bits. The flags in the solid case
  would be just the usual ones, in the bits case, it would be the
  usual ones plus SAMPLES_COVER_CLIP.

- Delete the COVERS_CLIP flag, since it would not be used in either
  case and isn't used elsewhere.

- Delete the simple_repeat stuff (along with the SIMPLE_REPEAT flags)

If there are specific benchmarks where the simple_repeat stuff made a
noticable difference, then it's probably better to just do a regular
fast path for them. A scanline approach would likely be faster than a
tiled approach anyway in those cases.



Soren
___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman


Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths

2010-09-21 Thread Bill Spitzak

On Sep 21, 2010, at 9:04 PM, Siarhei Siamashka wrote:


But now returning back to this issue. Ideally, PAD repeat would be a  
better

default instead of NONE. Just because NONE repeat is a bit strange and
hard to accelerate in hardware. This is one example from  
freedesktop.org

bugzilla (and there are more similar reports):
https://bugs.freedesktop.org/show_bug.cgi?id=27954



Rather than PAD being the default, I think NONE should be changed to  
the equivalent of:


 - intersecting the current clip with a quadrilateral that is equal  
to the bounds of the source surface transformed to the destination

 - draw the image using PAD REPEAT
 - restore the clip to as before.

The result is the black edges of the image are sharp antialiased lines  
no matter what the transform.


This is in fact what users expect, and also what other graphics  
libraries are doing (which explains why the hardware is supporting PAD  
directly).


___
Pixman mailing list
Pixman@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/pixman