Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Wed, 2010-09-29 at 14:39 +0200, Soeren Sandmann wrote: > I think the right fix is to use different flags for solids and > non-solids. Fine by me. -- -- From: Jonathan Morton jonathan.mor...@movial.com ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Siarhei Siamashka writes: > Standard fast paths can be used in the following two cases: > a) pixels outside image boundaries are not accessed > b) solid source (special cases of 1x1 image with repeat set) > And can't be when: > c) pixels outside image boundaries are accessed (excluding solid case) > > The problem is that there is only a single flag bit to differentiate all > these three cases at the moment. It means that for each particular repeat > type, > either (a) case is accelerated, or it is (b). But not both. I think the right fix is to use different flags for solids and non-solids. The reason it was using the same flags for both is simply that it was translated more or less directly from an if statement from back before the flags were introduced. > > Actually, I don't even think a large scale refactoring is necessary - > > most of the real fix is just deleting code. The main piece of new > > logic is the extending of PIXMAN_STD_FAST_PATH to generate different > > flags for solid and bits images. Patches to do this follows. The deletion of the simple repeat code is actually orthogonal to this. I still think it's worth doing though. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Siarhei Siamashka writes: > On Tuesday 28 September 2010 17:24:14 Siarhei Siamashka wrote: > > The flags in pixman are implemented in such a way that they all have some > > kind of "positive" meaning, indicating that the operation is going to be > > more simple than the general case in one way or another. Having more flags > > set on the initial analysis stage is always good. Forgetting to set some > > of the flags is safe in the sense that it will not make pixman misbehave, > > pixman will just run slower because of not taking some fast paths. To rephrase: the flags should be though of as properties that the images have. If the FAST_PATH_BLAH flag is set, then the image has property BLAH. If a fast path includes BLAH in its mask, then it is saying "I can only run if the image has property BLAH". For fast paths, that's really all there is to it, bugs in the PIXMAN_STD_FAST_PATH macro notwithstanding. So the logic we rely on is: BLAH is set => image has property BLAH But there is no need for the arrow to go the other way because if an image has property BLAH and the flag isn't set, then we just won't run that particular fast path. Ie., the flags have a positive meaning, as you say. > Tried to actually verify this claim by artificially clearing flags in > 'compute_image_info' and running pixman tests. Appears that not everything > is so simple. > > Some flags are a bit special: > - NARROW_FORMAT flag needs to be propagated to 'general_composite_rect' > function because it is used there > - calculation of NEEDS_WORKAROUND flag also can't be skipped That is a good point. Yes, in the cases you point out (NARROW_FORMAT and NEEDS_WORKAROUND), we do make use of the negative meaning of the flag. If NARROW_FORMAT is *not* set, then wide fetchers are used. So, we rely on this: BLAH is not set => image does not have property BLAH which by contraposition gives image has property BLAH => BLAH is set So those two flags really are special in that they *must* be set if the image actually has the property. This is actually kind of bad. I don't like special cases, and I hadn't realized we had one here. But maybe we can get rid of NEEDS_WORKAROUND in 0.22, and if we add a floating point implementation as a fallback for the general one, that would allow the general one to just put NARROW_FORMAT flags on its "fast path", using the flag in a more conventional way. > - flags ID_TRANSFORM, NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, NO_PAD_REPEAT > and NO_REFLECT_REPEAT are needed to select untransformed fetchers > bit_image_fetch_untransformed_32/bit_image_fetch_untransformed_64 instead of > bit_image_fetch_general/_pixman_image_get_scanline_generic_64, and apparently > these behave differently This is in *principle* the same positive use of the flags. It should in *principle* be possible to ignore all these fetchers and just use the bits_image_fetch_general() function. However, there are differences because when you have a transformation, the 64 bit pipeline is actually using 32 bit pixels expanded to 64 bits. Which is broken, of course. So hopefully, this is another case where a floating point implementation can help. However, for that to work, we will need a couple of things from the floating point implemenetation: - The floating point implementation must do transformations and gradients (unlike the existing 64 bit pipe). - We need the ability to plug in implementation specific fetchers. Otherwise we can't do proper fallback from the general to the floating point implementation. So if Dmitri and Jonathan are still reading, I think the above points are worth keeping in mind. Here are some notes on how CPU specific fetchers could be done: http://cgit.freedesktop.org/~sandmann/pixman/tree/docs/fetchers?h=docs&id=2dcb46955869ed042d1bc32f738ab23ab569b32e Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Tuesday 28 September 2010 17:24:14 Siarhei Siamashka wrote: > The flags in pixman are implemented in such a way that they all have some > kind of "positive" meaning, indicating that the operation is going to be > more simple than the general case in one way or another. Having more flags > set on the initial analysis stage is always good. Forgetting to set some > of the flags is safe in the sense that it will not make pixman misbehave, > pixman will just run slower because of not taking some fast paths. Tried to actually verify this claim by artificially clearing flags in 'compute_image_info' and running pixman tests. Appears that not everything is so simple. Some flags are a bit special: - NARROW_FORMAT flag needs to be propagated to 'general_composite_rect' function because it is used there - calculation of NEEDS_WORKAROUND flag also can't be skipped - flags ID_TRANSFORM, NO_ALPHA_MAP, NO_CONVOLUTION_FILTER, NO_PAD_REPEAT and NO_REFLECT_REPEAT are needed to select untransformed fetchers bit_image_fetch_untransformed_32/bit_image_fetch_untransformed_64 instead of bit_image_fetch_general/_pixman_image_get_scanline_generic_64, and apparently these behave differently The rest of flags seem to be really optional and only result in skipping fast paths if not set. But of course current tests do not provide full coverage of pixman functionality and some other quirks could have been overlooked. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Tuesday 28 September 2010 14:55:41 Siarhei Siamashka wrote: > On Monday 27 September 2010 17:51:32 Jonathan Morton wrote: > > If I may inject an alternative point of view into this lively > > discussion... > > > > I think the current flags system is overcomplicated for the common case. > > If any one of a wide variety of flags is set for an operation, this > > excludes a huge number of potential fastpaths. > > That's not quite true. Adding new flags has no effect on the selection of > the existing fast paths. New flags provide a possibility to add more > special fast paths which require these flags. I think a bit more detailed explanation might be useful. The flags in pixman are implemented in such a way that they all have some kind of "positive" meaning, indicating that the operation is going to be more simple than the general case in one way or another. Having more flags set on the initial analysis stage is always good. Forgetting to set some of the flags is safe in the sense that it will not make pixman misbehave, pixman will just run slower because of not taking some fast paths. Each fast path function has a mask with flags (provided separately for source, mask and destination), with each bit set there telling something like "I can't handle case X" (or with alternative interpretation "I can't handle cases other than Y"). Fast path functions with many flags set in their description are allowed to cut more corners and run a lot faster because of this. When asked to do something via pixman_image_composite() function call, pixman first calculates the flags for the current operation (again, separately for the source, mask and destination images). Setting each bit is kind of like telling to the the rest of the code: "You are allowed not to care about case X". With all the flags for the current operation available, fast path selection code just walks over the list of available fast path functions and selects the first one with matching flags. It is possible to have multiple functions in the list with the flags matching to the current operation, so the list should be just ordered so that the better fast path functions are checked first. Platform specific SIMD optimizations naturally come first in the list, then followed by fast path functions implemented in C, and finally falling back to a very slow general implementation (which is capable of handling any operation). There may be also some overlap in fast path functions functionality, providing multiple alternatives to do the same operation in some corner cases. For example, when having nearest neighbor scaling for a8r8g8b8 images with NORMAL repeat but not trying to access any pixels outside image boundaries, either 'fast_composite_scaled_nearest___cover_SRC()' or alternatively 'fast_composite_scaled_nearest___normal_SRC()' can be selected. The former fast path function works faster, so it is better to be put in the list first. Walking over the list of fast path functions can be relatively slow. That's why whenever some fast path function is found in the list, it is put into fast path cache. This improves performance when similar operations are repeatedly used. Regarding possible optimization when pixman flag types become more settled less likely to change. Evaluating all the possible flags in the initial analysis stage may be somewhat slow. Theoretically, if we know that we have some extremely fast function which only needs flag A, then we can cut some corners skipping calculation of the rest of flags B, C, D, E and move to fast path selection code immediately. One of the examples could be (yet to be introduced?) SOLID flag. If we already have it set, then we don't care much whether we also have any transformation, its type, filter, repeat, etc. But this kind of optimization may make code less readable/maintainable, so IMHO can be only used when really justified by a significant performance increase. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Monday 27 September 2010 17:51:32 Jonathan Morton wrote: > If I may inject an alternative point of view into this lively > discussion... > > I think the current flags system is overcomplicated for the common case. > If any one of a wide variety of flags is set for an operation, this > excludes a huge number of potential fastpaths. That's not quite true. Adding new flags has no effect on the selection of the existing fast paths. New flags provide a possibility to add more special fast paths which require these flags. > Meanwhile we seem to have a problem identifying the cases when the standard- > form fastpaths *can* be used. That's just bug, partially caused by having somewhat more complicated than necessary legacy code inherited from xserver. The flags system is not at fault. I think the problem is going to be fixed and everyone will be happy again :) > Oddly enough, I have to solve much the same problem when writing EXA > drivers, since EXA presents information about the operation in a similar > way as to Pixman. I do so in a much simpler manner: > > - I go through each of the features that I know the hardware doesn't > support, such as unusual image formats, alpha maps or component-alpha. > If any of these are set, I tell EXA to fall back to software > immediately. > > In Pixman itself, this could simply be setting an ON_CRACK flag (at > least for alpha maps and custom accessors) which forces use of the > general path. > > - Next, I examine the transform, filter and repeat state, and use that > to construct a "fetchmode" index. This has distinct values for "solid" > and "untransformed and covers clip" which are easily tested for later. > The fetchmode fits in 8 bits per source, so 16 bits are enough for a > masked operation. This is small enough to make a direct lookup table > feasible if necessary. > > A big feature of this is that the definitions of the "covers clip" and > "solid" modes are mutually exclusive, and the "solid" mode doesn't care > what repeat mode is used to trigger it as long as it's not "normal". If > an image is solid, I don't even need to check whether a transform is set > on it - I only need the dimensions and the repeat mode. Thanks for sharing the details about your implementation. For this particular "covers clip" vs. "solid" case, I think it's not like we don't know what to do to fix it. It's quite the opposite, we actually have multiple alternatives to select from ;) > Some fetchmodes may not be supported by the hardware, so I make these > fall back to software at this point too. This is equivalent to > searching the fastpath list in Pixman. > > The common cases are as follows: > > - Source covers clip without transform, no mask, no crack. > - Source is solid, mask covers clip without transform, no crack. > - Source is transformed (typically in a less-than-general manner) and > may or may not cover clip, no mask, no crack. > > Surely it's worth optimising the flags generation for these common > cases? I think pixman still does not have all the needed hooks for plugging different types of optimizations yet, so it's a bit too early to "tighten" this code and add shortcuts based on some assumptions. Currently I need to somehow introduce ARM NEON optimized over_0565_8_0565 operation when the source image is scaled (using nearest filer so far) and actively uses PAD repeat. Mask image fortunately does not use any kind of transformation and "covers clip". This case is a little bit more complex than the ones you described above. And I guess, more complex compositing optimizations may need to be added later. That said, I think that having a new microbenchmark program for measuring pixman function call and fast path selection overhead could be very useful. Just in order to be able to easily spot obvious (and easily fixable) performance problems/regressions when tweaking this flags setting and fast path selection code. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
If I may inject an alternative point of view into this lively discussion... I think the current flags system is overcomplicated for the common case. If any one of a wide variety of flags is set for an operation, this excludes a huge number of potential fastpaths. Meanwhile we seem to have a problem identifying the cases when the standard-form fastpaths *can* be used. Oddly enough, I have to solve much the same problem when writing EXA drivers, since EXA presents information about the operation in a similar way as to Pixman. I do so in a much simpler manner: - I go through each of the features that I know the hardware doesn't support, such as unusual image formats, alpha maps or component-alpha. If any of these are set, I tell EXA to fall back to software immediately. In Pixman itself, this could simply be setting an ON_CRACK flag (at least for alpha maps and custom accessors) which forces use of the general path. - Next, I examine the transform, filter and repeat state, and use that to construct a "fetchmode" index. This has distinct values for "solid" and "untransformed and covers clip" which are easily tested for later. The fetchmode fits in 8 bits per source, so 16 bits are enough for a masked operation. This is small enough to make a direct lookup table feasible if necessary. A big feature of this is that the definitions of the "covers clip" and "solid" modes are mutually exclusive, and the "solid" mode doesn't care what repeat mode is used to trigger it as long as it's not "normal". If an image is solid, I don't even need to check whether a transform is set on it - I only need the dimensions and the repeat mode. Some fetchmodes may not be supported by the hardware, so I make these fall back to software at this point too. This is equivalent to searching the fastpath list in Pixman. The common cases are as follows: - Source covers clip without transform, no mask, no crack. - Source is solid, mask covers clip without transform, no crack. - Source is transformed (typically in a less-than-general manner) and may or may not cover clip, no mask, no crack. Surely it's worth optimising the flags generation for these common cases? -- -- From: Jonathan Morton jonathan.mor...@movial.com ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Friday 24 September 2010 19:39:10 Soeren Sandmann wrote: > Siarhei Siamashka writes: > > On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote: > > > Siarhei Siamashka writes: > > > > But right now setting PAD repeat for everying will cause serious > > > > performance problems for pixman, because it will stop using simple > > > > non-transformed fast paths. > > > > > > > > This patch tries to address the problem. It passes current pixman > > > > tests and I think that it is most likely fine. But I'm still not > > > > totally sure if it is really safe in all possible cases. > > > > > > I can't think of a case where it would be broken, but I don't think > > > it's really *right* either. The COVERS_CLIP flag is supposed to > > > indicate that the image doesn't have any undefined parts, which is > > > certainly the case when you have PAD or REFLECT set. > > > > I don't think the patch is more broken than the current code. Both NONE > > and PAD repeats are very similar, as they both define the values of > > pixels outside image boundaries according to some predefined rules. Both > > can't be used with simple non-transformed fast paths if the operation > > involves accessing any pixels outside image boundaries. So in my opinion > > treating them equally in this part of code is the right thing. > > You can make exactly the same argument about NORMAL: it defines values > of pixels outside image boundaries according to some predefined > rules. Exactly. This is one of the possible interpretations. A bit more details below. > The reason you can't remove COVERS_CLIP in that case is that solid > images created by NORMAL repeating a 1x1 image is still a common way > to generate solid images. But PAD and REFLECT *could* be used that way > too, even if in practice they aren't. > > I guess that's an example of something that is in principle more > broken with the patch that without: with the patch you can't use PAD > and REFLECT with a 1x1 image to make solid images. Well, you still can and it will even work. But admittedly it will be just ridiculously slow. Flipping these flag bits just changes where we are going to take serious performance hit: for nonscaled standard fast paths when the source image has PAD/REFLECT repeat or for the solid sources which are represented by 1x1 images with PAD/REFLECT repeat Anyway, that's a good point and a proof that this stuff really needs a better fix. Having performance regressions for any use cases is not good. > But my main objection is still that it uses the flags in a meaningless > way that just happen to give a desired result. With the patch you > can't assign any real meaning to the COVERS_CLIP flag. Looking from the end of the pipeline, the COVERS_CLIP flag is set when there is no risk accessing pixels outside image boundaries from the fast path functions (which implies some knowledge about what fast path functions are actually doing). And it is one of the requirements for selecting standard fast paths. Standard fast paths can be used in the following two cases: a) pixels outside image boundaries are not accessed b) solid source (special cases of 1x1 image with repeat set) And can't be when: c) pixels outside image boundaries are accessed (excluding solid case) The problem is that there is only a single flag bit to differentiate all these three cases at the moment. It means that for each particular repeat type, either (a) case is accelerated, or it is (b). But not both. Before the patch it was: NONE:a) PAD: b) REFLECT: b) NORMAL: b) After the patch it is going to be: NONE:a) PAD: a) REFLECT: a) NORMAL: b) Either way is not optimal. One of the possible solutions is probably to explicitly set COVERS_CLIP flag when the image is known to be solid, but not just based on the repeat type. Like in the attached patch variant. But surely the problem can be solved better, additionally accelerating solid case even when transformation is set. > Ie., it's a hack to get around some admittedly not-too-well-though-out > existing code, instead of fixing it. > > > Even though this treatment itself is not perfect and can be improved > > as you explained below. > > > > Would it be a bad idea to apply this patch now, and then proceed with > > a larger scale refactoring next? > > If I thought that a larger scale refactoring was likely to actually > happen, then I'd be more willing to consider applying this patch. The most difficult part is that any problem can be typically solved in many different ways, with pretty much equal end result. So some misunderstandings can be always possible unless a really unambiguous explanation is provided about how exactly you would like to see it fixed. I understand that providing such clear explanation may take even more time than fixing the problem yourself and that's unfortunate. I would actually prefer to see you taking care of all the general infrastructure for the fast path code selection in pixman. Yes, may
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Siarhei Siamashka writes: > On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote: > > Siarhei Siamashka writes: > > > But right now setting PAD repeat for everying will cause serious > > > performance problems for pixman, because it will stop using simple > > > non-transformed fast paths. > > > > > > This patch tries to address the problem. It passes current pixman tests > > > and I think that it is most likely fine. But I'm still not totally sure > > > if it is really safe in all possible cases. > > > > I can't think of a case where it would be broken, but I don't think > > it's really *right* either. The COVERS_CLIP flag is supposed to > > indicate that the image doesn't have any undefined parts, which is > > certainly the case when you have PAD or REFLECT set. > > I don't think the patch is more broken than the current code. Both NONE > and PAD repeats are very similar, as they both define the values of pixels > outside image boundaries according to some predefined rules. Both can't > be used with simple non-transformed fast paths if the operation involves > accessing any pixels outside image boundaries. So in my opinion treating > them equally in this part of code is the right thing. You can make exactly the same argument about NORMAL: it defines values of pixels outside image boundaries according to some predefined rules. The reason you can't remove COVERS_CLIP in that case is that solid images created by NORMAL repeating a 1x1 image is still a common way to generate solid images. But PAD and REFLECT *could* be used that way too, even if in practice they aren't. I guess that's an example of something that is in principle more broken with the patch that without: with the patch you can't use PAD and REFLECT with a 1x1 image to make solid images. But my main objection is still that it uses the flags in a meaningless way that just happen to give a desired result. With the patch you can't assign any real meaning to the COVERS_CLIP flag. Ie., it's a hack to get around some admittedly not-too-well-though-out existing code, instead of fixing it. > Even though this treatment itself is not perfect and can be improved > as you explained below. > > Would it be a bad idea to apply this patch now, and then proceed with > a larger scale refactoring next? If I thought that a larger scale refactoring was likely to actually happen, then I'd be more willing to consider applying this patch. But once the immediate symptom is solved, people inevitably end up not fixing the real problem. Actually, I don't even think a large scale refactoring is necessary - most of the real fix is just deleting code. The main piece of new logic is the extending of PIXMAN_STD_FAST_PATH to generate different flags for solid and bits images. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Hi Bill, It seems to me that you are looking at the problem from a bit different angle. The whole issue is that the developers of radeon drivers are recommending to use PAD repeat in applications in order to avoid some performance problems on this type of hardware. But if the application developers listen to that and start using PAD repeat more often, then software rendering done via pixman is going to take a severe performance hit unless this minor issue is addressed. Putting myself in application developer's shoes, I can imagine that deciding whether to use PAD repeat may be non-obvious if there are cases (like the use of pixman) where it causes performance regressions. So let's just make PAD repeat fast in pixman in order not to hinder its adoption anywhere else :) The patch discussed here is not supposed to change pixman behavior in any way, it's just a performance optimization. On Wednesday 22 September 2010 21:03:34 Bill Spitzak wrote: > What I do in OpenGL is draw a rectangle (well two triangles) textured > with the image with PAD repeat. Outside the rectangle is 0,0,0,0. This > produces the sharp edge I wanted and allows full acceleration to be > used. Because OpenGL and DirectX assume that such sharp edges are how > images will be drawn with translations, they obviously made this the > accelerated path. > > Even if you don't want to use geometry, you could also intersect the > clip with a quad (using the current clip implementation) and then draw > all images with PAD. > > The problem is that you have to get rid of the current, and useless, > definition of NONE. It must be changed to make a "sharp" edge. This will > not change the output when the image scale is 1 or less, and I believe > the change when the image scale is > 1 will produce the results desired > by most users (ie they expect that drawing a 10x10 image will cover the > same pixels as a 10x10 rectangle). > > Soeren Sandmann wrote: > > Bill Spitzak writes: > >> This is in fact what users expect, and also what other graphics > >> libraries are doing (which explains why the hardware is supporting PAD > >> directly). > > > > The reason Render's NONE is difficult to accelerate in hardware is > > that for formats without alpha (such as x8r8g8b8), Render wants the > > border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0). > > > > I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but > > maybe not. > > > > > > Soren signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Wednesday 22 September 2010 17:56:20 Soeren Sandmann wrote: > Siarhei Siamashka writes: > > But right now setting PAD repeat for everying will cause serious > > performance problems for pixman, because it will stop using simple > > non-transformed fast paths. > > > > This patch tries to address the problem. It passes current pixman tests > > and I think that it is most likely fine. But I'm still not totally sure > > if it is really safe in all possible cases. > > I can't think of a case where it would be broken, but I don't think > it's really *right* either. The COVERS_CLIP flag is supposed to > indicate that the image doesn't have any undefined parts, which is > certainly the case when you have PAD or REFLECT set. I don't think the patch is more broken than the current code. Both NONE and PAD repeats are very similar, as they both define the values of pixels outside image boundaries according to some predefined rules. Both can't be used with simple non-transformed fast paths if the operation involves accessing any pixels outside image boundaries. So in my opinion treating them equally in this part of code is the right thing. Even though this treatment itself is not perfect and can be improved as you explained below. Would it be a bad idea to apply this patch now, and then proceed with a larger scale refactoring next? > The fundamental problem is that the PIXMAN_STD_FAST_PATH macro is > mixing up several things. There are two kinds of images involved in > the standard fast paths: solids and bits. The problem is that the > required flags for those images are quite different. > > A solid image has basically no requirements on the flags other than > the usual NO_ALPHAMAP, NO_CONVOLUTION etc. Generally, as long as it's > a solid color, the fast path will work. > > The bits images have basically the requirement that we now call > SAMPLES_COVER_CLIP. That allows the biltters to simply walk the image > without worrying about reading out of bounds. > > Then there is the 'simple repeat' stuff, which is an optimization that > notices that in some cases you can do a repeat operation by repeating > the compositing instead of sampling with wrap-around addressing. This > code has survived since the dawn of time, when pixman was called fb, > didn't have transformations, and NORMAL was the only repeat mode > available. Back then this code was the way to do repeating. > > On IRC you pointed out that for 1xn repeated images, this simple > repeat stuff is not an optimization at all because it results in > terrible memory access patterns. It has historically had other > pathological cases, such as one it would end up calling a fast path a > zillion times with a 1x1 rectangle. > > In fact, it was never written as an optimization, it was only left in > because it was assumed to be one. I don't think anyone has ever > measured the impact compared to the general path. > > Considering that it is really at the wrong level in the pipeline too > (repeating conceptually happens during sampling, not after > compositing), maybe it's time to retire this code. There are cases > where it is probably still useful, but I'd rather avoid the headache > of thinking about special cases where it can't or shouldn't be used. > > So, basically, what I think could be done is: > > - Change the PIXMAN_STD_FAST_PATH to put in different flags when an > image is solid vs. when it is bits. The flags in the solid case > would be just the usual ones, in the bits case, it would be the > usual ones plus SAMPLES_COVER_CLIP. > > - Delete the COVERS_CLIP flag, since it would not be used in either > case and isn't used elsewhere. > > - Delete the simple_repeat stuff (along with the SIMPLE_REPEAT flags) > > If there are specific benchmarks where the simple_repeat stuff made a > noticable difference, then it's probably better to just do a regular > fast path for them. A scanline approach would likely be faster than a > tiled approach anyway in those cases. All of this looks like a good plan to me. -- Best regards, Siarhei Siamashka signature.asc Description: This is a digitally signed message part. ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
What I do in OpenGL is draw a rectangle (well two triangles) textured with the image with PAD repeat. Outside the rectangle is 0,0,0,0. This produces the sharp edge I wanted and allows full acceleration to be used. Because OpenGL and DirectX assume that such sharp edges are how images will be drawn with translations, they obviously made this the accelerated path. Even if you don't want to use geometry, you could also intersect the clip with a quad (using the current clip implementation) and then draw all images with PAD. The problem is that you have to get rid of the current, and useless, definition of NONE. It must be changed to make a "sharp" edge. This will not change the output when the image scale is 1 or less, and I believe the change when the image scale is > 1 will produce the results desired by most users (ie they expect that drawing a 10x10 image will cover the same pixels as a 10x10 rectangle). Soeren Sandmann wrote: Bill Spitzak writes: This is in fact what users expect, and also what other graphics libraries are doing (which explains why the hardware is supporting PAD directly). The reason Render's NONE is difficult to accelerate in hardware is that for formats without alpha (such as x8r8g8b8), Render wants the border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0). I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but maybe not. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Bill Spitzak writes: > This is in fact what users expect, and also what other graphics > libraries are doing (which explains why the hardware is supporting PAD > directly). The reason Render's NONE is difficult to accelerate in hardware is that for formats without alpha (such as x8r8g8b8), Render wants the border to be (0, 0, 0, 0), whereas GL will give you (0xFF, 0, 0, 0). I thought I heard at one point that Direct3D can do (0, 0, 0, 0), but maybe not. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
Siarhei Siamashka writes: > From: Siarhei Siamashka > > Without this fix, setting PAD repeat on a source image prevents > the use of any nonscaled standard fast paths, affecting performance > a lot. But as long as no pixels outside the source image boundaries > are touched by the compositing operation, all the repeat types > behave the same and can take the same fast paths. > > This is important because setting PAD repeat instead of NONE is > more hardware acceleration friendly (for the drivers implementing > RENDER extension) and does not inhibit OVER->SRC operator > optimization in pixman. > --- > > We discussed this problem and the related code with Soeren on IRC the > other day, and it may be that the code dealing with FAST_PATH_COVERS_CLIP > and FAST_PATH_SAMPLES_COVER_CLIP would better to be changed a bit. Along > with the selection of fast path functions dealing with SOLID sources or > masks (for example, any transformation set on a SOLID source image does > not actually make any difference for end result, so it can be ignored). > And there is a special optimization for NORMAL repeat in > walk_region_internal() > function which is also somehow related. Well, I guess this all stuff > carried us away from the original problem :) > > But now returning back to this issue. Ideally, PAD repeat would be a better > default instead of NONE. Just because NONE repeat is a bit strange and > hard to accelerate in hardware. This is one example from freedesktop.org > bugzilla (and there are more similar reports): > https://bugs.freedesktop.org/show_bug.cgi?id=27954 > > But right now setting PAD repeat for everying will cause serious performance > problems for pixman, because it will stop using simple non-transformed fast > paths. > > This patch tries to address the problem. It passes current pixman tests and > I think that it is most likely fine. But I'm still not totally sure if it is > really safe in all possible cases. I can't think of a case where it would be broken, but I don't think it's really *right* either. The COVERS_CLIP flag is supposed to indicate that the image doesn't have any undefined parts, which is certainly the case when you have PAD or REFLECT set. The fundamental problem is that the PIXMAN_STD_FAST_PATH macro is mixing up several things. There are two kinds of images involved in the standard fast paths: solids and bits. The problem is that the required flags for those images are quite different. A solid image has basically no requirements on the flags other than the usual NO_ALPHAMAP, NO_CONVOLUTION etc. Generally, as long as it's a solid color, the fast path will work. The bits images have basically the requirement that we now call SAMPLES_COVER_CLIP. That allows the biltters to simply walk the image without worrying about reading out of bounds. Then there is the 'simple repeat' stuff, which is an optimization that notices that in some cases you can do a repeat operation by repeating the compositing instead of sampling with wrap-around addressing. This code has survived since the dawn of time, when pixman was called fb, didn't have transformations, and NORMAL was the only repeat mode available. Back then this code was the way to do repeating. On IRC you pointed out that for 1xn repeated images, this simple repeat stuff is not an optimization at all because it results in terrible memory access patterns. It has historically had other pathological cases, such as one it would end up calling a fast path a zillion times with a 1x1 rectangle. In fact, it was never written as an optimization, it was only left in because it was assumed to be one. I don't think anyone has ever measured the impact compared to the general path. Considering that it is really at the wrong level in the pipeline too (repeating conceptually happens during sampling, not after compositing), maybe it's time to retire this code. There are cases where it is probably still useful, but I'd rather avoid the headache of thinking about special cases where it can't or shouldn't be used. So, basically, what I think could be done is: - Change the PIXMAN_STD_FAST_PATH to put in different flags when an image is solid vs. when it is bits. The flags in the solid case would be just the usual ones, in the bits case, it would be the usual ones plus SAMPLES_COVER_CLIP. - Delete the COVERS_CLIP flag, since it would not be used in either case and isn't used elsewhere. - Delete the simple_repeat stuff (along with the SIMPLE_REPEAT flags) If there are specific benchmarks where the simple_repeat stuff made a noticable difference, then it's probably better to just do a regular fast path for them. A scanline approach would likely be faster than a tiled approach anyway in those cases. Soren ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman
Re: [Pixman] [PATCH] Don't discriminate PAD and REFLECT repeat in standard fast paths
On Sep 21, 2010, at 9:04 PM, Siarhei Siamashka wrote: But now returning back to this issue. Ideally, PAD repeat would be a better default instead of NONE. Just because NONE repeat is a bit strange and hard to accelerate in hardware. This is one example from freedesktop.org bugzilla (and there are more similar reports): https://bugs.freedesktop.org/show_bug.cgi?id=27954 Rather than PAD being the default, I think NONE should be changed to the equivalent of: - intersecting the current clip with a quadrilateral that is equal to the bounds of the source surface transformed to the destination - draw the image using PAD REPEAT - restore the clip to as before. The result is the black edges of the image are sharp antialiased lines no matter what the transform. This is in fact what users expect, and also what other graphics libraries are doing (which explains why the hardware is supporting PAD directly). ___ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman