On Sat, 24 Nov 2012 04:57:22 +0100 sandm...@cs.au.dk (Søren Sandmann) wrote:
> Hi, > > Reviewing the supersampling patch here: > > http://cgit.freedesktop.org/~ajohnson/pixman/log/?h=supersampling > > I wasn't happy with either the performance and image quality, and I > realized that the whole supersampling approach just isn't going to > fly. Since I told people to do it that way, I apologize for that. The > approach advocated by Bill Spitzak in the various downsampling threads > of computing a convolution kernel up front, is the much better way to > go. To make up for being misleading, the following patches implement > comprehensive support for high-quality image scaling filters. > > Pixman already has a convolution filter, but since it only allows one > sample per pixel of the filter, it is limited in the quality that it can > support, so the following patches (to be applied on top of the three > rounding patches) add a new filter type > > PIXMAN_FILTER_SEPARABLE_CONVOLUTION > > that supports multiple different convolution matrices that are chosen > between based on the subpixel source location. The matrices are > specified as tensor products of x/y vectors, which makes them > separable by definition. I like this approach. This is a clean design, which look simple enough to optimize for better performance with SIMD. > -=- Further work and examples > > There is some additional work that could be done: > > - Performance improvements. Low-hanging fruit includes adding new fast > path iterators that assume the source is a8r8g8b8 or r5g6b5. Higher > hanging fruit is SIMD optimziations and implementations that take > advantage of separability. It may also be interesting to speed up > pixman_filter_create_separable_convolution() by tabularizing some of > the trigonometric functions etc. From what I see, the separable convolution filter shares a lot of similarities with the existing pixman SIMD code for bilinear scaling, which could be extended with relatively little effort. Bilinear scaling uses weighted average of 2 pixels (in one direction), with weights calculated on the go. Separable convolution uses weighted average of N pixels, with weights obtained by table lookups. Both use subpixel positions (7 phase bits or 128 phases for current bilinear implementation) to lookup or calculate weights. Bilinear filter is naturally a subset of separable convolution. The biggest challenge for optimized bilinear scaling (compared to nearest) had been properly implementing different types of repeat on image edges due to sampling of some pixels outside of the image boundaries. But "many" pixels from the separable convolution filter is not much different from just "two" from the (bi)linear filter in this respect, so this should already work fine also for separable convolution with just minor tweaks. Regarding SIMD optimized iterators (for both bilinear and separable convolution), they are rather simple to implement as well. Just have a look at my old OpenMP proof of concept patch: http://lists.freedesktop.org/archives/pixman/2012-June/002071.html The OMP_BILINEAR_PARALLEL_FOR define lists all the variables which fully represent the state of the iterator, needed to walk over the source image and scale it (only the loop counter 'i' is different for each scanline). This state can be calculated when creating an iterator, and then we can simply pull one scaled scanline at a time. The fact that the current bilinear code is not separable is a minor implementation detail, we could as well have 2x1 sized filter instead of 2x2 and then do vertical scaling separately with the help of a temporary buffer in L1 cache like you tried before in the "separable-bilinear" branch: http://lists.freedesktop.org/archives/pixman/2012-June/002140.html And I tried a similar "two-passes in L1 cache" approach for YUV->RGB scaling code prototype much earlier, with reasonably good results: https://bugzilla.mozilla.org/show_bug.cgi?id=634557#c53 So this is expected to provide good performance. Affine transformations (still to be implemented with SIMD) are a bit different and iterators are not a very good choice for them due to less than perfect memory access pattern. -- Best regards, Siarhei Siamashka _______________________________________________ Pixman mailing list Pixman@lists.freedesktop.org http://lists.freedesktop.org/mailman/listinfo/pixman